You've got 500 documents. A client asks for "everything related to the Q3 budget." A flat search engine returns 200 results. Your knowledge graph returns 12—the ones that actually matter.
That's the difference between a database and a graph.
AiFiler's knowledge graph isn't a trendy buzzword layer on top of vector embeddings. It's the structural foundation of how we organize and retrieve information. And the architecture behind it solves a problem most document tools don't even acknowledge: finding the right documents is harder than storing them.
The Problem We Solved
Traditional document management treats files as isolated objects. You search by filename, metadata, or full-text keywords. The connections between documents—"this contract references that statement of work," "this email thread discusses that proposal"—are invisible to the system.
Vector databases helped. You could embed documents and find semantic neighbors. But semantic similarity isn't the same as semantic relevance. A document about "budget forecasting" might be semantically similar to "Q3 financial planning," but if you're looking for "documents that impact the Q3 budget decision," you need to know why they're connected.
We needed a graph.
The Eight Edge Types
AiFiler's knowledge graph uses eight distinct edge types. Each represents a different kind of relationship:
- References — Document A cites or quotes Document B
- Responds To — Document A is a reply to or addresses Document B
- Implements — Document A puts into practice what Document B specifies
- Contradicts — Document A conflicts with or disagrees with Document B
- Extends — Document A builds on or expands Document B
- Precedes — Document A must happen before Document B (temporal dependency)
- Tags — Document A is labeled or categorized by Document B (metadata relationship)
- Mentions — Document A discusses or references Document B (weak semantic link)
Why eight, not three or thirty? We started with twelve. We tested with five. Eight is the sweet spot: specific enough to be useful, general enough to apply across different document types (contracts, emails, spreadsheets, presentations) without requiring manual tagging.
Each edge is directional and weighted. A "References" edge from Contract A to Clause B has a strength score (0.0 to 1.0) based on how many times it's referenced and how central that reference is to the document's meaning. This matters: a passing mention scores 0.3; a foundational reference scores 0.9.
How We Extract Edges
The extraction happens in two phases:
Phase 1: Structural Parsing happens when you upload a document. AiFiler's file parser (lib/ingest/parseFile.ts) extracts citations, cross-references, and metadata relationships. For a contract, this means identifying clause references. For an email thread, it means linking replies to originals. For a spreadsheet, it means finding formula dependencies.
This is deterministic. No AI involved. A reference is a reference.
Phase 2: Semantic Inference runs asynchronously via Claude. The AI reads the document and identifies relationships that aren't explicitly marked: "This proposal contradicts the approach outlined in the previous proposal." "This implementation plan extends the framework described in the design doc."
The AI doesn't invent edges. It's constrained to the eight types. It assigns weights based on specificity and confidence. A high-confidence contradiction gets 0.85; a tentative mention gets 0.2.
Both phases write to the same graph structure. The result: a document isn't just stored; it's positioned relative to every other document in your workspace.
The Query Architecture
Here's where latency matters.
When you search for "Q3 budget," AiFiler doesn't just find documents with those words. The Universal Command (lib/intelligence/universalRouter.ts) routes your query to the graph engine, which:
- Seeds the graph with direct matches (documents containing "Q3 budget")
- Expands one hop outward along high-confidence edges (documents referenced by or referencing the seeds)
- Ranks results by edge weight, recency, and access frequency
- Filters using your workspace context (permissions, project tags, date ranges)
All of this happens in under 200ms. Why? Three architectural choices:
Choice 1: Adjacency Lists, Not Matrices
We don't store the graph as a dense adjacency matrix. That would require O(n²) space for n documents. Instead, we store adjacency lists: for each document, a list of outgoing edges with their targets and weights.
Document A: {
references: [
{ target: "Doc B", weight: 0.9 },
{ target: "Doc C", weight: 0.4 }
],
contradicts: [
{ target: "Doc D", weight: 0.7 }
]
}
This is memory-efficient and query-efficient. Finding all documents that Document A references is O(1). Finding all documents that reference Document A is O(n), but we cache that direction.
Choice 2: Cached Neighborhoods
We pre-compute and cache the "neighborhood" of each document: all documents within two hops, ranked by aggregate edge weight. This is a read-heavy operation (you query documents far more than you edit them), so the cache pays for itself immediately.
When you search, we're not traversing the graph in real time. We're querying a pre-computed, indexed neighborhood.
Choice 3: Edge Type Filtering
Not all edges matter for every query. If you're searching for "documents related to the Q3 budget," "Mentions" edges are less relevant than "References" or "Implements" edges. The query engine weights edge types differently based on intent.
The Universal Command infers intent from your search. "What contracts reference the budget?" weights "References" heavily. "What does the budget contradict?" weights "Contradicts" heavily. This is handled in lib/intentHeuristics.ts—a heuristic engine that maps natural language to edge type preferences.
Why This Matters for You
The knowledge graph is invisible when it works. You search for something, you get relevant results, you move on. But the architecture has three practical consequences:
First: Search gets smarter without you doing anything. You don't tag documents. You don't write metadata. The graph builds itself. Every time you upload a document, every time Claude analyzes it, the graph gets denser and more useful.
Second: You can ask questions, not just search. "Show me everything that contradicts the current budget." "What implements the Q3 strategy?" These aren't keyword searches; they're graph queries. The Universal Command understands them because the graph understands relationships.
Third: Latency stays low even as your workspace grows. We've tested with 50,000 documents. Query time is still under 300ms. That's because we're not doing full-graph traversal; we're querying indexed neighborhoods. The architecture scales.
The Trade-offs We Made
We chose this approach over alternatives. It's worth being honest about what we gave up:
We don't do real-time graph updates. When you edit a document, the edge extraction happens asynchronously. There's a 10-30 second lag before the graph reflects your changes. For most use cases, this is fine. For real-time collaboration on the same document, it's not ideal. We accept this trade-off.
We don't support custom edge types. Some teams want domain-specific relationships ("depends on," "funded by," "approved by"). We could make edges customizable, but it would complicate the inference engine and make intent detection harder. We chose consistency over flexibility.
We weight semantic edges lower than structural ones. A reference you explicitly made is trusted more than one Claude inferred. This means the graph is conservative—it won't hallucinate relationships. But it also means some real connections might be missed. We think that's the right bias.
What's Next
We're working on two improvements:
Graph Visualization — You'll be able to see the graph for a document: all connected nodes, edge types, and weights. This is coming in Q3. It's not just pretty; it helps you understand why a document was returned in a search.
Edge Confidence Tuning — Right now, edge weights are determined by our heuristics. We're building a feedback loop: when you click a search result, we learn. When you ignore a result, we learn. The weights adjust. The graph gets smarter.
The knowledge graph is the foundation of AiFiler's search and retrieval. It's why finding the right document is fast, why the Universal Command understands complex queries, and why your workspace becomes more useful the more you use it.
It's also why we don't need you to organize your documents. The graph does it for you.
Enjoyed this article?
Get more articles like this delivered to your inbox. No spam, unsubscribe anytime.