You've got 500 documents. A contract references a statement of work. The SOW links to a budget spreadsheet. The budget ties back to three client emails. Finding all of that shouldn't require five separate searches or a manual paper trail.
This is the problem AiFiler's knowledge graph solves. But building a graph system that stays fast, doesn't consume infinite memory, and actually helps you find what you need—that's harder than it looks.
The Problem We Started With
Most document tools treat files as isolated objects. You search, you get results, you open a file. Everything else is disconnected. Even tools that claim to have "relationships" often implement them as simple one-to-one links: "this contract is related to that email."
That's not how knowledge actually works. Documents exist in a web of context. A client contract isn't just related to one email—it connects to the entire conversation thread, the original RFP, the negotiation notes, the signed PDF, and the three follow-up amendments. Real relationships are:
- Multi-directional: A references B, but B also references A
- Typed: A "cites" B differently than A "amends" B
- Weighted: Some relationships matter more than others
- Transitive: If A connects to B and B connects to C, that path has meaning
When we started AiFiler, we knew we needed a graph. But graphs are expensive. They can be slow to query, memory-intensive to store, and nightmarish to keep consistent as documents change.
How We Approached It: Constraints First
Rather than build a generic graph engine, we asked: what are the actual relationships that matter in document work?
We spent two weeks analyzing user workflows. We watched people use AiFiler, asked them what connections they cared about, and logged what they actually searched for. We found eight relationship patterns that covered 94% of real usage:
- Cites — Document A references Document B (e.g., contract cites budget)
- Amends — Document A modifies Document B (e.g., amendment amends original contract)
- Responds to — Document A answers Document B (e.g., email responds to inquiry)
- Elaborates — Document A provides detail on Document B (e.g., spec elaborates on requirements)
- Contradicts — Document A conflicts with Document B (e.g., revised policy contradicts old one)
- Supersedes — Document A replaces Document B (e.g., v2 supersedes v1)
- Supports — Document A backs up Document B (e.g., evidence supports claim)
- Relates to — Generic connection (fallback for everything else)
That constraint—eight types, not unlimited—changed everything. It meant we could optimize for these specific patterns rather than build a universal graph engine.
The Implementation: Sparse Adjacency + Materialized Paths
Here's the architecture:
Data layer: We store edges in a Supabase table with this schema:
edges (
id uuid primary key,
source_doc_id uuid,
target_doc_id uuid,
edge_type enum('cites', 'amends', 'responds_to', ...),
confidence float (0-1),
created_at timestamp,
workspace_id uuid
)
This is a sparse adjacency list. For 500 documents, we might have 2,000 edges, not 250,000 potential slots. That matters.
Query layer: When you open a document, we don't traverse the entire graph. Instead, we use materialized paths—precomputed one-hop and two-hop neighbors stored in a separate table:
materialized_paths (
doc_id uuid,
neighbor_id uuid,
hops int (1-2),
edge_types text[] (array of types),
path_strength float,
updated_at timestamp
)
When a document changes, we recalculate only the paths that involve that document—not the entire graph. This is O(n) where n is the number of edges touching that doc, not O(n²) for the whole graph.
In-memory layer: The Knowledge state hook in lib/knowledge/ caches materialized paths using SWR with localStorage prefixing. When you click a document, we load its neighbors instantly from cache, then verify freshness in the background. If a neighbor changed, we update it without blocking the UI.
The data flows like this:
Document changes → Trigger edge recalculation → Update materialized_paths
→ Invalidate SWR cache
→ User sees fresh neighbors
Why Eight Types, Not Unlimited?
This is where the engineering choice pays off. With eight types, we can:
-
Render relationships clearly: Each type has a visual style. "Amends" shows as a red dotted line. "Cites" shows as blue. Users instantly understand the connection type without reading labels.
-
Weight them intelligently: When you search, we can boost results that are connected by "cites" or "amends" (strong connections) differently than "relates to" (weak). This is built into the search ranking in
lib/searchParser.ts. -
Detect cycles: With typed edges, we can identify problematic patterns. If Document A amends B, B amends C, and C amends A, that's a cycle we should flag. With unlimited types, cycle detection becomes expensive.
-
Train the AI to detect them: Claude (our default AI brain) knows exactly what to look for. The intent handler for relationship detection (
lib/intelligence/intentHandlers.ts) has specific patterns for each type. "Amends" looks for amendment language. "Responds to" looks for quoted text or timestamps. This beats a generic "find all relationships" approach.
The Real Cost: Confidence Scoring
The tricky part isn't the graph structure—it's knowing when a relationship actually exists.
When you upload a document, we run it through Claude to extract relationships. For each one, Claude returns a confidence score (0-1). A contract explicitly citing a budget gets 0.95. A document mentioning a client name that might relate to another document gets 0.3.
We store all of them, but only surface relationships above a threshold. Users can adjust this in Settings > Knowledge > Relationship Sensitivity. Set it to 0.9 if you want only high-confidence links. Set it to 0.5 if you want exploratory connections.
This is where the typed edges help again. We can tune thresholds per type. "Amends" relationships need 0.85+ confidence (they're usually explicit). "Relates to" relationships can be 0.4+ (they're more exploratory).
What This Means for You
If you're a power user, here's how this translates to speed:
-
Open any document in AiFiler. Look at the right sidebar—you'll see "Related Documents" grouped by relationship type.
-
Click on a relationship (e.g., "Cites") to see all documents this one references. This is instant because it's cached.
-
Use the Knowledge view to see your entire graph for a workspace. You can filter by edge type using the dropdown. This view updates in real-time as you add documents.
-
Search with context: When you search for "budget," AiFiler doesn't just find documents with that word. It finds documents connected to budget-related docs through the graph. A contract that cites a budget ranks higher than a contract that just mentions "budget" in passing.
The architecture is built for scale. We've tested it with 50,000 documents and 200,000 edges. Query time stays under 200ms. Memory footprint stays under 50MB for cached paths. That's because we're not storing a dense matrix or doing expensive traversals. We're storing sparse edges and precomputed neighbors.
Lessons We Learned
Constraint is a feature. We almost built a system that could handle unlimited relationship types. It would've been more "flexible." It would also have been slower, harder to visualize, and impossible to tune. Eight types forced us to think about what actually matters.
Materialized paths beat dynamic traversal. Every time we considered computing neighbors on-the-fly, we benchmarked it against precomputed paths. Precomputed always won. The cost of keeping them fresh is worth the speed.
Confidence matters more than completeness. It's tempting to extract every possible relationship. But showing users 50 weak connections clutters the graph. Showing 5 strong ones is actually useful. The confidence scoring lets us be selective.
Type-specific AI detection beats generic NLP. We tried using a generic "find relationships" prompt. Claude would find things that weren't really relationships. When we gave Claude specific patterns for each type—"Amends: look for amendment language, version numbers, and legal modification terms"—accuracy jumped from 67% to 91%.
The Next Layer
Right now, the knowledge graph is document-to-document. We're working on expanding it to include:
- People: Documents connected to the people who created or signed them
- Projects: Documents grouped by project, with relationships between projects
- Concepts: Automatically extracted topics that link documents together
But that's a future article. For now, the eight-type system is doing what it should: making your documents discoverable through their actual relationships, without slowing you down.
If you want to see your knowledge graph in action, open any workspace in AiFiler and click Knowledge in the sidebar. Adjust the relationship sensitivity in settings. Start exploring. The graph is there, working quietly, making sure you never lose a connection again.
Enjoyed this article?
Get more articles like this delivered to your inbox. No spam, unsubscribe anytime.


