You've probably noticed that when you open a document in AiFiler, related files appear instantly. No waiting. No "refreshing the index." That's not magic—it's a carefully designed graph architecture that treats your documents as nodes in a network, not isolated files in a folder.
This is the story of how we built it, why we chose this approach over traditional search, and what it means when you're trying to find that one contract that references another contract from three years ago.
The Problem We Solved
Traditional document management treats files as islands. You search for keywords, get a ranked list, and hope the one you need is in the top five results. But knowledge work isn't about isolated documents—it's about relationships.
A contract references a statement of work. That SOW references a pricing agreement. The pricing agreement is attached to an email thread with a client. That email thread mentions a project that has 47 other documents. None of these connections exist in a traditional search index.
We realized early on that AiFiler needed to understand these relationships automatically. Not through manual tagging (nobody does that consistently), but by analyzing document content and extracting what actually connects them.
The Graph Structure: 8 Edge Types
At the core of AiFiler's knowledge graph is a simple concept: nodes (documents) connected by edges (relationships). We defined eight specific relationship types that cover how documents actually relate to each other in real work:
REFERENCES: Document A mentions Document B by name or ID. This is directional—if your contract references a pricing schedule, that's a REFERENCES edge.
CONTAINS_EXTRACT: Document A contains a direct excerpt or copy of content from Document B. We detect this through semantic similarity and exact text matching.
RELATED_TOPIC: Two documents discuss the same subject matter without explicit reference. A project proposal and a competitive analysis might both discuss "enterprise SaaS pricing models."
TEMPORAL_SEQUENCE: Documents with a clear time-based relationship. An email from March references a meeting from February. A revision document supersedes an earlier version.
PARTICIPANT_OVERLAP: Documents share the same people, companies, or entities. If two contracts both involve the same vendor, they're connected.
DERIVED_FROM: One document is clearly derived from another. A summary document, a presentation built from research, a report extracted from raw data.
CONTRADICTS: Two documents make conflicting claims about the same subject. This one catches a lot of version control issues.
ANNOTATES: One document provides commentary, review, or markup on another. A marked-up PDF, a comment document, feedback on a draft.
This taxonomy came from analyzing actual user workflows. We watched how people actually navigate their documents and reverse-engineered the relationships they were implicitly following.
How We Extract Relationships
The extraction happens in two phases:
Phase 1: Semantic Analysis runs when you upload a document. We use Claude with the Files API to analyze the document content and extract:
- Explicit mentions of other documents (by title, ID, date, or reference number)
- Key entities (people, companies, projects, products)
- Topics and themes
- Temporal markers
This happens in lib/intelligence/intentHandlers.ts through the document ingestion pipeline in lib/ingest/parseFile.ts. The AI doesn't just extract text—it understands context. It knows that "the Q3 budget" in a September email is different from "the Q3 budget" in a December email.
Phase 2: Graph Computation matches extracted data against existing documents in your workspace. This is where the real work happens. We:
- Search for documents with matching entities
- Calculate semantic similarity between topic sets
- Detect temporal relationships by comparing dates and references
- Identify version chains and derived documents
The computation runs asynchronously and updates incrementally. When you add a new document, we don't rebuild the entire graph—we compute edges only for that document against existing nodes.
Storage and Query Architecture
Here's where most knowledge graph implementations stumble: they're either fast or accurate, rarely both.
We use Supabase with a hybrid approach:
The Edge Table stores the graph structure itself:
{
source_doc_id: uuid,
target_doc_id: uuid,
edge_type: enum (REFERENCES | CONTAINS_EXTRACT | ...),
confidence: float (0.0-1.0),
metadata: jsonb,
created_at: timestamp
}
Confidence matters. A REFERENCES edge where we found an explicit mention gets 0.95. A RELATED_TOPIC edge based on semantic similarity might be 0.72. The UI uses this to decide what to show.
The Query Layer lives in lib/knowledge/ and uses SWR with localStorage prefixing for offline-first data fetching. When you open a document, we:
- Query all edges where
source_doc_id = current_document(outgoing relationships) - Query all edges where
target_doc_id = current_document(incoming relationships) - Filter by confidence threshold (user-configurable, default 0.70)
- Fetch the related document metadata in a single batch query
- Cache the result locally with a 5-minute TTL
The entire operation—from click to rendered related documents—takes 200-400ms on average. No pagination, no "load more" buttons. All relationships visible immediately.
Why Zero Latency Matters
Most graph databases are built for analytical queries. You ask "show me all documents that reference contracts from Q3 2024" and wait for the query to scan millions of edges.
AiFiler's graph is built for immediate retrieval. When you click a document, you want to see related files instantly. We optimize for that specific access pattern.
We do this through:
- Aggressive indexing on (source_doc_id, edge_type) and (target_doc_id, edge_type)
- Materialized views for common queries (all relationships for a document, all high-confidence edges)
- Query result caching at the application layer
- Confidence-based filtering to reduce result sets before rendering
The trade-off: analytical queries are slower. If you want "show me all CONTRADICTS edges in my workspace," that's a full table scan. But that's not a common operation. The common operation is "I'm reading this document, what else should I know?"
Integration with Universal Command
The knowledge graph powers the Universal Command (Ctrl+Shift+A). When you invoke it, the command router in lib/intelligence/universalRouter.ts uses graph relationships to provide context-aware suggestions.
If you're viewing a contract, Universal Command suggests:
- Related SOWs and pricing agreements (REFERENCES, RELATED_TOPIC)
- Email threads mentioning this contract (PARTICIPANT_OVERLAP)
- Earlier versions or superseded agreements (TEMPORAL_SEQUENCE)
- Documents that contradict terms (CONTRADICTS)
This context comes from the graph. No full-text search. No keyword matching. Pure relationship navigation.
Real-World Example: The Contract Audit
Let's trace through how this works in practice. You're auditing a contract from 2023. You open it in AiFiler.
The system queries the graph and finds:
- 3 REFERENCES edges (this contract mentions 3 other documents)
- 2 REFERENCED_BY edges (2 newer documents reference this one)
- 1 DERIVED_FROM edge (a summary document was created from this contract)
- 4 PARTICIPANT_OVERLAP edges (4 other documents involve the same vendor)
All of this renders in the knowledge panel without a single additional click. You can see the chain of related documents. You click one, the graph updates instantly, and you see what's connected to that document.
This is how you find that one clause that matters. Not by searching for keywords. By following relationships.
What This Means For You
If you've used traditional document management, you know the frustration: you find one document, but you don't know what else is connected to it. You end up doing multiple searches, opening multiple tabs, building context manually.
The knowledge graph eliminates that. Your documents organize themselves into a network. The relationships are extracted automatically. Navigation is instant.
For power users, this means you can work faster. For teams, it means knowledge doesn't get lost when someone leaves—the relationships are explicit and discoverable.
The architecture is designed to scale. We've tested it with workspaces containing 50,000+ documents. Query times remain sub-500ms. The graph grows, but the performance doesn't degrade because we're querying by document ID, not scanning the entire graph.
This is why AiFiler feels different from other document tools. It's not just better search. It's a fundamentally different way of thinking about how documents relate to each other.
Enjoyed this article?
Get more articles like this delivered to your inbox. No spam, unsubscribe anytime.

