You've probably noticed that AiFiler doesn't just find documents—it understands how they relate to each other. A contract connects to a client, which connects to a project, which connects to budget forecasts and emails. That's not magic. It's a deliberately designed graph architecture that treats relationships as first-class citizens.
Most document tools treat your files as isolated objects. They're stored in folders, tagged with metadata, indexed for full-text search. But they're still fundamentally alone. AiFiler inverts that assumption: documents are nodes, and the connections between them are where the intelligence lives.
Why We Built a Graph at All
The problem we were solving was simple to state, hard to execute: context collapse. You search for "Q3 budget," and you get 47 results. Some are actual budgets. Some are emails mentioning budgets. Some are presentations that reference budget numbers. Some are Slack conversations about budget reviews. Without understanding how these documents relate to each other, the search result becomes a haystack.
A traditional relational database would solve this with foreign keys and JOINs. But documents in the real world don't fit neat schemas. A contract doesn't have a client_id field. It has a mention of the client's name buried in paragraph 3. An email thread references three projects, two people, and a budget number—but not in a structured way.
A graph database lets us say: "This document mentions this entity. This entity appears in these other documents. These documents were created by the same person. This person collaborated with that person on this project." It's flexible enough to handle messy reality while structured enough to enable fast queries.
The Eight Edge Types
We settled on eight types of edges, each representing a different kind of relationship:
1. Document-to-Document (Mentions) When document A references document B by name, title, or ID, we create a mentions edge. This is the simplest relationship but surprisingly powerful. A contract that mentions a specific SOW creates a direct path for discovery.
2. Document-to-Entity (Contains) Entities are extracted from document content: people, organizations, dates, monetary amounts, technical terms. When Claude reads a document and identifies "Acme Corp" as an organization, we create a contains edge. This is where the AI layer becomes essential—we're not just doing regex matching; we're doing semantic extraction.
3. Entity-to-Entity (Related) Some entities are related to other entities. A person works at an organization. An organization is a subsidiary of another organization. A project is funded by a budget. These edges come from both explicit mentions in documents and from Claude's understanding of real-world relationships.
4. Document-to-Author (CreatedBy) Every document has an author. This edge is simple but crucial for access control, audit trails, and for grouping documents by team member.
5. Document-to-Workspace (BelongsTo) Multi-workspace support means a document belongs to exactly one workspace. This edge is a hard boundary—it's how we enforce isolation between organizations.
6. Document-to-Collection (InCollection) Collections are user-created groupings. Unlike folders, a document can belong to multiple collections. Each collection membership is an edge.
7. Document-to-Knowledge-Base (InKnowledgeBase) Knowledge bases are curated subsets of your workspace—the documents you want Claude to consider when answering questions. This edge determines search scope for intelligence queries.
8. Entity-to-Document-Version (ExtractedFrom) Documents change. We keep extraction metadata tied to specific versions, so if you update a contract, we know which entities came from which version. This prevents stale relationships from polluting your graph.
How Data Flows Through the System
Here's the architecture in motion:
Ingestion → Extraction → Graph Construction → Query Resolution
When you upload a document (or connect a Google Drive folder), the file arrives at lib/ingest/parseFile.ts. We determine the file type—DOCX, PDF, XLSX, PPTX—and extract raw text. For PDFs, we use Chromium with Sparticuz to handle complex layouts. For structured formats like XLSX, we preserve the table structure.
That text flows to Claude via the Anthropic Files API (managed in lib/ai/fileStore.ts). Claude reads the document and returns structured extractions: entities, relationships, intent signals, and a semantic summary. We don't ask Claude to return raw JSON—we use prompt engineering to get consistent, parseable output.
These extractions hit Supabase (via lib/supabaseServer.ts for server-side operations). We insert nodes for each entity and edges for each relationship. The graph isn't a separate database; it's a normalized schema in Postgres with foreign key constraints. This keeps transactions atomic and queries fast.
When you search, the query hits lib/searchParser.ts, which parses your natural language or operators into a graph traversal. "Show me all documents related to Acme Corp" becomes: find the Acme Corp entity node, traverse all contains edges to find documents, then sort by relevance.
The Universal Command system (lib/intelligence/universalRouter.ts) handles intent routing. If you ask "What's our contract with Acme?", the router recognizes a search intent, parses the query, and executes a graph traversal. If you ask "Add this to the Acme project," it's a mutation intent—we create new edges.
The Performance Challenge: Zero Latency
A naive graph traversal is expensive. If you have 10,000 documents and each one mentions 50 entities, and each entity appears in 100 documents, a full traversal could touch millions of rows.
We solved this with three techniques:
Denormalization: We cache common traversals. When you open a document, we pre-fetch related documents using SWR with localStorage prefixing. The first load hits the database; subsequent loads serve from cache. When the document updates, we invalidate the cache key.
Indexed Queries: Postgres query planning is smart. We index on (edge_type, source_id) and (edge_type, target_id) so that "find all documents containing entity X" is a single index lookup, not a table scan.
Lazy Loading: The UI doesn't load the entire graph. The Knowledge view shows immediate relationships (one hop from the current document). Expanding a section triggers a fresh query for the next level. This keeps the initial render fast.
Why This Matters for You
The graph architecture has three user-facing consequences:
First, search becomes contextual. When you search for "budget," AiFiler doesn't just find documents with that word. It understands that budgets connect to projects, which connect to people, which connect to emails. You can navigate that web instead of wading through a flat list.
Second, the Universal Command (Ctrl+Shift+A) can reason across documents. When you ask "Who's the point person for the Acme project?", the system traverses from the Acme entity to project documents to team member entities. It's not guessing—it's following the graph.
Third, AI context becomes intelligent. When you ask Claude a question via the Knowledge view, we don't just dump your entire workspace into the context window. We traverse the graph to find the most relevant documents, then include those. This keeps context focused and costs down.
What We'd Do Differently
If we rebuilt this today, we'd probably use a dedicated graph database like Neo4j for the relationship layer, keeping Postgres for documents and metadata. Postgres is fast enough for our current scale, but a true graph engine would give us more sophisticated traversal primitives and better optimization for complex queries.
We'd also invest earlier in edge versioning. Right now, if a relationship changes, we update it in place. But sometimes you want to know how the graph looked six months ago. Timestamping edges would let us answer "What was the project structure in Q2?" without maintaining separate snapshots.
The graph will keep evolving. As we add more AI capabilities—like automatic relationship inference or anomaly detection—the edge types will multiply. But the core principle stays: relationships are data, and data should be queryable.
That's what separates a document tool from a knowledge tool.
Enjoyed this article?
Get more articles like this delivered to your inbox. No spam, unsubscribe anytime.