You've probably noticed that when you search for a document in AiFiler, results appear instantly. Related documents surface before you finish typing. Tags suggest themselves. That's not magic—it's a knowledge graph running in the background, connecting documents, people, concepts, and actions into a queryable web.
Most document tools treat each file as an island. They index text, they rank by recency, and that's it. AiFiler inverts that model. Every document is a node. Every relationship—whether it's "authored by," "references," "shares metadata with," or "was edited alongside"—is an edge. The graph doesn't just help you find things. It helps you understand what matters.
This is the architecture behind that system.
The Core Model: Nodes and 8 Edge Types
The knowledge graph operates on a simple premise: documents aren't isolated. They exist in relation to other documents, people, projects, and concepts. We represent this as a directed graph where each relationship has semantic meaning.
Nodes in our graph include:
- Documents (files, regardless of type)
- Users (creators, collaborators, viewers)
- Workspaces (organizational boundaries)
- Tags and categories (metadata)
- Concepts extracted from content (via Claude)
- Time periods (for temporal queries)
Edges connect these nodes with typed relationships. We currently support 8 edge types:
- AUTHORED_BY — A document was created by a user
- REFERENCES — Document A mentions or links to Document B
- SHARES_METADATA — Documents have overlapping tags, categories, or extracted concepts
- EDITED_BY — A user modified a document (with timestamp)
- ACCESSED_BY — A user viewed a document (with frequency)
- BELONGS_TO — A document is in a specific workspace or folder
- COLLABORATES_ON — Multiple users worked on the same document
- TEMPORAL_PROXIMITY — Documents were created or modified within a time window
Each edge carries metadata: creation date, weight (how strong the relationship is), and context (what triggered the relationship).
The Data Flow: From Document Ingestion to Graph Query
Here's how a document moves through the system and gets woven into the graph:
1. Ingestion
When you upload a file, AiFiler parses it using lib/ingest/parseFile.ts. This extracts text, metadata, and structure. The file is stored in Anthropic's Files API via lib/ai/fileStore.ts, giving us a persistent, queryable reference.
2. Concept Extraction
The document text flows to Claude via lib/ai/client.ts with a specific prompt: extract entities, themes, and relationships. Claude returns structured JSON with:
- Key concepts (e.g., "Q3 budget," "client retention," "API deprecation")
- Related documents (if mentioned by name or reference)
- Suggested tags
- Confidence scores for each extraction
3. Graph Construction
These extractions become edges. If Claude identifies that "Document A references Document B," we create a REFERENCES edge. If multiple documents share the same concept, we create SHARES_METADATA edges. This happens asynchronously via a Supabase trigger, so ingestion doesn't block the user.
4. Query Execution
When you search, the Universal Command (lib/intelligence/universalRouter.ts) interprets your intent. If you search for "contracts from Q3," the intent handler translates this to a graph query:
MATCH (doc:Document)
WHERE doc.concept CONTAINS "Q3"
AND doc.concept CONTAINS "contract"
RETURN doc,
COUNT(relationships) as relevance_score
ORDER BY relevance_score DESC
This isn't literal Cypher (we don't use Neo4j), but the logic is identical. We query Supabase's jsonb columns and materialized views that pre-compute common paths.
5. Ranking and Re-ranking
Raw graph results are re-ranked by:
- Edge weight (how confident we are in the relationship)
- Recency (EDITED_BY and ACCESSED_BY edges with recent timestamps)
- User context (documents your team has accessed recently score higher)
- Temporal alignment (if you're searching for Q3 documents, TEMPORAL_PROXIMITY edges matter)
This ranking happens in lib/intelligence/intentHandlers.ts before results are returned to the UI.
Why We Avoided Traditional Graph Databases
You might ask: why not use Neo4j or Amazon Neptune? We evaluated both. Here's what we found:
Operational overhead. Graph databases require their own infrastructure, backups, and scaling decisions. We already run Supabase. Adding another database means another thing to monitor, another set of credentials, another potential failure point.
Latency trade-offs. Graph databases excel at deep traversals (finding friends-of-friends-of-friends). Our queries are shallower: we care about direct relationships and 2-hop connections. Supabase's jsonb + materialized views handle this faster than a separate database round-trip.
Cost. At AiFiler's scale, Supabase's compute is cheaper than provisioning a managed graph database.
The trade-off we made: We pre-compute common graph patterns into materialized views. Instead of computing "all documents related to this user" on every query, we refresh a view every 15 minutes. This trades freshness for speed. For most use cases—finding documents, discovering related files, building recommendations—15-minute staleness is invisible.
The Materialized View Strategy
Here's the pattern we use:
CREATE MATERIALIZED VIEW document_relationships AS
SELECT
d1.id as source_doc,
d2.id as target_doc,
json_agg(json_build_object(
'edge_type', edge.type,
'weight', edge.weight,
'created_at', edge.created_at
)) as edges,
COUNT(*) as relationship_count,
MAX(edge.weight) as max_weight
FROM documents d1
JOIN graph_edges edge ON d1.id = edge.source_id
JOIN documents d2 ON d2.id = edge.target_id
GROUP BY d1.id, d2.id;
CREATE INDEX idx_doc_relationships ON document_relationships(source_doc, max_weight DESC);
When you search, we query this view instead of computing relationships live. The view refreshes every 15 minutes via a cron job. For real-time accuracy, we also maintain an in-memory cache (lib/knowledge/) that tracks recent changes and merges them with the materialized view results.
Handling Edge Weight and Decay
Not all relationships are equal. If you co-authored a document with someone six months ago, that's weaker than co-authoring one yesterday. We model this with edge weight decay:
weight(t) = initial_weight * e^(-λt)
Where t is time since the relationship was created and λ is a decay constant (we use 0.001, giving a half-life of ~700 days).
This means:
- Fresh relationships (created today) have full weight
- Relationships older than a year contribute minimal signal
- The graph naturally forgets stale connections without explicit deletion
We recompute weights during the materialized view refresh. This keeps the ranking fresh without requiring a full graph recomputation on every query.
The Intent-to-Graph Bridge
This is where the architecture gets interesting. The Universal Command doesn't just search—it understands intent. When you type "show me all contracts we discussed with Acme," the intent handler (lib/intelligence/intentHandlers.ts) does three things:
-
Parse the intent. Extract entities (Acme = company, contracts = document type) and relationships (discussed = REFERENCES edge, we = collaboration).
-
Build a graph query. Translate the intent into a pattern:
- Find documents tagged "contract"
- That reference or mention "Acme"
- That have COLLABORATES_ON edges (indicating team discussion)
-
Execute and rank. Run the query against the materialized view, re-rank by recency and team context, and return.
This bridge between natural language and graph queries is what makes AiFiler's search feel intelligent. You're not typing boolean operators. You're describing what you need, and the graph finds it.
Why This Matters for You
The knowledge graph does three things that traditional search can't:
1. It finds documents you didn't know existed. If you search for "Q3 budget," the graph returns not just documents tagged "Q3" and "budget," but also documents that reference the Q3 budget, were edited by the same people who edited the budget, or were created in the same time window. You discover context.
2. It learns from your team's behavior. Every time someone accesses a document, that's an ACCESSED_BY edge with a timestamp. The graph sees patterns: "This team always looks at these three documents together." Next time you search, those relationships influence ranking.
3. It scales without getting slower. Because we pre-compute relationships and cache frequently-accessed patterns, queries stay fast even as your document library grows to thousands of files. The architecture doesn't degrade gracefully—it's designed to stay fast by design.
The knowledge graph is why AiFiler's search, recommendations, and the Intelligence system (which powers features like automatic tagging and related document suggestions) all feel connected. They're not separate features. They're all queries against the same graph, interpreted through different intents.
Next time you search in AiFiler and see a result you didn't expect—a document that turned out to be exactly what you needed—that's the graph working. Eight edge types, zero latency, and a lot of careful engineering underneath.
Enjoyed this article?
Get more articles like this delivered to your inbox. No spam, unsubscribe anytime.