You've probably noticed that when you open a document in AiFiler, related files appear instantly. No waiting. No "refreshing the index." That's not magic—it's a carefully designed graph architecture that treats your documents as nodes in a network, not isolated files in a folder.

This is the story of how we built it, why we chose this approach over traditional search, and what it means when you're trying to find that one contract that references another contract from three years ago.

The Problem We Solved

Traditional document management treats files as islands. You search for keywords, get a ranked list, and hope the one you need is in the top five results. But knowledge work isn't about isolated documents—it's about relationships.

A contract references a statement of work. That SOW references a pricing agreement. The pricing agreement is attached to an email thread with a client. That email thread mentions a project that has 47 other documents. None of these connections exist in a traditional search index.

We realized early on that AiFiler needed to understand these relationships automatically. Not through manual tagging (nobody does that consistently), but by analyzing document content and extracting what actually connects them.

The Graph Structure: 8 Edge Types

At the core of AiFiler's knowledge graph is a simple concept: nodes (documents) connected by edges (relationships). We defined eight specific relationship types that cover how documents actually relate to each other in real work:

REFERENCES: Document A mentions Document B by name or ID. This is directional—if your contract references a pricing schedule, that's a REFERENCES edge.

CONTAINS_EXTRACT: Document A contains a direct excerpt or copy of content from Document B. We detect this through semantic similarity and exact text matching.

RELATED_TOPIC: Two documents discuss the same subject matter without explicit reference. A project proposal and a competitive analysis might both discuss "enterprise SaaS pricing models."

TEMPORAL_SEQUENCE: Documents with a clear time-based relationship. An email from March references a meeting from February. A revision document supersedes an earlier version.

PARTICIPANT_OVERLAP: Documents share the same people, companies, or entities. If two contracts both involve the same vendor, they're connected.

DERIVED_FROM: One document is clearly derived from another. A summary document, a presentation built from research, a report extracted from raw data.

CONTRADICTS: Two documents make conflicting claims about the same subject. This one catches a lot of version control issues.

ANNOTATES: One document provides commentary, review, or markup on another. A marked-up PDF, a comment document, feedback on a draft.

This taxonomy came from analyzing actual user workflows. We watched how people actually navigate their documents and reverse-engineered the relationships they were implicitly following.

How We Extract Relationships

The extraction happens in two phases:

Phase 1: Semantic Analysis runs when you upload a document. We use Claude with the Files API to analyze the document content and extract:

Explicit mentions of other documents (by title, ID, date, or reference number)
Key entities (people, companies, projects, products)
Topics and themes
Temporal markers

This happens in lib/intelligence/intentHandlers.ts through the document ingestion pipeline in lib/ingest/parseFile.ts. The AI doesn't just extract text—it understands context. It knows that "the Q3 budget" in a September email is different from "the Q3 budget" in a December email.

Phase 2: Graph Computation matches extracted data against existing documents in your workspace. This is where the real work happens. We:

Search for documents with matching entities
Calculate semantic similarity between topic sets
Detect temporal relationships by comparing dates and references
Identify version chains and derived documents

The computation runs asynchronously and updates incrementally. When you add a new document, we don't rebuild the entire graph—we compute edges only for that document against existing nodes.

Storage and Query Architecture

Here's where most knowledge graph implementations stumble: they're either fast or accurate, rarely both.

We use Supabase with a hybrid approach:

The Edge Table stores the graph structure itself:

{
  source_doc_id: uuid,
  target_doc_id: uuid,
  edge_type: enum (REFERENCES | CONTAINS_EXTRACT | ...),
  confidence: float (0.0-1.0),
  metadata: jsonb,
  created_at: timestamp
}

Confidence matters. A REFERENCES edge where we found an explicit mention gets 0.95. A RELATED_TOPIC edge based on semantic similarity might be 0.72. The UI uses this to decide what to show.

The Query Layer lives in lib/knowledge/ and uses SWR with localStorage prefixing for offline-first data fetching. When you open a document, we:

Query all edges where source_doc_id = current_document (outgoing relationships)
Query all edges where target_doc_id = current_document (incoming relationships)
Filter by confidence threshold (user-configurable, default 0.70)
Fetch the related document metadata in a single batch query
Cache the result locally with a 5-minute TTL

The entire operation—from click to rendered related documents—takes 200-400ms on average. No pagination, no "load more" buttons. All relationships visible immediately.

Why Zero Latency Matters

Most graph databases are built for analytical queries. You ask "show me all documents that reference contracts from Q3 2024" and wait for the query to scan millions of edges.

AiFiler's graph is built for immediate retrieval. When you click a document, you want to see related files instantly. We optimize for that specific access pattern.

We do this through:

Aggressive indexing on (source_doc_id, edge_type) and (target_doc_id, edge_type)
Materialized views for common queries (all relationships for a document, all high-confidence edges)
Query result caching at the application layer
Confidence-based filtering to reduce result sets before rendering

The trade-off: analytical queries are slower. If you want "show me all CONTRADICTS edges in my workspace," that's a full table scan. But that's not a common operation. The common operation is "I'm reading this document, what else should I know?"

Integration with Universal Command

The knowledge graph powers the Universal Command (Ctrl+Shift+A). When you invoke it, the command router in lib/intelligence/universalRouter.ts uses graph relationships to provide context-aware suggestions.

If you're viewing a contract, Universal Command suggests:

Related SOWs and pricing agreements (REFERENCES, RELATED_TOPIC)
Email threads mentioning this contract (PARTICIPANT_OVERLAP)
Earlier versions or superseded agreements (TEMPORAL_SEQUENCE)
Documents that contradict terms (CONTRADICTS)

This context comes from the graph. No full-text search. No keyword matching. Pure relationship navigation.

Real-World Example: The Contract Audit

Let's trace through how this works in practice. You're auditing a contract from 2023. You open it in AiFiler.

The system queries the graph and finds:

3 REFERENCES edges (this contract mentions 3 other documents)
2 REFERENCED_BY edges (2 newer documents reference this one)
1 DERIVED_FROM edge (a summary document was created from this contract)
4 PARTICIPANT_OVERLAP edges (4 other documents involve the same vendor)

All of this renders in the knowledge panel without a single additional click. You can see the chain of related documents. You click one, the graph updates instantly, and you see what's connected to that document.

This is how you find that one clause that matters. Not by searching for keywords. By following relationships.

What This Means For You

If you've used traditional document management, you know the frustration: you find one document, but you don't know what else is connected to it. You end up doing multiple searches, opening multiple tabs, building context manually.

The knowledge graph eliminates that. Your documents organize themselves into a network. The relationships are extracted automatically. Navigation is instant.

For power users, this means you can work faster. For teams, it means knowledge doesn't get lost when someone leaves—the relationships are explicit and discoverable.

The architecture is designed to scale. We've tested it with workspaces containing 50,000+ documents. Query times remain sub-500ms. The graph grows, but the performance doesn't degrade because we're querying by document ID, not scanning the entire graph.

This is why AiFiler feels different from other document tools. It's not just better search. It's a fundamentally different way of thinking about how documents relate to each other.

How AiFiler's Knowledge Graph Actually Works: 8 Edge Types, Zero Latency

The Problem We Solved

The Graph Structure: 8 Edge Types

How We Extract Relationships

Storage and Query Architecture

Why Zero Latency Matters

Integration with Universal Command

Real-World Example: The Contract Audit

What This Means For You

Enjoyed this article?

Related Articles

The Architecture Behind AiFiler's Knowledge Graph: 8 Edge Types, Zero Latency

How AiFiler's Universal Command Handles 50+ Intents

How AiFiler's Universal Command Routes 50+ Intents Without Breaking

Ready to try AiFiler?