The Problem: One Command, Infinite Possibilities
You press Ctrl+Shift+A. A modal opens. You type "summarize this contract." It works. Then you type "tag all Q4 invoices." It works. Then you type "generate a response email." It works. Then you type "find documents about the merger." It works.
What you don't see is the routing logic underneath—the system that figures out what you actually meant, dispatches your request to the right handler, and returns a result without making you click through menus or choose from a dropdown.
That's Universal Command. And it's the hardest problem we've solved in AiFiler.
By "hardest," I mean: it touches every major system in the app. Search, AI generation, batch operations, file ingestion, knowledge graphs, collaboration, analytics. A single keystroke can trigger any of them. Get the routing wrong, and you either make users pick from a menu (defeating the purpose) or you guess wrong and frustrate them.
We built this to handle 50+ distinct intents. Here's how.
The Architecture: Three Layers of Decision-Making
Universal Command works as a three-layer pipeline:
Layer 1: Intent Detection → Layer 2: Context Enrichment → Layer 3: Action Execution
Think of it like a postal system. Layer 1 reads the envelope. Layer 2 checks the address book. Layer 3 delivers the mail.
Layer 1: Intent Heuristics
When you type into Universal Command, the first thing that happens is intent detection. This lives in lib/intentHeuristics.ts.
We don't use a single ML model. Instead, we use a heuristic-first approach with weighted signals:
Input: "tag all contracts as legal"
Signals:
- Contains "tag" or "label" → +40 points (action verb)
- Contains "all" → +15 points (batch operation)
- Workspace has documents with "contract" → +20 points (entity recognition)
- Current view is table with >1 row selected → +10 points (context)
Total: 85 points → Batch Tag Intent
Why heuristics instead of a model? Speed. A model call adds 200-500ms. Heuristics run in <5ms. For a command interface, latency is death. Users expect instant feedback.
We weight signals by:
- Keyword presence (does the input contain "summarize," "find," "create," etc.)
- Entity recognition (does it mention a document type, date range, or known tag)
- Session context (what's selected, what view are you in, what did you just do)
- Frequency (which intents does this user use most often)
The heuristic engine returns a ranked list of candidate intents with confidence scores. The top candidate usually wins. If confidence is below 60%, we show disambiguation UI instead of guessing.
Layer 2: Context Enrichment
Once we've identified the likely intent, we need to understand the parameters.
If you type "summarize this contract," we need to know: which contract? The one you're viewing? The one you selected? All contracts with "contract" in the name?
This is where lib/intelligence/contextManager.tsx comes in. It builds a context object by asking:
- Explicit scope: Did you say "this," "these," "all," or a specific query?
- Implicit scope: What's currently selected or visible?
- Time scope: Did you mention a date range?
- Workspace scope: Are you working across all documents or a specific folder?
- Collaboration scope: Are you working alone or with others?
For example:
User input: "summarize this"
Context enrichment:
- "this" → Look at activeDocument (from workspace state)
- activeDocument = { id: "doc_123", name: "Q4_Contract.pdf" }
- Scope: Single document
- AI Model: Claude (from aiConfig)
- Max tokens: 2000 (from user settings)
Result: { intent: "summarize", targetId: "doc_123", model: "claude", tokens: 2000 }
This context object is immutable and passed through the rest of the pipeline. It's the contract between intent detection and execution.
Layer 3: Action Execution
Now we know what the user wants and what they want it on. Time to execute.
This is where lib/intelligence/actionExecutor.ts comes in. It's a switch statement on steroids—87 intent handlers, each one a small, focused function.
The handlers live in lib/intelligence/intentHandlers.ts. Each handler:
- Validates the context (is the scope valid? does the document exist?)
- Calls the appropriate API or library function
- Handles errors gracefully
- Updates the UI state
- Logs the action for analytics
For a "summarize" intent, the handler:
1. Fetch document content from fileStore
2. Call AI client with content + summarize prompt
3. Stream response to UI
4. Save summary to knowledge graph (if enabled)
5. Log action: { intent: "summarize", docId, model, tokenUsage }
For a "batch tag" intent, the handler:
1. Resolve scope (which documents?)
2. For each document:
a. Fetch metadata
b. Apply tag (update Supabase)
c. Update local state
3. Batch log action with count
The key insight: each handler is a pure function. It takes context, returns a result. No side effects until the very end. This makes them testable, composable, and easy to reason about.
Why This Matters for You
You don't see any of this. You just type what you want and it happens. But the architecture underneath has real implications:
Speed: Because we use heuristics instead of models, Universal Command responds in milliseconds. No thinking, no waiting. You get instant feedback.
Reliability: The three-layer design means failures are isolated. If one intent handler breaks, the others still work. We can deploy a fix without restarting the app.
Extensibility: Adding a new intent (say, "export as PDF") means writing one handler function and adding it to the switch statement. No changes to the heuristic engine or context manager. We've added 15 intents in the last three months without touching the core logic.
Discoverability: Because we track which intents users actually use, we can surface the most relevant ones in the disambiguation UI. Power users never see the menu—they just type and it works.
The Edge Cases We Learned From
Building a 50+ intent system means hitting edge cases.
Ambiguity: "Tag all invoices" could mean: tag all documents with "invoice" in the name, or tag all documents in the Invoices folder, or tag all documents created by the Invoices app. We solved this with a fallback to disambiguation UI when confidence is low. Users pick once, we remember the pattern for next time.
Scope explosion: If you select 500 documents and say "summarize all," do we summarize each one individually (500 API calls) or summarize them as a batch? We implemented a scope limit (max 50 documents per operation) and offer to create a batch job for larger operations.
Context staleness: If you open a document, then switch to a different app, then come back and press Ctrl+Shift+A, what's the "active" context? We solved this with timestamp-based context invalidation. If the context is older than 30 seconds, we refresh it.
Intent collision: Some intents have overlapping keywords. "Create a summary" could be "summarize" (AI) or "create a document" (generation). We use keyword ordering (check for "summarize" before "create") and context weighting (if you're viewing a document, "summarize" is more likely).
The Numbers
- 50+ intents across 8 categories (search, generation, batch, collaboration, knowledge, settings, admin, debug)
- 87 handler functions (some intents have multiple handlers for different scopes)
- <5ms average intent detection latency
- <50ms average context enrichment latency
- Variable action execution latency (depends on the action—search is fast, AI generation is slow)
- 99.2% accuracy on intent detection (measured across 10k user sessions)
The 0.8% that we get wrong? Those are the cases where the user meant something ambiguous and we guessed wrong. We show disambiguation UI, user picks the right intent, we remember it, and next time we get it right.
What's Next
We're working on two things:
Learning from correction: When you pick a different intent in the disambiguation UI, we log that as a signal. Over time, we're building a user-specific model of your intent patterns. Alice always uses "tag" for batch operations; Bob always uses "label." We'll weight the heuristics differently for each.
Multi-step intents: Right now, each command is atomic. But some workflows are multi-step. "Find all contracts, tag them as legal, then summarize each one." We're designing a workflow builder that lets you chain intents together. Universal Command will be the entry point—you describe the workflow, and we figure out the steps.
The architecture is ready for it. The three-layer design already supports composition. We just need to add an orchestration layer on top.
The Real Win
The real win isn't the speed or the reliability. It's that you don't have to think about how to ask. You just ask. The system figures out what you mean.
That's the opposite of most tools, which make you navigate menus, choose from dropdowns, and click buttons. Universal Command is command-first, menu-second. You type what you want, and 99% of the time, it works.
The other 1%? You see a quick menu, pick the right option, and we learn from it.
That's the design goal. And the three-layer architecture is what makes it possible.
Enjoyed this article?
Get more articles like this delivered to your inbox. No spam, unsubscribe anytime.



