UBI Router Architecture
Benchmarked against: Anthropic โ Streaming Messages (content routing) Spec: MTAAA v1.4 ยง2.1.6 Pattern: LangChain Router Pattern (rule-based conditional routing)
UBI (Universal Brain Intake) is the single unified intake point for all content entering the SuperPortia ecosystem. It detects content type and routes to the appropriate handler โ no duplication, no confusion about where to send things.
Why a unified intake?โ
Before UBI, content arrived through multiple paths (manual ingest, API, email, patrol agents) with no consistent routing. Each path had its own logic, leading to:
- Code files accidentally stored in UB as text
- Image metadata lost because photos went straight to filesystem
- No audit trail for what was ingested when, by whom
UBI solves this with one rule: everything enters through one door.
Router designโ
The Router is deterministic โ no LLM needed. Content type detection (is this a .py file? is this image/png?) is rule-based. Rules handle 80%+ of cases. The remaining ambiguous cases (plain text that could be a spec, article, or email) default to ๆๅญ้็บ for LLM-based classification.
Priority order:
- MIME type โ most reliable (from HTTP header or magic bytes)
- File extension โ fallback when MIME is generic (
application/octet-stream) - Default to text โ safest default; ๆๅญ้็บ can flag unknowns
Routing rulesโ
Code detectionโ
CODE_EXTENSIONS = {
# Languages
".py", ".js", ".ts", ".jsx", ".tsx", ".go", ".rs",
".java", ".c", ".cpp", ".swift", ".kt", ".rb", ".php",
# Shell scripts
".sh", ".bash", ".zsh", ".bat", ".ps1",
# Trading scripts
".pine", ".mq4", ".mq5",
# Config (executable/deployable, not knowledge)
".yaml", ".yml", ".toml", ".json", ".xml", ".ini",
".env", ".dockerfile", ".tf", ".hcl",
# Web markup
".html", ".css", ".scss", ".svg",
# Build
".sql", ".makefile", ".cmake", ".gradle",
}
Data detectionโ
DATA_EXTENSIONS = {
".csv", ".tsv", ".parquet", ".feather", ".arrow",
".xlsx", ".xls", ".ods",
".db", ".sqlite", ".duckdb",
".ndjson", ".jsonl",
".sav", ".dta", ".hdf5", ".h5", ".nc",
}
Image and AV detectionโ
Images route by MIME: image/png, image/jpeg, image/gif, image/webp, image/svg+xml, image/tiff, image/heic.
Audio/video routes by MIME prefix: audio/*, video/*.
Edge casesโ
| Input | MIME | Extension | Routed to | Why |
|---|---|---|---|---|
| Python script | text/x-python | .py | Code Handler | Extension match |
| JPEG photo | image/jpeg | .jpg | Image Handler | MIME match |
| CSV data | text/csv | .csv | Data Handler | Extension overrides text MIME |
| Markdown doc | text/markdown | .md | ๆๅญ้็บ | .md = knowledge, not code |
| JSON config | application/json | .json | Code Handler | Extension in CODE_EXTENSIONS |
| JSON API payload | application/json | (none) | ๆๅญ้็บ | No extension โ default to text |
| PDF document | application/pdf | .pdf | ๆๅญ้็บ | PDF = likely document |
| PineScript | text/plain | .pine | Code Handler | Extension match |
| Unknown binary | application/octet-stream | .bin | ๆๅญ้็บ | Default โ text flags unknowns |
Multipart handlingโ
When UBI receives multipart content (e.g., an email with text body + image attachment + PDF), it:
- Splits into individual pieces
- Each piece gets its own
intake_id(sharedparent_intake_idfor linking) - Each piece is independently routed
All three share parent_intake_id = "intake-2026-0304-001" for audit linking.
Intake state schemaโ
Every item passing through UBI carries this state:
class IntakeState(TypedDict):
intake_id: str # "intake-2026-0304-001"
file_name: str # "report.pdf"
file_extension: str # ".pdf"
detected_mime: str # "application/pdf"
raw_content: bytes # File content
input_channel: str # "manual" | "api" | "email_forward" | ...
caller_agent: str # "Mac CLI ๅฐๅ
"
caller_ship: str # "SS1"
parent_intake_id: str # For multipart splits
Audit trailโ
Every intake is logged to the intake_log table in D1 โ regardless of whether it becomes a UB entry:
| Column | Purpose |
|---|---|
intake_id | Unique intake identifier |
routed_to | Which handler processed it |
routing_method | How routing was decided: mime, extension, or default |
input_channel | How content arrived |
caller_agent | Who submitted it |
parent_intake_id | Links multipart splits |
result_entry_id | UB entry ID (if text) |
result_external_ref | External ref (git SHA, image UID) for non-text |
This enables:
- Audit โ What was submitted when? By whom? Where did it go?
- Statistics โ How many items per handler? Per channel? Per ship?
- Debugging โ Wrong routing? Check
routing_methodanddetected_mime - Cross-system linking โ Trace from UB entry back to original intake
Cross-handler feedbackโ
Non-text handlers can send extracted text back to ๆๅญ้็บ for UB classification:
| Handler | Extracts | Feedback to ๆๅญ้็บ |
|---|---|---|
| ๐ผ๏ธ Image | OCR text from screenshots, charts | Text description + OCR โ UB entry with image_uid reference |
| ๐๏ธ AV | Whisper transcripts from audio/video | Transcript โ UB entry with audio_uid reference |
| โจ๏ธ Code | Docstrings, README content | Design knowledge โ UB entry with git_path reference |
This ensures all text knowledge ends up in UB regardless of its original source format.
Why not n8n / Flowise?โ
Pre-Flight Check was performed (Decision #34). LangGraph's native features cover all routing needs:
| Need | n8n/Flowise | LangGraph |
|---|---|---|
| Conditional routing | Visual nodes | add_conditional_edges |
| State management | Limited | TypedDict with reducers |
| Retry logic | Manual | Built-in RetryPolicy |
| Subgraph composition | Not native | First-class CompiledGraph |
| Python ecosystem | Separate runtime | Native Python |
| Cost | Self-hosted server | Zero infrastructure |
Adding n8n/Flowise would introduce infrastructure overhead with no capability gain.
Related pagesโ
- File Ingestion (MTAAA) โ ๆๅญ้็บ 5-node pipeline, 3D classification, cost model
- Embeddings โ How UB entries are vectorized after classification
- Search Results โ How classified entries are retrieved via hybrid search