Skip to main content

UBI Router Architecture

Benchmarked against: Anthropic โ€” Streaming Messages (content routing) Spec: MTAAA v1.4 ยง2.1.6 Pattern: LangChain Router Pattern (rule-based conditional routing)

UBI (Universal Brain Intake) is the single unified intake point for all content entering the SuperPortia ecosystem. It detects content type and routes to the appropriate handler โ€” no duplication, no confusion about where to send things.


Why a unified intake?โ€‹

Before UBI, content arrived through multiple paths (manual ingest, API, email, patrol agents) with no consistent routing. Each path had its own logic, leading to:

  • Code files accidentally stored in UB as text
  • Image metadata lost because photos went straight to filesystem
  • No audit trail for what was ingested when, by whom

UBI solves this with one rule: everything enters through one door.


Router designโ€‹

The Router is deterministic โ€” no LLM needed. Content type detection (is this a .py file? is this image/png?) is rule-based. Rules handle 80%+ of cases. The remaining ambiguous cases (plain text that could be a spec, article, or email) default to ๆ–‡ๅญ—้‹็ˆบ for LLM-based classification.

Priority order:

  1. MIME type โ€” most reliable (from HTTP header or magic bytes)
  2. File extension โ€” fallback when MIME is generic (application/octet-stream)
  3. Default to text โ€” safest default; ๆ–‡ๅญ—้‹็ˆบ can flag unknowns

Routing rulesโ€‹

Code detectionโ€‹

CODE_EXTENSIONS = {
# Languages
".py", ".js", ".ts", ".jsx", ".tsx", ".go", ".rs",
".java", ".c", ".cpp", ".swift", ".kt", ".rb", ".php",
# Shell scripts
".sh", ".bash", ".zsh", ".bat", ".ps1",
# Trading scripts
".pine", ".mq4", ".mq5",
# Config (executable/deployable, not knowledge)
".yaml", ".yml", ".toml", ".json", ".xml", ".ini",
".env", ".dockerfile", ".tf", ".hcl",
# Web markup
".html", ".css", ".scss", ".svg",
# Build
".sql", ".makefile", ".cmake", ".gradle",
}

Data detectionโ€‹

DATA_EXTENSIONS = {
".csv", ".tsv", ".parquet", ".feather", ".arrow",
".xlsx", ".xls", ".ods",
".db", ".sqlite", ".duckdb",
".ndjson", ".jsonl",
".sav", ".dta", ".hdf5", ".h5", ".nc",
}

Image and AV detectionโ€‹

Images route by MIME: image/png, image/jpeg, image/gif, image/webp, image/svg+xml, image/tiff, image/heic.

Audio/video routes by MIME prefix: audio/*, video/*.


Edge casesโ€‹

InputMIMEExtensionRouted toWhy
Python scripttext/x-python.pyCode HandlerExtension match
JPEG photoimage/jpeg.jpgImage HandlerMIME match
CSV datatext/csv.csvData HandlerExtension overrides text MIME
Markdown doctext/markdown.mdๆ–‡ๅญ—้‹็ˆบ.md = knowledge, not code
JSON configapplication/json.jsonCode HandlerExtension in CODE_EXTENSIONS
JSON API payloadapplication/json(none)ๆ–‡ๅญ—้‹็ˆบNo extension โ†’ default to text
PDF documentapplication/pdf.pdfๆ–‡ๅญ—้‹็ˆบPDF = likely document
PineScripttext/plain.pineCode HandlerExtension match
Unknown binaryapplication/octet-stream.binๆ–‡ๅญ—้‹็ˆบDefault โ†’ text flags unknowns

Multipart handlingโ€‹

When UBI receives multipart content (e.g., an email with text body + image attachment + PDF), it:

  1. Splits into individual pieces
  2. Each piece gets its own intake_id (shared parent_intake_id for linking)
  3. Each piece is independently routed

All three share parent_intake_id = "intake-2026-0304-001" for audit linking.


Intake state schemaโ€‹

Every item passing through UBI carries this state:

class IntakeState(TypedDict):
intake_id: str # "intake-2026-0304-001"
file_name: str # "report.pdf"
file_extension: str # ".pdf"
detected_mime: str # "application/pdf"
raw_content: bytes # File content
input_channel: str # "manual" | "api" | "email_forward" | ...
caller_agent: str # "Mac CLI ๅฐๅ…‹"
caller_ship: str # "SS1"
parent_intake_id: str # For multipart splits

Audit trailโ€‹

Every intake is logged to the intake_log table in D1 โ€” regardless of whether it becomes a UB entry:

ColumnPurpose
intake_idUnique intake identifier
routed_toWhich handler processed it
routing_methodHow routing was decided: mime, extension, or default
input_channelHow content arrived
caller_agentWho submitted it
parent_intake_idLinks multipart splits
result_entry_idUB entry ID (if text)
result_external_refExternal ref (git SHA, image UID) for non-text

This enables:

  • Audit โ€” What was submitted when? By whom? Where did it go?
  • Statistics โ€” How many items per handler? Per channel? Per ship?
  • Debugging โ€” Wrong routing? Check routing_method and detected_mime
  • Cross-system linking โ€” Trace from UB entry back to original intake

Cross-handler feedbackโ€‹

Non-text handlers can send extracted text back to ๆ–‡ๅญ—้‹็ˆบ for UB classification:

HandlerExtractsFeedback to ๆ–‡ๅญ—้‹็ˆบ
๐Ÿ–ผ๏ธ ImageOCR text from screenshots, chartsText description + OCR โ†’ UB entry with image_uid reference
๐ŸŽ™๏ธ AVWhisper transcripts from audio/videoTranscript โ†’ UB entry with audio_uid reference
โŒจ๏ธ CodeDocstrings, README contentDesign knowledge โ†’ UB entry with git_path reference

This ensures all text knowledge ends up in UB regardless of its original source format.


Why not n8n / Flowise?โ€‹

Pre-Flight Check was performed (Decision #34). LangGraph's native features cover all routing needs:

Needn8n/FlowiseLangGraph
Conditional routingVisual nodesadd_conditional_edges
State managementLimitedTypedDict with reducers
Retry logicManualBuilt-in RetryPolicy
Subgraph compositionNot nativeFirst-class CompiledGraph
Python ecosystemSeparate runtimeNative Python
CostSelf-hosted serverZero infrastructure

Adding n8n/Flowise would introduce infrastructure overhead with no capability gain.


  • File Ingestion (MTAAA) โ€” ๆ–‡ๅญ—้‹็ˆบ 5-node pipeline, 3D classification, cost model
  • Embeddings โ€” How UB entries are vectorized after classification
  • Search Results โ€” How classified entries are retrieved via hybrid search