UBI Router Architecture

Benchmarked against: Anthropic — Streaming Messages (content routing) Spec: MTAAA v1.4 §2.1.6 Pattern: LangChain Router Pattern (rule-based conditional routing)

UBI (Universal Brain Intake) is the single unified intake point for all content entering the SuperPortia ecosystem. It detects content type and routes to the appropriate handler — no duplication, no confusion about where to send things.

Why a unified intake?

Before UBI, content arrived through multiple paths (manual ingest, API, email, patrol agents) with no consistent routing. Each path had its own logic, leading to:

Code files accidentally stored in UB as text
Image metadata lost because photos went straight to filesystem
No audit trail for what was ingested when, by whom

UBI solves this with one rule: everything enters through one door.

Router design

The Router is deterministic — no LLM needed. Content type detection (is this a .py file? is this image/png?) is rule-based. Rules handle 80%+ of cases. The remaining ambiguous cases (plain text that could be a spec, article, or email) default to 文字鍋爺 for LLM-based classification.

Priority order:

MIME type — most reliable (from HTTP header or magic bytes)
File extension — fallback when MIME is generic (application/octet-stream)
Default to text — safest default; 文字鍋爺 can flag unknowns

Routing rules

Code detection

CODE_EXTENSIONS = {
    # Languages
    ".py", ".js", ".ts", ".jsx", ".tsx", ".go", ".rs",
    ".java", ".c", ".cpp", ".swift", ".kt", ".rb", ".php",
    # Shell scripts
    ".sh", ".bash", ".zsh", ".bat", ".ps1",
    # Trading scripts
    ".pine", ".mq4", ".mq5",
    # Config (executable/deployable, not knowledge)
    ".yaml", ".yml", ".toml", ".json", ".xml", ".ini",
    ".env", ".dockerfile", ".tf", ".hcl",
    # Web markup
    ".html", ".css", ".scss", ".svg",
    # Build
    ".sql", ".makefile", ".cmake", ".gradle",
}

Data detection

DATA_EXTENSIONS = {
    ".csv", ".tsv", ".parquet", ".feather", ".arrow",
    ".xlsx", ".xls", ".ods",
    ".db", ".sqlite", ".duckdb",
    ".ndjson", ".jsonl",
    ".sav", ".dta", ".hdf5", ".h5", ".nc",
}

Image and AV detection

Images route by MIME: image/png, image/jpeg, image/gif, image/webp, image/svg+xml, image/tiff, image/heic.

Audio/video routes by MIME prefix: audio/*, video/*.

Edge cases

Input	MIME	Extension	Routed to	Why
Python script	`text/x-python`	`.py`	Code Handler	Extension match
JPEG photo	`image/jpeg`	`.jpg`	Image Handler	MIME match
CSV data	`text/csv`	`.csv`	Data Handler	Extension overrides text MIME
Markdown doc	`text/markdown`	`.md`	文字鍋爺	`.md` = knowledge, not code
JSON config	`application/json`	`.json`	Code Handler	Extension in CODE_EXTENSIONS
JSON API payload	`application/json`	(none)	文字鍋爺	No extension → default to text
PDF document	`application/pdf`	`.pdf`	文字鍋爺	PDF = likely document
PineScript	`text/plain`	`.pine`	Code Handler	Extension match
Unknown binary	`application/octet-stream`	`.bin`	文字鍋爺	Default → text flags unknowns

Multipart handling

When UBI receives multipart content (e.g., an email with text body + image attachment + PDF), it:

Splits into individual pieces
Each piece gets its own intake_id (shared parent_intake_id for linking)
Each piece is independently routed

All three share parent_intake_id = "intake-2026-0304-001" for audit linking.

Intake state schema

Every item passing through UBI carries this state:

class IntakeState(TypedDict):
    intake_id: str          # "intake-2026-0304-001"
    file_name: str          # "report.pdf"
    file_extension: str     # ".pdf"
    detected_mime: str      # "application/pdf"
    raw_content: bytes      # File content
    input_channel: str      # "manual" | "api" | "email_forward" | ...
    caller_agent: str       # "Mac CLI 小克"
    caller_ship: str        # "SS1"
    parent_intake_id: str   # For multipart splits

Audit trail

Every intake is logged to the intake_log table in D1 — regardless of whether it becomes a UB entry:

Column	Purpose
`intake_id`	Unique intake identifier
`routed_to`	Which handler processed it
`routing_method`	How routing was decided: `mime`, `extension`, or `default`
`input_channel`	How content arrived
`caller_agent`	Who submitted it
`parent_intake_id`	Links multipart splits
`result_entry_id`	UB entry ID (if text)
`result_external_ref`	External ref (git SHA, image UID) for non-text

This enables:

Audit — What was submitted when? By whom? Where did it go?
Statistics — How many items per handler? Per channel? Per ship?
Debugging — Wrong routing? Check routing_method and detected_mime
Cross-system linking — Trace from UB entry back to original intake

Cross-handler feedback

Non-text handlers can send extracted text back to 文字鍋爺 for UB classification:

Handler	Extracts	Feedback to 文字鍋爺
🖼️ Image	OCR text from screenshots, charts	Text description + OCR → UB entry with `image_uid` reference
🎙️ AV	Whisper transcripts from audio/video	Transcript → UB entry with `audio_uid` reference
⌨️ Code	Docstrings, README content	Design knowledge → UB entry with `git_path` reference

This ensures all text knowledge ends up in UB regardless of its original source format.

Why not n8n / Flowise?

Pre-Flight Check was performed (Decision #34). LangGraph's native features cover all routing needs:

Need	n8n/Flowise	LangGraph
Conditional routing	Visual nodes	`add_conditional_edges`
State management	Limited	TypedDict with reducers
Retry logic	Manual	Built-in `RetryPolicy`
Subgraph composition	Not native	First-class `CompiledGraph`
Python ecosystem	Separate runtime	Native Python
Cost	Self-hosted server	Zero infrastructure

Adding n8n/Flowise would introduce infrastructure overhead with no capability gain.

File Ingestion (MTAAA) — 文字鍋爺 5-node pipeline, 3D classification, cost model
Embeddings — How UB entries are vectorized after classification
Search Results — How classified entries are retrieved via hybrid search

Why a unified intake?​

Router design​

Routing rules​

Code detection​

Data detection​

Image and AV detection​

Edge cases​

Multipart handling​

Intake state schema​

Audit trail​

Cross-handler feedback​

Why not n8n / Flowise?​

Related pages​