Ingest Fragment API
Benchmarked against: Anthropic — Files API Architecture: MTAAA 5-node classification pipeline Spec: docs/MTAAA-Spec-v1.4-DRAFT.md
The Ingest Fragment API is SuperPortia's primary method for adding knowledge to the Universal Brain. Every piece of content — files, text, URLs, screenshots — passes through the MTAAA pipeline for automatic classification.
Pipeline overview
| Node | Role | Output |
|---|---|---|
| file_detector | Identifies input type and format | {type, mime, encoding} |
| content_extractor | Extracts readable content | {text, metadata} |
| feature_learner | Identifies entities, topics, patterns | {entities, tags, features} |
| schema_matcher | Maps to MTAAA 3D taxonomy (Topic x Type x Lifecycle) | {topic, type, lifecycle} |
| archivist | Writes to UB with full metadata | {entry_id, status} |
Input types
| Type | Parameter | Example |
|---|---|---|
| File | input_type: "file" | /path/to/document.pdf |
| Text | input_type: "text" | Raw text content |
| URL | input_type: "url" | https://example.com/article |
| Screenshot | input_type: "screenshot" | Image file path |
API reference
ingest_fragment
# Basic file ingestion
result = ingest_fragment(
path="/path/to/file.py",
input_type="file",
source="manual"
)
# Text ingestion
result = ingest_fragment(
path="The content to ingest as text...",
input_type="text",
source="api"
)
# URL ingestion
result = ingest_fragment(
path="https://example.com/article",
input_type="url",
source="manual"
)
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
path | string | Yes | File path, text content, or URL |
input_type | string | No | file (default), text, url, screenshot |
source | string | No | manual (default), nq_alpha, ss_vault, downloads, api |
Response
{
"entry_id": "ub-a1b2c3d4e5f6",
"category": "source_code",
"title": "Auto-generated title from content",
"write_status": "success",
"vectorized": true
}
Auto-tagging
The pipeline automatically adds governance metadata:
| Tag | Source | Example |
|---|---|---|
source_ship | SP_SHIP_ID env var | SS1 |
ss_agent_id | SP_AGENT_ID env var | mac-cli |
| Timestamp | System clock | 2026-03-05T10:45:00Z |
5-handler routing
MTAAA routes content through specialized handlers based on detected type:
| Handler | Content types | Key features |
|---|---|---|
| text_subgraph | Articles, notes, decisions, research | Full NLP classification |
| code_handler | Source code, scripts, configs | Language detection, function extraction |
| image_handler | Screenshots, photos, diagrams | Multimodal description |
| structured_handler | JSON, CSV, YAML | Schema inference |
| mixed_handler | PDFs, notebooks, rich documents | Multi-section processing |
MTAAA 3D classification
Every entry is classified along three dimensions using the Controlled Vocabulary:
| Dimension | What it answers | Example values |
|---|---|---|
| Topic | What is this about? | AI Agents > Architecture, Trading > Strategy |
| Type | What kind of content? | Specification, Decision Record, Research |
| Lifecycle | How current? | versioned, persistent, ephemeral |
The LLM classifier selects from the CV only — no freeform values allowed.
Ingestion quality checklist
Before calling ingest_fragment():
- Title: Will be auto-generated, but you can set it explicitly for important entries
- Language: All UB entries must be in English (Captain decision, 2026-02-28)
- Duplicates: Run
search_brain()first to check for existing entries - Tags: Auto-assigned by pipeline; add manual tags for mandatory categories (see Controlled Vocabulary)
- Self-contained: Content should be understandable without external context
Batch ingestion
For multiple files, create a work order with file paths in the description:
# WO description for batch ingest
/path/to/file1.md
/path/to/file2.py
/path/to/file3.json
Then dispatch with engine: "ingest" — free, no LLM cost.
Where content lands
| Stage | Table | Status |
|---|---|---|
| UB Dock | entries | Unclassified, searchable by keyword |
| UB Main | classified_entries | Fully classified (3D), vector-indexed |
Related pages
| Page | Relationship |
|---|---|
| UB Entry CRUD | Reading and updating entries |
| Controlled Vocabulary | Classification taxonomy |
| Search Brain | Finding ingested content |