Skip to main content

Ingest Fragment API

Benchmarked against: Anthropic — Files API Architecture: MTAAA 5-node classification pipeline Spec: docs/MTAAA-Spec-v1.4-DRAFT.md

The Ingest Fragment API is SuperPortia's primary method for adding knowledge to the Universal Brain. Every piece of content — files, text, URLs, screenshots — passes through the MTAAA pipeline for automatic classification.


Pipeline overview

NodeRoleOutput
file_detectorIdentifies input type and format{type, mime, encoding}
content_extractorExtracts readable content{text, metadata}
feature_learnerIdentifies entities, topics, patterns{entities, tags, features}
schema_matcherMaps to MTAAA 3D taxonomy (Topic x Type x Lifecycle){topic, type, lifecycle}
archivistWrites to UB with full metadata{entry_id, status}

Input types

TypeParameterExample
Fileinput_type: "file"/path/to/document.pdf
Textinput_type: "text"Raw text content
URLinput_type: "url"https://example.com/article
Screenshotinput_type: "screenshot"Image file path

API reference

ingest_fragment

# Basic file ingestion
result = ingest_fragment(
path="/path/to/file.py",
input_type="file",
source="manual"
)

# Text ingestion
result = ingest_fragment(
path="The content to ingest as text...",
input_type="text",
source="api"
)

# URL ingestion
result = ingest_fragment(
path="https://example.com/article",
input_type="url",
source="manual"
)

Parameters

ParameterTypeRequiredDescription
pathstringYesFile path, text content, or URL
input_typestringNofile (default), text, url, screenshot
sourcestringNomanual (default), nq_alpha, ss_vault, downloads, api

Response

{
"entry_id": "ub-a1b2c3d4e5f6",
"category": "source_code",
"title": "Auto-generated title from content",
"write_status": "success",
"vectorized": true
}

Auto-tagging

The pipeline automatically adds governance metadata:

TagSourceExample
source_shipSP_SHIP_ID env varSS1
ss_agent_idSP_AGENT_ID env varmac-cli
TimestampSystem clock2026-03-05T10:45:00Z

5-handler routing

MTAAA routes content through specialized handlers based on detected type:

HandlerContent typesKey features
text_subgraphArticles, notes, decisions, researchFull NLP classification
code_handlerSource code, scripts, configsLanguage detection, function extraction
image_handlerScreenshots, photos, diagramsMultimodal description
structured_handlerJSON, CSV, YAMLSchema inference
mixed_handlerPDFs, notebooks, rich documentsMulti-section processing

MTAAA 3D classification

Every entry is classified along three dimensions using the Controlled Vocabulary:

DimensionWhat it answersExample values
TopicWhat is this about?AI Agents > Architecture, Trading > Strategy
TypeWhat kind of content?Specification, Decision Record, Research
LifecycleHow current?versioned, persistent, ephemeral

The LLM classifier selects from the CV only — no freeform values allowed.


Ingestion quality checklist

Before calling ingest_fragment():

  1. Title: Will be auto-generated, but you can set it explicitly for important entries
  2. Language: All UB entries must be in English (Captain decision, 2026-02-28)
  3. Duplicates: Run search_brain() first to check for existing entries
  4. Tags: Auto-assigned by pipeline; add manual tags for mandatory categories (see Controlled Vocabulary)
  5. Self-contained: Content should be understandable without external context

Batch ingestion

For multiple files, create a work order with file paths in the description:

# WO description for batch ingest
/path/to/file1.md
/path/to/file2.py
/path/to/file3.json

Then dispatch with engine: "ingest" — free, no LLM cost.


Where content lands

StageTableStatus
UB DockentriesUnclassified, searchable by keyword
UB Mainclassified_entriesFully classified (3D), vector-indexed

PageRelationship
UB Entry CRUDReading and updating entries
Controlled VocabularyClassification taxonomy
Search BrainFinding ingested content