Skip to main content

UB Source Tracking

Benchmarked against: Anthropic — Citations Tools: ingest_fragment, search_brain, get_entry Rule: UB Governance (EGS Chapter 9)

Every piece of knowledge in the Universal Brain has provenance — who created it, where it came from, when it was ingested, and which ship produced it. This tracking enables trust, audit, and quality control.


Provenance fields

Every UB entry automatically captures:

FieldSourcePurpose
entry_idAuto-generatedUnique identifier (e.g., ub-396f44b70763)
source_shipSP_SHIP_ID env varWhich ship ingested this (SS1, SS2, SS3)
agent_idSP_AGENT_ID env varWhich agent created this
sourceIngestion parameterOrigin: manual, api, nq_alpha, ss_vault, downloads
created_atAuto-generatedUTC timestamp
updated_atAuto-generatedLast modification time
tagsAgent or pipelineCategorization tags
entitiesMTAAA pipelineExtracted named entities

Tag system

Tags follow a controlled vocabulary — lowercase, hyphenated, max 8 per entry:

Mandatory tags by content type

Content typeRequired tags
Research/Intelresearch, [domain], [YYYY-MM]
Decision Recorddecision, [project], captain-approved
Incident/RCAincident, rca, P0-P3
Spec/Designspec, [project], [version]
Session Recordsession, [ship]
Session Handoffsession-handoff, [ship]
Correctioncorrection, [topic]

Tag format rules

RuleExampleAnti-example
Lowercasecloud-ubCloud-UB
Hyphenatedengine-selectionengine_selection
No spaceswork-orderwork order
Date format2026-03March 2026

MTAAA 3D classification

Beyond tags, the MTAAA pipeline classifies entries along three dimensions:

DimensionWhat it capturesExample values
TopicSubject matter"AI Agents > Architecture", "Infrastructure > Cloud"
TypeContent type"Specification", "Decision Record", "Research Note"
LifecycleCurrency"versioned", "persistent", "ephemeral"

Classification uses a Controlled Vocabulary (CV) — the LLM selects from predefined categories only, no freeform.


Searching with provenance

# Search returns entries with full metadata
results = search_brain("Cloud UB architecture")

# Each result includes:
# - entry_id, title, content (preview)
# - tags, source_ship, agent_id
# - created_at, relevance score

Filtering by provenance

# Browse by category
search_by_category(category="knowledge", subcategory="architecture")

# Get full entry with all metadata
get_entry(entry_id="ub-396f44b70763")

Quality checklist

Before calling ingest_fragment():

  1. Title — Descriptive, searchable, English
  2. Content — Self-contained (reader needs no other context)
  3. Tags — Follow controlled vocabulary above
  4. No duplicatessearch_brain() first to check
  5. Source — Set correctly (manual, api, etc.)

Freshness tracking

Tag patternMeaning
verified-2026-03Perishable knowledge verified this month
staleKnown to be outdated, needs re-verification
timelessFramework/method knowledge, doesn't expire
ephemeralTemporary, can be cleaned up

PageRelationship
UB GovernanceFull governance rules
File IngestionMTAAA pipeline
Controlled VocabularyCV reference
Search BrainSearch details