UB Source Tracking

Benchmarked against: Anthropic — Citations Tools: ingest_fragment, search_brain, get_entry Rule: UB Governance (EGS Chapter 9)

Every piece of knowledge in the Universal Brain has provenance — who created it, where it came from, when it was ingested, and which ship produced it. This tracking enables trust, audit, and quality control.

Provenance fields

Every UB entry automatically captures:

Field	Source	Purpose
`entry_id`	Auto-generated	Unique identifier (e.g., `ub-396f44b70763`)
`source_ship`	`SP_SHIP_ID` env var	Which ship ingested this (SS1, SS2, SS3)
`agent_id`	`SP_AGENT_ID` env var	Which agent created this
`source`	Ingestion parameter	Origin: manual, api, nq_alpha, ss_vault, downloads
`created_at`	Auto-generated	UTC timestamp
`updated_at`	Auto-generated	Last modification time
`tags`	Agent or pipeline	Categorization tags
`entities`	MTAAA pipeline	Extracted named entities

Tag system

Tags follow a controlled vocabulary — lowercase, hyphenated, max 8 per entry:

Mandatory tags by content type

Content type	Required tags
Research/Intel	`research`, `[domain]`, `[YYYY-MM]`
Decision Record	`decision`, `[project]`, `captain-approved`
Incident/RCA	`incident`, `rca`, `P0`-`P3`
Spec/Design	`spec`, `[project]`, `[version]`
Session Record	`session`, `[ship]`
Session Handoff	`session-handoff`, `[ship]`
Correction	`correction`, `[topic]`

Tag format rules

Rule	Example	Anti-example
Lowercase	`cloud-ub`	`Cloud-UB`
Hyphenated	`engine-selection`	`engine_selection`
No spaces	`work-order`	`work order`
Date format	`2026-03`	`March 2026`

MTAAA 3D classification

Beyond tags, the MTAAA pipeline classifies entries along three dimensions:

Dimension	What it captures	Example values
Topic	Subject matter	"AI Agents > Architecture", "Infrastructure > Cloud"
Type	Content type	"Specification", "Decision Record", "Research Note"
Lifecycle	Currency	"versioned", "persistent", "ephemeral"

Classification uses a Controlled Vocabulary (CV) — the LLM selects from predefined categories only, no freeform.

Searching with provenance

# Search returns entries with full metadata
results = search_brain("Cloud UB architecture")

# Each result includes:
# - entry_id, title, content (preview)
# - tags, source_ship, agent_id
# - created_at, relevance score

Filtering by provenance

# Browse by category
search_by_category(category="knowledge", subcategory="architecture")

# Get full entry with all metadata
get_entry(entry_id="ub-396f44b70763")

Quality checklist

Before calling ingest_fragment():

Title — Descriptive, searchable, English
Content — Self-contained (reader needs no other context)
Tags — Follow controlled vocabulary above
No duplicates — search_brain() first to check
Source — Set correctly (manual, api, etc.)

Freshness tracking

Tag pattern	Meaning
`verified-2026-03`	Perishable knowledge verified this month
`stale`	Known to be outdated, needs re-verification
`timeless`	Framework/method knowledge, doesn't expire
`ephemeral`	Temporary, can be cleaned up

Page	Relationship
UB Governance	Full governance rules
File Ingestion	MTAAA pipeline
Controlled Vocabulary	CV reference
Search Brain	Search details

Provenance fields​

Tag system​

Mandatory tags by content type​

Tag format rules​

MTAAA 3D classification​

Searching with provenance​

Filtering by provenance​

Quality checklist​

Freshness tracking​

Related pages​