Screenshot & Image Ingestion

Benchmarked against: Anthropic — Vision Tools: ingest_fragment (input_type=screenshot), Chrome MCP screenshot, Claude Code Read tool Pipeline: MTAAA image handler

SuperPortia agents can process visual content — screenshots, photos, diagrams, and documents. Images can be ingested into UB, analyzed inline, or used for browser automation verification.

Image capabilities by tool

Tool	Capability	Use case
Claude Code Read	View any image file (multimodal)	Inline analysis, screenshot review
Chrome MCP screenshot	Capture browser viewport	Automation verification, UI checking
Chrome MCP zoom	Capture specific region	Detailed element inspection
ingest_fragment (screenshot)	Ingest image to UB	Persistent visual record
MTAAA image handler	Extract text/metadata from images	Classification and archiving

Screenshot analysis

Claude is multimodal — it can directly view and analyze images:

# Read a screenshot file
Read(file_path="/path/to/screenshot.png")
# Claude sees the image and can describe/analyze it

Browser screenshots

# Take a screenshot of current browser page
computer(action="screenshot", tabId=123)

# Zoom into a specific region
computer(action="zoom", region=[100, 200, 500, 400], tabId=123)

Image ingestion to UB

Images can be ingested into the Universal Brain:

ingest_fragment(
    path="/path/to/screenshot.png",
    input_type="screenshot",
    source="manual"
)

The MTAAA pipeline processes the image:

File detector — Identifies as image type
Content extractor — Extracts visible text (OCR), metadata
Feature learner — Identifies key features
Schema matcher — Maps to UB schema
Archivist — Writes to UB with classification

Visual verification workflow

For browser automation, screenshots verify results:

Example: verifying a web page

# 1. Navigate to page
navigate(url="https://example.com", tabId=123)

# 2. Take screenshot
computer(action="screenshot", tabId=123)

# 3. Agent sees the page visually and can:
#    - Verify content is correct
#    - Check layout and styling
#    - Identify UI elements
#    - Detect errors

Photo library access (SS1)

SS1 has access to the Mac Photos library (15,819+ photos, iCloud sync):

Capability	Method
Query photo metadata	SQLite query on Photos.sqlite
Read original photos	Direct file access to originals directory
Analyze photos	Claude multimodal (Read tool)
Ingest to UB	`ingest_fragment(input_type="screenshot")`

Note: Photo-related tasks await Captain's direction before proceeding.

Limitations

Limitation	Impact
Image tokens are expensive	Each image consumes significant context
No real-time video	Only static screenshots
OCR quality varies	Depends on image quality and text clarity
No image generation	Claude analyzes images but doesn't create them
Chrome screenshots are viewport only	Can't capture full-page scrollable content in one shot

Page	Relationship
Computer Use	Browser automation tools
File Ingestion	MTAAA pipeline
File Tools	Local file operations

Image capabilities by tool​

Screenshot analysis​

Browser screenshots​

Image ingestion to UB​

Visual verification workflow​

Example: verifying a web page​

Photo library access (SS1)​

Limitations​

Related pages​