Skip to main content

Screenshot & Image Ingestion

Benchmarked against: Anthropic — Vision Tools: ingest_fragment (input_type=screenshot), Chrome MCP screenshot, Claude Code Read tool Pipeline: MTAAA image handler

SuperPortia agents can process visual content — screenshots, photos, diagrams, and documents. Images can be ingested into UB, analyzed inline, or used for browser automation verification.


Image capabilities by tool

ToolCapabilityUse case
Claude Code ReadView any image file (multimodal)Inline analysis, screenshot review
Chrome MCP screenshotCapture browser viewportAutomation verification, UI checking
Chrome MCP zoomCapture specific regionDetailed element inspection
ingest_fragment (screenshot)Ingest image to UBPersistent visual record
MTAAA image handlerExtract text/metadata from imagesClassification and archiving

Screenshot analysis

Claude is multimodal — it can directly view and analyze images:

# Read a screenshot file
Read(file_path="/path/to/screenshot.png")
# Claude sees the image and can describe/analyze it

Browser screenshots

# Take a screenshot of current browser page
computer(action="screenshot", tabId=123)

# Zoom into a specific region
computer(action="zoom", region=[100, 200, 500, 400], tabId=123)

Image ingestion to UB

Images can be ingested into the Universal Brain:

ingest_fragment(
path="/path/to/screenshot.png",
input_type="screenshot",
source="manual"
)

The MTAAA pipeline processes the image:

  1. File detector — Identifies as image type
  2. Content extractor — Extracts visible text (OCR), metadata
  3. Feature learner — Identifies key features
  4. Schema matcher — Maps to UB schema
  5. Archivist — Writes to UB with classification

Visual verification workflow

For browser automation, screenshots verify results:

Example: verifying a web page

# 1. Navigate to page
navigate(url="https://example.com", tabId=123)

# 2. Take screenshot
computer(action="screenshot", tabId=123)

# 3. Agent sees the page visually and can:
# - Verify content is correct
# - Check layout and styling
# - Identify UI elements
# - Detect errors

Photo library access (SS1)

SS1 has access to the Mac Photos library (15,819+ photos, iCloud sync):

CapabilityMethod
Query photo metadataSQLite query on Photos.sqlite
Read original photosDirect file access to originals directory
Analyze photosClaude multimodal (Read tool)
Ingest to UBingest_fragment(input_type="screenshot")

Note: Photo-related tasks await Captain's direction before proceeding.


Limitations

LimitationImpact
Image tokens are expensiveEach image consumes significant context
No real-time videoOnly static screenshots
OCR quality variesDepends on image quality and text clarity
No image generationClaude analyzes images but doesn't create them
Chrome screenshots are viewport onlyCan't capture full-page scrollable content in one shot

PageRelationship
Computer UseBrowser automation tools
File IngestionMTAAA pipeline
File ToolsLocal file operations