Screenshot & Image Ingestion
Benchmarked against: Anthropic — Vision Tools:
ingest_fragment(input_type=screenshot), Chrome MCP screenshot, Claude Code Read tool Pipeline: MTAAA image handler
SuperPortia agents can process visual content — screenshots, photos, diagrams, and documents. Images can be ingested into UB, analyzed inline, or used for browser automation verification.
Image capabilities by tool
| Tool | Capability | Use case |
|---|---|---|
| Claude Code Read | View any image file (multimodal) | Inline analysis, screenshot review |
| Chrome MCP screenshot | Capture browser viewport | Automation verification, UI checking |
| Chrome MCP zoom | Capture specific region | Detailed element inspection |
| ingest_fragment (screenshot) | Ingest image to UB | Persistent visual record |
| MTAAA image handler | Extract text/metadata from images | Classification and archiving |
Screenshot analysis
Claude is multimodal — it can directly view and analyze images:
# Read a screenshot file
Read(file_path="/path/to/screenshot.png")
# Claude sees the image and can describe/analyze it
Browser screenshots
# Take a screenshot of current browser page
computer(action="screenshot", tabId=123)
# Zoom into a specific region
computer(action="zoom", region=[100, 200, 500, 400], tabId=123)
Image ingestion to UB
Images can be ingested into the Universal Brain:
ingest_fragment(
path="/path/to/screenshot.png",
input_type="screenshot",
source="manual"
)
The MTAAA pipeline processes the image:
- File detector — Identifies as image type
- Content extractor — Extracts visible text (OCR), metadata
- Feature learner — Identifies key features
- Schema matcher — Maps to UB schema
- Archivist — Writes to UB with classification
Visual verification workflow
For browser automation, screenshots verify results:
Example: verifying a web page
# 1. Navigate to page
navigate(url="https://example.com", tabId=123)
# 2. Take screenshot
computer(action="screenshot", tabId=123)
# 3. Agent sees the page visually and can:
# - Verify content is correct
# - Check layout and styling
# - Identify UI elements
# - Detect errors
Photo library access (SS1)
SS1 has access to the Mac Photos library (15,819+ photos, iCloud sync):
| Capability | Method |
|---|---|
| Query photo metadata | SQLite query on Photos.sqlite |
| Read original photos | Direct file access to originals directory |
| Analyze photos | Claude multimodal (Read tool) |
| Ingest to UB | ingest_fragment(input_type="screenshot") |
Note: Photo-related tasks await Captain's direction before proceeding.
Limitations
| Limitation | Impact |
|---|---|
| Image tokens are expensive | Each image consumes significant context |
| No real-time video | Only static screenshots |
| OCR quality varies | Depends on image quality and text clarity |
| No image generation | Claude analyzes images but doesn't create them |
| Chrome screenshots are viewport only | Can't capture full-page scrollable content in one shot |
Related pages
| Page | Relationship |
|---|---|
| Computer Use | Browser automation tools |
| File Ingestion | MTAAA pipeline |
| File Tools | Local file operations |