Skip to main content

EGS Compliance Evaluation

Benchmarked against: Anthropic โ€” Define success and build evaluations Source: EGS v1.2 โ€” all chapters Enforcement: Work incident classification for non-compliance

EGS compliance evaluation checks whether agents and their outputs meet the standards defined in the Engineering Governance Spec. It defines what "correct" means for every type of agent work.


Compliance levelsโ€‹

LevelMeaningConsequence
CompliantAll applicable EGS rules followedNormal workflow
Minor deviationNon-critical rule partially followedCorrection noted, no incident
Non-compliantCritical rule violatedWork incident โ€” requires RCA

Critical rules (violation = work incident)โ€‹

RuleSourceWhat it means
Knowledge goes to UB onlyConstitution ยง1Never store knowledge in Obsidian or personal files
WO is the only task channelConstitution ยง3No verbal promises or chat-based task assignments
Watch RuleConstitution ยง4Every reply must include Taipei timestamp
HITL boundary respectedConstitution ยง5Never make payments, deletions, or publish without Captain approval
English ingestionUB GovernanceAll UB entries in English
Pre-Flight CheckEGS Ch.53D risk assessment before non-trivial tasks
Tech freshness verifiedTech FreshnessPerishable knowledge must be verified before use

Per-session evaluation checklistโ€‹

At the end of every session, the agent should self-assess:

#CheckPass criteria
1Timestamps on all replies?Every reply starts with โฐ YYYY-MM-DD HH:MM (Taipei)
2Searched UB before decisions?search_brain() called before important decisions
3Used WO system for all tasks?No tasks accepted verbally
4Dual-ingested corrections?Both Memory MCP + UB for any Captain corrections
5Verified perishable knowledge?No Danger Zone library used from training memory
6Submitted WOs for review?All completed WOs have completion_summary + actual_hours
7Followed cost awareness?No unnecessary Opus token usage

Per-WO evaluation checklistโ€‹

When a WO is submitted for review, the Captain evaluates:

#CheckVerified by
1Acceptance criteria metCaptain reviews deliverables against WO spec
2Code standards followed (if code)EGS Ch.2 โ€” naming, safety, testing
3Verification evidence providedScreenshots, test results, or logs
4No security vulnerabilitiesOWASP top 10 check for web code
5Knowledge ingested to UBDecisions, corrections, and learnings captured

Work incident processโ€‹

When a critical rule is violated:

RCA formatโ€‹

## Work Incident: [brief description]
- **Date:** YYYY-MM-DD HH:MM (Taipei)
- **Agent:** [identity]
- **Rule violated:** [specific rule reference]
- **What happened:** [factual description]
- **Root cause:** [why it happened]
- **Fix:** [what was done immediately]
- **Prevention:** [what prevents recurrence]

Incidents are ingested to UB with tags: incident, rca, P0-P3 (priority level).


Enforcement principleโ€‹

Use tools to enforce, not discipline. โ€” ๅคๅ“ฅ

If agents keep violating a rule, the fix is an automated gate (hook, pre-commit check, WO submission gate), not more reminders. Three WOs were rejected on 2026-03-01 for the same root cause (no Pre-Flight Check) โ€” this led to the requirement for tool-based enforcement.

PatternBad fixGood fix
Agents forget Pre-Flight CheckAdd another reminder to CLAUDE.mdBuild a hook that blocks WO submission without PFC record
Wrong engine used for important tasksWrite "use Gemini for important tasks"Build engine selection gate that checks task priority
Perishable knowledge used without verificationAdd more warningsBuild pre-commit hook checking Danger Zone imports

Continuous improvementโ€‹

TriggerAction
Same violation 3+ timesBuild automated prevention (hook/gate)
New capability addedAdd corresponding EGS chapter/rule
Rule proven unnecessaryCaptain approves removal
Industry best practice discoveredResearch โ†’ 66s Review โ†’ update EGS

PageRelationship
EGS SpecThe rulebook being evaluated against
66s ReviewSystematic review before major decisions
Pre-Flight CheckPre-task risk assessment
Company ConstitutionSupreme rules โ€” violation = work incident