EGS Compliance Evaluation
Benchmarked against: Anthropic โ Define success and build evaluations Source: EGS v1.2 โ all chapters Enforcement: Work incident classification for non-compliance
EGS compliance evaluation checks whether agents and their outputs meet the standards defined in the Engineering Governance Spec. It defines what "correct" means for every type of agent work.
Compliance levelsโ
| Level | Meaning | Consequence |
|---|---|---|
| Compliant | All applicable EGS rules followed | Normal workflow |
| Minor deviation | Non-critical rule partially followed | Correction noted, no incident |
| Non-compliant | Critical rule violated | Work incident โ requires RCA |
Critical rules (violation = work incident)โ
| Rule | Source | What it means |
|---|---|---|
| Knowledge goes to UB only | Constitution ยง1 | Never store knowledge in Obsidian or personal files |
| WO is the only task channel | Constitution ยง3 | No verbal promises or chat-based task assignments |
| Watch Rule | Constitution ยง4 | Every reply must include Taipei timestamp |
| HITL boundary respected | Constitution ยง5 | Never make payments, deletions, or publish without Captain approval |
| English ingestion | UB Governance | All UB entries in English |
| Pre-Flight Check | EGS Ch.5 | 3D risk assessment before non-trivial tasks |
| Tech freshness verified | Tech Freshness | Perishable knowledge must be verified before use |
Per-session evaluation checklistโ
At the end of every session, the agent should self-assess:
| # | Check | Pass criteria |
|---|---|---|
| 1 | Timestamps on all replies? | Every reply starts with โฐ YYYY-MM-DD HH:MM (Taipei) |
| 2 | Searched UB before decisions? | search_brain() called before important decisions |
| 3 | Used WO system for all tasks? | No tasks accepted verbally |
| 4 | Dual-ingested corrections? | Both Memory MCP + UB for any Captain corrections |
| 5 | Verified perishable knowledge? | No Danger Zone library used from training memory |
| 6 | Submitted WOs for review? | All completed WOs have completion_summary + actual_hours |
| 7 | Followed cost awareness? | No unnecessary Opus token usage |
Per-WO evaluation checklistโ
When a WO is submitted for review, the Captain evaluates:
| # | Check | Verified by |
|---|---|---|
| 1 | Acceptance criteria met | Captain reviews deliverables against WO spec |
| 2 | Code standards followed (if code) | EGS Ch.2 โ naming, safety, testing |
| 3 | Verification evidence provided | Screenshots, test results, or logs |
| 4 | No security vulnerabilities | OWASP top 10 check for web code |
| 5 | Knowledge ingested to UB | Decisions, corrections, and learnings captured |
Work incident processโ
When a critical rule is violated:
RCA formatโ
## Work Incident: [brief description]
- **Date:** YYYY-MM-DD HH:MM (Taipei)
- **Agent:** [identity]
- **Rule violated:** [specific rule reference]
- **What happened:** [factual description]
- **Root cause:** [why it happened]
- **Fix:** [what was done immediately]
- **Prevention:** [what prevents recurrence]
Incidents are ingested to UB with tags: incident, rca, P0-P3 (priority level).
Enforcement principleโ
Use tools to enforce, not discipline. โ ๅคๅฅ
If agents keep violating a rule, the fix is an automated gate (hook, pre-commit check, WO submission gate), not more reminders. Three WOs were rejected on 2026-03-01 for the same root cause (no Pre-Flight Check) โ this led to the requirement for tool-based enforcement.
| Pattern | Bad fix | Good fix |
|---|---|---|
| Agents forget Pre-Flight Check | Add another reminder to CLAUDE.md | Build a hook that blocks WO submission without PFC record |
| Wrong engine used for important tasks | Write "use Gemini for important tasks" | Build engine selection gate that checks task priority |
| Perishable knowledge used without verification | Add more warnings | Build pre-commit hook checking Danger Zone imports |
Continuous improvementโ
| Trigger | Action |
|---|---|
| Same violation 3+ times | Build automated prevention (hook/gate) |
| New capability added | Add corresponding EGS chapter/rule |
| Rule proven unnecessary | Captain approves removal |
| Industry best practice discovered | Research โ 66s Review โ update EGS |
Related pagesโ
| Page | Relationship |
|---|---|
| EGS Spec | The rulebook being evaluated against |
| 66s Review | Systematic review before major decisions |
| Pre-Flight Check | Pre-task risk assessment |
| Company Constitution | Supreme rules โ violation = work incident |