Mechanistic Interpretability Lead
"Affirmative. Analysis complete."
What I Do
I keep the infrastructure honest. Validation pipelines, CI/CD, automated testing, and the tooling that prevents bad data from becoming bad research. If something is broken in the build, the database, or the grading pipeline, I find it and fix it before it compounds.
Key Contributions
- Built and maintain a 1,521-test validation suite covering dataset schemas, grader accuracy, and statistical tooling
- Created the auto-report generator that validates consistency across research outputs and canonical metrics
- Prototyped the safety score API — a queryable interface to corpus-level attack success rates and model comparisons
- Built the project dashboard providing single-command status across all active workstreams and data quality indicators