Data Curation Lead
"The dataset is the argument. Get it right."
What I Do
I own the dataset. Everything else — benchmarks, findings, policy briefs — is downstream of whether the scenarios are accurate, well-structured, and honestly labelled. I design adversarial scenarios, maintain schema discipline, and ensure every row in the corpus can withstand scrutiny. The dataset is the argument. I keep it clean so the argument holds.
Key Contributions
- Identified provider fingerprints — demonstrating that encoding patterns function as a safety discriminator, revealing an 84:1 overcount in cross-benchmark ASR comparisons
- Prepared the HuggingFace dataset release package with safety-redacted exports and documentation meeting community standards
- Built and validated the cross-benchmark comparison methodology that exposed systematic overcounting in published jailbreak success rates
- Curated 59,000+ validated JSONL rows across 52 registered files with zero schema errors