What We Test
Red team assessments apply our validated attack taxonomy to your specific system architecture. We test foundation models, agentic workflows, and multi-agent environments against 81 documented attack techniques across 6 eras of jailbreak evolution. Our methodology satisfies VAISS Guardrail 4 (pre-deployment testing) requirements for Australian deployers and aligns with ISO/IEC 42001 and the NIST AI Risk Management Framework.
Methodology
Scoping & Threat Modeling
- Review system architecture and deployment context
- Identify high-risk interaction patterns
- Select attack scenarios from taxonomy
- Define success criteria and reporting thresholds
Adversarial Testing
- Execute tailored attack scenarios (50-100 prompts)
- Document model responses and failure modes
- Test multi-turn interaction chains
- Validate findings across model versions
Analysis & Remediation
- Classify vulnerabilities by severity
- Map findings to regulatory frameworks
- Develop remediation recommendations
- Deliver findings report and debrief call
Attack Taxonomy
Our testing draws from a 141,691-prompt jailbreak corpus with evaluation results across 231+ models. Coverage includes:
Persona Hijacking
Role-playing attacks that exploit instruction-following behavior (DAN, STAN, Developer Mode)
Constraint Erosion
Gradual relaxation of safety boundaries through multi-turn interaction
Format Exploitation
Encoding techniques, Base64, ROT13, character substitution to bypass content filters
Refusal Suppression
Explicit discouragement of safety responses, pre-emptive agreement framing
Reasoning Manipulation
Extended reasoning model exploits that lead models toward harmful conclusions
Multi-Agent Tactics
Environment shaping, delegation cascades, narrative erosion in agent collectives
Deliverables
- Findings Report: 30-50 page PDF with vulnerability classification, severity ratings, and evidence screenshots
- Attack Scenario Database: Complete prompt set used in testing (JSONL format for integration into CI/CD)
- Remediation Playbook: Specific countermeasures mapped to each vulnerability class
- Regulatory Mapping: How findings relate to EU AI Act, NIST AI RMF, ISO/IEC 42001, and Australia's VAISS Guardrail 4 (pre-deployment testing)
- Debrief Call: 90-minute technical walkthrough with your team
Pricing
Engagements are scoped based on system complexity, model count, and regulatory requirements. Typical range: contact for pricing. All engagements include a coordinated disclosure agreement.
Get Started
Free mini-assessment available (10 scenarios, 2-page brief, 1-week delivery). Full assessments typically take 3-4 weeks from kickoff.