Red Team Assessments

Adversarial testing grounded in empirical research

What We Test

Red team assessments apply our validated attack taxonomy to your specific system architecture. We test foundation models, agentic workflows, and multi-agent environments against 81 documented attack techniques across 6 eras of jailbreak evolution. Our methodology satisfies VAISS Guardrail 4 (pre-deployment testing) requirements for Australian deployers and aligns with ISO/IEC 42001 and the NIST AI Risk Management Framework.

Methodology

1
Week 1

Scoping & Threat Modeling

  • Review system architecture and deployment context
  • Identify high-risk interaction patterns
  • Select attack scenarios from taxonomy
  • Define success criteria and reporting thresholds
2
Weeks 2-3

Adversarial Testing

  • Execute tailored attack scenarios (50-100 prompts)
  • Document model responses and failure modes
  • Test multi-turn interaction chains
  • Validate findings across model versions
3
Week 4

Analysis & Remediation

  • Classify vulnerabilities by severity
  • Map findings to regulatory frameworks
  • Develop remediation recommendations
  • Deliver findings report and debrief call

Attack Taxonomy

Our testing draws from a 141,691-prompt jailbreak corpus with evaluation results across 231+ models. Coverage includes:

Persona Hijacking

Role-playing attacks that exploit instruction-following behavior (DAN, STAN, Developer Mode)

Constraint Erosion

Gradual relaxation of safety boundaries through multi-turn interaction

Format Exploitation

Encoding techniques, Base64, ROT13, character substitution to bypass content filters

Refusal Suppression

Explicit discouragement of safety responses, pre-emptive agreement framing

Reasoning Manipulation

Extended reasoning model exploits that lead models toward harmful conclusions

Multi-Agent Tactics

Environment shaping, delegation cascades, narrative erosion in agent collectives

Deliverables

Pricing

Engagements are scoped based on system complexity, model count, and regulatory requirements. Typical range: contact for pricing. All engagements include a coordinated disclosure agreement.

Get Started

Free mini-assessment available (10 scenarios, 2-page brief, 1-week delivery). Full assessments typically take 3-4 weeks from kickoff.