Hand holding digital network of light points and connections as a symbol for cyber security, OWASP, secure IT systems, and modern security analytics.

OWASP “Vendor Evaluation Criteria for AI Red Teaming Providers & Tooling v1.0”: How to Choose the Right Partner

19. February 2026

A few days ago, OWASP published the first version of the Vendor Evaluation Criteria for AI Red Teaming Providers & Tooling v1.0. The new guide helps companies to evaluate providers offering security analyses of AI-based systems by providing a solid basis for their decision-making.

Clear criteria for the security analysis of AI systems are crucial

Companies are integrating AI features at a rapid pace: from AI chatbots and systems that connect internal knowledge databases via Retrieval-Augmented-Generation (RAG) to complex agent-based AI workflows.

As part of the corporate infrastructure, these systems and applications are subject to the same security regulations as the existing IT landscape. The resulting increase in demand for AI analyses is causing the market for related services to grow rapidly. But how do you choose the right provider who not only promises “AI security” but also performs security analyses with the necessary understanding of risk and the appropriate depth of testing?

The OWSAP guide provides valuable guidance for your selection.

"Vendor Evaluation Criteria for AI Red Teaming" at a glance

The OWASP guide shows companies the key factors they should consider when selecting providers for security analyses. While the title uses the term red teaming, the services described are commonly understood as pentests in German-speaking markets rather than classic red team assessments. Checklists and questionnaires can be used to identify important criteria - both when evaluating automated testing procedures and when selecting a suitable penetration testing partner.

Essentially, the following criteria apply:

  • Substantive rather than superficial testing: The OWASP guide makes it clear that professional providers should not reduce their analysis to simple one-time prompts or standard queries. Instead, multi-stage, scenario-based tests are necessary, in which different roles, intentions, and information flows are played out.
  • Specific capabilities for different AI application scenarios: Depending on the technology used, targeted testing methods are required. In the simplest case, these include jailbreak tests, but also the exfiltration of sensitive information from connected data sources or the circumvention of guardrails. Providers should clearly state their methodology and be able to substantiate it with concrete examples.
  • Additional skills for advanced architectures: For more complex AI systems that are deeply integrated into existing business processes, providers need additional expertise. Creative attack chains are required that operate across tool and agent calls and lead to unwanted actions being executed or cross-agent context manipulation. A provider should be able to explain in a comprehensible manner how such risks arise and how they are tested.
  • Reproducibility & traceability: A provider should document successful attacks in a traceable manner - including clear logs of attack attempts and jailbreak success rates.

Where the OWASP guide reaches its limits and what really matters when it comes to practice

Security analyses of a wide variety of systems and applications are part of the daily core business of our experts at usd HeroLab - and increasingly, this includes analyses of AI-based solutions.

Based on their proven pentest quality criteria, their risk assessment, and their experience from current AI projects, we asked our colleagues to evaluate the OWASP guide for you. Their conclusion: it offers solid guidance, but also has clear limitations.

1. The categorization “simple vs. advanced systems” is too broad.

OWASP makes a clear distinction between simple and complex AI applications. In practice, however, even a supposedly “simple” AI chatbot can process sensitive data, connect to internal APIs, or trigger operational actions. Such a classification often does not do justice to the real risks involved.

Our approach to conducting pentests is therefore based on threat modeling and scenario-based analyses that reveal real risks independently of generic categories.

2. Automated tools prominently displayed

The OWASP guide places a strong focus on automated tools. They perform valuable groundwork in the form of standard checks. What they cannot yet deliver is what makes the difference in practice: creative, context-related security analyses by experienced security experts. Especially with agent-based systems, tool-calling mechanisms, and multi-agent workflows, realistic and reliable results can only be achieved when human expertise is combined with intelligent testing tools in a targeted manner.

3. Focusing exclusively on “AI red teaming” is too narrow an approach

AI-based systems rarely exist in isolation. They are embedded in web front ends, mobile apps, backend services, or APIs. In our projects, we therefore always consider the entire system, including data flows, authorization models, and related infrastructure. This holistic view is missing from the OWASP guide, but it is essential for realistic risk analyses.

Whether we're talking about AI red teaming, LLM pentests, GenAI pentests, or security analyses of AI systems, at the end of the day, it's always about uncovering real vulnerabilities, making risks transparent, and helping companies operate their AI systems securely. The new OWASP guide confirms much of what we have been applying in our work for a long time. In some places, however, it remains quite general. That's precisely why we continue to rely on scenario-based threat modeling and holistic security analyses instead of mere checklists.

Florian Kimmes, Senior Consultant IT Security and expert on Pentests of AI/LLM systems, usd AG
Portrait of Florian Kimmes in formal attire, Senior Consultant IT Security and expert on pentests of AI/LLM systems, usd AG.

Are you looking for a provider who can reliably test your AI applications or advise you on AI governance? Our experts will guide you through the next step toward more security. Contact us.

Also interesting:

Categories

Categories