Introduction

As AI adoption grows, so do the risks associated with AI security. Ensuring that AI models are resilient against adversarial threats is a critical component of maintaining trust and reliability. The Arena page within Cranium’s platform provides organizations with a centralized view of their AI models, allowing them to assess vulnerabilities, analyze attack scenarios, and strengthen their security posture.

Through automated and manual penetration testing, the Arena simulates real-world adversarial techniques to uncover potential weaknesses before they can be exploited. Each AI model discovered across your AI Bills of Materials (BOMs) is cataloged and analyzed, providing detailed insights into its risk profile. The Model Threat Analysis page further breaks down a model’s security assessment, helping users understand key threats, evaluate attack categories, and track known weaknesses.

This article explores the different components of the Model Threat Analysis page, including the Findings Overview, the Vulnerable Attack Categories list, and the Known Weaknesses list. Each section is designed to help security teams gain a deeper understanding of their AI models’ vulnerabilities and take proactive measures to mitigate risks.

AI Models List

The Arena page displays AI models as cards in a grid view. Each model card provides essential information at a glance, including the model icon or logo for visual identification, the full model name as a clickable link, and the model's vulnerability likelihood shown as a percentage. The cards also display how many AI Systems contain the model, a severity badge indicating the highest risk level found during testing, and the completion status showing when the most recent penetration test was conducted.

You can search for specific models by typing any part of the model name into the search bar at the top of the page. The sort dropdown menu allows you to organize models by highest severity to prioritize critical vulnerabilities, or by most recently updated to see the latest test results. The Filters button provides additional options to narrow your results by specific AI Systems or by the date when penetration tests were conducted.

Models displayed in the Arena can have different statuses depending on their testing progress. Completed models have finished testing and are clickable to view the full Model Threat Analysis. Models currently in testing appear in the list but are not yet clickable until results are available. Models identified in your Bills of Materials but not yet tested will still appear in the Arena but show no test data until testing begins.

Model Threat Analysis

When you click on the summary card of a tested model, you are brought to the Model Threat Analysis page, which provides a detailed assessment of the model’s security posture. This page is designed to help you understand the vulnerabilities associated with the model, analyze potential attack vectors, and track security risks across your AI systems.

The Model Threat Analysis page is structured into three key modules: Findings Overview, Vulnerable Attack Categories, and Known Weaknesses. The Findings Overview presents a high-level summary of the latest test results, including the date of the last scan, the number of identified weaknesses, the model’s vulnerability likelihood, and a breakdown of the attack categories tested—showing how many attacks were performed and how many were successful. This module also lists the AI systems that contain the model and includes an option to download a Model Threat Analysis Report as a PDF.

The Vulnerable Attack Categories list provides insights into the penetration test results, offering a high-level summary of each tested attack category. It includes links to the attack execution details and supporting evidence. Meanwhile, the Known Weaknesses list highlights conditions that could develop into exploitable vulnerabilities under certain circumstances. This list is primarily sourced from security frameworks such as the OWASP Top Ten Vulnerabilities for LLMs and ML Models.

Findings Overview

The Findings Overview module provides a high-level summary of a model’s security assessment, helping users quickly understand its risk profile. This section includes the date of the last scan, ensuring transparency on when the most recent security evaluation was conducted. Users can also download a PDF report containing the full analysis for further review or documentation.

A key metric displayed in this module is the count of known weaknesses, which represents the number of identified conditions that could lead to vulnerabilities. Additionally, the vulnerability likelihood is shown as a percentage, calculated by dividing the number of successful attacks by the total number of attacks tested. This provides a measurable risk indicator for the model’s susceptibility to various adversarial techniques.

The Findings Overview also breaks down the attack categories that were tested. These include generating harmful responses, generating misinformation, generating hallucinations, susceptibility to jailbreaks, encoding attacks, prompt injection, data leakage, and enabling cyberattacks. By analyzing the outcomes of these attack simulations, users can assess the types of threats to which the model is most vulnerable.

Lastly, this module contains a list of AI Systems that incorporate the tested model. Each AI System name is hyperlinked, allowing users to navigate directly to its AI System Details page for further investigation. This interconnected view ensures that organizations can trace vulnerabilities at both the model and system levels, facilitating a more comprehensive security strategy.

Vulnerable Attach Categories List

The Vulnerable Attack Categories list highlights the specific attack categories where tested exploits were successful, indicating areas of confirmed vulnerability. In this context, vulnerabilities represent weaknesses that Cranium was able to exploit using various scanning and penetration testing methods. By identifying these attack categories, users gain insight into the security gaps that adversaries could potentially leverage. Each entry in the list provides key details, including the name of the attack category, the date the exploit was discovered, and the attack type used to expose the vulnerability.

Vulnerability Details

The Vulnerability Details modal provides an in-depth analysis of a successfully exploited vulnerability, offering greater context on how the AI model was compromised. The Vulnerability Overview section includes key information such as the date of the most recent update, the attack category, the attack type, a description of the attack, and the percentage of successful attacks within that category. This summary helps users quickly understand the nature and severity of the vulnerability.

The Successful Attacks section breaks down individual instances where an attack was able to exploit the model. Each entry includes the test date, attack type, attack framework, attack algorithm, attack method (manual or automated), and attack objective. Users can also click the evidence box to review any supporting documentation, such as conversation logs, that demonstrate how the attack was executed. Unsuccessful attack attempts are not included in this section, ensuring that the focus remains on confirmed vulnerabilities that require mitigation.

Known Weaknesses List

The Known Weaknesses list highlights potential security risks identified through Cranium's Vulnerability Assessment. These weaknesses represent conditions in a model that could be exploited under the right circumstances but were not directly tested in penetration assessments. For example, a model may be flagged with "LLM07:2025 System Prompt Leakage," indicating that, based on OWASP's 2025 #7 LLM vulnerability, the model may be susceptible to leaking system prompts. You can click on the hyperlinked text to learn more about the known weakness.
While sources like OWASP and OSV do not provide highly specific exploit details, they offer valuable security insights that help organizations assess potential risks. By identifying these weaknesses, users can take proactive security measures to mitigate threats before they are actively exploited.

Please note: A model's Arena data is not dependent on the success of the BOM's Vulnerability Assessment.

Did this answer your question?