Arena Shield Overview

Overview

Arena Shield enables targeted penetration testing against AI models with system prompt context, followed by guardrail remediation and iterative optimization. This guide covers technical implementation, YAML export integration, and critical cost/performance considerations.

Getting Started

Initial Test Run Setup

Navigate to Arena Shield in the left sidebar menu.
Click the Create a New Test Run button and provide a descriptive name with version numbering.
Select a model from the 10 supported options in the Base Model dropdown (start with smaller models for faster results).
Choose System Prompt:
1. Pre-defined options: Code Assistant, External Chatbot, Internal Chatbot
2. Custom prompt: Type your production system prompt for authentic testing
3. BOM-detected prompt: If a system prompt has been detected for the selected base model in a BOM, it will appear in the dropdown automatically
Click the Begin Test Run button and monitor live progress with real-time updates.

Initial Results Review

After your initial test run completes, you'll see the baseline security assessment results screen with two main sections:

Successful Tests

Attack categories where your model defended successfully (< 10% of attacks in that category succeeded). These represent areas where your current model configuration already provides good security.

Failed Tests

Attack categories where your model needs improvement (> 10% of attacks succeeded). These are the areas that would benefit from guardrail implementation.

At this stage, you won't see vulnerability likelihood percentages or delta comparisons because those appear after remediation. The initial results focus on identifying which attack categories need attention.

Guardrail Remediation Process

Begin Remediation

Review your Failed Tests Categories and focus on areas needing improvement.
Click the Begin Remediation button from the initial results screen.
Choose an Input Guardrail (optional) by selecting one guardrail to filter incoming requests.
Choose an Output Guardrail (optional) by selecting at least one guardrail to filter AI responses.
1. Important Limitation: Maximum of 1 input + 1 output guardrail to prevent conflicts
Click the Apply Guardrails and Retest button to initiate a new test run with your selected protections.

Monitoring Remediation Progress

The remediation test run includes live progress updates showing:

Overall completion percentage
Individual attack category progress
Real-time status messages from testing services

Guardrail Types

Input Guardrails filter malicious prompts, injection attempts, and inappropriate requests before they reach your model.
Output Guardrails review and filter AI-generated responses to prevent harmful, biased, or inappropriate content delivery.
Each guardrail includes description of functionality and recommended use cases. Expand guardrail details before selection to understand impact.

Remediation Results Analysis

After the remediation test completes, you'll access the comprehensive results screen with:

Vulnerability Likelihood Comparison

Post-remediation percentage: Improved vulnerability likelihood with guardrails applied
Delta calculation:Quantified security improvement (should show decrease)
- Due to the non-deterministic nature of LLM outputs, you may experience variability in results between test runs

Updated Test Categories

Successful Tests: Categories that improved to < 10% attack success rate
Failed Tests: Categories still above 10% attack success rate (with updated percentages)
Test Count Changes: Number of categories that moved from failed to successful

Detailed Analysis Options

View System Prompt: Review the prompt configuration used in testing

Verbose Logs: Access detailed attack information and responses for security analysis

YAML Export & Integration

Downloading Configurations

Complete the remediation process by applying guardrails and verifying improved results.
Download the guardrails YAML file, which exports a NeMo Guardrails-compatible configuration file.
Verify the file contents to confirm the YAML structure contains model definitions and guardrail flows.

Integration Overview

YAML Structure

Shield exports YAML files compatible with NVIDIA's NeMo Guardrails framework, containing:

Model configurations (main model, input/output guardrail models)
Rail flow definitions (input and output safety checks)
Prompt templates for content safety validation

Deployment Options

Local Deployment: Run models locally using vLLM servers with GPU resources
Cloud Deployment: Deploy using Hugging Face Inference Endpoints for managed infrastructure
Hybrid Approaches: Combine local and cloud resources based on security and performance requirements

Integration Requirements

Install NeMo Guardrails Python package
Configure appropriate model serving infrastructure (vLLM or HuggingFace endpoints)
Set up environment variables and API keys
Allocate sufficient GPU resources (minimum: 1 GPU per model + guardrails)

Next Steps

For Detailed Implementation:

Review the complete YAML Implementation Guide in the Knowledge Base
Examine provided code examples and configuration templates
Test configurations in staging environments before production deployment

Technical Support: Contact [email protected] for specific deployment architecture guidance.

Iterative Optimization Workflow

Test Run Cloning

Clone completed tests to create a new test run from the post-initial-testing state without requiring full baseline retesting.
Modify your guardrail selection to try different combinations and configurations efficiently.
Compare results across iterations to identify the optimal balance between security improvement and performance impact.
Export the best performing configuration by downloading the YAML file for production implementation.

Systematic Approach

Take a methodical approach to guardrail optimization by testing individual guardrails first to understand their isolated impact on both security and performance. Once you understand how each guardrail behaves independently, combine them carefully to avoid conflicts that could reduce effectiveness or create unexpected blocking behavior. Document the performance implications of each configuration you test, including response time changes, resource utilization impacts, and any user experience effects. Maintain detailed test result history for future reference, as this documentation will prove valuable when scaling your implementation or troubleshooting issues that arise in production.

Performance Benchmarking

Establish comprehensive performance baselines before implementing any guardrails by measuring your model's response times, throughput capacity, and resource utilization under normal operating conditions. As you test different guardrail combinations, track response time increases systematically to understand the performance cost of each security improvement. Monitor resource utilization changes across CPU, GPU, and memory usage to ensure your infrastructure can handle the additional computational overhead. Establish acceptable performance thresholds based on your business requirements and user expectations, creating clear criteria for determining when security improvements justify the performance trade-offs.

Did this answer your question?