Overview
Shield exports guardrail configurations as YAML files compatible with NVIDIA's NeMo Guardrails framework. This guide provides step-by-step instructions for implementing Shield-generated YAML configurations in your production AI systems.
Prerequisites
Required Software
- NeMo Guardrails Python package
- Python 3.8 or higher
- CUDA-compatible GPU hardware
- vLLM Python package (Deployment Method #1)
Infrastructure Requirements
- Minimum GPU Resources: 1 GPU per model component (main model + input guardrail + output guardrail)
- Recommended Setup: 3 separate GPUs for optimal performance isolation
- Memory Requirements: Sufficient VRAM for all models (varies by model size)
Understanding Shield YAML Structure
Shield exports YAML files with the following key components:
Models Section
The models section defines the main AI model and guardrail models with their serving configurations:
models:
- type: main
engine: vllm_openai
model: meta-llama/Llama-3.2-1B-Instruct
- type: content_safety_input_0
engine: vllm_openai
model: google/shieldgemma-2b
- type: content_safety_output_0
engine: vllm_openai
model: meta-llama/Llama-Guard-3-1BRails Section
The rails section specifies the input and output safety check flows:
rails:
input:
flows:
- content safety check input $model=content_safety_input_0
output:
flows:
- content safety check output $model=content_safety_output_0Prompts Section
The prompts section contains the detailed prompt templates for content safety validation, including specific safety policies and evaluation criteria.
Deployment Method 1: Local vLLM Servers
This method runs all models locally using vLLM servers, providing full control over your infrastructure.
Step 1: Set Up vLL Servers
Main Model Server:
CUDA_VISIBLE_DEVICES=0 vllm serve meta-llama/Llama-3.2-1B-Instruct \
--port 8000 --host 0.0.0.0 --api-key token-mainInput Guardrail Server:
CUDA_VISIBLE_DEVICES=1 vllm serve google/shieldgemma-2b \
--port 8001 --host 0.0.0.0 --api-key token-input_guardrail_0Output Guardrail Server:
CUDA_VISIBLE_DEVICES=2 vllm serve meta-llama/Llama-Guard-3-1B \
--port 8002 --host 0.0.0.0 --api-key token-output_guardrail_0Step 2: Configure YAML File
Update your Shield-exported YAML file to match your local server configuration:
models:
- type: main
engine: vllm_openai
model: meta-llama/Llama-3.2-1B-Instruct
parameters:
openai_api_base: http://localhost:8000/v1
openai_api_key: token-main
- type: content_safety_input_0
engine: vllm_openai
parameters:
openai_api_base: http://localhost:8001/v1
model_name: google/shieldgemma-2b
openai_api_key: token-input_guardrail_0
- type: content_safety_output_0
engine: vllm_openai
parameters:
openai_api_base: http://localhost:8002/v1
model_name: meta-llama/Llama-Guard-3-1B
openai_api_key: token-output_guardrail_0Step 3: Test Deployment
Use the provided Python script to test your configuration:
from pathlib import Path
import dotenv
import argparse
from nemoguardrails import RailsConfig, LLMRails
dotenv.load_dotenv()
def main(yaml_path: str, prompt: str):
config = RailsConfig.from_path(yaml_path)
rails = LLMRails(config)
out = rails.generate(messages=[{"role": "user", "content": prompt}])
print(out["content"])
if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Run NeMo Guardrails example deployment."
)
parser.add_argument(
"-f",
"--file",
type=str,
required=True,
help="Path to the YAML configuration file.",
)
parser.add_argument(
"-p", "--prompt", type=str, required=True, help="Prompt to generate."
)
args = parser.parse_args()
main(args.file, args.prompt)
# Run with: python example_deployment.py -f your_config.yaml -p "Your test prompt"Deployment Method 2: HuggingFace Inference Endpoints
This method uses managed HuggingFace infrastructure, reducing operational overhead but requiring external dependencies.
Step 1: Create HuggingFace Endpoints
- Create Endpoints: Follow Hugging Face Endpoint Creation Guide
- Required Endpoints:
- Main model endpoint (e.g., meta-llama/Llama-3.2-1B-Instruct)
- Input guardrail endpoint (e.g., google/shieldgemma-2b)
- Output guardrail endpoint (e.g., meta-llama/Llama-Guard-3-1B)
Step 2: Configure Environment
Set your HuggingFace API token:
export HUGGINGFACEHUB_API_TOKEN=hf_XXXXXXXXXXXXXXXXXXXXXXXXStep 3: Update YAML Configuration
Configure your Shield YAML with HuggingFace endpoint URLs:
models:
- type: main
engine: huggingface_endpoint
model: meta-llama/Llama-3.2-1B-Instruct
parameters:
endpoint_url: ${HF_MAIN_ENDPOINT_URL}
- type: content_safety_input_0
engine: huggingface_endpoint
parameters:
endpoint_url: ${HF_INPUT_ENDPOINT_URL}
model_name: google/shieldgemma-2b
- type: content_safety_output_0
engine: huggingface_endpoint
parameters:
endpoint_url: ${HF_OUTPUT_ENDPOINT_URL}
model_name: meta-llama/Llama-Guard-3-1BSet environment variables for your endpoint URLs:
export HF_MAIN_ENDPOINT_URL="https://your-main-endpoint.endpoints.huggingface.cloud"
export HF_INPUT_ENDPOINT_URL="https://your-input-endpoint.endpoints.huggingface.cloud"
export HF_OUTPUT_ENDPOINT_URL="https://your-output-endpoint.endpoints.huggingface.cloud"