Shield YAML Implementation Guide

Overview

Shield exports guardrail configurations as YAML files compatible with NVIDIA's NeMo Guardrails framework. This guide provides step-by-step instructions for implementing Shield-generated YAML configurations in your production AI systems.

Prerequisites

Required Software

NeMo Guardrails Python package
Python 3.8 or higher
CUDA-compatible GPU hardware
vLLM Python package (Deployment Method #1)

Infrastructure Requirements

Minimum GPU Resources: 1 GPU per model component (main model + input guardrail + output guardrail)
Recommended Setup: 3 separate GPUs for optimal performance isolation
Memory Requirements: Sufficient VRAM for all models (varies by model size)

Understanding Shield YAML Structure

Shield exports YAML files with the following key components:

Models Section

The models section defines the main AI model and guardrail models with their serving configurations:

models:
- type: main
  engine: vllm_openai
  model: meta-llama/Llama-3.2-1B-Instruct
- type: content_safety_input_0
  engine: vllm_openai
  model: google/shieldgemma-2b
- type: content_safety_output_0
  engine: vllm_openai
  model: meta-llama/Llama-Guard-3-1B

Rails Section

The rails section specifies the input and output safety check flows:

rails:
  input:
    flows:
    - content safety check input $model=content_safety_input_0
  output:
    flows:
    - content safety check output $model=content_safety_output_0

Prompts Section

The prompts section contains the detailed prompt templates for content safety validation, including specific safety policies and evaluation criteria.

Deployment Method 1: Local vLLM Servers

This method runs all models locally using vLLM servers, providing full control over your infrastructure.

Step 1: Set Up vLL Servers

Main Model Server:

CUDA_VISIBLE_DEVICES=0 vllm serve meta-llama/Llama-3.2-1B-Instruct \
  --port 8000 --host 0.0.0.0 --api-key token-main

Input Guardrail Server:

CUDA_VISIBLE_DEVICES=1 vllm serve google/shieldgemma-2b \
  --port 8001 --host 0.0.0.0 --api-key token-input_guardrail_0

Output Guardrail Server:

CUDA_VISIBLE_DEVICES=2 vllm serve meta-llama/Llama-Guard-3-1B \
  --port 8002 --host 0.0.0.0 --api-key token-output_guardrail_0

Step 2: Configure YAML File

Update your Shield-exported YAML file to match your local server configuration:

models:
- type: main
  engine: vllm_openai
  model: meta-llama/Llama-3.2-1B-Instruct
  parameters:
    openai_api_base: http://localhost:8000/v1
    openai_api_key: token-main
- type: content_safety_input_0
  engine: vllm_openai
  parameters:
    openai_api_base: http://localhost:8001/v1
    model_name: google/shieldgemma-2b
    openai_api_key: token-input_guardrail_0
- type: content_safety_output_0
  engine: vllm_openai
  parameters:
    openai_api_base: http://localhost:8002/v1
    model_name: meta-llama/Llama-Guard-3-1B
    openai_api_key: token-output_guardrail_0

Step 3: Test Deployment

Use the provided Python script to test your configuration:

from pathlib import Path
import dotenv
import argparse

from nemoguardrails import RailsConfig, LLMRails

dotenv.load_dotenv()


def main(yaml_path: str, prompt: str):
    config = RailsConfig.from_path(yaml_path)
    rails = LLMRails(config)
    out = rails.generate(messages=[{"role": "user", "content": prompt}])
    print(out["content"])


if __name__ == "__main__":
    parser = argparse.ArgumentParser(
        description="Run NeMo Guardrails example deployment."
    )
    parser.add_argument(
        "-f",
        "--file",
        type=str,
        required=True,
        help="Path to the YAML configuration file.",
    )
    parser.add_argument(
        "-p", "--prompt", type=str, required=True, help="Prompt to generate."
    )
    args = parser.parse_args()
    main(args.file, args.prompt)


# Run with: python example_deployment.py -f your_config.yaml -p "Your test prompt"

Deployment Method 2: HuggingFace Inference Endpoints

This method uses managed HuggingFace infrastructure, reducing operational overhead but requiring external dependencies.

Step 1: Create HuggingFace Endpoints

Create Endpoints: Follow Hugging Face Endpoint Creation Guide
Required Endpoints:
- Main model endpoint (e.g., meta-llama/Llama-3.2-1B-Instruct)
- Input guardrail endpoint (e.g., google/shieldgemma-2b)
- Output guardrail endpoint (e.g., meta-llama/Llama-Guard-3-1B)

Step 2: Configure Environment

Set your HuggingFace API token:

export HUGGINGFACEHUB_API_TOKEN=hf_XXXXXXXXXXXXXXXXXXXXXXXX

Step 3: Update YAML Configuration

Configure your Shield YAML with HuggingFace endpoint URLs:

models:
- type: main
  engine: huggingface_endpoint
  model: meta-llama/Llama-3.2-1B-Instruct
  parameters:
    endpoint_url: ${HF_MAIN_ENDPOINT_URL}
- type: content_safety_input_0
  engine: huggingface_endpoint
  parameters:
    endpoint_url: ${HF_INPUT_ENDPOINT_URL}
    model_name: google/shieldgemma-2b
- type: content_safety_output_0
  engine: huggingface_endpoint
  parameters:
    endpoint_url: ${HF_OUTPUT_ENDPOINT_URL}
    model_name: meta-llama/Llama-Guard-3-1B

Set environment variables for your endpoint URLs:

export HF_MAIN_ENDPOINT_URL="https://your-main-endpoint.endpoints.huggingface.cloud"
export HF_INPUT_ENDPOINT_URL="https://your-input-endpoint.endpoints.huggingface.cloud"
export HF_OUTPUT_ENDPOINT_URL="https://your-output-endpoint.endpoints.huggingface.cloud"

Did this answer your question?