Find the insights and best practices about our product.
Shield YAML Implementation Guide

Overview

Shield exports guardrail configurations as YAML files compatible with NVIDIA's NeMo Guardrails framework. This guide provides step-by-step instructions for implementing Shield-generated YAML configurations in your production AI systems.

Prerequisites

Required Software

Infrastructure Requirements

  • Minimum GPU Resources: 1 GPU per model component (main model + input guardrail + output guardrail)
  • Recommended Setup: 3 separate GPUs for optimal performance isolation
  • Memory Requirements: Sufficient VRAM for all models (varies by model size)

Understanding Shield YAML Structure

Shield exports YAML files with the following key components:

Models Section 

The models section defines the main AI model and guardrail models with their serving configurations:

models:
- type: main
engine: vllm_openai
model: meta-llama/Llama-3.2-1B-Instruct
- type: content_safety_input_0
engine: vllm_openai
model: google/shieldgemma-2b
- type: content_safety_output_0
engine: vllm_openai
model: meta-llama/Llama-Guard-3-1B

Rails Section

The rails section specifies the input and output safety check flows:

rails:
input:
flows:
- content safety check input $model=content_safety_input_0
output:
flows:
- content safety check output $model=content_safety_output_0

Prompts Section

The prompts section contains the detailed prompt templates for content safety validation, including specific safety policies and evaluation criteria.

Deployment Method 1: Local vLLM Servers

This method runs all models locally using vLLM servers, providing full control over your infrastructure.

Step 1: Set Up vLL Servers

Main Model Server:

CUDA_VISIBLE_DEVICES=0 vllm serve meta-llama/Llama-3.2-1B-Instruct \
--port 8000 --host 0.0.0.0 --api-key token-main

Input Guardrail Server:

CUDA_VISIBLE_DEVICES=1 vllm serve google/shieldgemma-2b \
--port 8001 --host 0.0.0.0 --api-key token-input_guardrail_0

Output Guardrail Server:

CUDA_VISIBLE_DEVICES=2 vllm serve meta-llama/Llama-Guard-3-1B \
--port 8002 --host 0.0.0.0 --api-key token-output_guardrail_0

Step 2: Configure YAML File

Update your Shield-exported YAML file to match your local server configuration:

models:
- type: main
engine: vllm_openai
model: meta-llama/Llama-3.2-1B-Instruct
parameters:
openai_api_base: http://localhost:8000/v1
openai_api_key: token-main
- type: content_safety_input_0
engine: vllm_openai
parameters:
openai_api_base: http://localhost:8001/v1
model_name: google/shieldgemma-2b
openai_api_key: token-input_guardrail_0
- type: content_safety_output_0
engine: vllm_openai
parameters:
openai_api_base: http://localhost:8002/v1
model_name: meta-llama/Llama-Guard-3-1B
openai_api_key: token-output_guardrail_0

Step 3: Test Deployment

Use the provided Python script to test your configuration:

from pathlib import Path
import dotenv
import argparse

from nemoguardrails import RailsConfig, LLMRails

dotenv.load_dotenv()


def main(yaml_path: str, prompt: str):
config = RailsConfig.from_path(yaml_path)
rails = LLMRails(config)
out = rails.generate(messages=[{"role": "user", "content": prompt}])
print(out["content"])


if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Run NeMo Guardrails example deployment."
)
parser.add_argument(
"-f",
"--file",
type=str,
required=True,
help="Path to the YAML configuration file.",
)
parser.add_argument(
"-p", "--prompt", type=str, required=True, help="Prompt to generate."
)
args = parser.parse_args()
main(args.file, args.prompt)


# Run with: python example_deployment.py -f your_config.yaml -p "Your test prompt"

Deployment Method 2: HuggingFace Inference Endpoints

This method uses managed HuggingFace infrastructure, reducing operational overhead but requiring external dependencies.

Step 1: Create HuggingFace Endpoints

  1. Create Endpoints: Follow Hugging Face Endpoint Creation Guide
  2. Required Endpoints:
    • Main model endpoint (e.g., meta-llama/Llama-3.2-1B-Instruct)
    • Input guardrail endpoint (e.g., google/shieldgemma-2b)
    • Output guardrail endpoint (e.g., meta-llama/Llama-Guard-3-1B)

Step 2: Configure Environment

Set your HuggingFace API token:

export HUGGINGFACEHUB_API_TOKEN=hf_XXXXXXXXXXXXXXXXXXXXXXXX

Step 3: Update YAML Configuration

Configure your Shield YAML with HuggingFace endpoint URLs:

models:
- type: main
engine: huggingface_endpoint
model: meta-llama/Llama-3.2-1B-Instruct
parameters:
endpoint_url: ${HF_MAIN_ENDPOINT_URL}
- type: content_safety_input_0
engine: huggingface_endpoint
parameters:
endpoint_url: ${HF_INPUT_ENDPOINT_URL}
model_name: google/shieldgemma-2b
- type: content_safety_output_0
engine: huggingface_endpoint
parameters:
endpoint_url: ${HF_OUTPUT_ENDPOINT_URL}
model_name: meta-llama/Llama-Guard-3-1B

Set environment variables for your endpoint URLs:

export HF_MAIN_ENDPOINT_URL="https://your-main-endpoint.endpoints.huggingface.cloud"
export HF_INPUT_ENDPOINT_URL="https://your-input-endpoint.endpoints.huggingface.cloud"
export HF_OUTPUT_ENDPOINT_URL="https://your-output-endpoint.endpoints.huggingface.cloud"
Did this answer your question?