Profile Analysis Overview

Profiles act as customizable rulesets that control how AI interactions are monitored and governed. They determine which signals to detect, how to respond to violations, and what actions to take on inputs and outputs.

Profile Analysis is where you evaluate how a single profile is performing across all the traffic it governs. Because one profile can be applied across many use cases and interaction types, this page aggregates everything that profile has handled into one view, independent of which use case the traffic came from. It answers three kinds of question:

Is the profile keeping up with traffic without adding excessive latency?
How is the profile acting on the traffic it sees?
What kinds of risk and data are flowing through the profile?

This is the policy-level counterpart to Use Case Analysis, which looks at activity for a single deployment rather than a single profile.

The page is scoped by three filters at the top. Each filter applies to the entire page.

Log Group: The traffic source being analyzed. This determines which body of traffic the profile is being judged against.
- Monitoring: Live traffic, including all activity sent through the API.
- Playground: Prompts you test in the Playground, run through a profile before any real traffic flows.
Timeframe: 24 hours, 7 days, 30 days, or all time.
Profile Selector: The profile being analyzed.

The page is divided into to areas. Operational Data covers how the profile is performing and acting. Signal Data covers what is present in the traffic.

Operational Data

Operational data describes the profile's throughput, latency, and the actions it is taking on traffic.

Summary Metrics

Four metrics summarize the profile's activity over the selected timeframe:

Total Time: The average total time across all events in the timeframe. This includes input analysis time, output analysis time, the model's processing time, and incoming and outgoing processing time.
Requests: The total number of input-output pairs the profile processed.
Pass Rate: The number of events not classified as blocked or modified, divided by the total event count.
Unique Users: The number of unique users associated with the profile's traffic.

Safety and Security Signals

Two count cards show the total number of detected signals in each of two high-level groupings:

Safety Signals: Non-adversarial signals spanning content, data, and code. Includes Illegality, Toxicity, Blocklist, PII/PHI/PCI, Secrets, Code Present, and Code Requested.
Security Signals: Adversarial signals that target the model itself. Includes Jailbreaking, Instruction Override, Prompt Leaking, Role Impersonation, Direct Command Injection, Self-Referential Injection, and Goal Hijacking.

Prompt Actions and Response Actions

The Prompt Actions and Response Actions modules show how the profile handled prompt input traffic and response output traffic. Each displays the distribution of three outcomes:

Block: The request or response was blocked.
Modify: The request or response was modified before being passed through.
Pass: The request or response was passed through without modification.

Together these show how aggressively the profile in intervening on each side of an interaction. Each module also displays Total Tokens and Average Tokens for the traffic it covers. Total Tokens is the combined input and output token count for the timeframe, and Average Tokens is the average number of tokens per event. A token is roughly half a word to a full word, depending on the size. These figures give a sense of the size of inputs and outputs down to the event level.

Analysis Duration

Where Total time reports a single headline figure, the Analysis Duration table shows where that time is spent. It breaks processing into three stages:

Prompt Analysis: Time spent evaluating the prompt.
LLM: Time spent on the model call.
Response Analysis: Time spent evaluating the response.

The table has four columns:

Duration: The processing stage (Prompt Analysis, LLM, or Response Analysis).
Time: How long the stage took, in seconds.
Delta: The percent change from the previous equivalent period. When the page is filtered to all time, this is always zero. For any other timeframe, it compares against the prior equivalent window, such as the last 24 hours against the 24 hours before it.
%: The share of total time spent in that stage.

Signal Data

Signal data describes the risk and entities present in the traffic the profile handled.

Signal Distribution

Signal Distribution displays the distribution of detected signals across signal groups for both prompt traffic and response traffic. On this page, signals are grouped into two categories:

Safety: Signals that identify harmful, hostile, or unlawful material.
- Toxicity: Hostile, aggressive, disrespectful, or harmful language, including hate speech, harassment, threats, and personal attacks.
- Illegality: References to illegal activities or instructions for unlawful behavior across categories such as cybercrime, fraud, drugs, violence, and terrorism.
Security: Signals that identify attempts to manipulate or exploit the model through adversarial inputs. Includes prompt injection, jailbreaking, role impersonation, and similar techniques.

A gear icon on the module opens a Chart Configuration panel with the following controls:

View Mode: Both, Prompts, or Responses. Filters the chart to display signals from prompt traffic, response traffic, or both.
Signal Type: Both, Safety, or Security. Filters the chart to display Safety signals, Security signals, or both.
Show Zero Values: A toggle that controls whether signal categories with zero detections appear in the chart.

Attack Vectors

Attack Vectors displays a ranked list of detected adversarial attack types with counts. These are the same adversarial detections counted in the Security Signals card, presented here as a ranked list. Tracked attack types include:

Jailbreaking: Attempts to bypass safety guardrails through indirect manipulation, hypothetical scenarios, or roleplay.
Goal Hijacking: Attempts to redirect the model away from its intended purpose toward unrelated or harmful objectives.
Prompt Leaking: Attempts to extract the system's internal instructions, prompts, or configuration details.
Direct Command Injection: Attempts to inject executable commands or system-level instructions into user inputs.
Instruction Override: Attempts to replace or supersede the system's original instructions with new directives.
Self-Referential Injection: Attempts to embed instructions within the expected output format or recursive responses.

Entities

The Entities visualization displays the distribution of detected named entities and can be displayed in multiple formats. Named entities are real-world objects or concepts that can be referred to by a proper name, things that exist in the world and have an identity distinct from a generic category. Entity categories include Academic, Business, People, Institutional, Geographical, Organizational, Temporal, Monetary, and Cultural. A search and filter control allows users to explore entity data.

Did this answer your question?