
Trace is the explainability view. It shows how the classification models evaluated a prompt or a response, so when an event is flagged, modified, or blocked, you can see what was detected, where in the text it was found, and why the models classified it the way they did. You open Trace from the Trace icon in the event detail panel toolbar, and it works the same way wherever you reach it, whether the event came from Monitoring, Cannon, Playground, or Bookmarks.
The Trace view keeps the same event header as the event detail panel, so you stay oriented to which event you are looking at while you investigate it.
Choosing What to Trace
Two sets of controls sit at the top of the view. The first selects which side of the event you are tracing: Prompt or Response. The second selects how you want to view that side: Signals, Chunks, Samples, or All. You pick a side, then a view.
The three main views move from what was detected, to where it was detected, to why. Working through them in that order is the natural path when you are trying to understand a classification.
Signals
The Signals view shows the signals detected on the selected side. Signals appear as labeled tags with counts, grouped by type, such as Prompt Signals, PII Signals, and Prompt Intent. This is the quickest read of what the models found.
Chunks
The Chunks view breaks the text into numbered semantic chunks and shows the signal tags associated with each one. This tells you where in the prompt or response a signal was detected, rather than only that it was detected. When a single prompt raises several signals, the chunks show which part of the text drove each one. For example, a large input might be divided into twelve chunks, with adversarial language detected in one chunk and sentiment derived from another, so each signal traces back to the specific span that produced it.
Chunking happens because the analysis models are optimized for inputs up to 80 tokens. When an input is larger than that, it is divided into multiple chunks that are analyzed in parallel. Chunking is semantic, which means an input is never divided mid-sentence or mid-paragraph. Because chunk boundaries follow sentence and paragraph breaks, actual chunk size falls between roughly 65 and 80 tokens.
Samples
The Samples view is the deepest level of explanation. It shows why the models classified a chunk the way they did, by comparing it to the training samples it most resembles.
The view has two panels. The Modules panel on the left lists the classification modules use to evaluate the event: CPVS, Intent, IOR, MTVS, and Sentiment. Each module can be expanded to show sub-levels. Selecting a module and level filters what appears on the right.
- Intent: Classifies the intent expressed in the content.
- IOR (Instruction Override): Detects attempts to replace or supersede the system's original instructions with new directives.
The Samples panel on the right displays the training samples from the selected module that are most similar to the selected chunk. Samples are sorted by importance, with the most influential sample first. Each sample shows its text, the label assigned to it, a confidence percentage, and two scores:
- Distance: How far the sample is from the chunk. A smaller distance means a closer match. The most similar samples have the smallest distances.
- Relevance: How much weight the model gives the sample, based on its similarity to the chunk. Higher relevance means the sample had more influence on the classification.
A single sample can carry more than one label, since a sample can belong to multiple classes at once. Each module is threshold optimized, so a classification only fires when the probability passes that module's threshold.
All
The All view combines Signals, Chunks, and Samples into a single scrollable layout, for when you want to see the full picture at once rather than moving between views.
Moving Between Events
Previous and Next buttons at the bottom of the view move you to the neighboring events without leaving Trace, so you can work through a set of events in sequence.
