
Cranium's Adversarial Inputs Detector is a security tool that scans AI agent configuration files for potential malicious content, data exfiltration endpoints, self-propagation patterns, and hidden Unicode characters.
Overview
Modern AI coding assistants (Cursor, Windsurf, GitHub Copilot) use markdown files to configure agent behaviors and instructions. This scanner detects potentially malicious patterns in these "trusted" files that could:
- Exfiltrate sensitive data through external URLs, API calls, or file paths
- Self-propagate by instructing agents to copy themselves to other trusted locations
- Hide malicious instructions using invisible Unicode characters
What It Scans
The scanner looks for markdown files in specific trusted directories:
.cursor/commands/- Cursor AI command files.windsurf/workflows/- Windsurf workflow files.github/andgithub/instructions- GitHub instructions (only files ending ininstructions.md)- Special agent files:
agents.md,claude.md,gemini.md(from any location)
Threat Categories
1. Data Exfiltration Endpoints
Detects suspicious external endpoints that could leak data:
- External URLs:
https://,http://,wss://,ftp:// - Data URIs: Embedded base64-encoded content in markdown/HTML
- API endpoints with tokens: URLs containing
token=,api_key=,auth=, etc. - URL shorteners:
bit.ly,t.co,tinyurl.com, etc. - Suspicious domains: Long subdomains, base64-like labels, DNS exfiltration patterns
- File paths in code blocks: References to sensitive system paths like
/etc/passwd,/var/log/ - HTML forms/images: External destinations in HTML blocks
- Mailto links: Email addresses with potential data in subject/body
- Template placeholders: Dynamic variables that could be injected at runtime
2. Self-Propagation Patterns
Detects instructions that tell AI agents to replicate themselves:
- References to trusted folders (
.cursor/commands,.github,.windsurf/workflows) - References to special agent instruction files (
agents.md,claude.md,gemini.md) - Common in prompts like: "Copy this file to
.cursor/commands"
3. Invisible Unicode Characters
Detects hidden characters that can conceal malicious instructions:
- Zero-width characters: U+200B (zero-width space), U+200C, U+200D, U+FEFF
- Bidirectional overrides: U+202E (RTL override), U+202D (LTR override)
- Variation selectors: U+FE00-FE0F, U+E0100-E01EF (when multiple consecutive)
- Unicode tag characters: U+E0000-E007F (can encode hidden messages)
- Control characters: Non-printable characters in text
These can be used for:
- Hiding instructions from human reviewers
- Data exfiltration via steganography
- Homograph attacks (disguising file extensions)
- Bypassing text-based filters
Scoring Rubric
Individual Pattern Scores
Each detected pattern also receives its own individual score and verdict, allowing you to see the vulnerability level of specific detections within a file.
Severity Thresholds
- CRITICAL: 80+ points
- HIGH: 50-79 points
- MEDIUM: 30-49 points
- LOW: 0-29 points
Scoring System
The detector analyzes files and assigns a vulnerability score based on the highest scoring pattern found:
1. Invisible Unicode Characters: Score = number of invisible characters detected, capped at 100.
2. Exfiltration Endpoints: Score = 0-100 points based on multiple factors:
- Data URIs: +70 points
- Explanation: Data URIs can embed malicious or exfiltrated content directly within files, making them hard to spot and easy to execute.
- Examples:
data:text/plain;base64,SGVsbG8=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL...
- Token parameters (
token,api_key, etc.): +60 points - Explanation: Direct inclusion of tokens or API keys as parameters in URLs can allow easy credential theft or exfiltration.
- Examples:
https://evil.com/?token=abc123https://api.site.com/data?api_key=SECRETXYZ
- Long/suspicious domain labels: +40 points
- Explanation: Attackers often use unusually long or randomized domains to evade detection or to register disposable endpoints for malicious use.
- Examples:
https://super-long-domain-with-random-characters-abcdef.comhttps://1234567890abcdef.yoursite.com
- External HTTP/HTTPS URLs: +30 points
- Explanation: Outbound requests to untrusted or external domains can signal attempted data exfiltration or C2 (Command and Control) communications.
- Examples:
https://unknownsite.com/datahttp://external-attacker.org/path
- URL shorteners: +30 points
- Explanation: URL shorteners are often used to obscure the final destination, making it easier to hide malicious links.
- Examples:
https://bit.ly/xyz123https://tinyurl.com/abcd
- Template placeholders: +25 points
- Explanation: Template placeholders like
{{token}}can signal dynamically generated endpoints used for injection or exfiltration, indicating tampering opportunities. - Examples:
https://api.example.com/{{token}}/gethttps://{{host}}/download
- Explanation: Template placeholders like
- Mailto links: +20 points
- Explanation:
mailto:links can be used for social engineering or to silently send stolen information by crafting automated emails. - Examples:
mailto:[email protected]mailto:[email protected]
- Explanation:
3. Self-Propagation Patterns: 80 points (default targets .cursor/commands, .windsurf/workflows, .github, .github/instructions, agent files)
