Adversarial Inputs Detector

Cranium's Adversarial Inputs Detector is a security tool that scans AI agent configuration files for potential malicious content, data exfiltration endpoints, self-propagation patterns, and hidden Unicode characters.

Overview

Modern AI coding assistants (Cursor, Windsurf, GitHub Copilot) use markdown files to configure agent behaviors and instructions. This scanner detects potentially malicious patterns in these "trusted" files that could:

- Exfiltrate sensitive data through external URLs, API calls, or file paths
- Self-propagate by instructing agents to copy themselves to other trusted locations
- Hide malicious instructions using invisible Unicode characters

What It Scans

The scanner looks for markdown files in specific trusted directories:

.cursor/commands/ - Cursor AI command files
.windsurf/workflows/ - Windsurf workflow files
.github/ and github/instructions - GitHub instructions (only files ending in instructions.md)
Special agent files: agents.md, claude.md, gemini.md (from any location)

Threat Categories

1. Data Exfiltration Endpoints

Detects suspicious external endpoints that could leak data:

External URLs: https://, http://, wss://, ftp://
Data URIs: Embedded base64-encoded content in markdown/HTML
API endpoints with tokens: URLs containing token=, api_key=, auth=, etc.
URL shorteners: bit.ly, t.co, tinyurl.com, etc.
Suspicious domains: Long subdomains, base64-like labels, DNS exfiltration patterns
File paths in code blocks: References to sensitive system paths like /etc/passwd, /var/log/
HTML forms/images: External destinations in HTML blocks
Mailto links: Email addresses with potential data in subject/body
Template placeholders: Dynamic variables that could be injected at runtime

2. Self-Propagation Patterns

Detects instructions that tell AI agents to replicate themselves:

References to trusted folders (.cursor/commands, .github, .windsurf/workflows)
References to special agent instruction files (agents.md, claude.md, gemini.md)
Common in prompts like: "Copy this file to .cursor/commands"

3. Invisible Unicode Characters

Detects hidden characters that can conceal malicious instructions:

Zero-width characters: U+200B (zero-width space), U+200C, U+200D, U+FEFF
Bidirectional overrides: U+202E (RTL override), U+202D (LTR override)
Variation selectors: U+FE00-FE0F, U+E0100-E01EF (when multiple consecutive)
Unicode tag characters: U+E0000-E007F (can encode hidden messages)
Control characters: Non-printable characters in text

These can be used for:

Hiding instructions from human reviewers
Data exfiltration via steganography
Homograph attacks (disguising file extensions)
Bypassing text-based filters

Scoring Rubric

Individual Pattern Scores

Each detected pattern also receives its own individual score and verdict, allowing you to see the vulnerability level of specific detections within a file.

Severity Thresholds

CRITICAL: 80+ points
HIGH: 50-79 points
MEDIUM: 30-49 points
LOW: 0-29 points

Scoring System

The detector analyzes files and assigns a vulnerability score based on the highest scoring pattern found:

1. Invisible Unicode Characters: Score = number of invisible characters detected, capped at 100.
2. Exfiltration Endpoints: Score = 0-100 points based on multiple factors:

- Data URIs: +70 points
- - Explanation: Data URIs can embed malicious or exfiltrated content directly within files, making them hard to spot and easy to execute.
  - Examples:
  - - data:text/plain;base64,SGVsbG8=
    - data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL...
- Token parameters (token, api_key, etc.): +60 points
- - Explanation: Direct inclusion of tokens or API keys as parameters in URLs can allow easy credential theft or exfiltration.
  - Examples:
  - - https://evil.com/?token=abc123
    - https://api.site.com/data?api_key=SECRETXYZ
- Long/suspicious domain labels: +40 points
- - Explanation: Attackers often use unusually long or randomized domains to evade detection or to register disposable endpoints for malicious use.
  - Examples:
  - - https://super-long-domain-with-random-characters-abcdef.com
    - https://1234567890abcdef.yoursite.com
- External HTTP/HTTPS URLs: +30 points
- - Explanation: Outbound requests to untrusted or external domains can signal attempted data exfiltration or C2 (Command and Control) communications.
  - Examples:
  - - https://unknownsite.com/data
    - http://external-attacker.org/path
- URL shorteners: +30 points
- - Explanation: URL shorteners are often used to obscure the final destination, making it easier to hide malicious links.
  - Examples:
  - - https://bit.ly/xyz123
    - https://tinyurl.com/abcd
- Template placeholders: +25 points
- - Explanation: Template placeholders like {{token}} can signal dynamically generated endpoints used for injection or exfiltration, indicating tampering opportunities.
  - Examples:
  - - https://api.example.com/{{token}}/get
    - https://{{host}}/download
- Mailto links: +20 points
- - Explanation: mailto: links can be used for social engineering or to silently send stolen information by crafting automated emails.
  - Examples:
  - - mailto:[email protected]
    - mailto:[email protected]

3. Self-Propagation Patterns: 80 points (default targets .cursor/commands, .windsurf/workflows, .github, .github/instructions, agent files)

Did this answer your question?