Find the insights and best practices about our product.
Adversarial Inputs Detector

Cranium's Adversarial Inputs Detector is a security tool that scans AI agent configuration files for potential malicious content, data exfiltration endpoints, self-propagation patterns, and hidden Unicode characters.

Overview

Modern AI coding assistants (Cursor, Windsurf, GitHub Copilot) use markdown files to configure agent behaviors and instructions. This scanner detects potentially malicious patterns in these "trusted" files that could:

- Exfiltrate sensitive data through external URLs, API calls, or file paths
- Self-propagate by instructing agents to copy themselves to other trusted locations
- Hide malicious instructions using invisible Unicode characters

What It Scans

The scanner looks for markdown files in specific trusted directories:

  • .cursor/commands/ - Cursor AI command files
  • .windsurf/workflows/ - Windsurf workflow files
  • .github/ and github/instructions - GitHub instructions (only files ending in instructions.md)
  • Special agent files: agents.md, claude.md, gemini.md (from any location)

 

Threat Categories

1. Data Exfiltration Endpoints

Detects suspicious external endpoints that could leak data:

  • External URLs: https://, http://, wss://, ftp://
  • Data URIs: Embedded base64-encoded content in markdown/HTML
  • API endpoints with tokens: URLs containing token=, api_key=, auth=, etc.
  • URL shorteners: bit.ly, t.co, tinyurl.com, etc.
  • Suspicious domains: Long subdomains, base64-like labels, DNS exfiltration patterns
  • File paths in code blocks: References to sensitive system paths like /etc/passwd, /var/log/
  • HTML forms/images: External destinations in HTML blocks
  • Mailto links: Email addresses with potential data in subject/body
  • Template placeholders: Dynamic variables that could be injected at runtime

2. Self-Propagation Patterns

Detects instructions that tell AI agents to replicate themselves:

  • References to trusted folders (.cursor/commands, .github, .windsurf/workflows)
  • References to special agent instruction files (agents.md, claude.md, gemini.md)
  • Common in prompts like: "Copy this file to .cursor/commands"

3. Invisible Unicode Characters

Detects hidden characters that can conceal malicious instructions:

  • Zero-width characters: U+200B (zero-width space), U+200C, U+200D, U+FEFF
  • Bidirectional overrides: U+202E (RTL override), U+202D (LTR override)
  • Variation selectors: U+FE00-FE0F, U+E0100-E01EF (when multiple consecutive)
  • Unicode tag characters: U+E0000-E007F (can encode hidden messages)
  • Control characters: Non-printable characters in text

 

These can be used for:

  • Hiding instructions from human reviewers
  • Data exfiltration via steganography
  • Homograph attacks (disguising file extensions)
  • Bypassing text-based filters

Scoring Rubric

Individual Pattern Scores

Each detected pattern also receives its own individual score and verdict, allowing you to see the vulnerability level of specific detections within a file.

Severity Thresholds

  • CRITICAL: 80+ points
  • HIGH: 50-79 points
  • MEDIUM: 30-49 points
  • LOW: 0-29 points

Scoring System 

The detector analyzes files and assigns a vulnerability score based on the highest scoring pattern found:

1. Invisible Unicode Characters: Score = number of invisible characters detected, capped at 100.
2. Exfiltration Endpoints: Score = 0-100 points based on multiple factors:

    • Data URIs: +70 points  
      • Explanation: Data URIs can embed malicious or exfiltrated content directly within files, making them hard to spot and easy to execute.  
      • Examples:  
        • data:text/plain;base64,SGVsbG8=
        • data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL...
    • Token parameters (token, api_key, etc.): +60 points
      • Explanation: Direct inclusion of tokens or API keys as parameters in URLs can allow easy credential theft or exfiltration.  
      • Examples:  
        • https://evil.com/?token=abc123
        • https://api.site.com/data?api_key=SECRETXYZ
    • Long/suspicious domain labels: +40 points 
      • Explanation: Attackers often use unusually long or randomized domains to evade detection or to register disposable endpoints for malicious use.
      • Examples:
        • https://super-long-domain-with-random-characters-abcdef.com
        • https://1234567890abcdef.yoursite.com
    • External HTTP/HTTPS URLs: +30 points
      • Explanation: Outbound requests to untrusted or external domains can signal attempted data exfiltration or C2 (Command and Control) communications.
      • Examples:
        • https://unknownsite.com/data
        • http://external-attacker.org/path
    • URL shorteners: +30 points
      • Explanation: URL shorteners are often used to obscure the final destination, making it easier to hide malicious links.
      • Examples:
        • https://bit.ly/xyz123
        • https://tinyurl.com/abcd
    • Template placeholders: +25 points
      • Explanation: Template placeholders like {{token}} can signal dynamically generated endpoints used for injection or exfiltration, indicating tampering opportunities.
      • Examples:
        • https://api.example.com/{{token}}/get
        • https://{{host}}/download
    • Mailto links: +20 points
      • Explanation: mailto: links can be used for social engineering or to silently send stolen information by crafting automated emails.
      • Examples:

3. Self-Propagation Patterns: 80 points (default targets .cursor/commands, .windsurf/workflows, .github, .github/instructions, agent files)

Did this answer your question?