mirror of
https://github.com/0xsrb/AASRT.git
synced 2026-04-23 12:56:37 +02:00
2099 lines
57 KiB
Markdown
2099 lines
57 KiB
Markdown
# Product Requirements Document: AI Agent Security Reconnaissance Tool (AASRT)
|
|
|
|
**Version:** 1.0
|
|
**Date:** February 8, 2026
|
|
**Author:** AGK
|
|
**Document Type:** Comprehensive Technical Specification
|
|
|
|
---
|
|
|
|
## 1. EXECUTIVE SUMMARY
|
|
|
|
### 1.1 Product Vision
|
|
Develop a Python-based security reconnaissance tool that automates the discovery of publicly exposed AI agent implementations (ClawdBot, AutoGPT, LangChain agents, etc.) with potential security misconfigurations across the internet using multiple search engines and APIs.
|
|
|
|
### 1.2 Problem Statement
|
|
Security researchers and organizations currently lack automated tools to:
|
|
- Identify exposed AI agent instances at scale
|
|
- Assess the security posture of AI agent deployments
|
|
- Monitor for misconfigurations that expose sensitive data
|
|
- Track the proliferation of vulnerable AI agent implementations
|
|
|
|
### 1.3 Target Audience
|
|
- Security researchers and penetration testers
|
|
- Bug bounty hunters
|
|
- DevSecOps teams auditing AI deployments
|
|
- Cybersecurity educators and students
|
|
- Incident response teams
|
|
|
|
### 1.4 Success Metrics
|
|
- Ability to discover 90%+ of publicly indexed vulnerable instances
|
|
- Query execution time < 30 seconds per search engine
|
|
- False positive rate < 15%
|
|
- Support for 5+ search engines/data sources
|
|
- Exportable reports in multiple formats (JSON, CSV, HTML, PDF)
|
|
|
|
---
|
|
|
|
## 2. LEGAL & ETHICAL FRAMEWORK
|
|
|
|
### 2.1 Compliance Requirements
|
|
**MANDATORY IMPLEMENTATION:**
|
|
- Tool operates in **passive reconnaissance mode only** (no active exploitation)
|
|
- Strict adherence to search engine Terms of Service
|
|
- Rate limiting to prevent API abuse
|
|
- Legal disclaimer displayed on every execution
|
|
- User consent required before execution
|
|
- Logging of all queries for accountability
|
|
|
|
### 2.2 Responsible Disclosure
|
|
- Built-in responsible disclosure workflow
|
|
- Templates for notifying affected parties
|
|
- Integration with CVE/vulnerability databases
|
|
- Option to anonymize findings before reporting
|
|
|
|
### 2.3 Terms of Service Compliance
|
|
**Per Search Engine:**
|
|
- Shodan: Max 1 query/second, API key required
|
|
- Censys: Respect rate limits (120 requests/5 minutes)
|
|
- ZoomEye: Authentication required, rate limits enforced
|
|
- Google Dorking: Robots.txt compliance, no automated queries without approval
|
|
- GitHub: API rate limits, token-based authentication
|
|
|
|
### 2.4 Legal Disclaimer Template
|
|
```
|
|
WARNING: This tool is for authorized security research and defensive purposes only.
|
|
Unauthorized access to computer systems is illegal under CFAA (US), Computer Misuse Act (UK),
|
|
and similar laws worldwide. Users are responsible for ensuring lawful use.
|
|
By proceeding, you acknowledge:
|
|
1. You have authorization to scan target systems
|
|
2. You will comply with all applicable laws and terms of service
|
|
3. You will responsibly disclose findings
|
|
4. You will not exploit discovered vulnerabilities
|
|
```
|
|
|
|
---
|
|
|
|
## 3. FUNCTIONAL REQUIREMENTS
|
|
|
|
### 3.1 Core Features
|
|
|
|
#### 3.1.1 Multi-Source Search Integration
|
|
**Requirement ID:** FR-001
|
|
**Priority:** P0 (Critical)
|
|
|
|
**Description:**
|
|
Integrate with multiple search engines and threat intelligence platforms to maximize discovery coverage.
|
|
|
|
**Supported Data Sources:**
|
|
|
|
1. **Shodan**
|
|
- API Endpoint: `https://api.shodan.io/shodan/host/search`
|
|
- Authentication: API Key (user-provided)
|
|
- Search capabilities: Port scanning, service fingerprinting, banner grabbing
|
|
- Rate limit: 1 query/second (enforce client-side)
|
|
- Data returned: IP, port, hostname, service, banner, SSL cert info
|
|
|
|
2. **Censys**
|
|
- API Endpoint: `https://search.censys.io/api/v2/hosts/search`
|
|
- Authentication: API ID + Secret
|
|
- Search capabilities: Host search, certificate search
|
|
- Rate limit: 120 requests per 5 minutes
|
|
- Data returned: IP, services, protocols, certificates, autonomous system info
|
|
|
|
3. **ZoomEye**
|
|
- API Endpoint: `https://api.zoomeye.org/host/search`
|
|
- Authentication: JWT token
|
|
- Search capabilities: Device fingerprinting, service identification
|
|
- Rate limit: Variable by account type
|
|
- Data returned: IP, port, service, device info, location
|
|
|
|
4. **GitHub Code Search**
|
|
- API Endpoint: `https://api.github.com/search/code`
|
|
- Authentication: Personal Access Token
|
|
- Search capabilities: Code pattern matching, configuration file discovery
|
|
- Rate limit: 30 requests/minute (authenticated)
|
|
- Data returned: Repository, file path, code snippets, commit history
|
|
|
|
5. **Google Custom Search (Optional)**
|
|
- API Endpoint: `https://customsearch.googleapis.com/customsearch/v1`
|
|
- Authentication: API Key + Search Engine ID
|
|
- Search capabilities: Dork-based discovery
|
|
- Rate limit: 100 queries/day (free tier)
|
|
- Data returned: URLs, snippets, page metadata
|
|
|
|
6. **GreyNoise**
|
|
- API Endpoint: `https://api.greynoise.io/v3/community/`
|
|
- Authentication: API Key
|
|
- Purpose: Filter out internet scanners/noise
|
|
- Rate limit: Variable by tier
|
|
- Data returned: Classification (benign, malicious, unknown)
|
|
|
|
**Input Parameters:**
|
|
```python
|
|
{
|
|
"search_engines": ["shodan", "censys", "zoomeye", "github"],
|
|
"query_templates": {
|
|
"shodan": 'product:"ClawdBot" http.title:"ClawdBot"',
|
|
"censys": 'services.http.response.body: "ClawdBot"',
|
|
"github": '"ANTHROPIC_API_KEY" filename:.env'
|
|
},
|
|
"max_results_per_engine": 100,
|
|
"enable_rate_limiting": true
|
|
}
|
|
```
|
|
|
|
**Output Format:**
|
|
```json
|
|
{
|
|
"scan_id": "uuid-v4",
|
|
"timestamp": "ISO-8601",
|
|
"source": "shodan",
|
|
"results": [
|
|
{
|
|
"ip": "192.0.2.1",
|
|
"port": 8080,
|
|
"hostname": "example.com",
|
|
"service": "http",
|
|
"banner": "ClawdBot/1.2.3",
|
|
"vulnerability_indicators": [
|
|
"no_authentication",
|
|
"exposed_api_keys",
|
|
"shell_access_enabled"
|
|
],
|
|
"risk_score": 9.2,
|
|
"metadata": {}
|
|
}
|
|
],
|
|
"total_results": 42,
|
|
"query_executed": "product:ClawdBot"
|
|
}
|
|
```
|
|
|
|
#### 3.1.2 Search Query Templates
|
|
**Requirement ID:** FR-002
|
|
**Priority:** P0 (Critical)
|
|
|
|
**Description:**
|
|
Pre-built, optimized search queries for discovering vulnerable AI agent implementations.
|
|
|
|
**Query Categories:**
|
|
|
|
**Category 1: Exposed AI Agent Dashboards**
|
|
```python
|
|
QUERY_TEMPLATES = {
|
|
"clawdbot_instances": {
|
|
"shodan": [
|
|
'http.title:"ClawdBot Dashboard"',
|
|
'http.html:"ClawdBot" port:3000',
|
|
'product:"ClawdBot"'
|
|
],
|
|
"censys": [
|
|
'services.http.response.html_title: "ClawdBot"',
|
|
'services.http.response.body: "anthropic"'
|
|
]
|
|
},
|
|
|
|
"autogpt_instances": {
|
|
"shodan": [
|
|
'http.title:"Auto-GPT"',
|
|
'http.html:"autogpt" port:8000'
|
|
]
|
|
},
|
|
|
|
"langchain_agents": {
|
|
"shodan": [
|
|
'http.html:"langchain" http.html:"agent"',
|
|
'product:"LangChain"'
|
|
]
|
|
},
|
|
|
|
"openai_playground_exposed": {
|
|
"shodan": [
|
|
'http.title:"OpenAI Playground"',
|
|
'http.html:"sk-" http.html:"openai"'
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
**Category 2: Exposed Configuration Files**
|
|
```python
|
|
CONFIG_EXPOSURE_QUERIES = {
|
|
"github": [
|
|
'"ANTHROPIC_API_KEY" extension:env',
|
|
'"OPENAI_API_KEY" extension:json',
|
|
'"claude-" AND "sk-ant-" filename:.env',
|
|
'path:.env "API_KEY"',
|
|
'filename:config.json "anthropic"'
|
|
],
|
|
|
|
"google_dorks": [
|
|
'inurl:.env "ANTHROPIC_API_KEY"',
|
|
'filetype:json "openai_api_key"',
|
|
'intitle:"index of" .env'
|
|
]
|
|
}
|
|
```
|
|
|
|
**Category 3: Vulnerable Service Patterns**
|
|
```python
|
|
VULNERABILITY_PATTERNS = {
|
|
"no_authentication": {
|
|
"shodan": 'http.status:200 "Authorization" -http.headers:"www-authenticate"',
|
|
"censys": 'services.http.response.headers.www_authenticate: ""'
|
|
},
|
|
|
|
"exposed_api_endpoints": {
|
|
"shodan": [
|
|
'http.html:"/api/messages"',
|
|
'http.html:"/api/execute"',
|
|
'http.html:"/shell"'
|
|
]
|
|
},
|
|
|
|
"debug_mode_enabled": {
|
|
"shodan": [
|
|
'http.html:"DEBUG=True"',
|
|
'http.html:"development mode"'
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
**Query Customization:**
|
|
Users should be able to:
|
|
- Add custom queries via configuration file
|
|
- Combine multiple query patterns (AND/OR logic)
|
|
- Use regex patterns for flexible matching
|
|
- Save favorite queries for reuse
|
|
- Import/export query sets
|
|
|
|
#### 3.1.3 Vulnerability Assessment Engine
|
|
**Requirement ID:** FR-003
|
|
**Priority:** P0 (Critical)
|
|
|
|
**Description:**
|
|
Analyze discovered instances for specific vulnerability indicators and assign risk scores.
|
|
|
|
**Vulnerability Checks:**
|
|
|
|
```python
|
|
VULNERABILITY_CHECKS = {
|
|
"authentication": {
|
|
"check_no_auth": {
|
|
"method": "http_request",
|
|
"endpoint": "/",
|
|
"expected_status": [401, 403],
|
|
"vulnerability_if": [200],
|
|
"severity": "CRITICAL",
|
|
"cvss_score": 9.1,
|
|
"description": "No authentication required for access"
|
|
},
|
|
|
|
"check_default_credentials": {
|
|
"method": "credential_test",
|
|
"credentials": [
|
|
{"username": "admin", "password": "admin"},
|
|
{"username": "root", "password": "root"}
|
|
],
|
|
"severity": "HIGH",
|
|
"cvss_score": 8.5
|
|
}
|
|
},
|
|
|
|
"information_disclosure": {
|
|
"check_exposed_env": {
|
|
"method": "path_check",
|
|
"paths": ["/.env", "/config.json", "/.git/config"],
|
|
"severity": "CRITICAL",
|
|
"cvss_score": 9.8,
|
|
"description": "Environment files publicly accessible"
|
|
},
|
|
|
|
"check_api_key_exposure": {
|
|
"method": "regex_scan",
|
|
"patterns": [
|
|
r"sk-ant-[a-zA-Z0-9-_]{95}", # Anthropic API key
|
|
r"sk-[a-zA-Z0-9]{48}", # OpenAI API key
|
|
r"AKIA[0-9A-Z]{16}" # AWS Access Key
|
|
],
|
|
"severity": "CRITICAL",
|
|
"cvss_score": 10.0
|
|
}
|
|
},
|
|
|
|
"dangerous_functionality": {
|
|
"check_shell_access": {
|
|
"method": "endpoint_check",
|
|
"endpoints": ["/shell", "/exec", "/api/execute"],
|
|
"severity": "CRITICAL",
|
|
"cvss_score": 9.9,
|
|
"description": "Shell command execution accessible"
|
|
},
|
|
|
|
"check_file_upload": {
|
|
"method": "endpoint_check",
|
|
"endpoints": ["/upload", "/api/files"],
|
|
"test_upload": false, # Passive only
|
|
"severity": "HIGH",
|
|
"cvss_score": 7.8
|
|
}
|
|
},
|
|
|
|
"network_exposure": {
|
|
"check_public_ip": {
|
|
"method": "network_check",
|
|
"verify": "public_internet_accessible",
|
|
"severity": "MEDIUM",
|
|
"cvss_score": 6.5
|
|
},
|
|
|
|
"check_cloudflare": {
|
|
"method": "header_check",
|
|
"headers": ["cf-ray", "server"],
|
|
"protected_if": "cloudflare_present",
|
|
"severity": "INFO"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
**Risk Scoring Algorithm:**
|
|
```python
|
|
def calculate_risk_score(vulnerabilities):
|
|
"""
|
|
Calculate overall risk score (0-10) based on discovered vulnerabilities.
|
|
|
|
Formula:
|
|
- Base score: Highest CVSS score found
|
|
- Multiplier: 1.0 + (0.1 * number_of_critical_vulns)
|
|
- Cap at 10.0
|
|
"""
|
|
base_score = max([v['cvss_score'] for v in vulnerabilities])
|
|
critical_count = len([v for v in vulnerabilities if v['severity'] == 'CRITICAL'])
|
|
|
|
risk_score = min(base_score * (1.0 + (0.1 * critical_count)), 10.0)
|
|
|
|
return {
|
|
"overall_score": round(risk_score, 1),
|
|
"severity_breakdown": {
|
|
"critical": len([v for v in vulnerabilities if v['severity'] == 'CRITICAL']),
|
|
"high": len([v for v in vulnerabilities if v['severity'] == 'HIGH']),
|
|
"medium": len([v for v in vulnerabilities if v['severity'] == 'MEDIUM']),
|
|
"low": len([v for v in vulnerabilities if v['severity'] == 'LOW'])
|
|
},
|
|
"exploitability": "HIGH" if critical_count >= 2 else "MEDIUM"
|
|
}
|
|
```
|
|
|
|
#### 3.1.4 Data Enrichment
|
|
**Requirement ID:** FR-004
|
|
**Priority:** P1 (High)
|
|
|
|
**Description:**
|
|
Enrich discovered instances with additional context and threat intelligence.
|
|
|
|
**Enrichment Sources:**
|
|
|
|
1. **WHOIS Lookup**
|
|
- Domain registration details
|
|
- Registrant information (if public)
|
|
- Creation/expiration dates
|
|
- Nameservers
|
|
|
|
2. **Geolocation**
|
|
- Country, city, region
|
|
- ISP/hosting provider
|
|
- ASN (Autonomous System Number)
|
|
- Coordinates
|
|
|
|
3. **SSL/TLS Certificate Analysis**
|
|
- Certificate validity
|
|
- Issuer information
|
|
- Subject Alternative Names (SANs)
|
|
- Expiration date
|
|
- Self-signed detection
|
|
|
|
4. **DNS Records**
|
|
- A, AAAA, MX, TXT records
|
|
- Subdomain enumeration (passive)
|
|
- DNS security (DNSSEC, CAA)
|
|
|
|
5. **Historical Data**
|
|
- First seen timestamp
|
|
- Changes over time
|
|
- Archive.org Wayback Machine lookups
|
|
|
|
6. **Threat Intelligence Integration**
|
|
- GreyNoise classification
|
|
- AbuseIPDB reputation score
|
|
- Blocklist presence
|
|
- Known malicious infrastructure
|
|
|
|
**Enrichment Output:**
|
|
```json
|
|
{
|
|
"target": "192.0.2.1",
|
|
"enrichment": {
|
|
"whois": {
|
|
"domain": "example.com",
|
|
"registrar": "GoDaddy",
|
|
"creation_date": "2020-01-15",
|
|
"registrant_country": "US"
|
|
},
|
|
"geolocation": {
|
|
"country": "United States",
|
|
"city": "San Francisco",
|
|
"isp": "DigitalOcean",
|
|
"asn": "AS14061"
|
|
},
|
|
"ssl_certificate": {
|
|
"valid": true,
|
|
"issuer": "Let's Encrypt",
|
|
"expiration": "2026-05-08",
|
|
"self_signed": false,
|
|
"sans": ["example.com", "www.example.com"]
|
|
},
|
|
"threat_intelligence": {
|
|
"greynoise": {
|
|
"classification": "benign",
|
|
"last_seen": "2026-02-01"
|
|
},
|
|
"abuseipdb": {
|
|
"abuse_confidence_score": 0,
|
|
"total_reports": 0
|
|
}
|
|
},
|
|
"historical": {
|
|
"first_seen": "2025-12-01",
|
|
"changes_detected": 3,
|
|
"archive_url": "https://web.archive.org/..."
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
#### 3.1.5 Reporting & Export
|
|
**Requirement ID:** FR-005
|
|
**Priority:** P0 (Critical)
|
|
|
|
**Description:**
|
|
Generate comprehensive reports in multiple formats for different audiences.
|
|
|
|
**Report Formats:**
|
|
|
|
1. **JSON (Machine-readable)**
|
|
```json
|
|
{
|
|
"scan_metadata": {
|
|
"scan_id": "uuid",
|
|
"timestamp": "ISO-8601",
|
|
"duration_seconds": 45.3,
|
|
"engines_used": ["shodan", "censys"],
|
|
"total_results": 127
|
|
},
|
|
"summary": {
|
|
"critical_findings": 12,
|
|
"high_findings": 34,
|
|
"medium_findings": 56,
|
|
"low_findings": 25
|
|
},
|
|
"findings": [...]
|
|
}
|
|
```
|
|
|
|
2. **CSV (Spreadsheet-friendly)**
|
|
```
|
|
IP,Port,Hostname,Service,Risk Score,Vulnerabilities,First Seen,Location
|
|
192.0.2.1,8080,example.com,http,9.2,"no_auth,api_keys",2026-02-08,US
|
|
```
|
|
|
|
3. **HTML (Interactive Dashboard)**
|
|
- Executive summary with charts
|
|
- Filterable/sortable table of findings
|
|
- Vulnerability distribution pie chart
|
|
- Geographic heatmap
|
|
- Timeline of discoveries
|
|
- Clickable links to detailed findings
|
|
|
|
4. **PDF (Executive Report)**
|
|
- Professional formatting
|
|
- Executive summary (1 page)
|
|
- Detailed findings with screenshots
|
|
- Remediation recommendations
|
|
- Appendices with technical details
|
|
|
|
5. **Markdown (Documentation)**
|
|
- GitHub-compatible formatting
|
|
- Suitable for issue tracking
|
|
- Easy to version control
|
|
|
|
**Report Sections:**
|
|
|
|
```markdown
|
|
# AI Agent Security Reconnaissance Report
|
|
|
|
## Executive Summary
|
|
- Total instances discovered: 127
|
|
- Critical vulnerabilities: 12
|
|
- Affected organizations: 8
|
|
- Average risk score: 7.3/10
|
|
|
|
## Key Findings
|
|
1. [CRITICAL] 12 ClawdBot instances with no authentication
|
|
2. [HIGH] 34 instances exposing API keys in source code
|
|
3. [MEDIUM] 56 instances with debug mode enabled
|
|
|
|
## Detailed Analysis
|
|
### Finding #1: Unauthenticated ClawdBot Dashboard
|
|
- **Target:** 192.0.2.1:8080
|
|
- **Risk Score:** 9.2/10
|
|
- **Vulnerabilities:**
|
|
- No authentication required
|
|
- Shell access enabled
|
|
- API keys visible in JavaScript
|
|
- **Evidence:** [Screenshot/URL]
|
|
- **Remediation:** Implement authentication, remove shell access
|
|
|
|
## Geographic Distribution
|
|
[Map showing affected countries]
|
|
|
|
## Remediation Roadmap
|
|
1. Immediate: Disable unauthenticated instances
|
|
2. Short-term: Rotate exposed API keys
|
|
3. Long-term: Implement security hardening guide
|
|
|
|
## Appendix
|
|
- A: Full query list
|
|
- B: CVSS score methodology
|
|
- C: Responsible disclosure template
|
|
```
|
|
|
|
#### 3.1.6 Notification & Alerting
|
|
**Requirement ID:** FR-006
|
|
**Priority:** P2 (Medium)
|
|
|
|
**Description:**
|
|
Alert users when new vulnerable instances are discovered or critical findings emerge.
|
|
|
|
**Alert Channels:**
|
|
- Email (SMTP)
|
|
- Slack webhook
|
|
- Discord webhook
|
|
- Telegram bot
|
|
- SMS (Twilio integration)
|
|
- PagerDuty (for critical findings)
|
|
|
|
**Alert Triggers:**
|
|
```python
|
|
ALERT_RULES = {
|
|
"new_critical_finding": {
|
|
"condition": "risk_score >= 9.0",
|
|
"throttle": "15_minutes",
|
|
"channels": ["email", "slack"],
|
|
"template": "critical_finding_alert"
|
|
},
|
|
|
|
"api_key_exposure": {
|
|
"condition": "api_key_pattern_matched",
|
|
"throttle": "immediate",
|
|
"channels": ["email", "slack", "pagerduty"],
|
|
"template": "api_key_leak_alert"
|
|
},
|
|
|
|
"bulk_discovery": {
|
|
"condition": "new_findings >= 10",
|
|
"throttle": "1_hour",
|
|
"channels": ["email"],
|
|
"template": "bulk_discovery_summary"
|
|
}
|
|
}
|
|
```
|
|
|
|
**Alert Template Example:**
|
|
```
|
|
🚨 CRITICAL: Exposed AI Agent Instance Detected
|
|
|
|
Target: 192.0.2.1:8080
|
|
Service: ClawdBot Dashboard
|
|
Risk Score: 9.2/10
|
|
|
|
Vulnerabilities:
|
|
- No authentication required
|
|
- Shell command execution enabled
|
|
- Anthropic API key exposed (sk-ant-...)
|
|
|
|
Action Required:
|
|
1. Verify if this is your infrastructure
|
|
2. If yes: Immediately disable public access
|
|
3. If no: Report via responsible disclosure
|
|
|
|
View full details: [URL to report]
|
|
```
|
|
|
|
---
|
|
|
|
### 3.2 Advanced Features
|
|
|
|
#### 3.2.1 Continuous Monitoring
|
|
**Requirement ID:** FR-007
|
|
**Priority:** P2 (Medium)
|
|
|
|
**Description:**
|
|
Scheduled scans to monitor for new vulnerable instances over time.
|
|
|
|
**Features:**
|
|
- Cron-like scheduling (hourly, daily, weekly)
|
|
- Delta reporting (show only new findings)
|
|
- Historical trending
|
|
- Automated baseline creation
|
|
- Change detection alerts
|
|
|
|
**Configuration:**
|
|
```yaml
|
|
monitoring:
|
|
enabled: true
|
|
schedule: "0 */6 * * *" # Every 6 hours
|
|
retention_days: 90
|
|
delta_only: true
|
|
alert_on_new: true
|
|
|
|
targets:
|
|
- name: "ClawdBot Instances"
|
|
queries: ["clawdbot_instances"]
|
|
baseline_date: "2026-02-01"
|
|
```
|
|
|
|
#### 3.2.2 False Positive Filtering
|
|
**Requirement ID:** FR-008
|
|
**Priority:** P1 (High)
|
|
|
|
**Description:**
|
|
Machine learning-based or rule-based filtering to reduce false positives.
|
|
|
|
**Filtering Strategies:**
|
|
|
|
1. **Honeypot Detection**
|
|
- Identify intentional honeypots
|
|
- Check for deceptive characteristics
|
|
- Cross-reference with known honeypot databases
|
|
|
|
2. **Legitimate Development Environments**
|
|
- Detect localhost/internal IPs
|
|
- Identify staging/dev subdomain patterns
|
|
- Check for development framework banners
|
|
|
|
3. **Whitelisting**
|
|
- User-defined whitelist (IP ranges, domains)
|
|
- Trusted organization list
|
|
- Known security research infrastructure
|
|
|
|
4. **Confidence Scoring**
|
|
```python
|
|
def calculate_confidence(finding):
|
|
confidence = 100
|
|
|
|
# Deduct points for uncertainty indicators
|
|
if finding.get('honeypot_indicators'):
|
|
confidence -= 30
|
|
if finding.get('inconclusive_response'):
|
|
confidence -= 20
|
|
if not finding.get('service_confirmed'):
|
|
confidence -= 15
|
|
|
|
return max(confidence, 0)
|
|
```
|
|
|
|
#### 3.2.3 Integration with Vulnerability Databases
|
|
**Requirement ID:** FR-009
|
|
**Priority:** P2 (Medium)
|
|
|
|
**Description:**
|
|
Cross-reference findings with known vulnerabilities and exploits.
|
|
|
|
**Integrations:**
|
|
- CVE/NVD database
|
|
- ExploitDB
|
|
- Metasploit modules
|
|
- GitHub security advisories
|
|
- Vendor-specific security bulletins
|
|
|
|
**Output Enhancement:**
|
|
```json
|
|
{
|
|
"finding": {
|
|
"service": "ClawdBot 1.2.3",
|
|
"related_cves": [
|
|
{
|
|
"cve_id": "CVE-2024-XXXXX",
|
|
"description": "Authentication bypass in ClawdBot < 1.3.0",
|
|
"cvss_score": 9.8,
|
|
"exploit_available": true,
|
|
"exploit_url": "https://exploit-db.com/..."
|
|
}
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
#### 3.2.4 Responsible Disclosure Workflow
|
|
**Requirement ID:** FR-010
|
|
**Priority:** P1 (High)
|
|
|
|
**Description:**
|
|
Built-in workflow for responsible disclosure of findings.
|
|
|
|
**Workflow Steps:**
|
|
|
|
1. **Identify Contact**
|
|
- Extract abuse@, security@ email from WHOIS
|
|
- Check for security.txt file
|
|
- Lookup on HackerOne/Bugcrowd
|
|
- Search for CERT/PSIRT contacts
|
|
|
|
2. **Generate Disclosure Report**
|
|
```markdown
|
|
Subject: Security Issue: Exposed AI Agent Instance
|
|
|
|
Dear Security Team,
|
|
|
|
I have discovered a potentially vulnerable AI agent deployment:
|
|
|
|
**Affected System:**
|
|
- URL: https://example.com:8080
|
|
- IP: 192.0.2.1
|
|
- Discovered: 2026-02-08
|
|
|
|
**Vulnerability Summary:**
|
|
- No authentication required for dashboard access
|
|
- Potential exposure of API credentials
|
|
- Shell command execution interface accessible
|
|
|
|
**Risk Assessment:**
|
|
- Severity: Critical (CVSS 9.2)
|
|
- Exploitability: High
|
|
- Impact: Confidentiality, Integrity, Availability
|
|
|
|
**Recommended Actions:**
|
|
1. Implement authentication immediately
|
|
2. Audit for credential exposure
|
|
3. Review access logs for unauthorized access
|
|
|
|
I am available to provide additional details if needed.
|
|
|
|
This disclosure follows a 90-day timeline.
|
|
|
|
Best regards,
|
|
[Your Name]
|
|
Security Researcher
|
|
```
|
|
|
|
3. **Track Disclosure**
|
|
- Status tracking (reported, acknowledged, fixed, disclosed)
|
|
- Countdown timer for public disclosure
|
|
- Communication log
|
|
- PGP encryption support for sensitive details
|
|
|
|
4. **Public Disclosure**
|
|
- Anonymize or redact sensitive details
|
|
- Publish to blog/database
|
|
- Submit to CVE if applicable
|
|
- Credit researchers appropriately
|
|
|
|
---
|
|
|
|
## 4. NON-FUNCTIONAL REQUIREMENTS
|
|
|
|
### 4.1 Performance
|
|
**Requirement ID:** NFR-001
|
|
**Priority:** P0 (Critical)
|
|
|
|
- Query execution: < 30 seconds per search engine
|
|
- Vulnerability assessment: < 5 seconds per target
|
|
- Report generation: < 10 seconds for 1000 findings
|
|
- Memory usage: < 500MB for typical scans
|
|
- Concurrent requests: Support 10+ parallel API calls
|
|
- Cache frequently accessed data (TTL: 1 hour)
|
|
|
|
### 4.2 Scalability
|
|
**Requirement ID:** NFR-002
|
|
**Priority:** P1 (High)
|
|
|
|
- Handle 10,000+ results per scan
|
|
- Support distributed scanning (future: multi-node)
|
|
- Queue-based architecture for large jobs
|
|
- Pagination for API responses
|
|
- Database storage for historical data (SQLite → PostgreSQL)
|
|
|
|
### 4.3 Reliability
|
|
**Requirement ID:** NFR-003
|
|
**Priority:** P0 (Critical)
|
|
|
|
- Graceful handling of API failures
|
|
- Retry logic with exponential backoff (3 attempts)
|
|
- Timeout handling (30s per request)
|
|
- Partial results on engine failure
|
|
- State persistence (resume interrupted scans)
|
|
- Error logging with stack traces
|
|
|
|
### 4.4 Security
|
|
**Requirement ID:** NFR-004
|
|
**Priority:** P0 (Critical)
|
|
|
|
- API keys stored encrypted (AES-256)
|
|
- No credential storage in logs
|
|
- HTTPS-only for API communications
|
|
- Input validation on all user inputs
|
|
- Rate limiting enforcement
|
|
- Audit logging of all scans
|
|
- Secrets management via environment variables or vault
|
|
|
|
### 4.5 Usability
|
|
**Requirement ID:** NFR-005
|
|
**Priority:** P1 (High)
|
|
|
|
- CLI interface with intuitive commands
|
|
- Progress indicators for long-running tasks
|
|
- Colored output for severity levels
|
|
- Interactive mode for configuration
|
|
- Help documentation built-in
|
|
- Example queries provided
|
|
- Wizard for first-time setup
|
|
|
|
### 4.6 Maintainability
|
|
**Requirement ID:** NFR-006
|
|
**Priority:** P1 (High)
|
|
|
|
- Modular architecture (plugins for search engines)
|
|
- Comprehensive inline documentation
|
|
- Type hints for all functions
|
|
- Unit test coverage > 80%
|
|
- Integration tests for critical paths
|
|
- Configuration via YAML/JSON
|
|
- Logging at appropriate levels (DEBUG, INFO, WARNING, ERROR)
|
|
|
|
### 4.7 Portability
|
|
**Requirement ID:** NFR-007
|
|
**Priority:** P2 (Medium)
|
|
|
|
- Cross-platform (Linux, macOS, Windows)
|
|
- Python 3.9+ compatibility
|
|
- Docker containerization
|
|
- Minimal external dependencies
|
|
- Standalone binary distribution (PyInstaller)
|
|
|
|
---
|
|
|
|
## 5. TECHNICAL ARCHITECTURE
|
|
|
|
### 5.1 System Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ CLI Interface │
|
|
│ (argparse + rich/click) │
|
|
└────────────────────────┬────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Core Engine │
|
|
│ - Query Manager │
|
|
│ - Result Aggregator │
|
|
│ - Vulnerability Assessor │
|
|
└────────────────────────┬────────────────────────────────────┘
|
|
│
|
|
┌───────────────┼───────────────┬────────────────┐
|
|
▼ ▼ ▼ ▼
|
|
┌─────────────┐ ┌─────────────┐ ┌──────────┐ ┌──────────────┐
|
|
│ Shodan │ │ Censys │ │ GitHub │ │ ZoomEye │
|
|
│ Module │ │ Module │ │ Module │ │ Module │
|
|
└─────────────┘ └─────────────┘ └──────────┘ └──────────────┘
|
|
│ │ │ │
|
|
└───────────────┴───────────────┴────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Data Enrichment Layer │
|
|
│ - WHOIS | Geolocation | SSL | DNS | Threat Intel │
|
|
└────────────────────────┬────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Storage Layer │
|
|
│ - SQLite (local) / PostgreSQL (production) │
|
|
│ - File system (JSON/CSV exports) │
|
|
└────────────────────────┬────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Reporting Engine │
|
|
│ - JSON | CSV | HTML | PDF | Markdown │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### 5.2 Data Flow
|
|
|
|
```
|
|
User Input → Query Builder → API Clients → Raw Results →
|
|
Deduplication → Vulnerability Assessment → Enrichment →
|
|
Filtering → Risk Scoring → Storage → Report Generation →
|
|
Output (File/Screen/Alert)
|
|
```
|
|
|
|
### 5.3 Database Schema
|
|
|
|
```sql
|
|
-- SQLite/PostgreSQL Schema
|
|
|
|
CREATE TABLE scans (
|
|
scan_id TEXT PRIMARY KEY,
|
|
timestamp DATETIME NOT NULL,
|
|
engines_used TEXT, -- JSON array
|
|
total_results INTEGER,
|
|
duration_seconds REAL,
|
|
status TEXT, -- 'completed', 'failed', 'partial'
|
|
metadata TEXT -- JSON
|
|
);
|
|
|
|
CREATE TABLE findings (
|
|
finding_id TEXT PRIMARY KEY,
|
|
scan_id TEXT REFERENCES scans(scan_id),
|
|
source_engine TEXT,
|
|
target_ip TEXT,
|
|
target_port INTEGER,
|
|
target_hostname TEXT,
|
|
service TEXT,
|
|
risk_score REAL,
|
|
vulnerabilities TEXT, -- JSON array
|
|
first_seen DATETIME,
|
|
last_seen DATETIME,
|
|
status TEXT, -- 'new', 'confirmed', 'false_positive', 'remediated'
|
|
metadata TEXT -- JSON
|
|
);
|
|
|
|
CREATE TABLE enrichment_data (
|
|
finding_id TEXT REFERENCES findings(finding_id),
|
|
whois_data TEXT, -- JSON
|
|
geolocation TEXT, -- JSON
|
|
ssl_cert TEXT, -- JSON
|
|
threat_intel TEXT, -- JSON
|
|
timestamp DATETIME
|
|
);
|
|
|
|
CREATE TABLE alerts (
|
|
alert_id TEXT PRIMARY KEY,
|
|
finding_id TEXT REFERENCES findings(finding_id),
|
|
severity TEXT,
|
|
channel TEXT,
|
|
sent_at DATETIME,
|
|
acknowledged BOOLEAN DEFAULT FALSE
|
|
);
|
|
|
|
CREATE INDEX idx_findings_risk ON findings(risk_score DESC);
|
|
CREATE INDEX idx_findings_timestamp ON findings(first_seen DESC);
|
|
```
|
|
|
|
### 5.4 Configuration Management
|
|
|
|
**Configuration File: `config.yaml`**
|
|
|
|
```yaml
|
|
# API Credentials
|
|
api_keys:
|
|
shodan: ${SHODAN_API_KEY} # Environment variable
|
|
censys:
|
|
api_id: ${CENSYS_API_ID}
|
|
secret: ${CENSYS_SECRET}
|
|
zoomeye: ${ZOOMEYE_API_KEY}
|
|
github: ${GITHUB_TOKEN}
|
|
greynoise: ${GREYNOISE_API_KEY}
|
|
|
|
# Search Engine Configuration
|
|
engines:
|
|
shodan:
|
|
enabled: true
|
|
rate_limit: 1 # queries per second
|
|
max_results: 100
|
|
timeout: 30
|
|
|
|
censys:
|
|
enabled: true
|
|
rate_limit: 0.4 # 120 per 5 min = 0.4/sec
|
|
max_results: 100
|
|
timeout: 30
|
|
|
|
github:
|
|
enabled: true
|
|
rate_limit: 0.5 # 30 per min
|
|
max_results: 50
|
|
timeout: 30
|
|
|
|
# Vulnerability Assessment
|
|
vulnerability_checks:
|
|
enabled: true
|
|
passive_only: true
|
|
timeout_per_check: 10
|
|
max_concurrent: 5
|
|
|
|
# Enrichment
|
|
enrichment:
|
|
enabled: true
|
|
sources:
|
|
- whois
|
|
- geolocation
|
|
- ssl_certificate
|
|
- threat_intel
|
|
cache_ttl: 3600 # 1 hour
|
|
|
|
# Reporting
|
|
reporting:
|
|
formats:
|
|
- json
|
|
- csv
|
|
- html
|
|
output_dir: "./reports"
|
|
include_screenshots: false
|
|
anonymize_by_default: false
|
|
|
|
# Alerting
|
|
alerts:
|
|
enabled: false
|
|
channels:
|
|
email:
|
|
smtp_server: "smtp.gmail.com"
|
|
smtp_port: 587
|
|
from_address: "alerts@example.com"
|
|
to_addresses:
|
|
- "security@example.com"
|
|
slack:
|
|
webhook_url: ${SLACK_WEBHOOK_URL}
|
|
|
|
# Monitoring
|
|
monitoring:
|
|
enabled: false
|
|
schedule: "0 */12 * * *"
|
|
retention_days: 90
|
|
|
|
# Filtering
|
|
filtering:
|
|
whitelist_ips: []
|
|
whitelist_domains: []
|
|
min_confidence_score: 70
|
|
exclude_honeypots: true
|
|
|
|
# Logging
|
|
logging:
|
|
level: "INFO" # DEBUG, INFO, WARNING, ERROR
|
|
file: "./logs/scanner.log"
|
|
max_size_mb: 100
|
|
backup_count: 5
|
|
|
|
# Database
|
|
database:
|
|
type: "sqlite" # sqlite or postgresql
|
|
sqlite:
|
|
path: "./data/scanner.db"
|
|
postgresql:
|
|
host: "localhost"
|
|
port: 5432
|
|
database: "aasrt"
|
|
user: ${DB_USER}
|
|
password: ${DB_PASSWORD}
|
|
```
|
|
|
|
---
|
|
|
|
## 6. PYTHON IMPLEMENTATION SPECIFICATIONS
|
|
|
|
### 6.1 Project Structure
|
|
|
|
```
|
|
ai-agent-security-scanner/
|
|
├── src/
|
|
│ ├── __init__.py
|
|
│ ├── main.py # CLI entry point
|
|
│ ├── core/
|
|
│ │ ├── __init__.py
|
|
│ │ ├── query_manager.py # Query building & execution
|
|
│ │ ├── result_aggregator.py # Result deduplication & merging
|
|
│ │ ├── vulnerability_assessor.py # Vulnerability checks
|
|
│ │ └── risk_scorer.py # Risk calculation
|
|
│ ├── engines/
|
|
│ │ ├── __init__.py
|
|
│ │ ├── base.py # Abstract base class
|
|
│ │ ├── shodan_engine.py
|
|
│ │ ├── censys_engine.py
|
|
│ │ ├── zoomeye_engine.py
|
|
│ │ ├── github_engine.py
|
|
│ │ └── google_engine.py
|
|
│ ├── enrichment/
|
|
│ │ ├── __init__.py
|
|
│ │ ├── whois.py
|
|
│ │ ├── geolocation.py
|
|
│ │ ├── ssl_analyzer.py
|
|
│ │ ├── dns.py
|
|
│ │ └── threat_intel.py
|
|
│ ├── reporting/
|
|
│ │ ├── __init__.py
|
|
│ │ ├── json_reporter.py
|
|
│ │ ├── csv_reporter.py
|
|
│ │ ├── html_reporter.py
|
|
│ │ ├── pdf_reporter.py
|
|
│ │ └── markdown_reporter.py
|
|
│ ├── storage/
|
|
│ │ ├── __init__.py
|
|
│ │ ├── database.py # Database abstraction
|
|
│ │ └── cache.py # Caching layer
|
|
│ ├── alerts/
|
|
│ │ ├── __init__.py
|
|
│ │ ├── email_notifier.py
|
|
│ │ ├── slack_notifier.py
|
|
│ │ └── webhook_notifier.py
|
|
│ └── utils/
|
|
│ ├── __init__.py
|
|
│ ├── config.py # Configuration loader
|
|
│ ├── logger.py # Logging setup
|
|
│ ├── validators.py # Input validation
|
|
│ └── crypto.py # Encryption helpers
|
|
├── tests/
|
|
│ ├── __init__.py
|
|
│ ├── test_engines/
|
|
│ ├── test_vulnerability_assessor.py
|
|
│ ├── test_enrichment.py
|
|
│ └── test_reporting.py
|
|
├── queries/
|
|
│ ├── clawdbot.yaml # ClawdBot query templates
|
|
│ ├── autogpt.yaml
|
|
│ └── custom.yaml.example
|
|
├── reports/ # Generated reports (gitignored)
|
|
├── logs/ # Log files (gitignored)
|
|
├── data/ # Database files (gitignored)
|
|
├── config.yaml.example # Example configuration
|
|
├── requirements.txt # Python dependencies
|
|
├── Dockerfile # Docker image
|
|
├── docker-compose.yml # Docker Compose setup
|
|
├── README.md # Project documentation
|
|
├── LICENSE # GPL-3.0 or MIT
|
|
└── .env.example # Environment variables example
|
|
```
|
|
|
|
### 6.2 Required Python Libraries
|
|
|
|
```txt
|
|
# requirements.txt
|
|
|
|
# API Clients
|
|
shodan==1.31.0
|
|
censys==2.2.12
|
|
requests==2.31.0
|
|
|
|
# Data Processing
|
|
pandas==2.2.0
|
|
python-whois==0.9.2
|
|
dnspython==2.6.1
|
|
pyOpenSSL==24.0.0
|
|
cryptography==42.0.5
|
|
|
|
# Database
|
|
sqlalchemy==2.0.28
|
|
psycopg2-binary==2.9.9 # PostgreSQL adapter
|
|
|
|
# Reporting
|
|
jinja2==3.1.3 # HTML templates
|
|
matplotlib==3.8.3 # Charts
|
|
reportlab==4.1.0 # PDF generation
|
|
markdown==3.5.2
|
|
|
|
# CLI & UI
|
|
click==8.1.7 # CLI framework
|
|
rich==13.7.0 # Rich text formatting
|
|
tqdm==4.66.2 # Progress bars
|
|
|
|
# Utilities
|
|
pyyaml==6.0.1
|
|
python-dotenv==1.0.1
|
|
pydantic==2.6.3 # Data validation
|
|
validators==0.22.0
|
|
|
|
# Networking
|
|
ipaddress==1.0.23
|
|
geoip2==4.7.0
|
|
python-nmap==0.7.1 # Optional
|
|
|
|
# Security
|
|
keyring==25.0.0 # Secure credential storage
|
|
bcrypt==4.1.2
|
|
|
|
# Testing
|
|
pytest==8.0.2
|
|
pytest-cov==4.1.0
|
|
pytest-mock==3.12.0
|
|
|
|
# Async (optional for performance)
|
|
aiohttp==3.9.3
|
|
asyncio==3.4.3
|
|
```
|
|
|
|
### 6.3 Core Classes & Interfaces
|
|
|
|
**Abstract Base Class for Search Engines:**
|
|
|
|
```python
|
|
# engines/base.py
|
|
|
|
from abc import ABC, abstractmethod
|
|
from typing import List, Dict, Any
|
|
from dataclasses import dataclass
|
|
|
|
@dataclass
|
|
class SearchResult:
|
|
ip: str
|
|
port: int
|
|
hostname: str = None
|
|
service: str = None
|
|
banner: str = None
|
|
vulnerabilities: List[str] = None
|
|
metadata: Dict[str, Any] = None
|
|
source_engine: str = None
|
|
|
|
class BaseSearchEngine(ABC):
|
|
"""Abstract base class for all search engine integrations."""
|
|
|
|
def __init__(self, api_key: str, rate_limit: float, timeout: int):
|
|
self.api_key = api_key
|
|
self.rate_limit = rate_limit # queries per second
|
|
self.timeout = timeout
|
|
|
|
@abstractmethod
|
|
def search(self, query: str, max_results: int = 100) -> List[SearchResult]:
|
|
"""
|
|
Execute a search query and return results.
|
|
|
|
Args:
|
|
query: Search query string
|
|
max_results: Maximum number of results to return
|
|
|
|
Returns:
|
|
List of SearchResult objects
|
|
|
|
Raises:
|
|
APIException: If API call fails
|
|
RateLimitException: If rate limit exceeded
|
|
"""
|
|
pass
|
|
|
|
@abstractmethod
|
|
def validate_credentials(self) -> bool:
|
|
"""Validate API credentials."""
|
|
pass
|
|
|
|
def _rate_limit_wait(self):
|
|
"""Implement rate limiting."""
|
|
import time
|
|
time.sleep(1.0 / self.rate_limit)
|
|
```
|
|
|
|
**Vulnerability Assessor Interface:**
|
|
|
|
```python
|
|
# core/vulnerability_assessor.py
|
|
|
|
from typing import List, Dict
|
|
from dataclasses import dataclass
|
|
|
|
@dataclass
|
|
class Vulnerability:
|
|
check_name: str
|
|
severity: str # CRITICAL, HIGH, MEDIUM, LOW, INFO
|
|
cvss_score: float
|
|
description: str
|
|
evidence: Dict[str, Any] = None
|
|
remediation: str = None
|
|
|
|
class VulnerabilityAssessor:
|
|
"""Performs vulnerability checks on discovered targets."""
|
|
|
|
def __init__(self, config: Dict[str, Any]):
|
|
self.config = config
|
|
self.passive_only = config.get('passive_only', True)
|
|
|
|
def assess(self, target: SearchResult) -> List[Vulnerability]:
|
|
"""
|
|
Run all applicable vulnerability checks on a target.
|
|
|
|
Args:
|
|
target: SearchResult object to assess
|
|
|
|
Returns:
|
|
List of Vulnerability objects
|
|
"""
|
|
vulnerabilities = []
|
|
|
|
# Run checks
|
|
vulnerabilities.extend(self._check_authentication(target))
|
|
vulnerabilities.extend(self._check_information_disclosure(target))
|
|
vulnerabilities.extend(self._check_dangerous_functionality(target))
|
|
vulnerabilities.extend(self._check_network_exposure(target))
|
|
|
|
return vulnerabilities
|
|
|
|
def _check_authentication(self, target: SearchResult) -> List[Vulnerability]:
|
|
"""Check for authentication issues."""
|
|
# Implementation details
|
|
pass
|
|
|
|
def _check_information_disclosure(self, target: SearchResult) -> List[Vulnerability]:
|
|
"""Check for exposed sensitive information."""
|
|
# Implementation details
|
|
pass
|
|
```
|
|
|
|
**CLI Interface Specification:**
|
|
|
|
```python
|
|
# main.py
|
|
|
|
import click
|
|
from rich.console import Console
|
|
from rich.table import Table
|
|
|
|
@click.group()
|
|
@click.version_option(version='1.0.0')
|
|
def cli():
|
|
"""AI Agent Security Reconnaissance Tool (AASRT)"""
|
|
pass
|
|
|
|
@cli.command()
|
|
@click.option('--engine', '-e', multiple=True,
|
|
type=click.Choice(['shodan', 'censys', 'zoomeye', 'github', 'all']),
|
|
default=['all'], help='Search engines to use')
|
|
@click.option('--query', '-q', help='Custom search query')
|
|
@click.option('--template', '-t', help='Use predefined query template')
|
|
@click.option('--max-results', '-m', default=100, help='Max results per engine')
|
|
@click.option('--output', '-o', type=click.Path(), help='Output file path')
|
|
@click.option('--format', '-f',
|
|
type=click.Choice(['json', 'csv', 'html', 'pdf', 'markdown']),
|
|
default='json', help='Output format')
|
|
@click.option('--verbose', '-v', is_flag=True, help='Verbose output')
|
|
def scan(engine, query, template, max_results, output, format, verbose):
|
|
"""Perform a security reconnaissance scan."""
|
|
console = Console()
|
|
|
|
# Display legal disclaimer
|
|
if not click.confirm('Have you read and agreed to the terms of use?'):
|
|
console.print('[red]Scan aborted. You must agree to terms of use.[/red]')
|
|
return
|
|
|
|
# Execute scan
|
|
console.print('[green]Starting scan...[/green]')
|
|
# Implementation...
|
|
|
|
@cli.command()
|
|
@click.argument('scan_id')
|
|
def report(scan_id):
|
|
"""Generate a report from a previous scan."""
|
|
# Implementation...
|
|
|
|
@cli.command()
|
|
def configure():
|
|
"""Interactive configuration wizard."""
|
|
# Implementation...
|
|
|
|
@cli.command()
|
|
@click.option('--enable/--disable', default=True)
|
|
def monitor(enable):
|
|
"""Enable or disable continuous monitoring."""
|
|
# Implementation...
|
|
|
|
if __name__ == '__main__':
|
|
cli()
|
|
```
|
|
|
|
### 6.4 Error Handling Strategy
|
|
|
|
```python
|
|
# Custom Exceptions
|
|
|
|
class AASRTException(Exception):
|
|
"""Base exception for AASRT."""
|
|
pass
|
|
|
|
class APIException(AASRTException):
|
|
"""Raised when API call fails."""
|
|
pass
|
|
|
|
class RateLimitException(AASRTException):
|
|
"""Raised when rate limit is exceeded."""
|
|
pass
|
|
|
|
class ConfigurationException(AASRTException):
|
|
"""Raised when configuration is invalid."""
|
|
pass
|
|
|
|
class ValidationException(AASRTException):
|
|
"""Raised when input validation fails."""
|
|
pass
|
|
|
|
# Error Handling Pattern
|
|
|
|
def search_with_retry(func, max_retries=3):
|
|
"""Decorator for API calls with retry logic."""
|
|
import time
|
|
from functools import wraps
|
|
|
|
@wraps(func)
|
|
def wrapper(*args, **kwargs):
|
|
for attempt in range(max_retries):
|
|
try:
|
|
return func(*args, **kwargs)
|
|
except RateLimitException:
|
|
if attempt < max_retries - 1:
|
|
wait_time = 2 ** attempt # Exponential backoff
|
|
logger.warning(f"Rate limit hit. Retrying in {wait_time}s...")
|
|
time.sleep(wait_time)
|
|
else:
|
|
raise
|
|
except APIException as e:
|
|
logger.error(f"API error on attempt {attempt + 1}: {e}")
|
|
if attempt == max_retries - 1:
|
|
raise
|
|
|
|
return wrapper
|
|
```
|
|
|
|
---
|
|
|
|
## 7. SECURITY & COMPLIANCE
|
|
|
|
### 7.1 Ethical Use Policy
|
|
|
|
**CRITICAL REQUIREMENTS:**
|
|
|
|
1. **Passive Reconnaissance Only**
|
|
- No exploitation of discovered vulnerabilities
|
|
- No brute force attacks
|
|
- No unauthorized access attempts
|
|
- No modification of target systems
|
|
|
|
2. **Responsible Disclosure**
|
|
- Mandatory 90-day disclosure window
|
|
- Attempt to contact affected parties
|
|
- Anonymize sensitive details in public reports
|
|
- Coordinate with CERTs when appropriate
|
|
|
|
3. **Legal Compliance**
|
|
- CFAA compliance (US)
|
|
- GDPR compliance (EU)
|
|
- Computer Misuse Act compliance (UK)
|
|
- Local jurisdiction laws
|
|
|
|
4. **User Agreement**
|
|
- Explicit acceptance of terms before use
|
|
- Logging of user acknowledgment
|
|
- Clear documentation of permitted uses
|
|
|
|
### 7.2 Data Privacy
|
|
|
|
**Personal Data Handling:**
|
|
- Do not collect personal information beyond what's publicly indexed
|
|
- Anonymize IP addresses in shared reports (option to mask last octet)
|
|
- Do not store API responses containing PII indefinitely
|
|
- Provide data deletion capabilities
|
|
- GDPR right-to-be-forgotten support
|
|
|
|
**API Key Security:**
|
|
- Never log API keys
|
|
- Encrypt stored credentials (AES-256)
|
|
- Support environment variable injection
|
|
- Warn on insecure configurations
|
|
- Key rotation reminders
|
|
|
|
### 7.3 Audit Trail
|
|
|
|
**Required Logging:**
|
|
```python
|
|
# Audit log format
|
|
{
|
|
"timestamp": "ISO-8601",
|
|
"user": "username_or_id",
|
|
"action": "scan_initiated",
|
|
"engines": ["shodan", "censys"],
|
|
"query": "product:ClawdBot",
|
|
"results_count": 42,
|
|
"ip_address": "user_ip",
|
|
"acknowledged_terms": true
|
|
}
|
|
```
|
|
|
|
**Log Retention:**
|
|
- Audit logs: 1 year minimum
|
|
- Scan results: Configurable (default 90 days)
|
|
- Error logs: 30 days
|
|
|
|
---
|
|
|
|
## 8. DEPLOYMENT & OPERATIONS
|
|
|
|
### 8.1 Installation Methods
|
|
|
|
**Method 1: pip install (PyPI)**
|
|
```bash
|
|
pip install ai-agent-scanner
|
|
aasrt configure # Interactive setup
|
|
aasrt scan --template clawdbot
|
|
```
|
|
|
|
**Method 2: From source**
|
|
```bash
|
|
git clone https://github.com/yourusername/ai-agent-scanner.git
|
|
cd ai-agent-scanner
|
|
pip install -r requirements.txt
|
|
python -m src.main scan --help
|
|
```
|
|
|
|
**Method 3: Docker**
|
|
```bash
|
|
docker pull yourusername/aasrt:latest
|
|
docker run -it \
|
|
-e SHODAN_API_KEY=your_key \
|
|
-v $(pwd)/reports:/app/reports \
|
|
aasrt:latest scan --template clawdbot
|
|
```
|
|
|
|
**Method 4: Standalone binary**
|
|
```bash
|
|
# PyInstaller-generated executable
|
|
./aasrt-linux-x64 scan --help
|
|
```
|
|
|
|
### 8.2 Docker Configuration
|
|
|
|
**Dockerfile:**
|
|
```dockerfile
|
|
FROM python:3.11-slim
|
|
|
|
WORKDIR /app
|
|
|
|
# Install dependencies
|
|
COPY requirements.txt .
|
|
RUN pip install --no-cache-dir -r requirements.txt
|
|
|
|
# Copy application
|
|
COPY src/ ./src/
|
|
COPY queries/ ./queries/
|
|
COPY config.yaml.example ./config.yaml
|
|
|
|
# Create volumes for persistent data
|
|
VOLUME ["/app/reports", "/app/data", "/app/logs"]
|
|
|
|
# Environment variables
|
|
ENV PYTHONUNBUFFERED=1
|
|
|
|
ENTRYPOINT ["python", "-m", "src.main"]
|
|
CMD ["--help"]
|
|
```
|
|
|
|
**docker-compose.yml:**
|
|
```yaml
|
|
version: '3.8'
|
|
|
|
services:
|
|
aasrt:
|
|
build: .
|
|
environment:
|
|
- SHODAN_API_KEY=${SHODAN_API_KEY}
|
|
- CENSYS_API_ID=${CENSYS_API_ID}
|
|
- CENSYS_SECRET=${CENSYS_SECRET}
|
|
volumes:
|
|
- ./reports:/app/reports
|
|
- ./data:/app/data
|
|
- ./logs:/app/logs
|
|
- ./config.yaml:/app/config.yaml
|
|
command: scan --template clawdbot --output /app/reports/scan.json
|
|
|
|
postgres:
|
|
image: postgres:16
|
|
environment:
|
|
- POSTGRES_DB=aasrt
|
|
- POSTGRES_USER=aasrt
|
|
- POSTGRES_PASSWORD=${DB_PASSWORD}
|
|
volumes:
|
|
- postgres_data:/var/lib/postgresql/data
|
|
ports:
|
|
- "5432:5432"
|
|
|
|
volumes:
|
|
postgres_data:
|
|
```
|
|
|
|
### 8.3 CI/CD Pipeline
|
|
|
|
**GitHub Actions Workflow:**
|
|
```yaml
|
|
name: CI/CD Pipeline
|
|
|
|
on:
|
|
push:
|
|
branches: [main]
|
|
pull_request:
|
|
branches: [main]
|
|
|
|
jobs:
|
|
test:
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- uses: actions/checkout@v3
|
|
- uses: actions/setup-python@v4
|
|
with:
|
|
python-version: '3.11'
|
|
|
|
- name: Install dependencies
|
|
run: |
|
|
pip install -r requirements.txt
|
|
pip install pytest pytest-cov
|
|
|
|
- name: Run tests
|
|
run: pytest --cov=src --cov-report=xml
|
|
|
|
- name: Upload coverage
|
|
uses: codecov/codecov-action@v3
|
|
|
|
build:
|
|
needs: test
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- uses: actions/checkout@v3
|
|
|
|
- name: Build Docker image
|
|
run: docker build -t aasrt:${{ github.sha }} .
|
|
|
|
- name: Push to registry
|
|
run: |
|
|
echo ${{ secrets.DOCKER_PASSWORD }} | docker login -u ${{ secrets.DOCKER_USERNAME }} --password-stdin
|
|
docker push aasrt:${{ github.sha }}
|
|
```
|
|
|
|
---
|
|
|
|
## 9. TESTING STRATEGY
|
|
|
|
### 9.1 Unit Tests
|
|
|
|
**Coverage Requirements:**
|
|
- Core functions: 90%+ coverage
|
|
- Engine modules: 80%+ coverage
|
|
- Utilities: 95%+ coverage
|
|
|
|
**Example Test Cases:**
|
|
```python
|
|
# tests/test_vulnerability_assessor.py
|
|
|
|
import pytest
|
|
from src.core.vulnerability_assessor import VulnerabilityAssessor
|
|
from src.engines.base import SearchResult
|
|
|
|
def test_authentication_check_no_auth():
|
|
"""Test detection of unauthenticated endpoints."""
|
|
assessor = VulnerabilityAssessor({'passive_only': True})
|
|
|
|
target = SearchResult(
|
|
ip="192.0.2.1",
|
|
port=8080,
|
|
service="http",
|
|
metadata={"http_status": 200, "auth_required": False}
|
|
)
|
|
|
|
vulns = assessor._check_authentication(target)
|
|
|
|
assert len(vulns) == 1
|
|
assert vulns[0].severity == "CRITICAL"
|
|
assert vulns[0].check_name == "no_authentication"
|
|
|
|
def test_api_key_exposure_detection():
|
|
"""Test detection of exposed API keys."""
|
|
assessor = VulnerabilityAssessor({'passive_only': True})
|
|
|
|
target = SearchResult(
|
|
ip="192.0.2.1",
|
|
port=8080,
|
|
banner="sk-ant-api03-abc123..."
|
|
)
|
|
|
|
vulns = assessor._check_information_disclosure(target)
|
|
|
|
assert any(v.check_name == "api_key_exposure" for v in vulns)
|
|
```
|
|
|
|
### 9.2 Integration Tests
|
|
|
|
```python
|
|
# tests/test_integration.py
|
|
|
|
@pytest.mark.integration
|
|
def test_end_to_end_scan(mock_shodan_api):
|
|
"""Test complete scan workflow."""
|
|
from src.main import scan_command
|
|
|
|
# Setup
|
|
mock_shodan_api.return_value = [
|
|
{"ip_str": "192.0.2.1", "port": 8080, "data": "ClawdBot"}
|
|
]
|
|
|
|
# Execute
|
|
result = scan_command(
|
|
engines=['shodan'],
|
|
query='product:ClawdBot',
|
|
max_results=10
|
|
)
|
|
|
|
# Verify
|
|
assert result['total_results'] > 0
|
|
assert 'findings' in result
|
|
assert result['findings'][0]['risk_score'] >= 0
|
|
```
|
|
|
|
### 9.3 Security Tests
|
|
|
|
- Input validation tests (SQL injection, XSS, path traversal)
|
|
- API key encryption verification
|
|
- Rate limiting enforcement tests
|
|
- Credential leakage tests (grep logs for secrets)
|
|
|
|
---
|
|
|
|
## 10. DOCUMENTATION REQUIREMENTS
|
|
|
|
### 10.1 User Documentation
|
|
|
|
**README.md:**
|
|
- Installation instructions
|
|
- Quick start guide
|
|
- CLI command reference
|
|
- Configuration examples
|
|
- Troubleshooting guide
|
|
|
|
**USAGE.md:**
|
|
- Detailed usage scenarios
|
|
- Query template guide
|
|
- Custom query creation
|
|
- Report interpretation
|
|
- Best practices
|
|
|
|
**LEGAL.md:**
|
|
- Terms of service
|
|
- Responsible use guidelines
|
|
- Legal disclaimers
|
|
- Compliance requirements
|
|
|
|
### 10.2 Developer Documentation
|
|
|
|
**CONTRIBUTING.md:**
|
|
- Code style guide (PEP 8)
|
|
- Testing requirements
|
|
- Pull request process
|
|
- Engine plugin development guide
|
|
|
|
**API_REFERENCE.md:**
|
|
- Class and method documentation
|
|
- Data models
|
|
- Configuration schema
|
|
- Extension points
|
|
|
|
---
|
|
|
|
## 11. ROADMAP & FUTURE ENHANCEMENTS
|
|
|
|
### Phase 1 (MVP) - Months 1-2
|
|
- [x] Core engine with Shodan + Censys integration
|
|
- [x] Basic vulnerability assessment
|
|
- [x] JSON/CSV reporting
|
|
- [x] CLI interface
|
|
|
|
### Phase 2 - Months 3-4
|
|
- [ ] GitHub Code Search integration
|
|
- [ ] HTML/PDF reporting
|
|
- [ ] Data enrichment (WHOIS, geolocation)
|
|
- [ ] Docker deployment
|
|
|
|
### Phase 3 - Months 5-6
|
|
- [ ] Continuous monitoring
|
|
- [ ] Alert notifications
|
|
- [ ] False positive filtering
|
|
- [ ] Web UI (optional)
|
|
|
|
### Phase 4 - Future
|
|
- [ ] Machine learning for anomaly detection
|
|
- [ ] Distributed scanning (multi-node)
|
|
- [ ] Blockchain integration for immutable audit logs
|
|
- [ ] Custom CVE tracking
|
|
- [ ] Integration with SIEM platforms
|
|
|
|
---
|
|
|
|
## 12. SUCCESS CRITERIA
|
|
|
|
### 12.1 Acceptance Criteria
|
|
|
|
**Minimum Viable Product (MVP):**
|
|
- ✅ Successfully discover ClawdBot instances on Shodan
|
|
- ✅ Identify at least 3 vulnerability categories
|
|
- ✅ Generate JSON report with risk scores
|
|
- ✅ Complete scan in < 2 minutes for 100 results
|
|
- ✅ Zero API key leaks in logs
|
|
- ✅ Pass all unit tests (80%+ coverage)
|
|
|
|
**Production Ready:**
|
|
- ✅ Support 5+ search engines
|
|
- ✅ Multi-format reporting (JSON, CSV, HTML, PDF)
|
|
- ✅ Data enrichment functional
|
|
- ✅ False positive rate < 15%
|
|
- ✅ Comprehensive documentation
|
|
- ✅ Docker deployment tested
|
|
- ✅ Responsible disclosure workflow implemented
|
|
|
|
### 12.2 Performance Benchmarks
|
|
|
|
- Scan 1000 results in < 5 minutes
|
|
- Memory usage stays under 500MB
|
|
- Report generation < 10 seconds
|
|
- Zero data loss on interruptions
|
|
- API rate limits never exceeded
|
|
|
|
---
|
|
|
|
## 13. APPENDICES
|
|
|
|
### Appendix A: Query Template Examples
|
|
|
|
**ClawdBot Detection:**
|
|
```yaml
|
|
# queries/clawdbot.yaml
|
|
|
|
name: ClawdBot Vulnerability Scan
|
|
description: Detect exposed ClawdBot instances
|
|
|
|
queries:
|
|
shodan:
|
|
- 'product:"ClawdBot"'
|
|
- 'http.title:"ClawdBot Dashboard"'
|
|
- 'http.html:"anthropic" http.html:"api_key"'
|
|
|
|
censys:
|
|
- 'services.http.response.body: "ClawdBot"'
|
|
- 'services.http.response.html_title: "ClawdBot"'
|
|
|
|
github:
|
|
- '"ClawdBot" filename:.env "ANTHROPIC_API_KEY"'
|
|
- 'path:config.json "clawdbot" "api_key"'
|
|
|
|
vulnerability_checks:
|
|
- no_authentication
|
|
- api_key_exposure
|
|
- shell_access
|
|
- debug_mode
|
|
|
|
risk_threshold: 7.0
|
|
alert_on_discovery: true
|
|
```
|
|
|
|
### Appendix B: Sample Output
|
|
|
|
**JSON Report (Abbreviated):**
|
|
```json
|
|
{
|
|
"scan_metadata": {
|
|
"scan_id": "550e8400-e29b-41d4-a716-446655440000",
|
|
"timestamp": "2026-02-08T14:30:00Z",
|
|
"engines_used": ["shodan", "censys"],
|
|
"total_results": 42,
|
|
"duration_seconds": 87.3
|
|
},
|
|
"summary": {
|
|
"critical_findings": 12,
|
|
"high_findings": 18,
|
|
"medium_findings": 10,
|
|
"low_findings": 2,
|
|
"average_risk_score": 7.8
|
|
},
|
|
"findings": [
|
|
{
|
|
"finding_id": "finding-001",
|
|
"target_ip": "192.0.2.1",
|
|
"target_port": 8080,
|
|
"target_hostname": "clawdbot.example.com",
|
|
"service": "http",
|
|
"risk_score": 9.2,
|
|
"vulnerabilities": [
|
|
{
|
|
"check_name": "no_authentication",
|
|
"severity": "CRITICAL",
|
|
"cvss_score": 9.1,
|
|
"description": "Dashboard accessible without authentication",
|
|
"evidence": {"http_status": 200, "auth_header": null}
|
|
},
|
|
{
|
|
"check_name": "api_key_exposure",
|
|
"severity": "CRITICAL",
|
|
"cvss_score": 10.0,
|
|
"description": "Anthropic API key visible in source code",
|
|
"evidence": {"api_key_preview": "sk-ant-api03-..."}
|
|
}
|
|
],
|
|
"enrichment": {
|
|
"geolocation": {"country": "US", "city": "San Francisco"},
|
|
"whois": {"registrar": "GoDaddy"},
|
|
"threat_intel": {"greynoise": "benign"}
|
|
},
|
|
"first_seen": "2026-02-08T14:32:15Z",
|
|
"status": "new"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### Appendix C: CVSS Score Mapping
|
|
|
|
| Vulnerability | CVSS Score | Severity |
|
|
|---------------|------------|----------|
|
|
| No Authentication | 9.1 | CRITICAL |
|
|
| API Key Exposure | 10.0 | CRITICAL |
|
|
| Shell Access Enabled | 9.9 | CRITICAL |
|
|
| Debug Mode Enabled | 7.5 | HIGH |
|
|
| Default Credentials | 8.5 | HIGH |
|
|
| Information Disclosure | 6.5 | MEDIUM |
|
|
| Missing HTTPS | 5.3 | MEDIUM |
|
|
| Outdated Software | 4.0 | LOW |
|
|
|
|
### Appendix D: Responsible Disclosure Template
|
|
|
|
```markdown
|
|
# Security Vulnerability Disclosure
|
|
|
|
**To:** security@example.com
|
|
**Subject:** Security Issue: Exposed AI Agent Instance
|
|
**Severity:** Critical
|
|
**Disclosure Timeline:** 90 days (until May 9, 2026)
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
I have discovered a security vulnerability affecting your AI agent deployment that could lead to unauthorized access and data exposure.
|
|
|
|
## Affected System
|
|
|
|
- **URL:** https://clawdbot.example.com:8080
|
|
- **IP Address:** 192.0.2.1
|
|
- **Service:** ClawdBot Dashboard
|
|
- **Discovery Date:** February 8, 2026
|
|
|
|
## Vulnerability Details
|
|
|
|
### 1. Unauthenticated Access (CVSS 9.1)
|
|
The ClawdBot dashboard is accessible without any authentication mechanism.
|
|
|
|
**Steps to Reproduce:**
|
|
1. Navigate to https://clawdbot.example.com:8080
|
|
2. Observe full dashboard access without login
|
|
|
|
### 2. API Key Exposure (CVSS 10.0)
|
|
Anthropic API keys are visible in client-side JavaScript.
|
|
|
|
**Evidence:**
|
|
- File: `/static/js/config.js`
|
|
- Line 12: `ANTHROPIC_API_KEY: "sk-ant-api03-..."`
|
|
|
|
### 3. Shell Command Execution (CVSS 9.9)
|
|
The `/api/execute` endpoint allows arbitrary command execution.
|
|
|
|
## Impact
|
|
|
|
- Unauthorized access to all conversations and data
|
|
- Theft of API credentials leading to financial impact
|
|
- Potential for lateral movement within your infrastructure
|
|
- Reputational damage
|
|
|
|
## Recommended Remediation
|
|
|
|
**Immediate Actions:**
|
|
1. Disable public access to the instance
|
|
2. Rotate all exposed API keys
|
|
3. Review access logs for unauthorized activity
|
|
|
|
**Short-term Actions:**
|
|
1. Implement authentication (OAuth 2.0 recommended)
|
|
2. Move API keys to server-side environment variables
|
|
3. Disable or properly secure shell execution endpoints
|
|
|
|
**Long-term Actions:**
|
|
1. Conduct security audit of all AI agent deployments
|
|
2. Implement security hardening guidelines
|
|
3. Set up monitoring for similar issues
|
|
|
|
## Disclosure Timeline
|
|
|
|
- **Day 0 (Feb 8):** Initial discovery
|
|
- **Day 1 (Feb 9):** Disclosure to security team
|
|
- **Day 30 (Mar 10):** Follow-up if no response
|
|
- **Day 60 (Apr 9):** Limited disclosure to CERT
|
|
- **Day 90 (May 9):** Public disclosure if not remediated
|
|
|
|
## Contact Information
|
|
|
|
I am available to provide additional details or assistance.
|
|
|
|
**Researcher:** [Your Name]
|
|
**Email:** [Your Email]
|
|
**PGP Key:** [Key Fingerprint]
|
|
|
|
---
|
|
|
|
*This disclosure follows coordinated vulnerability disclosure practices and is intended to help improve your security posture.*
|
|
```
|
|
|
|
---
|
|
|
|
## 14. CONCLUSION
|
|
|
|
This Product Requirements Document provides a comprehensive specification for building an AI Agent Security Reconnaissance Tool. The tool will enable security researchers to discover and assess vulnerable AI agent deployments at scale while maintaining ethical standards and legal compliance.
|
|
|
|
**Key Takeaways:**
|
|
1. **Passive reconnaissance only** - No exploitation
|
|
2. **Multi-source intelligence** - Leverage Shodan, Censys, GitHub, etc.
|
|
3. **Automated vulnerability assessment** - Risk scoring and categorization
|
|
4. **Comprehensive reporting** - Multiple formats for different audiences
|
|
5. **Responsible disclosure** - Built-in ethical workflow
|
|
6. **Production-ready** - Docker, CI/CD, monitoring capabilities
|
|
|
|
**Next Steps:**
|
|
1. Review and approve this PRD
|
|
2. Set up development environment
|
|
3. Implement MVP (Phase 1) features
|
|
4. Conduct security review
|
|
5. Beta testing with trusted researchers
|
|
6. Public release with documentation
|
|
|
|
---
|
|
|
|
**Document Approval:**
|
|
|
|
| Role | Name | Signature | Date |
|
|
|------|------|-----------|------|
|
|
| Product Owner | AGK | _______ | _____ |
|
|
| Technical Lead | _______ | _______ | _____ |
|
|
| Security Lead | _______ | _______ | _____ |
|
|
| Legal Counsel | _______ | _______ | _____ |
|
|
|
|
---
|
|
|
|
**Document Revision History:**
|
|
|
|
| Version | Date | Author | Changes |
|
|
|---------|------|--------|---------|
|
|
| 1.0 | 2026-02-08 | AGK | Initial comprehensive PRD |
|
|
|
|
---
|
|
|
|
**END OF DOCUMENT**
|