Files
PentestPilot/bin/web/crawl_words.py
PentestPilot Bot 461c14d676 feat: bootstrap PentestPilot toolkit, docs, and orchestrators
Initial commit of PentestPilot — AI‑assisted pentest recon and orchestration toolkit.\n\nHighlights:\n- Resumeable pipelines (full_pipeline) with manifest state and elapsed timings\n- Rich dashboard (colors, severity bars, durations, compact/json modes)\n- Web helpers: httpx→nuclei auto, tech routing + quick scanners\n- Agents: multi‑task orchestrator (web/full/ad/notes/post) with resume\n- AD/SMB, password utils, shells, transfer, privesc, tunnels\n- QoL scripts: proxy toggle, cleanup, tmux init, URL extractor\n- Docs: README (Quick Start + Docs Index), HOWTO (deep guide), TOOLKIT (catalog with examples)\n\nStructure:\n- bin/automation: pipelines, dashboard, manifest, resume, tech_actions\n- bin/web: routing, scanners, helpers\n- bin/ai: orchestrators + robust AI utils\n- bin/ad, bin/passwords, bin/shells, bin/transfer, bin/privesc, bin/misc, bin/dns, bin/scan, bin/windows, bin/hashes\n- HOWTO.md and TOOLKIT.md cross‑linked with examples\n\nUse:\n- settarget <target>; agent full <domain|hosts.txt>; dashboard --compact\n- See HOWTO.md for setup, semantics, and examples.
2025-10-08 16:00:22 +02:00

47 lines
1.4 KiB
Python
Executable File

#!/usr/bin/env python3
import sys, re, html, urllib.parse, urllib.request
def fetch(url):
try:
req = urllib.request.Request(url, headers={'User-Agent':'Mozilla/5.0'})
with urllib.request.urlopen(req, timeout=8) as r:
return r.read().decode('utf-8', 'ignore')
except Exception as e:
sys.stderr.write(f"[!] fetch error for {url}: {e}\n")
return ''
def extract_links(base, htmltext):
links = set()
for m in re.finditer(r'href=["\']([^"\']+)["\']', htmltext, re.I):
href = m.group(1)
if href.startswith('#') or href.startswith('mailto:'): continue
url = urllib.parse.urljoin(base, href)
links.add(url)
return links
def words(text):
text = html.unescape(text)
return set(w.lower() for w in re.findall(r'[A-Za-z][A-Za-z0-9_\-]{3,}', text))
if len(sys.argv) < 2:
print(f"Usage: {sys.argv[0]} <url> [depth]", file=sys.stderr); sys.exit(1)
start = sys.argv[1]
depth = int(sys.argv[2]) if len(sys.argv) > 2 else 1
visited=set([start]); frontier=[start]
all_words=set()
for _ in range(depth):
new=[]
for u in list(frontier):
body = fetch(u)
all_words |= words(body)
for v in extract_links(u, body):
if v not in visited and urllib.parse.urlparse(v).netloc == urllib.parse.urlparse(start).netloc:
visited.add(v); new.append(v)
frontier = new
for w in sorted(all_words):
print(w)