Files
HackBrowserData/rfcs/001-project-architecture.md
T
Roger d8032ac824 docs: rewrite readme, rfcs, and contributing (#555)
* docs: rewrite README, RFCs, and CONTRIBUTING
* docs: fix Linux storage labels in RFC-006 (Opera/Vivaldi swapped)
2026-04-06 00:16:47 +08:00

8.5 KiB

RFC-001: Project Architecture & Data Model

Author: moonD4rk Status: Living Document Created: 2026-04-05

1. Project Positioning

HackBrowserData is a CLI security research tool that extracts and decrypts browser data from Chromium-based browsers and Firefox across Windows, macOS, and Linux.

Key constraints:

  • Go 1.20 — the module must build with Go 1.20 to maintain Windows 7 support. Features from Go 1.21+ (log/slog, slices, maps, cmp) must not be used.
  • Supported engines: Chromium (including Yandex and Opera variants) and Firefox.
  • Supported platforms: Windows (DPAPI), macOS (Keychain), Linux (D-Bus Secret Service).
  • No root-level library API — the CLI calls browser.PickBrowsers() directly; there is no importable pkg/ surface.

2. Directory Structure

HackBrowserData/
├── cmd/hack-browser-data/    # CLI entrypoint: cobra root, dump, list, version
├── browser/                  # Browser interface, PickBrowsers(), platform browser lists
│   ├── chromium/             # Chromium engine: extraction, decryption, profile discovery
│   └── firefox/              # Firefox engine: extraction, NSS key derivation
├── types/                    # Data model: Category enum, Entry structs, BrowserData
├── crypto/                   # Encryption primitives, cipher version detection
│   └── keyretriever/         # Platform-specific master key retrieval (Keychain/DPAPI/D-Bus)
├── filemanager/              # Temp file session, locked file handling (Windows)
├── output/                   # Output Writer: CSV, JSON, CookieEditor formatters
├── log/                      # Logging with level filtering
└── utils/                    # SQLite query helpers, file utilities

3. Core Data Model

3.1 Category

Category is an int enum representing 9 browser-agnostic data kinds: Password, Cookie, Bookmark, History, Download, CreditCard, Extension, LocalStorage, SessionStorage.

Three categories are classified as sensitive (Password, Cookie, CreditCard) via IsSensitive(), enabling safe-by-default export scenarios.

3.2 Entry Types

Each category has a corresponding Entry struct with json and csv struct tags. All structs are flat (no nesting) and use time.Time for timestamps.

Struct Category Key Fields
LoginEntry Password URL, Username, Password, CreatedAt
CookieEntry Cookie Host, Path, Name, Value, IsSecure, IsHTTPOnly, ExpireAt, CreatedAt
BookmarkEntry Bookmark Name, URL, Folder, CreatedAt
HistoryEntry History URL, Title, VisitCount, LastVisit
DownloadEntry Download URL, TargetPath, TotalBytes, StartTime, EndTime
CreditCardEntry CreditCard Name, Number, ExpMonth, ExpYear
ExtensionEntry Extension Name, ID, Description, Version
StorageEntry LocalStorage, SessionStorage URL, Key, Value

StorageEntry is shared by both LocalStorage and SessionStorage.

3.3 BrowserData Container

BrowserData is the result container returned by Extract(). It holds typed slices — one per category. The container is populated field-by-field during extraction. The output layer uses makeExtractor[T]() generics to pull the correct slice for serialization.

4. Browser Interface & Registration

4.1 BrowserKind

Four engine kinds determine source paths and extractors:

Kind Description
Chromium Standard Chromium layout
ChromiumYandex Yandex variant: different file names and SQL queries
ChromiumOpera Opera variant: different extension key, Roaming path on Windows
Firefox Firefox: NSS encryption, SQLite + JSON files

4.2 BrowserConfig

BrowserConfig is the declarative, platform-specific browser definition containing: Key (CLI matching), Name (display), Kind (engine), Storage (keychain label), UserDataDir (data path).

4.3 PickBrowsers() Flow

PickBrowsers(opts)
  → platformBrowsers()              // build-tagged: returns []BrowserConfig for this OS
  → pickFromConfigs(configs, opts)   // filter by name, apply profile-path/keychain overrides
      → newBrowsers(cfg)             // dispatch by Kind to chromium.NewBrowsers or firefox.NewBrowsers
          → discoverProfiles()       // scan for profile subdirectories
          → resolveSourcePaths()     // stat each candidate path, first match wins

Key design decisions:

  • One KeyRetriever per browser — created once and shared across all profiles to prevent repeated keychain prompts on macOS.
  • Profile discovery differs by engine: Chromium looks for Preferences files in subdirectories; Firefox accepts any subdirectory containing known source files.
  • Flat layout fallback — Opera-style browsers that store data directly in UserDataDir (no profile subdirectories) are handled by falling back to the base directory.

4.4 Platform Browser Lists

Browser configs are defined per-platform via build tags:

  • macOS — 12 browsers (Chrome, Edge, Chromium, Chrome Beta, Opera, OperaGX, Vivaldi, CocCoc, Brave, Yandex, Arc, Firefox)
  • Windows — 16 browsers (all macOS minus Arc, plus 360 Speed, 360 Speed X, QQ, DC, Sogou)
  • Linux — 8 browsers (Chrome, Edge, Chromium, Chrome Beta, Opera, Vivaldi, Brave, Firefox)

5. Extract() Orchestration

Both Chromium and Firefox engines follow the same extraction pattern:

Extract(categories)
  1. NewSession()               → create isolated temp directory
  2. acquireFiles(session)      → copy source files to temp dir (with dedup and WAL/SHM)
  3. getMasterKey(session)       → platform-specific key retrieval
  4. for each category:
       extractCategory(data, cat, masterKey, path)
  5. defer session.Cleanup()    → remove temp directory

For details on file acquisition, see RFC-008. For encryption details, see RFC-003 (Chromium) and RFC-005 (Firefox). For key retrieval, see RFC-006.

5.1 Collect-and-Continue Pattern

The extraction loop maximizes data recovery. Each category is extracted independently — a failure in one does not affect others. Errors are handled at three levels:

Level Trigger Action
Session failure Temp dir cannot be created Abort entirely, return error
Category failure Source file missing or extraction error Skip category, continue to next
Record failure Single row decryption fails Skip record, continue extraction

Master key failure is non-fatal. If the key cannot be retrieved, categories requiring decryption (passwords, cookies, credit cards) produce empty values, while non-encrypted categories (history, bookmarks, downloads) still succeed.

5.2 Custom Extractors

The categoryExtractor interface allows browser-specific extraction logic. Yandex and Opera use custom extractors for passwords and extensions respectively, while all other categories fall through to the default Chromium implementation.

6. Dependency Constraints

The module is pinned to go 1.20 in go.mod. This is enforced by a CI lint check that fails if the directive changes.

Dependency Version Purpose
modernc.org/sqlite v1.31.1 (pinned) Pure-Go SQLite. v1.32+ requires Go 1.21
github.com/syndtr/goleveldb v1.0.0 LevelDB for Chromium localStorage/sessionStorage
github.com/tidwall/gjson v1.18.0 JSON path queries
github.com/spf13/cobra v1.10.2 CLI framework
github.com/moond4rk/keychainbreaker v0.2.5 macOS keychain decryption
github.com/godbus/dbus/v5 v5.2.2 Linux D-Bus Secret Service
golang.org/x/sys v0.27.0 Windows syscalls (DPAPI, DuplicateHandle)
RFC Topic
RFC-002 Chromium data file locations and storage formats
RFC-003 Chromium encryption mechanisms per platform
RFC-004 Firefox data file locations and storage formats
RFC-005 Firefox NSS encryption and key derivation
RFC-006 Platform-specific master key retrieval
RFC-007 CLI commands and output formats
RFC-008 File acquisition and platform quirks
RFC-009 Windows locked file bypass technique