docs: add architecture refactoring RFCs and switch gitignore to whitelist (#510)

* docs: add architecture refactoring RFCs and switch gitignore to whitelist

- Rename rfc/ to rfcs/
- RFC-001: overall architecture redesign (data models, crypto layer,
  browser registration, CLI separation, error handling)
- RFC-002: data extraction and file acquisition refactoring
- Replace .gitignore blacklist (212 lines) with precise whitelist (43 lines)
  to prevent accidental commit of sensitive browser data files

* feat: update architecture refactoring documentation

- Refactor the architecture to improve scalability and maintainability
- Streamline browser data and file acquisition processes for efficiency

* docs(rfcs): add extract_* naming convention and queryRows[T] helper

- RFC-001: add file naming convention section explaining extract_* prefix
  grouping, add datautil/query.go for queryRows[T] generic helper
- RFC-002: update all extract examples to use datautil.QueryRows[T],
  add Section 3.2 with queryRows[T] definition

* feat: update architecture refactoring documentation

- RFC-001: rename BrowserConfig→Config, BrowsingData→Extract,
  add public/private visibility table, add isValidBrowserDir
  in PickBrowsers, remove storage from Chromium struct,
  NewChain returns KeyRetriever interface, add error wrapping
  convention, unexport PBKDF2 params, flatten log/level
- RFC-002: replace outPutter with writeFile/writeJSON/writeCSV,
  remove golang.org/x/text dependency (3-byte BOM), add Windows
  locked file handling (copyLocked), fix discoverDataFiles to
  check file vs dir type, Firefox New() takes profileDir only,
  add decryptPBE helper, add error handling section, add
  profile discovery with tests, add platform config example
This commit is contained in:
Roger
2026-03-23 01:07:56 +08:00
committed by GitHub
parent cbd4594958
commit 9959c0839a
4 changed files with 1679 additions and 440 deletions
+798
View File
@@ -0,0 +1,798 @@
# RFC-001: Architecture Refactoring
**Author**: moonD4rk
**Status**: Proposed
**Created**: 2025-09-01
**Updated**: 2026-03-22
## Abstract
This RFC addresses the overall architecture of HackBrowserData:
1. **Data model redesign**: `Category` enum + browser-agnostic `*Entry` structs
2. **Crypto layer**: cipher version detection, master key retrieval abstraction
3. **Browser registration & discovery**: declarative config, direct profile scanning
4. **Yandex variant handling**: source overrides + query overrides
5. **Error handling**: collect-and-continue pattern
**Constraint**: Go 1.20 (Windows 7 support).
See RFC-002 for file acquisition, extract method details, and output.
---
## 1. Target Directory Structure
```
hackbrowserdata/
├── cmd/
│ └── hack-browser-data/
│ └── main.go # CLI: flag parsing → PickBrowsers → Extract → Output
├── browser/
│ ├── browser.go # Browser interface, BrowserKind, Config, PickBrowsers()
│ ├── browser_darwin.go # platformBrowsers() → []Config
│ ├── browser_windows.go # platformBrowsers() → []Config
│ ├── browser_linux.go # platformBrowsers() → []Config
│ │
│ ├── chromium/
│ │ ├── chromium.go # Chromium struct (holds masterKey []byte), Extract()
│ │ ├── chromium_darwin.go # platform key retriever wiring
│ │ ├── chromium_windows.go # platform key retriever wiring
│ │ ├── chromium_linux.go # platform key retriever wiring
│ │ ├── source.go # chromiumSources, yandexSources maps
│ │ ├── extract_password.go # extractPasswords() + default SQL query
│ │ ├── extract_cookie.go # extractCookies() + default SQL query
│ │ ├── extract_history.go # extractHistories() + default SQL query
│ │ ├── extract_download.go # extractDownloads() + default SQL query
│ │ ├── extract_bookmark.go # extractBookmarks() (JSON)
│ │ ├── extract_creditcard.go # extractCreditCards() + default SQL query
│ │ ├── extract_extension.go # extractExtensions() (JSON)
│ │ └── extract_storage.go # extractLocalStorage(), extractSessionStorage() (LevelDB)
│ │
│ ├── firefox/
│ │ ├── firefox.go # Firefox struct, Extract(), deriveMasterKey()
│ │ ├── firefox_test.go
│ │ ├── source.go # firefoxSources map
│ │ ├── extract_password.go # extractPasswords() (JSON + ASN1PBE)
│ │ ├── extract_cookie.go # extractCookies() (SQLite, no encryption)
│ │ ├── extract_history.go # extractHistories() (SQLite)
│ │ ├── extract_download.go # extractDownloads() (SQLite)
│ │ ├── extract_bookmark.go # extractBookmarks() (SQLite)
│ │ ├── extract_extension.go # extractExtensions() (JSON)
│ │ └── extract_storage.go # extractLocalStorage() (SQLite)
│ │
│ └── exploit/
│ └── gcoredump/
│ └── gcoredump.go # CVE-2025-24204 macOS exploit (darwin only)
├── browserdata/
│ ├── browserdata.go # BrowserData struct (typed slices)
│ ├── output.go # BrowserData.Output() — CSV/JSON writer
│ ├── output_test.go
│ │
│ └── datautil/
│ ├── sqlite.go # QuerySQLite() helper
│ ├── query.go # queryRows[T]() generic helper (Go 1.20)
│ └── decrypt.go # DecryptChromiumValue() helper
├── crypto/
│ ├── crypto.go # AESCBCDecrypt, AESGCMDecrypt, DES3, PKCS5
│ ├── crypto_darwin.go # DecryptWithChromium (CBC), DecryptWithDPAPI (returns error)
│ ├── crypto_windows.go # DecryptWithChromium (GCM), DecryptWithDPAPI
│ ├── crypto_linux.go # DecryptWithChromium (CBC), DecryptWithDPAPI (returns error)
│ ├── crypto_test.go
│ ├── version.go # DetectVersion(), StripPrefix(), CipherVersion
│ ├── asn1pbe.go # Firefox ASN.1 PBE key derivation
│ ├── asn1pbe_test.go
│ ├── pbkdf2.go
│ │
│ └── keyretriever/
│ ├── keyretriever.go # KeyRetriever interface, ChainRetriever
│ ├── keyretriever_darwin.go # GcoredumpRetriever, SecurityCmdRetriever
│ ├── keyretriever_windows.go # DPAPIRetriever
│ ├── keyretriever_linux.go # DBusRetriever, FallbackRetriever
│ └── params.go # PBKDF2Params (saltysalt, iterations)
├── filemanager/
│ └── session.go # Session: MkdirTemp, TempDir(), Acquire(), Cleanup()
├── types/
│ ├── category.go # Category enum (9 values)
│ ├── models.go # LoginEntry, CookieEntry, ... (browser-agnostic)
│ └── types_test.go
├── log/
│ ├── log.go
│ ├── logger.go
│ ├── logger_test.go
│ └── level.go # log levels (merged from level/ sub-package)
└── utils/
├── byteutil/
│ └── byteutil.go
├── fileutil/
│ ├── fileutil.go # renamed from filetutil.go
│ └── fileutil_test.go
├── typeutil/
│ ├── typeutil.go
│ └── typeutil_test.go
└── chainbreaker/
├── chainbreaker.go
└── chainbreaker_test.go
```
### What changed vs current structure
| Change | Current | Target |
|--------|---------|--------|
| **New** `browserdata/datautil/` | — | SQLite + decrypt helpers |
| **New** `filemanager/` | — | Session-based temp file management |
| **New** `crypto/keyretriever/` | — | Master key retrieval abstraction |
| **New** `crypto/version.go` | — | Cipher version detection |
| **New** `browser/chromium/extract_*.go` | — | Per-category extract methods |
| **New** `browser/firefox/extract_*.go` | — | Per-category extract methods |
| **New** `browser/*/source.go` | — | File source mapping per engine |
| **Restructured** `types/` | 22 DataType constants + file mappings | 9 Category constants + data model structs |
| **Deleted** `extractor/` | interface + registry + factory | not needed |
| **Deleted** `browserdata/imports.go` | init() side-effect registration | not needed |
| **Deleted** `browserdata/password/`, `cookie/`, etc. | 9 sub-packages | extract logic moved into browser engines |
| **Deleted** `browser/consts.go` | 27 scattered constants | inlined into Config |
| **Renamed** `filetutil.go` | typo | `fileutil.go` |
| **Renamed** `AES128CBCDecrypt` | misleading name | `AESCBCDecrypt` |
### Naming conventions
| Concept | Package | Type/Func | File |
|---------|---------|-----------|------|
| Data category | `types` | `Category` (int enum) | `category.go` |
| Data models | `types` | `LoginEntry`, `CookieEntry`, ... | `models.go` |
| Result container | `browserdata` | `BrowserData` | `browserdata.go` |
| Browser config | `browser` | `Config` | `browser.go` |
| Browser engine kind | `browser` | `BrowserKind` | `browser.go` |
| File source mapping | `chromium`/`firefox` | `source` struct, `chromiumSources` map | `source.go` |
| Key retrieval | `keyretriever` | `KeyRetriever` (interface) | `keyretriever.go` |
| Strategy chain | `keyretriever` | `ChainRetriever` | `keyretriever.go` |
| Cipher version | `crypto` | `CipherVersion` | `version.go` |
| Temp file session | `filemanager` | `Session` | `session.go` |
| SQLite helper | `datautil` | `QuerySQLite` (func) | `sqlite.go` |
| Generic query helper | `datautil` | `queryRows[T]` (func) | `query.go` |
| Decrypt helper | `datautil` | `DecryptChromiumValue` (func) | `decrypt.go` |
### Public vs private
| Symbol | Exported | Reason |
|--------|----------|--------|
| `Browser` interface | Yes | used by cmd/main.go |
| `Config` struct | Yes | passed to chromium.New() |
| `PickBrowsers()` | Yes | called by cmd/main.go |
| `platformBrowsers()` | No | browser package internal |
| `isValidBrowserDir()` | No | browser package internal |
| `Chromium.Extract()` | Yes | implements Browser interface |
| `Chromium.extractPasswords()` | No | chromium package internal |
| `Chromium.acquireFiles()` | No | chromium package internal |
| `discoverProfiles()` | No | chromium package internal |
| `BrowserData` struct | Yes | returned to cmd/main.go |
| `BrowserData.Output()` | Yes | called by cmd/main.go |
| `QuerySQLite()` | Yes | used by chromium and firefox |
| `QueryRows[T]()` | Yes | used by chromium and firefox |
### File naming convention for `extract_*.go`
Files inside `browser/chromium/` and `browser/firefox/` use the `extract_` prefix for extraction logic. This groups them visually when sorted alphabetically:
```
chromium.go ← struct + Extract orchestration
chromium_darwin.go ← platform: master key
chromium_linux.go
chromium_windows.go
extract_bookmark.go ← extract: one file per Category
extract_cookie.go
extract_creditcard.go
extract_download.go
extract_extension.go
extract_history.go
extract_password.go
extract_storage.go
source.go ← file source mapping
```
Three natural groups: `chromium*` (struct + platform), `extract_*` (data extraction), `source.go` (file mapping). Each `extract_*.go` file contains the default SQL query constant and the extract method (~20-30 lines).
---
## 2. Core Data Model Redesign
### 2.1 Problem: MasterKey mixed with data types
The current `DataType` enum contains 22 constants that conflate three concerns:
- **Infrastructure** (keys): `ChromiumKey`, `FirefoxKey4`
- **Browser engine prefix**: `ChromiumPassword` vs `FirefoxPassword` vs `YandexPassword`
- **File layout**: `Filename()`, `TempFilename()` methods on the enum
A password is a password regardless of which browser it came from. The browser engine determines *how* to extract, not *what* the data is.
### 2.2 New design: Category + Models
**`types/category.go`** — 9 data categories (down from 22 DataType constants):
```go
package types
type Category int
const (
Password Category = iota
Cookie
Bookmark
History
Download
CreditCard
Extension
LocalStorage
SessionStorage
)
var AllCategories = []Category{
Password, Cookie, Bookmark, History, Download,
CreditCard, Extension, LocalStorage, SessionStorage,
}
func (c Category) String() string { ... }
func (c Category) IsSensitive() bool {
switch c {
case Password, Cookie, CreditCard:
return true
default:
return false
}
}
func NonSensitiveCategories() []Category {
var cats []Category
for _, c := range AllCategories {
if !c.IsSensitive() {
cats = append(cats, c)
}
}
return cats
}
```
**`types/models.go`** — browser-agnostic data models, no encrypted fields:
```go
package types
import "time"
type LoginEntry struct {
URL string `json:"url" csv:"url"`
Username string `json:"username" csv:"username"`
Password string `json:"password" csv:"password"`
CreatedAt time.Time `json:"created_at" csv:"created_at"`
}
type CookieEntry struct {
Host string `json:"host" csv:"host"`
Path string `json:"path" csv:"path"`
Name string `json:"name" csv:"name"`
Value string `json:"value" csv:"value"`
IsSecure bool `json:"is_secure" csv:"is_secure"`
IsHTTPOnly bool `json:"is_httponly" csv:"is_httponly"`
ExpireAt time.Time `json:"expire_at" csv:"expire_at"`
CreatedAt time.Time `json:"created_at" csv:"created_at"`
}
type BookmarkEntry struct {
Name string `json:"name" csv:"name"`
URL string `json:"url" csv:"url"`
Folder string `json:"folder" csv:"folder"`
CreatedAt time.Time `json:"created_at" csv:"created_at"`
}
type HistoryEntry struct {
URL string `json:"url" csv:"url"`
Title string `json:"title" csv:"title"`
VisitCount int `json:"visit_count" csv:"visit_count"`
LastVisit time.Time `json:"last_visit" csv:"last_visit"`
}
type DownloadEntry struct {
URL string `json:"url" csv:"url"`
TargetPath string `json:"target_path" csv:"target_path"`
TotalBytes int64 `json:"total_bytes" csv:"total_bytes"`
StartTime time.Time `json:"start_time" csv:"start_time"`
EndTime time.Time `json:"end_time" csv:"end_time"`
}
type CreditCardEntry struct {
Name string `json:"name" csv:"name"`
Number string `json:"number" csv:"number"`
ExpMonth string `json:"exp_month" csv:"exp_month"`
ExpYear string `json:"exp_year" csv:"exp_year"`
}
type StorageEntry struct {
URL string `json:"url" csv:"url"`
Key string `json:"key" csv:"key"`
Value string `json:"value" csv:"value"`
}
type ExtensionEntry struct {
Name string `json:"name" csv:"name"`
ID string `json:"id" csv:"id"`
Description string `json:"description" csv:"description"`
Version string `json:"version" csv:"version"`
}
```
### 2.3 Result container
**`browserdata/browserdata.go`**:
```go
type BrowserData struct {
Passwords []types.LoginEntry
Cookies []types.CookieEntry
Bookmarks []types.BookmarkEntry
Histories []types.HistoryEntry
Downloads []types.DownloadEntry
CreditCards []types.CreditCardEntry
Extensions []types.ExtensionEntry
LocalStorage []types.StorageEntry
SessionStorage []types.StorageEntry
}
```
### 2.4 What was removed from types/
| Removed | Reason |
|---------|--------|
| `ChromiumKey`, `FirefoxKey4` | MasterKey is infrastructure, handled inside browser engine |
| `Chromium*`/`Firefox*`/`Yandex*` prefixes | Browser engine is extraction concern, not type concern |
| `Filename()`, `TempFilename()` | File layout is browser engine's internal knowledge |
| `itemFileNames` map | Moved into `chromium/source.go` and `firefox/source.go` |
| `DefaultChromiumTypes`, `DefaultFirefoxTypes`, `DefaultYandexTypes` | Replaced by `types.AllCategories` |
| `extractor/` package | No longer needed — browser engines have typed extract methods |
| `browserdata/imports.go` | No longer needed — no init() registration |
---
## 3. Crypto Layer
### 3.1 Cipher version detection
**New file**: `crypto/version.go`
```go
type CipherVersion string
const (
CipherV10 CipherVersion = "v10" // Chrome 80+
CipherV20 CipherVersion = "v20" // Chrome 127+ App-Bound Encryption
CipherDPAPI CipherVersion = "dpapi" // pre-Chrome 80
)
func DetectVersion(ciphertext []byte) CipherVersion {
if len(ciphertext) < 3 { return CipherDPAPI }
prefix := string(ciphertext[:3])
switch prefix {
case "v10":
return CipherV10
case "v20":
return CipherV20
default:
return CipherDPAPI
}
}
func StripPrefix(ciphertext []byte) []byte {
ver := DetectVersion(ciphertext)
if ver == CipherV10 || ver == CipherV20 {
return ciphertext[3:]
}
return ciphertext
}
```
Version-specific post-processing (e.g., v20 cookie value has a 32-byte header) belongs here, not in extract methods:
```go
// DecryptCookieValue handles version-specific cookie decryption.
func DecryptCookieValue(key, ciphertext []byte) ([]byte, error) {
version := DetectVersion(ciphertext)
payload := StripPrefix(ciphertext)
switch version {
case CipherV10:
return decryptPayload(key, payload)
case CipherV20:
value, err := decryptPayload(key, payload)
if err != nil { return nil, err }
if len(value) > 32 {
return value[32:], nil // strip App-Bound header
}
return value, nil
default:
return nil, fmt.Errorf("unsupported cipher version: %s", version)
}
}
```
### 3.2 Key retriever abstraction
**New package**: `crypto/keyretriever/`
```go
type KeyRetriever interface {
RetrieveKey(storage string, localStatePath string) ([]byte, error)
}
// Note: Windows DPAPIRetriever reads localStatePath to extract the encrypted key.
// macOS and Linux retrievers ignore localStatePath (they use keychain/dbus instead).
type ChainRetriever struct {
retrievers []KeyRetriever
}
func NewChain(retrievers ...KeyRetriever) KeyRetriever { ... }
func (c *ChainRetriever) RetrieveKey(storage string, localStatePath string) ([]byte, error) {
var lastErr error
for _, r := range c.retrievers {
key, err := r.RetrieveKey(storage, localStatePath)
if err == nil && len(key) > 0 { return key, nil }
lastErr = err
}
return nil, fmt.Errorf("all key retrievers failed: %w", lastErr)
}
```
Platform defaults:
- macOS: `NewChain(&GcoredumpRetriever{}, &SecurityCmdRetriever{})`
- Windows: `&DPAPIRetriever{}`
- Linux: `NewChain(&DBusRetriever{}, &FallbackRetriever{})`
**`params.go`** centralizes PBKDF2 magic values with source links:
```go
var (
// https://source.chromium.org/chromium/chromium/src/+/master:components/os_crypt/os_crypt_mac.mm
macOSParams = PBKDF2Params{Salt: []byte("saltysalt"), Iterations: 1003, KeyLen: 16}
// https://source.chromium.org/chromium/chromium/src/+/main:components/os_crypt/os_crypt_linux.cc
linuxParams = PBKDF2Params{Salt: []byte("saltysalt"), Iterations: 1, KeyLen: 16}
)
```
---
## 4. Browser Registration & Discovery
### 4.1 Declarative browser config
```go
// browser/browser.go
type BrowserKind int
const (
KindChromium BrowserKind = iota
KindChromiumYandex // Chromium variant with different file names and SQL queries
KindFirefox
)
type Config struct {
Key string // lookup key: "chrome", "firefox"
Name string // display name: "Chrome", "Firefox"
Kind BrowserKind
Storage string // keychain label (macOS/Linux); unused on Windows (DPAPI reads Local State directly)
UserDataDir string // e.g. ~/Library/Application Support/Google/Chrome/
}
type Browser interface {
Name() string
Extract(categories []types.Category) (*browserdata.BrowserData, error)
}
```
### 4.2 Platform browser list & PickBrowsers
Each platform file defines `platformBrowsers()`. Use full paths per line (no shared prefix variable):
```go
// browser/browser_darwin.go
func platformBrowsers() []Config {
return []Config{
{Key: "chrome", Name: "Chrome", Kind: KindChromium, Storage: "Chrome",
UserDataDir: homeDir + "/Library/Application Support/Google/Chrome"},
{Key: "edge", Name: "Edge", Kind: KindChromium, Storage: "Microsoft Edge",
UserDataDir: homeDir + "/Library/Application Support/Microsoft Edge"},
// ... other browsers
}
}
```
```go
func PickBrowsers(name, profile string) ([]Browser, error) {
name = strings.ToLower(name)
var browsers []Browser
configs := platformBrowsers()
for _, cfg := range configs {
if name != "all" && cfg.Key != name { continue }
dir := cfg.UserDataDir
if profile != "" { dir = profile }
if !isValidBrowserDir(cfg.Kind, dir) {
continue
}
bs, err := newBrowserFromConfig(cfg, dir)
if err != nil {
log.Debugf("skip %s: %v", cfg.Name, err)
continue
}
browsers = append(browsers, bs...)
}
return browsers, nil
}
func newBrowserFromConfig(cfg Config, dir string) ([]Browser, error) {
switch cfg.Kind {
case KindChromium, KindChromiumYandex:
return chromium.New(cfg, dir)
case KindFirefox:
return firefox.New(dir)
default:
return nil, fmt.Errorf("unknown browser kind: %d", cfg.Kind)
}
}
```
### 4.3 Browser installation validation & profile discovery
Before enumerating profiles, confirm the directory is a real browser installation. For Chromium, the `Local State` file is the confirmation signal:
```go
func isValidBrowserDir(kind BrowserKind, dir string) bool {
if !fileutil.IsDirExists(dir) { return false }
switch kind {
case KindChromium, KindChromiumYandex:
return fileutil.IsFileExists(filepath.Join(dir, "Local State"))
case KindFirefox:
return true
}
return false
}
```
Chromium profiles are deterministic (`Default/`, `Profile 1/`, ...). Directly `os.ReadDir()` and check known file paths instead of `filepath.Walk`.
Firefox profiles are `xxxxxxxx.name/` directories. Enumerate and check for `key4.db` or `logins.json`.
---
## 5. Yandex Variant Handling
Yandex is Chromium-based with 3 differences:
| Aspect | Standard Chromium | Yandex |
|--------|------------------|--------|
| Password file | `Login Data` | `Ya Passman Data` |
| Password SQL | `SELECT origin_url, ...` | `SELECT action_url, ...` |
| CreditCard file | `Web Data` | `Ya Credit Cards` |
### 5.1 Separate source map
```go
// browser/chromium/source.go
var yandexSources = map[types.Category]source{
types.Password: {paths: []string{"Ya Passman Data"}}, // different
types.Cookie: {paths: []string{"Network/Cookies", "Cookies"}},
types.History: {paths: []string{"History"}},
types.Download: {paths: []string{"History"}},
types.Bookmark: {paths: []string{"Bookmarks"}},
types.CreditCard: {paths: []string{"Ya Credit Cards"}}, // different
types.Extension: {paths: []string{"Secure Preferences"}},
types.LocalStorage: {paths: []string{"Local Storage/leveldb"}, isDir: true},
types.SessionStorage: {paths: []string{"Session Storage"}, isDir: true},
}
```
### 5.2 Query overrides (default + override pattern)
Each extract method defines its own default SQL query constant. The Chromium struct holds an optional override map:
```go
// browser/chromium/chromium.go
type Chromium struct {
name string
profileDir string
masterKey []byte // retrieved once in New(), shared across profiles
sources map[types.Category]source // chromiumSources or yandexSources
queryOverrides map[types.Category]string // nil for standard Chromium
}
var yandexQueryOverrides = map[types.Category]string{
types.Password: `SELECT action_url, username_value, password_value, date_created FROM logins`,
}
```
Extract methods check for overrides locally:
```go
// browser/chromium/extract_password.go
const defaultLoginQuery = `SELECT origin_url, username_value, password_value, date_created FROM logins`
func (c *Chromium) extractPasswords(masterKey []byte, path string) ([]types.LoginEntry, error) {
query := defaultLoginQuery
if q, ok := c.queryOverrides[types.Password]; ok {
query = q
}
// ... rest of extraction
}
```
### 5.3 Wiring at creation time
```go
func New(cfg browser.Config, userDataDir string) ([]*Chromium, error) {
sources := chromiumSources
var overrides map[types.Category]string
if cfg.Kind == browser.KindChromiumYandex {
sources = yandexSources
overrides = yandexQueryOverrides
}
// Retrieve master key ONCE for the entire browser, shared across all profiles.
localStatePath := filepath.Join(userDataDir, "Local State")
retriever := platformKeyRetriever() // returns ChainRetriever per platform
masterKey, err := retriever.RetrieveKey(cfg.Storage, localStatePath)
if err != nil { return nil, fmt.Errorf("retrieve master key: %w", err) }
// ... discover profiles, create Chromium instances with masterKey + sources + overrides
}
```
Zero if-branches in any extract method. All variant differences concentrated in `source.go` and `New()`. The master key is retrieved once and injected into every `Chromium` instance (one per profile).
---
## 6. Error Handling
### 6.1 Collect-and-continue pattern
`Extract()` collects errors per category but continues extracting. The returned `data` and `err` can both be non-nil:
```go
func (c *Chromium) Extract(categories []types.Category) (*browserdata.BrowserData, error) {
session, err := filemanager.NewSession()
if err != nil { return nil, err }
defer session.Cleanup()
files := c.acquireFiles(session, categories)
data := &browserdata.BrowserData{}
var errs []error
for _, cat := range categories {
path, ok := files[cat]
if !ok { continue }
// c.masterKey was retrieved once in New() and stored on the struct.
switch cat {
case types.Password:
data.Passwords, err = c.extractPasswords(c.masterKey, path)
case types.Cookie:
data.Cookies, err = c.extractCookies(c.masterKey, path)
case types.History:
data.Histories, err = c.extractHistories(path)
case types.Download:
data.Downloads, err = c.extractDownloads(path)
case types.Bookmark:
data.Bookmarks, err = c.extractBookmarks(path)
case types.CreditCard:
data.CreditCards, err = c.extractCreditCards(c.masterKey, path)
case types.Extension:
data.Extensions, err = c.extractExtensions(path)
case types.LocalStorage:
data.LocalStorage, err = c.extractLocalStorage(path)
case types.SessionStorage:
data.SessionStorage, err = c.extractSessionStorage(path)
}
if err != nil {
log.Debugf("extract %s: %v", cat, err)
errs = append(errs, fmt.Errorf("%s: %w", cat, err))
}
}
return data, errors.Join(errs...) // Go 1.20
}
```
### 6.2 Error severity levels
| Level | Behavior | Example |
|-------|----------|---------|
| Session/key failure | `return nil, err` — abort entirely | Disk full, keychain denied |
| Category failure | Log, skip, continue next category | Cookie file locked |
| Single record failure | Skip record, continue extraction | One cookie decryption failed |
### 6.3 Error wrapping convention
Use `fmt.Errorf` with `%w` for error context. No custom error types needed.
```go
// Good: wraps with context
raw, err := base64.StdEncoding.DecodeString(encoded)
if err != nil { return nil, fmt.Errorf("base64 decode: %w", err) }
// Bad: swallows error
raw, _ := base64.StdEncoding.DecodeString(encoded)
```
The `%w` verb preserves the error chain for `errors.Is()` and `errors.As()` if needed later.
### 6.4 Caller pattern
```go
data, err := b.Extract(categories)
if err != nil {
log.Warnf("%s: %v", b.Name(), err) // partial failure
}
if data == nil {
continue // total failure
}
data.Output(dir, b.Name(), format) // output whatever succeeded
```
---
## 7. Implementation Order
| Phase | Scope | Risk |
|-------|-------|------|
| 1 | `types/category.go` + `types/models.go` + `browserdata/browserdata.go` | Zero — new files only |
| 2 | `browserdata/datautil/sqlite.go` + `decrypt.go` | Zero — new files only |
| 3 | `crypto/version.go`, rename `AESCBCDecrypt` | Low — internal crypto changes |
| 4 | `crypto/keyretriever/` | Low — new package |
| 5 | `browser/chromium/source.go` + `extract_*.go` | Medium — new extract methods |
| 6 | `browser/firefox/source.go` + `extract_*.go` | Medium — new extract methods |
| 7 | `filemanager/session.go` | Low — new package |
| 8 | Wire `Extract()` + `Config` + `PickBrowsers()` | High — connects everything |
| 9 | Delete old code: `extractor/`, `browserdata/*/`, `imports.go` | High — removal |
| 10 | Update CLI, tests, cross-platform build verification | Medium |
---
## 8. Relationship with RFC-002
| Area | RFC-001 (this doc) | RFC-002 |
|------|-------------------|---------|
| Data model (Category + *Entry) | defines | uses |
| BrowserData container | defines | implements Output |
| Cipher version | covered | — |
| Master key retrieval | covered | — |
| Browser registration | covered | — |
| Yandex variant | covered | — |
| Error handling pattern | covered | — |
| Extract() orchestration | covered | — |
| File source mapping | — | covered |
| File acquisition (Session) | — | covered |
| Extract method details | — | covered |
| datautil helpers | — | covered |
| Output implementation | — | covered |
---
## 9. Open Questions
1. **App-Bound Encryption (Chrome 127+ v20)**: `crypto/version.go` has the extension point. Implementation deferred until tested.
2. **Firefox version detection**: is the key-length heuristic in `processMasterKey()` sufficient, or formalize it?
3. **Sort direction**: standardize all categories to DESC by date? (Firefox history/download currently ASC)
---
## References
- [Chromium OS Crypt](https://source.chromium.org/chromium/chromium/src/+/main:components/os_crypt/)
- [Chrome Password Decryption](https://github.com/chromium/chromium/blob/main/components/os_crypt/sync/os_crypt_win.cc)
- [Firefox NSS](https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS)
@@ -0,0 +1,843 @@
# RFC-002: Data Extraction & File Acquisition
**Author**: moonD4rk
**Status**: Proposed
**Created**: 2026-03-14
**Updated**: 2026-03-22
## Abstract
This RFC covers the implementation details of data extraction and file acquisition:
1. **File source mapping**: how each browser engine maps categories to files
2. **File acquisition**: Session-based temp file management with deduplication
3. **Extract methods**: concrete implementations for each data category
4. **Shared helpers**: `QuerySQLite()` and `DecryptChromiumValue()`
5. **Output**: writing `Extract` results to CSV/JSON files
**Constraint**: Go 1.20 (Windows 7 support).
See RFC-001 for data model (`Category` + `*Entry` types), crypto layer, browser registration, and Yandex variant design.
---
## 1. Data Flow
```
CLI: main.go
browser.PickBrowsers("all", "")
│ platformBrowsers() → []Config
│ → chromium.New(cfg, dir) / firefox.New(dir)
Browser.Extract(categories)
├─ filemanager.NewSession()
│ └─ acquireFiles() with dedup → map[Category]tempPath
├─ masterKey
│ Chromium: keyretriever.RetrieveKey(storage)
│ Firefox: deriveMasterKey(key4dbPath)
└─ per-category extract methods
├─ c.extractPasswords(masterKey, path) → []LoginEntry
├─ c.extractCookies(masterKey, path) → []CookieEntry
├─ c.extractHistories(path) → []HistoryEntry
├─ c.extractDownloads(path) → []DownloadEntry
├─ c.extractBookmarks(path) → []BookmarkEntry
├─ c.extractCreditCards(masterKey, path) → []CreditCardEntry
├─ c.extractExtensions(path) → []ExtensionEntry
├─ c.extractLocalStorage(path) → []StorageEntry (LevelDB)
└─ c.extractSessionStorage(path) → []StorageEntry (LevelDB)
browserdata.BrowserData{Passwords: [...], Cookies: [...], ...}
BrowserData.Output(dir, name, format)
chrome_default_password.csv
chrome_default_cookie.json
...
```
---
## 2. File Source Mapping
### 2.1 Category → source (one flat map per engine)
```go
// browser/chromium/source.go
type source struct {
paths []string // candidates in priority order
isDir bool
}
var chromiumSources = map[types.Category]source{
types.Password: {paths: []string{"Login Data"}},
types.Cookie: {paths: []string{"Network/Cookies", "Cookies"}},
types.History: {paths: []string{"History"}},
types.Download: {paths: []string{"History"}}, // same file, different query
types.Bookmark: {paths: []string{"Bookmarks"}},
types.CreditCard: {paths: []string{"Web Data"}},
types.Extension: {paths: []string{"Secure Preferences"}},
types.LocalStorage: {paths: []string{"Local Storage/leveldb"}, isDir: true},
types.SessionStorage: {paths: []string{"Session Storage"}, isDir: true},
}
```
```go
// browser/firefox/source.go
var firefoxSources = map[types.Category]source{
types.Password: {paths: []string{"logins.json"}},
types.Cookie: {paths: []string{"cookies.sqlite"}},
types.History: {paths: []string{"places.sqlite"}},
types.Download: {paths: []string{"places.sqlite"}}, // same file
types.Bookmark: {paths: []string{"places.sqlite"}}, // same file
types.Extension: {paths: []string{"extensions.json"}},
types.LocalStorage: {paths: []string{"webappsstore.sqlite"}},
}
```
Yandex source map defined in RFC-001 Section 5.
### 2.2 File acquisition with deduplication
When multiple categories map to the same file (e.g. History + Download), the file is copied once:
```go
func (c *Chromium) acquireFiles(session *filemanager.Session, categories []types.Category) map[types.Category]string {
result := make(map[types.Category]string)
copied := make(map[string]string) // abs src → temp dst
for _, cat := range categories {
src, ok := c.sources[cat] // uses c.sources (chromiumSources or yandexSources)
if !ok { continue }
for _, rel := range src.paths {
abs := filepath.Join(c.profileDir, rel)
if dst, ok := copied[abs]; ok {
result[cat] = dst // reuse already-copied file
break
}
dst := filepath.Join(session.TempDir(), filepath.Base(rel))
if err := session.Acquire(abs, dst, src.isDir); err == nil {
copied[abs] = dst
result[cat] = dst
break
}
}
}
return result
}
```
### 2.3 Firefox key4.db: infrastructure, not a Category
Each Firefox profile has its own `key4.db`. The master key is derived once in `New()` and stored on the struct, so `Extract()` never re-derives it:
```go
// firefox.New() — called once per profile
func New(profileDir string) (*Firefox, error) {
// derive master key from this profile's key4.db
keyPath := filepath.Join(profileDir, "key4.db")
masterKey, err := deriveMasterKey(keyPath)
if err != nil { return nil, err }
return &Firefox{
profileDir: profileDir,
masterKey: masterKey,
sources: firefoxSources,
}, nil
}
func (f *Firefox) Extract(categories []types.Category) (*browserdata.BrowserData, error) {
session, _ := filemanager.NewSession()
defer session.Cleanup()
files := f.acquireFiles(session, categories)
// masterKey was derived in New() from this profile's key4.db
data := &browserdata.BrowserData{}
// ... extract each category using f.masterKey ...
}
```
### 2.4 Profile Discovery
Profile discovery functions are pure helpers (no struct receiver) that scan the filesystem:
```go
// profile/finder.go
// discoverProfiles returns sub-directory names that look like Chrome profiles.
// Matches "Default" or any name starting with "Profile ".
// Falls back to ["."] for Opera-style layouts (data files live directly in userDataDir).
func discoverProfiles(userDataDir string) []string {
entries, err := os.ReadDir(userDataDir)
if err != nil { return []string{"."} }
var profiles []string
for _, e := range entries {
if !e.IsDir() { continue }
name := e.Name()
if name == "Default" || strings.HasPrefix(name, "Profile ") {
profiles = append(profiles, name)
}
}
if len(profiles) == 0 {
return []string{"."}
}
return profiles
}
// discoverDataFiles checks which categories have actual data files in profileDir.
func discoverDataFiles(profileDir string, sources map[types.Category]source) map[types.Category]string {
found := make(map[types.Category]string)
for cat, src := range sources {
for _, rel := range src.paths {
abs := filepath.Join(profileDir, rel)
info, err := os.Stat(abs)
if err != nil { continue }
if src.isDir && !info.IsDir() { continue }
if !src.isDir && info.IsDir() { continue }
found[cat] = abs
break
}
}
return found
}
// isValidBrowserDir checks whether the directory belongs to a real browser install.
// Chromium: requires "Local State" file. Firefox: requires directory existence.
func isValidBrowserDir(dir string, kind BrowserKind) bool {
switch kind {
case KindChromium, KindChromiumYandex:
_, err := os.Stat(filepath.Join(dir, "Local State"))
return err == nil
case KindFirefox:
info, err := os.Stat(dir)
return err == nil && info.IsDir()
}
return false
}
```
**Testing approach**: all three functions are pure filesystem operations, easily testable with `t.TempDir()`:
```go
func TestDiscoverProfiles(t *testing.T) {
dir := t.TempDir()
os.MkdirAll(filepath.Join(dir, "Default"), 0o755)
os.MkdirAll(filepath.Join(dir, "Profile 1"), 0o755)
os.MkdirAll(filepath.Join(dir, "System Profile"), 0o755)
profiles := discoverProfiles(dir)
assert.Equal(t, []string{"Default", "Profile 1"}, profiles)
}
func TestDiscoverDataFiles(t *testing.T) {
dir := t.TempDir()
os.WriteFile(filepath.Join(dir, "Login Data"), []byte{}, 0o644)
os.MkdirAll(filepath.Join(dir, "Network"), 0o755)
os.WriteFile(filepath.Join(dir, "Network", "Cookies"), []byte{}, 0o644)
files := discoverDataFiles(dir, chromiumSources)
assert.Contains(t, files, types.Password)
assert.Contains(t, files, types.Cookie)
}
func TestAcquireFiles_Dedup(t *testing.T) {
dir := t.TempDir()
os.WriteFile(filepath.Join(dir, "History"), []byte("data"), 0o644)
session, _ := filemanager.NewSession()
defer session.Cleanup()
c := &Chromium{profileDir: dir, sources: chromiumSources}
files := c.acquireFiles(session, []types.Category{types.History, types.Download})
assert.Equal(t, files[types.History], files[types.Download])
}
```
### 2.5 Platform Config Example
Each platform file returns the full list of known browsers with their `UserDataDir` paths:
```go
// browser/browser_windows.go
func platformBrowsers() []Config {
return []Config{
{Key: "chrome", Name: "Chrome", Kind: KindChromium, UserDataDir: homeDir + "/AppData/Local/Google/Chrome/User Data"},
{Key: "edge", Name: "Microsoft Edge", Kind: KindChromium, UserDataDir: homeDir + "/AppData/Local/Microsoft/Edge/User Data"},
{Key: "opera", Name: "Opera", Kind: KindChromium, UserDataDir: homeDir + "/AppData/Roaming/Opera Software/Opera Stable"},
{Key: "yandex", Name: "Yandex", Kind: KindChromiumYandex, UserDataDir: homeDir + "/AppData/Local/Yandex/YandexBrowser/User Data"},
{Key: "firefox", Name: "Firefox", Kind: KindFirefox, UserDataDir: homeDir + "/AppData/Roaming/Mozilla/Firefox/Profiles"},
}
}
```
`PickBrowsers()` iterates this list, calls `isValidBrowserDir()` to skip browsers that aren't installed, then calls `discoverProfiles()` to find all profiles within valid browser directories.
---
## 3. Shared Helpers: `browserdata/datautil/`
### 3.1 SQLite query helper
```go
// browserdata/datautil/sqlite.go
func QuerySQLite(dbPath string, journalOff bool, query string, scanFn func(*sql.Rows) error) error {
db, err := sql.Open("sqlite", dbPath)
if err != nil { return err }
defer db.Close()
if journalOff {
if _, err := db.Exec("PRAGMA journal_mode=off"); err != nil { return err }
}
rows, err := db.Query(query)
if err != nil { return err }
defer rows.Close()
for rows.Next() {
if err := scanFn(rows); err != nil {
log.Debugf("scan row error: %v", err)
continue // skip bad row, continue extraction
}
}
return rows.Err()
}
```
### 3.2 Generic query helper — `datautil/query.go`
```go
package datautil
// queryRows is a generic helper (Go 1.20) that wraps QuerySQLite
// and collects results into a typed slice. Each extract method
// only needs to provide the scan function.
func QueryRows[T any](path string, journalOff bool, query string, scanRow func(*sql.Rows) (T, error)) ([]T, error) {
var items []T
err := QuerySQLite(path, journalOff, query, func(rows *sql.Rows) error {
item, err := scanRow(rows)
if err != nil { return nil } // skip bad row
items = append(items, item)
return nil
})
return items, err
}
```
### 3.3 Chromium decrypt helper
```go
// browserdata/datautil/decrypt.go
func DecryptChromiumValue(masterKey, encrypted []byte) ([]byte, error) {
if len(encrypted) == 0 { return nil, nil }
if len(masterKey) == 0 {
return crypto.DecryptWithDPAPI(encrypted)
}
value, err := crypto.DecryptWithDPAPI(encrypted)
if err != nil {
value, err = crypto.DecryptWithChromium(masterKey, encrypted)
}
return value, err
}
```
---
## 4. Extract Method Examples
Each extract method lives in its own `extract_*.go` file inside the browser engine package (see RFC-001 for naming convention). The default SQL query is a `const` in the same file. Override is checked via `c.queryOverrides`.
### 4.1 Chromium password (SQLite + decryption)
```go
// browser/chromium/extract_password.go
const defaultLoginQuery = `SELECT origin_url, username_value, password_value, date_created FROM logins`
func (c *Chromium) extractPasswords(masterKey []byte, path string) ([]types.LoginEntry, error) {
logins, err := datautil.QueryRows(path, false, c.query(types.Password),
func(rows *sql.Rows) (types.LoginEntry, error) {
var url, username string
var pwd []byte
var created int64
if err := rows.Scan(&url, &username, &pwd, &created); err != nil {
return types.LoginEntry{}, err
}
password, _ := datautil.DecryptChromiumValue(masterKey, pwd)
return types.LoginEntry{
URL: url,
Username: username,
Password: string(password),
CreatedAt: typeutil.TimeEpoch(created),
}, nil
})
if err != nil { return nil, err }
sort.Slice(logins, func(i, j int) bool {
return logins[i].CreatedAt.After(logins[j].CreatedAt)
})
return logins, nil
}
```
### 4.2 Chromium cookie (SQLite + decryption)
```go
// browser/chromium/extract_cookie.go
const defaultCookieQuery = `SELECT name, encrypted_value, host_key, path,
creation_utc, expires_utc, is_secure, is_httponly,
has_expires, is_persistent FROM cookies`
func (c *Chromium) extractCookies(masterKey []byte, path string) ([]types.CookieEntry, error) {
cookies, err := datautil.QueryRows(path, false, c.query(types.Cookie),
func(rows *sql.Rows) (types.CookieEntry, error) {
var (
name, host, path string
isSecure, isHTTPOnly, hasExpire, isPersistent int
createdAt, expireAt int64
encryptedValue []byte
)
if err := rows.Scan(&name, &encryptedValue, &host, &path,
&createdAt, &expireAt, &isSecure, &isHTTPOnly,
&hasExpire, &isPersistent); err != nil {
return types.CookieEntry{}, err
}
value, _ := datautil.DecryptChromiumValue(masterKey, encryptedValue)
return types.CookieEntry{
Name: name,
Host: host,
Path: path,
Value: string(value),
IsSecure: isSecure != 0,
IsHTTPOnly: isHTTPOnly != 0,
ExpireAt: typeutil.TimeEpoch(expireAt),
CreatedAt: typeutil.TimeEpoch(createdAt),
}, nil
})
if err != nil { return nil, err }
sort.Slice(cookies, func(i, j int) bool {
return cookies[i].CreatedAt.After(cookies[j].CreatedAt)
})
return cookies, nil
}
```
### 4.3 Firefox password (JSON + `decryptPBE()` helper)
Firefox uses `decryptPBE()` to combine the 3-step pipeline (base64 decode -> ASN1 PBE parse -> decrypt) into one call, reducing 6 error checks to 2.
```go
// browser/firefox/extract_password.go
// decryptPBE combines base64 decode + ASN1 PBE parse + decrypt.
func decryptPBE(encoded string, masterKey []byte) ([]byte, error) {
raw, err := base64.StdEncoding.DecodeString(encoded)
if err != nil { return nil, fmt.Errorf("base64 decode: %w", err) }
pbe, err := crypto.NewASN1PBE(raw)
if err != nil { return nil, fmt.Errorf("parse asn1 pbe: %w", err) }
plaintext, err := pbe.Decrypt(masterKey)
if err != nil { return nil, fmt.Errorf("decrypt: %w", err) }
return plaintext, nil
}
func (f *Firefox) extractPasswords(masterKey []byte, path string) ([]types.LoginEntry, error) {
data, err := os.ReadFile(path)
if err != nil { return nil, err }
var logins []types.LoginEntry
for _, v := range gjson.GetBytes(data, "logins").Array() {
user, err := decryptPBE(v.Get("encryptedUsername").String(), masterKey)
if err != nil {
log.Debugf("decrypt username: %v", err)
continue
}
pwd, err := decryptPBE(v.Get("encryptedPassword").String(), masterKey)
if err != nil {
log.Debugf("decrypt password: %v", err)
continue
}
url := v.Get("formSubmitURL").String()
if url == "" { url = v.Get("hostname").String() }
logins = append(logins, types.LoginEntry{
URL: url,
Username: string(user),
Password: string(pwd),
CreatedAt: typeutil.TimeStamp(v.Get("timeCreated").Int() / 1000),
})
}
sort.Slice(logins, func(i, j int) bool {
return logins[i].CreatedAt.After(logins[j].CreatedAt)
})
return logins, nil
}
```
### 4.4 Firefox cookie (SQLite, no encryption)
```go
// browser/firefox/extract_cookie.go
const firefoxCookieQuery = `SELECT name, value, host, path,
creationTime, expiry, isSecure, isHttpOnly FROM moz_cookies`
func (f *Firefox) extractCookies(path string) ([]types.CookieEntry, error) {
cookies, err := datautil.QueryRows(path, true, firefoxCookieQuery,
func(rows *sql.Rows) (types.CookieEntry, error) {
var (
name, value, host, path string
isSecure, isHTTPOnly int
createdAt, expiry int64
)
if err := rows.Scan(&name, &value, &host, &path,
&createdAt, &expiry, &isSecure, &isHTTPOnly); err != nil {
return types.CookieEntry{}, err
}
return types.CookieEntry{
Name: name,
Host: host,
Path: path,
Value: value, // not encrypted
IsSecure: isSecure != 0,
IsHTTPOnly: isHTTPOnly != 0,
ExpireAt: typeutil.TimeStamp(expiry),
CreatedAt: typeutil.TimeStamp(createdAt / 1000000),
}, nil
})
if err != nil { return nil, err }
sort.Slice(cookies, func(i, j int) bool {
return cookies[i].CreatedAt.After(cookies[j].CreatedAt)
})
return cookies, nil
}
```
### 4.5 Chromium local storage (LevelDB)
```go
// browser/chromium/extract_storage.go
func (c *Chromium) extractLocalStorage(path string) ([]types.StorageEntry, error) {
db, err := leveldb.OpenFile(path, nil)
if err != nil { return nil, err }
defer db.Close()
var entries []types.StorageEntry
iter := db.NewIterator(nil, nil)
defer iter.Release()
for iter.Next() {
url, name := parseStorageKey(iter.Key(), []byte{0}) // \x00 separator
if url == "" { continue }
entries = append(entries, types.StorageEntry{
URL: url,
Key: name,
Value: string(iter.Value()),
})
}
return entries, iter.Error()
}
func (c *Chromium) extractSessionStorage(path string) ([]types.StorageEntry, error) {
db, err := leveldb.OpenFile(path, nil)
if err != nil { return nil, err }
defer db.Close()
var entries []types.StorageEntry
iter := db.NewIterator(nil, nil)
defer iter.Release()
for iter.Next() {
url, name := parseStorageKey(iter.Key(), []byte("-")) // "-" separator
if url == "" { continue }
entries = append(entries, types.StorageEntry{
URL: url,
Key: name,
Value: string(iter.Value()),
})
}
return entries, iter.Error()
}
func parseStorageKey(key []byte, separator []byte) (url, name string) {
parts := bytes.SplitN(key, separator, 2)
if len(parts) != 2 { return "", "" }
return string(parts[0]), string(parts[1])
}
```
### 4.6 Key differences between engines
| Aspect | Chromium | Firefox |
|--------|----------|---------|
| Password source | SQLite (`Login Data`) | JSON (`logins.json`) |
| Password decryption | DPAPI → AES-GCM/CBC | ASN1PBE |
| Cookie encryption | Yes (masterKey needed) | No (plaintext) |
| Cookie journal_mode | Not needed | `PRAGMA journal_mode=off` |
| Time format | WebKit epoch (`TimeEpoch`) | Unix microseconds (`TimeStamp / 1e6`) |
| Storage format | LevelDB directory | SQLite (`webappsstore.sqlite`) |
| key4.db | Not used | Required for master key derivation |
| masterKey parameter | Passed to password, cookie, creditcard | Passed to password only |
### 4.7 Error handling in extract methods
Three-level rule:
| Level | Action | Example |
|-------|--------|---------|
| File/DB open failure | `return nil, err` | `os.ReadFile` fails, `sql.Open` fails |
| Single record failure | `log.Debugf` + `continue` | One password decryption failed |
| Entire Category failure | Collected into `errs` by caller | Cookie file locked |
Extract methods only `return error` for file-level failures. Record-level failures are logged at Debug level and skipped. The caller (`Extract()`) collects per-category errors with `errors.Join`.
Error wrapping uses `fmt.Errorf("context: %w", err)` — no custom error types.
---
## 5. File Acquisition Layer
### 5.1 Session manager
```go
// filemanager/session.go
type Session struct {
tempDir string
}
func NewSession() (*Session, error) {
dir, err := os.MkdirTemp("", "hbd-*")
if err != nil { return nil, err }
return &Session{tempDir: dir}, nil
}
func (s *Session) TempDir() string { return s.tempDir }
func (s *Session) Acquire(src, dst string, isDir bool) error {
if isDir {
return fileutil.CopyDir(src, dst, "lock")
}
// Try normal copy first
err := fileutil.CopyFile(src, dst)
if err != nil {
// Normal copy failed (file may be locked), try platform-specific method
if err2 := copyLocked(src, dst); err2 != nil {
return fmt.Errorf("copy %s: %w; locked copy: %v", src, err, err2)
}
}
// Copy SQLite WAL/SHM companion files if present
for _, suffix := range []string{"-wal", "-shm"} {
if fileutil.IsFileExists(src + suffix) {
_ = fileutil.CopyFile(src+suffix, dst+suffix)
}
}
return nil
}
func (s *Session) Cleanup() {
os.RemoveAll(s.tempDir)
}
```
### 5.2 Locked file handling (Windows)
On Windows, Chrome locks Cookie files while running. `Session.Acquire()` falls back to `copyLocked()` which uses `syscall.CreateFile` with `FILE_SHARE_READ|FILE_SHARE_WRITE|FILE_SHARE_DELETE` flags to bypass exclusive locks.
Platform-specific files:
- `filemanager/copy_windows.go``copyLocked()` with sharing flags
- `filemanager/copy_other.go` — stub returning error
This is transparent to callers — browser extract methods never know whether a file was copied normally or via the locked-file path.
### 5.3 Acquirer interface (deferred)
If only `CopyAcquirer` is needed, `Session.Acquire()` handles it directly. The `Acquirer` interface can be introduced later when VSS or other strategies are needed.
---
## 6. Output
```go
// browserdata/output.go
func (d *BrowserData) Output(dir, browserName, format string) error {
items := []struct {
name string
data interface{}
len int
}{
{"password", d.Passwords, len(d.Passwords)},
{"cookie", d.Cookies, len(d.Cookies)},
{"bookmark", d.Bookmarks, len(d.Bookmarks)},
{"history", d.Histories, len(d.Histories)},
{"download", d.Downloads, len(d.Downloads)},
{"creditcard", d.CreditCards, len(d.CreditCards)},
{"extension", d.Extensions, len(d.Extensions)},
{"localstorage", d.LocalStorage, len(d.LocalStorage)},
{"sessionstorage", d.SessionStorage, len(d.SessionStorage)},
}
var errs []error
for _, item := range items {
if item.len == 0 { continue }
filename := formatFilename(browserName, item.name, format)
if err := writeFile(dir, filename, format, item.data); err != nil {
errs = append(errs, fmt.Errorf("write %s: %w", filename, err))
continue
}
log.Infof("exported: %s (%d items)", filename, item.len)
}
return errors.Join(errs...)
}
func writeFile(dir, filename, format string, data interface{}) error {
if dir != "" {
if err := os.MkdirAll(dir, 0o750); err != nil { return err }
}
path := filepath.Join(dir, filename)
f, err := os.OpenFile(path, os.O_CREATE|os.O_TRUNC|os.O_WRONLY, 0o600)
if err != nil { return err }
defer f.Close()
switch format {
case "json":
return writeJSON(f, data)
default:
return writeCSV(f, data)
}
}
func writeJSON(w io.Writer, data interface{}) error {
enc := json.NewEncoder(w)
enc.SetIndent("", " ")
enc.SetEscapeHTML(false)
return enc.Encode(data)
}
func writeCSV(w io.Writer, data interface{}) error {
// UTF-8 BOM (3 bytes) — replaces golang.org/x/text dependency
w.Write([]byte{0xEF, 0xBB, 0xBF})
csvWriter := csv.NewWriter(w)
return gocsv.MarshalCSV(data, gocsv.NewSafeCSVWriter(csvWriter))
}
func formatFilename(browserName, dataName, format string) string {
r := strings.NewReplacer(" ", "_", ".", "_", "-", "_")
ext := "csv"
if format == "json" { ext = "json" }
return strings.ToLower(fmt.Sprintf("%s_%s.%s", r.Replace(browserName), dataName, ext))
}
```
---
## 7. What Was Eliminated
| Before | After | Why |
|--------|-------|-----|
| `extractor/` package (interface + registry + factory) | Deleted | Browser engines have typed extract methods |
| `browserdata/password/`, `cookie/`, etc. (9 sub-packages) | Deleted | Extract logic moved into `browser/chromium/` and `browser/firefox/` |
| `browserdata/imports.go` | Deleted | No init() registration needed |
| `types.DataType` (22 iota constants) | `types.Category` (9 constants) | No browser prefix, no key types |
| `itemFileNames` map | `chromiumSources` / `firefoxSources` per engine | File layout is engine-internal |
| `TempFilename()` on DataType | `Session.TempDir()` + `filepath.Base()` | Session manages temp paths |
| `DefaultChromiumTypes`, `DefaultFirefoxTypes`, `DefaultYandexTypes` | `types.AllCategories` | One list for all engines |
| `loginData.encryptPass`, `cookie.encryptValue` | Local variables in extract methods | Encrypted fields don't belong in data models |
| 20 trivial `Name()` / `Len()` methods | Not needed | No Extractor interface |
---
## 8. Implementation Plan
### Phase 1: Foundation (new files only, zero risk)
1. `types/category.go` — Category enum
2. `types/models.go` — all *Entry structs
3. `browserdata/browserdata.go` — BrowserData struct
4. `browserdata/datautil/sqlite.go` — QuerySQLite()
5. `browserdata/datautil/decrypt.go` — DecryptChromiumValue()
6. `filemanager/session.go` — Session
### Phase 2: Extract methods (new files, coexist with old code)
1. `browser/chromium/source.go` — chromiumSources, yandexSources
2. `browser/chromium/extract_*.go` — all 9 extract methods
3. `browser/firefox/source.go` — firefoxSources
4. `browser/firefox/extract_*.go` — all extract methods
### Phase 3: Wiring (modify existing files)
1. Update `Chromium.Extract()` to use new extract methods
2. Update `Firefox.Extract()` to use new extract methods
3. Update `Config` and `PickBrowsers()`
4. Update `browserdata/output.go`
5. Update CLI `main.go`
### Phase 4: Cleanup (delete old code)
1. Delete `extractor/` package
2. Delete `browserdata/imports.go`
3. Delete `browserdata/password/`, `cookie/`, etc.
4. Delete old `types.DataType`, `itemFileNames`
5. Delete `browser/consts.go`
### Phase 5: Verification
```bash
go test ./...
go vet ./...
gofmt -d .
GOOS=windows GOARCH=amd64 go build ./cmd/hack-browser-data/
GOOS=linux GOARCH=amd64 go build ./cmd/hack-browser-data/
GOOS=darwin GOARCH=amd64 go build ./cmd/hack-browser-data/
```
---
## 9. Open Questions
1. **Sort direction**: standardize all categories to DESC by date?
2. **Output format**: keep `gocsv` or switch to `encoding/csv`?
3. **LevelDB key parsing**: the current `fillKey`/`fillHeader`/`fillValue` logic in localstorage is complex — how much of that detail carries over?
---
## 10. Relationship with RFC-001
| Area | RFC-001 | RFC-002 (this doc) |
|------|---------|-------------------|
| Data model (Category + *Entry) | defines | uses |
| BrowserData container | defines | implements Output |
| Cipher version | covered | — |
| Master key retrieval | covered | — |
| Browser registration | covered | — |
| Yandex variant | covered | — |
| Error handling pattern | covered | — |
| File source mapping | — | covered |
| File acquisition | — | covered |
| Extract methods | — | covered |
| datautil helpers | — | covered |
| Output | — | covered |