When a GitHub repo is renamed and looked up by its new name, the lookup
by name misses but a record with the same externalId still exists,
causing E11000 on save. Fall back to a lookup by externalId before
creating a new document.
Fixes#500
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Split into builder (node:21-slim, full deps + tsc/gulp) and runtime
(node:21-alpine, production deps only). Drops ~hundreds of MB from
the published image and removes dev tooling from the runtime layer.
Builds and pushes tdurieux/anonymous_github to Docker Hub on push to
main, v* tags, and manual dispatch. Multi-arch (amd64/arm64) with GHA
layer caching. Closes#478.
Files tracked by Git LFS used to come out as the pointer text:
version https://git-lfs.github.com/spec/v1
oid sha256:...
size ...
…because GitHub's blob API returns the pointer, not the resolved
content. Detect that prefix on the first ~150 bytes of the blob stream
and switch to a fresh fetch via the web raw URL
(github.com/<owner>/<repo>/raw/<commit>/<path>), which auto-redirects
to media.githubusercontent.com and resolves the LFS object — auth
header carries through. Non-LFS files are forwarded through the
existing pipeline unchanged.
Fixes#95.
The Anonymize form's preview built the readme baseUrl as
"https://github.com/<owner>/<repo>/raw/<source.branch>/". When the
form rendered before the branch field had populated (initial load,
or while waiting on getBranches), the URL became ".../raw//" and
the browser collapsed the empty segment, fetching ".../raw/<file>"
instead of ".../raw/<branch>/<file>". Relative <img src="./X">
references then 404'd against a path with no branch — exactly the
"branch missing" pattern in #407.
Fall back to details.defaultBranch (then "main") so the base URL is
always well-formed.
Fixes#407.
The Anonymize form used the cached RepositoryModel for hasPage,
defaultBranch, etc. — so enabling GitHub Pages (or changing the
default branch) on the source after first cache wouldn't reflect in
the UI, leaving the GitHub Pages checkbox grayed out.
Pass force=1 when loading the form's repo details so the backend
re-queries the GitHub API once. The cost is a single GET /repos/...
call per form load.
Fixes#364.
When the anonymizer doesn't change a slice's text, the streamer used
to push Buffer.from(out, "utf8") — which loses any invalid-UTF-8 bytes
in the input (replaced by U+FFFD via StringDecoder). Files
mistakenly classified as text (binary blobs without a known extension,
text with stray non-UTF-8 bytes, BOMs) came out corrupted even though
nothing in the term list matched.
Track the raw chunk bytes alongside the decoded `pending`. On flush —
where we have every byte buffered — emit the original buffer directly
when the output equals the input, so a pure passthrough is bit-exact.
In the streaming OVERLAP path, do the same when the decode for that
slice round-trips losslessly; fall back to encoded output otherwise
(unchanged from before for that case).
Also add the "missing_content" locale entry for the
/api/anonymize-preview route.
istextorbinary returns null for filenames with no extension, and the
isTextFile() guard treated null as "not text" — so terms in LICENSE,
COPYING, AUTHORS, README (extensionless), CHANGELOG, NOTICE, and
similar conventional filenames went through the binary passthrough
in AnonymizeTransformer and were never anonymized.
Add a small whitelist of these names ahead of the istextorbinary call.
Fixes#493.
The .leftCol-foot was hidden by default and only revealed inside the
mobile/tablet media query, so the last-update line never appeared on
wider viewports.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two regressions stacked from the recent tree work:
1. expandAllFolders (#496) was marking every folder open, including
folders whose children weren't fetched yet. The directive then
rendered an empty <ul> after each <a>, and the openFolder handler's
"no sibling means we need to load" check silently treated the empty
<ul> as already-loaded — so clicking the folder toggled the class
but the children never appeared.
Skip folders with empty children when pre-expanding, and harden the
click handler so an empty <ul> still triggers a fetch.
2. The $routeUpdate handler (#510 follow-up) became async and called
$scope.$apply(updateContent) at the end. Inside an already-running
digest cycle this no-ops or throws, leaving file navigation stuck.
Run updateContent() synchronously like before, and kick off any
missing parent-directory fetches in the background — getContent()
already falls back to sha "0" when the metadata isn't loaded yet.
When a user renamed the original GitHub repository, anonymous_github
kept calling oct.repos.get({owner, repo}) with the cached old name and
got 404, marking the link broken even though the repository still
existed at a new path.
Recover the numeric GitHub id from the dbModel's externalId
("gh_<id>"). On a 404 from the name-based lookup, fall back to
GET /repositories/<id>, which returns the renamed repo. The caller
already updates source.repositoryName from r.full_name afterwards.
Fixes#409.
Clicking a markdown link into a subdirectory's README threw
"Cannot read properties of undefined (reading 'sha')" and left the
viewer on Loading…. The route handler called updateContent() without
loading the new directory's file listing, so getSelectedFile() returned
undefined and getContent() then dereferenced fileInfo.sha.
Two fixes:
- getContent() falls back to sha "0" when fileInfo is undefined.
- The $routeUpdate handler walks the new path and loads any directory
listings that aren't yet in $scope.files before rendering, so the
selected file actually has its sha by the time we fetch.
Fixes#510.
The form's live README/PR preview was running its own copy of
ContentAnonimizer in the browser. The two implementations had been
drifting — recent fixes for word boundaries (#175/#249), accent
matching (#280), custom replacements (#285), and the diacritic-stripped
variants only landed on the server. Reviewers saw one anonymization;
authors composing the form saw another.
Add POST /api/anonymize-preview that takes a snippet (or a batch) plus
the user's options and runs them through the same ContentAnonimizer
the file route uses. Replace the client-side anonymizeReadme() body
with a debounced call to that endpoint. The PR view's
anonymizePrContent() runs as a synchronous template expression, so it
now reads from a {original -> anonymized} cache that's refreshed in
the background whenever the PR details, terms, or options change.
Single-flight + debounce keep the form responsive; an in-flight
request is dropped on the next change.
Entering an IP address (e.g. 192.168.1.1) or any term with regex
metacharacters made the form invalid because the "regex characters
detected" hint was wired up via $setValidity('terms', 'regex', false).
The text in the UI labels it as a warning, but the form treated it as
an error and refused to save.
Track the warning as a plain $scope flag and show it via ng-show on
that flag, so the form stays valid (#430).
A term entered as "Anonymous=>ABC" now scrubs "Anonymous" to "ABC"
instead of "XXXX-N". Lets users keep anonymized identifiers valid in
source code (no hyphen) and align tokens between paper text and repo.
Indexing for default-mask terms is unchanged: a list of
"Alpha=>AAA", "Beta" still produces XXXX-2 for Beta.
Fixes#285.
The file tree opened collapsed, requiring the reviewer to click each
folder before they could see what was inside. Walk the tree on first
render and mark every folder open in $scope.opens. Folders the user
has explicitly toggled (a previous entry already exists in
$scope.opens) are left as-is, so collapsing still works.
Fixes#496.
The server set Accept-Ranges: none on every file response. For text we
anonymize on the fly so byte ranges aren't meaningful, but binary
entries pass through unchanged — and the explicit "none" header makes
some browsers refuse to play <video>/<audio> elements that would
otherwise fall back to a full download. Newly uploaded MP4s under the
inline-preview threshold rendered as a blank progress bar (#538).
Only set Accept-Ranges: none for text entries; let binary entries omit
it so the standard fallback kicks in.
Fixes#538.
Without the path, two different files in the same repo (same sha, same
anonymization options) shared an ETag. If a browser ever sent the cached
ETag for one file while requesting another, the server would have
returned 304 against the wrong cache entry. Fold the path into the
ETag so each file has its own fingerprint.
Follow-up to b3c1030 (#439).
Replace the outdated user-images.githubusercontent.com screenshot in
the README with a locally hosted image, and regenerate the in-app
screenshots to reflect the current UI.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Files were being served with Cache-Control: max-age=18144000 (210 days)
keyed only on the upstream ?v=<sha>. Editing the term list left the
same URL serving stale anonymized bytes — visible to users in regular
tabs but not in incognito. The previous fix-by-incognito recipe in #439
is exactly this.
Switch to ETag-based revalidation that fingerprints both the upstream
sha and the saved anonymization options, with Cache-Control:
no-cache, must-revalidate. Browsers now revalidate on every request and
get a 304 when nothing has changed, or fresh content as soon as terms,
image/link/etc. options are updated.
Fixes#439.
The viewer already supported jumping to a line via #L42 in the URL but
never produced one — users had to type it manually. Wire guttermousedown
on the ACE editor to replaceState a #L<n> hash, with shift-click for a
range. Also reapply the highlight on hashchange so pasting a URL into
the address bar works without reload.
Fixes#392.
When a user added "Davó" to the term list, "Davo" elsewhere in the
content was left untouched (and vice versa). Each term now also runs a
diacritic-insensitive pass: ASCII Latin letters expand to a class
covering common accented siblings, with Unicode-aware lookaround
boundaries so the trailing boundary still fires next to "ó" etc.
Pure helpers moved into src/core/term-matching so the test file can
import them instead of duplicating the logic.
Fixes#280.
urlRel2abs() prepended an extra "." when it saw "./X", turning the
relative path into "../X" and silently moving up a directory. As a
result, raw HTML <img src="./imgs/run.png"> inside a README rendered
under /r/<repo>/<file> resolved to /r/<repo>/imgs/... instead of
/r/<repo>/<dir>/imgs/..., so the image 404'd. Markdown image syntax
went through marked-base-url and was unaffected.
Strip the leading "./" instead so the relative path concatenates
cleanly with baseUrl.
Fixes#346.
The streaming zip pipeline was constructing AnonymizeTransformer first and
then assigning opt.filePath afterwards. AnonymizeTransformer determines
isText in its constructor from opt.filePath, so every entry was classified
as binary and passed through unchanged — the downloaded zip leaked the
original (un-anonymized) terms even though the web view scrubbed them.
Pass filePath via the constructor so isText is computed correctly.
Fixes#342, #349.
marked v12 dropped the headerIds option, so headings rendered with no
id attributes and links like [Releases](#releases-and-contributing)
silently failed to scroll. Add a heading renderer that emits a
GitHub-style slug id, with a numeric suffix for duplicates within a
document.
Fixes#390.
The Generate button silently no-op'd because ng-model="newTokenName" inside
an ng-if block wrote to a child scope, leaving $scope.newTokenName empty.
Use $scope.tokenForm.{name,plaintext} so prototypal lookup resolves to the
controller scope.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Toasts used class="toast show" with ng-repeat but never initialized
Bootstrap's toast plugin, so they stayed pinned until manually closed.
Each navigation re-fired toasts (e.g. "README not found in github
pages"), stacking duplicates.
Add an $scope.addToast helper that schedules removal via $timeout, and
route all push sites through it.
Fixes#246.
The Remove action in the dashboard dropdown was gated on
status == 'ready', so expired repos showed no way to be removed and
stuck on the front page. The backend DELETE route already accepts any
non-'removed' status, so widen the ng-show to include 'expired' and
'error'.
Fixes#463.
Wrapping every user term as `\b${term}\b` silently dropped matches when
the term started or ended with a non-word char (e.g. `@tdurieux`,
`@author .*`), because JS `\b` only fires at a word/non-word transition.
Replace with `withWordBoundaries()`, which only emits `\b` on the side
where the term has a word-char edge.
Fixes#175, #249.
* Fix horizontal overflow causing page content to be cut off on mobile
- Add overflow-x: hidden to html/body and ng-view to prevent horizontal
scrolling across all pages
- Restore .container.page mobile padding to 15px to match Bootstrap .row
negative margins (-15px), which previously caused 5px overflow per side
- Add max-width: 100% constraints to prevent content from exceeding viewport
https://claude.ai/code/session_01L2xhJCKkjghMDBuwXpSHzi
* Fix Ko-fi widget overlapping hamburger menu and simplify desktop layout
- Move Ko-fi "Support me" button from top-right (where it hid the navbar
hamburger) to bottom-right corner
- Remove card styling (border, background, border-radius) from dashboard
quota section for a flatter, cleaner look
- Remove fixed max-width: 960px from dashboard-page so it uses Bootstrap's
standard container widths, consistent with other pages like admin
https://claude.ai/code/session_01L2xhJCKkjghMDBuwXpSHzi
* Redesign anonymize page: centered landing, reorganized form layout
- No URL state: centered input in the middle of the page for a clean
initial experience
- URL provided state: preview on the left, form settings on the right
in a fixed-width sidebar panel (380px on desktop)
- Reorganized form sections into logical groups:
Source (branch/commit) → Identity (ID/conference) → Anonymization
(terms) → Display (checkboxes, no longer hidden in accordion) →
Expiration
- Removed card/accordion wrappers for a flatter, more scannable form
- Mobile: form stacks below preview with sticky submit bar
https://claude.ai/code/session_01L2xhJCKkjghMDBuwXpSHzi
* Reduce navbar height on desktop
- Reduce navbar padding from default .5rem to 4px vertical
- Shrink nav icons from 30px/40px to 20px/28px
- Reduce nav-link font size to 0.9rem with tighter padding
- Shrink navbar-brand font size to 1rem
https://claude.ai/code/session_01L2xhJCKkjghMDBuwXpSHzi
---------
Co-authored-by: Claude <noreply@anthropic.com>