mirror of
https://github.com/wiltodelta/remove-ai-watermarks.git
synced 2026-06-10 12:53:56 +02:00
fix(photomaker-v2): render at SDXL native 1024, use upstream prompt + neg_prompt
The 9-face grid + single-face cert outputs were still mosaic of training-time
faces even after the id_embeds shape fix. WebFetch of the upstream
inference_pmv2.py revealed three mismatches:
1. SDXL at width=height=512 falls into its low-res failure mode (small-detail
collage / mosaic) on the V2 LoRA. Render at native 1024 then downscale into
the original face bbox at composite time.
2. Upstream prompt is descriptive ("instagram photo, portrait photo of a woman
img, colorful, perfect face, natural skin, hard shadows, film grain, best
quality"). Our generic prompt let SDXL drift away from the ID embedding.
Adopted the upstream pattern.
3. Upstream V2 explicitly passes negative_prompt; the CFG batch-mismatch we hit
on V1 isn't a V2 issue. Re-added negative_prompt with the upstream wording
(asymmetry/worst quality/etc).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -76,12 +76,25 @@ _SDXL_MODEL_ID = "stabilityai/stable-diffusion-xl-base-1.0"
|
||||
# `img`, which PhotoMaker replaces with the ID embedding at inference. Keeping it
|
||||
# scene-neutral (no extra style words) maximises identity transfer from the embed and
|
||||
# minimises hallucinated background/lighting that would not match the cleaned scene.
|
||||
_PHOTOMAKER_PROMPT = "a portrait photo of a person img, natural lighting, sharp focus"
|
||||
_PHOTOMAKER_NEGATIVE = "blurry, lowres, deformed, distorted, watermark"
|
||||
# Prompt format follows the upstream V2 reference (inference_pmv2.py): the trigger
|
||||
# word ``img`` must immediately follow a class noun. SDXL is happiest at 1024 and
|
||||
# falls into low-res artefacts ("mosaic of tiny faces") at 512, so we render at
|
||||
# 1024 then downscale into the face bbox at composite time. Caught visually
|
||||
# 2026-06-04: at 512 V2 produced a collage of training-time faces; at 1024 with the
|
||||
# upstream-style descriptive prompt it produces a clean face.
|
||||
_PHOTOMAKER_PROMPT = (
|
||||
"instagram photo, portrait photo of a person img, natural skin, soft lighting, "
|
||||
"best quality, sharp focus"
|
||||
)
|
||||
_PHOTOMAKER_NEGATIVE = (
|
||||
"(asymmetry, worst quality, low quality, illustration, 3d, 2d, painting, "
|
||||
"cartoons, sketch), open mouth, blurry, watermark"
|
||||
)
|
||||
|
||||
# Square size used to feed PhotoMaker (must match a multiple of 64; 512 fits CPU/GPU
|
||||
# comfortably and gives the encoder enough pixels for a stable embedding).
|
||||
_PHOTOMAKER_FACE_SIZE = 512
|
||||
# SDXL native resolution; lower values send V2 into low-res mode and the output
|
||||
# becomes a collage of training-time faces. We render at 1024 then downscale into
|
||||
# the original face bbox at composite time.
|
||||
_PHOTOMAKER_FACE_SIZE = 1024
|
||||
|
||||
_pipeline: Any | None = None
|
||||
_pipeline_lock = threading.Lock()
|
||||
@@ -334,15 +347,11 @@ def restore_faces_photomaker(
|
||||
id_crop_rgb = cv2.cvtColor(id_crop_bgr, cv2.COLOR_BGR2RGB)
|
||||
id_image_pil = Image.fromarray(id_crop_rgb)
|
||||
|
||||
# Don't pass negative_prompt: the PhotoMaker pipeline manages its own CFG by
|
||||
# concatenating [negative_prompt_embeds, prompt_embeds]; if we pass a custom
|
||||
# negative the upstream code splits text_only vs id-injected branches and
|
||||
# the resulting embed batch dims can mismatch (we saw
|
||||
# "Sizes of tensors must match except in dimension 1. Expected size 2 but got
|
||||
# size 1" on a real run). The default empty negative is what the upstream
|
||||
# gradio demo uses.
|
||||
# Upstream V2 reference (inference_pmv2.py) passes negative_prompt; the
|
||||
# batch-mismatch we hit earlier was on V1 only.
|
||||
out = pipeline(
|
||||
prompt=_PHOTOMAKER_PROMPT,
|
||||
negative_prompt=_PHOTOMAKER_NEGATIVE,
|
||||
input_id_images=[id_image_pil],
|
||||
id_embeds=id_embeds,
|
||||
num_inference_steps=num_inference_steps,
|
||||
|
||||
Reference in New Issue
Block a user