From 61f5443bcaaa407bf56029e5bc232d82a7976c87 Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Sun, 10 May 2026 11:07:55 -0700 Subject: [PATCH] fix(ask-user-format): forbid \uXXXX escaping of CJK chars MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds a self-check item to the AskUserQuestion preamble forbidding `\u`- escape encoding of non-ASCII characters (CJK, accents) in AskUserQuestion fields. The tool parameter pipe is UTF-8 native and passes characters through unchanged; manually escaping requires recalling each codepoint from training, which models get wrong on long CJK strings — the user sees `管理工具` rendered as `㄃3用箱` when the model emits the wrong codepoint thinking it has the right one. Long ≠ escape. Keep characters literal. Generated SKILL.md files for all 36 skills that consume the preamble get regenerated in the next commit. Contributed by @joe51317-dotcom (#1205). Co-Authored-By: Claude Opus 4.7 (1M context) --- .../preamble/generate-ask-user-format.ts | 21 +++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/scripts/resolvers/preamble/generate-ask-user-format.ts b/scripts/resolvers/preamble/generate-ask-user-format.ts index 859ed5b01..5a7d174db 100644 --- a/scripts/resolvers/preamble/generate-ask-user-format.ts +++ b/scripts/resolvers/preamble/generate-ask-user-format.ts @@ -46,6 +46,26 @@ Effort both-scales: when an option involves effort, label both human-team and CC Net line closes the tradeoff. Per-skill instructions may add stricter rules. +12. **Non-ASCII characters — write directly, never \\u-escape.** When any + string field (question, option label, option description) contains + Chinese (繁體/簡體), Japanese, Korean, or other non-ASCII text, emit + the literal UTF-8 characters in the JSON string. **Never escape them + as \`\\uXXXX\`.** Claude Code's tool parameter pipe is UTF-8 native + and passes characters through unchanged. Manually escaping requires + recalling each codepoint from training, which is unreliable for long + CJK strings — the model regularly emits the wrong codepoint (e.g. + writes \`\\u3103\` thinking it is 管 U+7BA1, but \`\\u3103\` is + actually ㄃, so the user sees \`管理工具\` rendered as \`㄃3用箱\`). + The trigger is long, multi-line questions with hundreds of CJK + characters: that is exactly when reflexive escaping kicks in and + exactly when miscoding is most damaging. Long ≠ escape. Keep + characters literal. + + Wrong: \`"question": "請選擇\\uXXXX\\uXXXX\\uXXXX\\uXXXX"\` + Right: \`"question": "請選擇管理工具"\` + + Only JSON-mandatory escapes remain allowed: \`\\n\`, \`\\t\`, \`\\"\`, \`\\\\\`. + ### Self-check before emitting Before calling AskUserQuestion, verify: @@ -58,5 +78,6 @@ Before calling AskUserQuestion, verify: - [ ] Dual-scale effort labels on effort-bearing options (human / CC) - [ ] Net line closes the decision - [ ] You are calling the tool, not writing prose +- [ ] Non-ASCII characters (CJK / accents) written directly, NOT \\u-escaped `; }