feat(cli): CJK word segmentation and Ctrl+arrow navigation optimization by Apophis3158 · Pull Request #2942 · QwenLM/qwen-code

Apophis3158 · 2026-04-07T05:41:01Z

TLDR

This PR adds intelligent CJK (Chinese/Japanese/Korean) word segmentation to the CLI text input, enabling proper Ctrl+Left/Right word-by-word navigation for CJK text.

Problem: Without this change, pressing Ctrl+Left/Right on CJK text jumps over the entire contiguous block of CJK characters until the next whitespace, treating phrases like "你好世界" as a single word. This makes precise cursor positioning in mixed Latin-CJK text nearly impossible.

Solution: Integrates the segmentit library for Chinese word segmentation, with character-by-character fallback for long lines and caching for performance. The implementation:

Adds lazy-loaded segmentit for CJK word boundary detection
Implements caching (up to 500 entries) to avoid repeated segmentation overhead
Falls back to character-by-character navigation for lines exceeding 1500 characters to prevent UI freezing
Extends CJK regex coverage to include CJK Extension A and Compatibility Ideographs
Preserves existing Latin/multi-script word boundary logic via isDifferentScript fallback

Screenshots / Video Demo

Dive Deeper

Implementation Details

Word Navigation (wordLeft / wordRight):

First attempts CJK segmentation via getCjkWordBoundaries() for lines containing CJK characters
Uses findPrevCjkWordStart() / findNextCjkWordEnd() for precise cursor positioning
Falls back to script-boundary detection (isDifferentScript) for mixed text (e.g., Latin + CJK)
Handles edge cases: whitespace-only prefixes, punctuation skipping, and cross-line navigation

Performance Optimizations:

Lazy loading: segmentit is loaded on-demand via createRequire() for ESM/CJS interop
Pre-warming: Background initialization on first call to minimize latency
Caching: Line content → boundary mapping cache with LRU-style eviction
Length limit: Lines >1500 chars skip segmentation to avoid UI freeze

Dependencies:

Added segmentit@^2.0.3 for Chinese word segmentation

Reviewer Test Plan

Open the CLI and type mixed Latin-CJK text, e.g., hello 你好 world 世界
Use Ctrl+Left/Right to navigate word-by-word:
- Verify cursor stops at CJK word boundaries (as determined by segmentit)
- Verify Latin words are still treated as single units
- Verify cross-line navigation works correctly
Test edge cases:
- Long lines (>1500 chars) with CJK text
- CJK text with punctuation: 你好，世界！
- Mixed scripts: 你好hello世界arabicالعربية
Run unit tests: npm run test -- packages/cli/src/ui/components/shared/text-buffer.test.ts

Testing Matrix

	🍏	🪟	🐧
npm run	❓	✅	❓
npx	❓	❓	❓
Docker	❓	❓	❓
Podman	❓	-	-
Seatbelt	❓	-	-

Linked issues / bugs

#2941

🤖 Generated with Qwen Code

wenshao

(duplicate review removed)

wenshao · 2026-04-07T20:05:38Z

packages/cli/src/ui/components/shared/text-buffer.ts

+      return b.end;
+    }
+    if (col < b.start) {
+      return b.end;


[Critical] findNextCjkWordEnd returns b.end when col < b.start, causing Ctrl+Right to skip over non-CJK text and jump directly to the end of a CJK word.

For example, in "hello 你好 world", if the cursor is inside "hello", pressing Ctrl+Right would jump to position 8, skipping "llo " and "你好" entirely. This is asymmetric with findPrevCjkWordStart which correctly returns null in the analogous case.

Suggested change

return b.end;

}

if (col < b.start) {

return b.end;

if (col < b.start) {

return null;

}

— glm-5.1 via Qwen Code /review

wenshao · 2026-04-07T20:06:00Z

packages/cli/src/ui/components/shared/text-buffer.ts

+      );
+      segmentitInstance = null;
+      return;
+    }
+    segmentitInstance = initSegment(new Segment());
+    debugLogger.info('segmentit: loaded successfully');
+  } catch (err) {
+    debugLogger.warn('segmentit: failed to load', err);
+    segmentitInstance = null;
+  }


[Suggestion] ensureSegmentitLoaded sets segmentitInstance = null on failure, causing it to retry createRequire on every keypress. Use a sentinel value to distinguish "not yet attempted" from "attempted and failed".

Three changes needed:

Declaration (line ~114):

let segmentitInstance: { doSegment: (text: string) => Array<{w: string}> } | null | false = null;

Catch block (line ~122): change segmentitInstance = null to:

segmentitInstance = false;

Guard (line ~116): change if (segmentitInstance !== null) return; — this already works since false !== null is true, so it will skip retrying.

— glm-5.1 via Qwen Code /review

wenshao · 2026-04-07T20:06:19Z

packages/cli/src/ui/components/shared/text-buffer.ts

+    debugLogger.warn('getCjkWordBoundaries: error, using char fallback', err);
+    // On error, fall back to char-by-char boundaries (cached)
+    const fallback = charByCharCjkFallback(line);
+    cjkBoundariesCache.set(line, fallback);


[Suggestion] The catch block inserts into the cache without calling evictCacheIfNeeded() first. All other insertion paths call it. If doSegment errors on many distinct lines, the cache can grow beyond the 500-entry CJK_BOUNDARIES_CACHE_MAX limit.

Suggested change

cjkBoundariesCache.set(line, fallback);

evictCacheIfNeeded();

cjkBoundariesCache.set(line, fallback);

— glm-5.1 via Qwen Code /review

wenshao · 2026-04-07T20:06:34Z

packages/cli/package.json

    "prompts": "^2.4.2",
    "react": "^19.1.0",
    "read-package-up": "^11.0.0",
+    "segmentit": "^2.0.3",


[Suggestion] segmentit adds ~15MB to disk footprint (embedded dictionary data) as a mandatory dependency for all CLI users. Since the project requires Node.js 20+, the built-in Intl.Segmenter supports CJK word segmentation with zero extra weight:

const segmenter = new Intl.Segmenter('zh', { granularity: 'word' }); const segments = [...segmenter.segment(line)];

Note: Intl.Segmenter uses ICU data which may produce different word boundaries than segmentit's dictionary-based approach. Recommend testing with representative CJK text samples before switching.

— glm-5.1 via Qwen Code /review

wenshao

These findings could not be posted as inline comments (lines not in diff):

AppContainer.tsx — midTurnDrainRef reads from React state mirror instead of synchronous ref. Fix: use drainQueue() from useMessageQueue directly.
prompts.ts — getActionsSection() says "ask for confirmation" but existing rule says "do not ask permission to use the tool". Contradictory instructions may cause inconsistent model behavior.
text-buffer.ts — delete_word_left/delete_word_right still use Latin-only word boundary logic while move_word now uses CJK segmentation. Inconsistent UX for CJK users.

— glm-5.1 via Qwen Code /review

Apophis3158 added 2 commits April 7, 2026 11:55

feat(cli): CJK word segmentation and Ctrl+arrow optimization

494be1b

Merge branch 'main' into feat/cjk-navigation

bc68132

wenshao reviewed Apr 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cli): CJK word segmentation and Ctrl+arrow navigation optimization#2942

feat(cli): CJK word segmentation and Ctrl+arrow navigation optimization#2942
Apophis3158 wants to merge 2 commits intoQwenLM:mainfrom
Apophis3158:feat/cjk-navigation

Apophis3158 commented Apr 7, 2026 •

edited

Loading

Uh oh!

wenshao left a comment •

edited

Loading

Uh oh!

wenshao Apr 7, 2026 •

edited

Loading

Uh oh!

wenshao Apr 7, 2026 •

edited

Loading

Uh oh!

wenshao Apr 7, 2026 •

edited

Loading

Uh oh!

wenshao Apr 7, 2026 •

edited

Loading

Uh oh!

wenshao left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	cjkBoundariesCache.set(line, fallback);
	evictCacheIfNeeded();
	cjkBoundariesCache.set(line, fallback);

Conversation

Apophis3158 commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TLDR

Screenshots / Video Demo

Dive Deeper

Implementation Details

Reviewer Test Plan

Testing Matrix

Linked issues / bugs

Uh oh!

wenshao left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wenshao Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wenshao Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wenshao Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wenshao Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wenshao left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Apophis3158 commented Apr 7, 2026 •

edited

Loading

wenshao left a comment •

edited

Loading

wenshao Apr 7, 2026 •

edited

Loading

wenshao Apr 7, 2026 •

edited

Loading

wenshao Apr 7, 2026 •

edited

Loading

wenshao Apr 7, 2026 •

edited

Loading

wenshao left a comment •

edited

Loading