starred/git-ai

Fork 0

mirror of https://github.com/git-ai-project/git-ai.git synced 2026-04-25 06:15:51 +03:00

[GH-ISSUE #358] [CRITICAL] Panic when processing files with multi-byte UTF-8 characters (Chinese, Japanese, etc.) #132

New issue

Closed

opened 2026-03-02 04:12:04 +03:00 by kerem · 1 comment

kerem commented

2026-03-02 04:12:04 +03:00

Owner

Originally created by @harvest-L on GitHub (Jan 16, 2026).
Original GitHub issue: https://github.com/git-ai-project/git-ai/issues/358

Bug Severity

CRITICAL - Causes git-ai to crash with panic, making it unusable for any files containing multi-byte UTF-8 characters.

Error Message

thread 'blocking-2' panicked at src\authorship\attribution_tracker.rs:1754:42:
byte index 2630 is not a char boundary; it is inside '选' (bytes 2629..2632) of `<template>
    <div class="add-person-container">
      <!-- 添加按钮（仅添加模式显示） -->
      <div class="add-btn-wrapper" v-if="mode === 'add'">
        <es-button type="text" icon="el-icon-plus" @click="addRow"> 添加个人 </es-button>
      </d

Note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

Affected Triggers

Both git-ai checkpoint hooks fail with the same panic:

PreToolUse:Edit [git-ai checkpoint claude --hook-input stdin] failed with non-blocking status code 101
PostToolUse:Edit [git-ai checkpoint claude --hook-input stdin] failed with non-blocking status code 101

Problem Description

When git-ai processes files containing multi-byte UTF-8 characters (Chinese, Japanese, Korean, emoji, etc.), it panics because it uses byte indices to slice strings directly, but the byte indices can fall in the middle of a multi-byte character.

Root Cause

Location: src/authorship/attribution_tracker.rs:1754

let content_slice = &full_content[std::cmp::max(line_start, attribution.start)
    ..std::cmp::min(line_end, attribution.end)];

The code uses byte indexing (line_start, line_end, attribution.start, attribution.end) to slice the string directly. In UTF-8:

ASCII characters = 1 byte
Chinese/Japanese/Korean characters = 3 bytes
Emoji = 4 bytes

When a byte index falls in the middle of a multi-byte character (e.g., byte 2630 of a 3-byte character at bytes 2629-2632), Rust panics because string slicing must occur on valid character boundaries.

Reproduction Steps

Create a file with Chinese, Japanese, or other multi-byte UTF-8 characters

For example, a Vue file with Chinese comments:

<template>
  <div class="add-person-container">
    <!-- 添加按钮（仅添加模式显示） -->
    <div class="add-btn-wrapper" v-if="mode === 'add'">
      <es-button type="text" icon="el-icon-plus" @click="addRow"> 添加个人 </es-button>
    </div>
  </div>
</template>

Use AI (Claude Code or Cursor) to edit the file
git-ai checkpoint will panic when processing this file

Impact

✗ All files with Chinese/Japanese/Korean text cause git-ai to crash
✗ Vue, React, HTML files with non-English comments are unusable with git-ai
✗ Any file containing emoji triggers the panic
✗ Cannot use git-ai in CJK (China, Japan, Korea) markets
✗ Affects all AI operations - CR, edits, refactoring on files with multi-byte characters

Expected Behavior

git-ai should handle multi-byte UTF-8 characters correctly:

Byte indices should be validated or adjusted to character boundaries before slicing
String operations should use character-safe methods
No panic should occur when processing files with multi-byte characters

Affected Users

This affects any user who:

Works with Chinese, Japanese, Korean, or other CJK languages
Uses emoji in their code
Has non-ASCII characters in comments or strings
Works in international teams with multi-language codebases

Suggested Fix

Use Rust's character boundary checks to ensure safe string slicing:

// Option 1: Use is_char_boundary check
let start = std::cmp::max(line_start, attribution.start);
let end = std::cmp::min(line_end, attribution.end);

// Ensure indices are on character boundaries
if !full_content.is_char_boundary(start) || !full_content.is_char_boundary(end) {
    // Adjust to nearest character boundary or skip
    // Option A: Skip this attribution
    continue;
    // Option B: Adjust to character boundaries
    // let start = full_content.char_indices()
    //     .find(|(idx, _)| *idx >= start)
    //     .map(|(idx, _)| idx)
    //     .unwrap_or(end);
}

let content_slice = &full_content[start..end];

Or use .get() for safer slicing:

// Option 2: Use get() method
let start = std::cmp::max(line_start, attribution.start);
let end = std::cmp::min(line_end, attribution.end);

let content_slice = match full_content.get(start..end) {
    Some(slice) => slice,
    None => continue, // Skip if not on character boundary
};

Environment

git-ai version: 1.0.31
OS: Windows (but affects all platforms)
File types: Vue, HTML, JavaScript, TypeScript, any text file
Character encoding: UTF-8 with multi-byte characters

Additional Context

This is a blocker for using git-ai in many international markets and projects. The panic occurs during checkpoint creation, which means:

AI edits on files with multi-byte characters cannot be tracked
Users must remove all CJK text or emoji from files before using git-ai
Makes git-ai essentially unusable for entire regions (China, Japan, Korea, etc.)

The fix is straightforward and should be prioritized as critical.

Originally created by @harvest-L on GitHub (Jan 16, 2026). Original GitHub issue: https://github.com/git-ai-project/git-ai/issues/358 ## Bug Severity **CRITICAL** - Causes git-ai to crash with panic, making it unusable for any files containing multi-byte UTF-8 characters. ## Error Message ``` thread 'blocking-2' panicked at src\authorship\attribution_tracker.rs:1754:42: byte index 2630 is not a char boundary; it is inside '选' (bytes 2629..2632) of `<template> <div class="add-person-container">  <div class="add-btn-wrapper" v-if="mode === 'add'"> <es-button type="text" icon="el-icon-plus" @click="addRow"> 添加个人 </es-button> </d ``` Note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace ## Affected Triggers Both git-ai checkpoint hooks fail with the same panic: - **PreToolUse:Edit** `[git-ai checkpoint claude --hook-input stdin] failed with non-blocking status code 101` - **PostToolUse:Edit** `[git-ai checkpoint claude --hook-input stdin] failed with non-blocking status code 101` ## Problem Description When git-ai processes files containing multi-byte UTF-8 characters (Chinese, Japanese, Korean, emoji, etc.), it panics because it uses **byte indices** to slice strings directly, but the byte indices can fall in the middle of a multi-byte character. ### Root Cause **Location**: `src/authorship/attribution_tracker.rs:1754` ```rust let content_slice = &full_content[std::cmp::max(line_start, attribution.start) ..std::cmp::min(line_end, attribution.end)]; ``` The code uses byte indexing (`line_start`, `line_end`, `attribution.start`, `attribution.end`) to slice the string directly. In UTF-8: - ASCII characters = 1 byte - Chinese/Japanese/Korean characters = 3 bytes - Emoji = 4 bytes When a byte index falls in the middle of a multi-byte character (e.g., byte 2630 of a 3-byte character at bytes 2629-2632), Rust panics because string slicing **must** occur on valid character boundaries. ## Reproduction Steps 1. Create a file with Chinese, Japanese, or other multi-byte UTF-8 characters 2. For example, a Vue file with Chinese comments: ```vue <template> <div class="add-person-container">  <div class="add-btn-wrapper" v-if="mode === 'add'"> <es-button type="text" icon="el-icon-plus" @click="addRow"> 添加个人 </es-button> </div> </div> </template> ``` 3. Use AI (Claude Code or Cursor) to edit the file 4. git-ai checkpoint will panic when processing this file ## Impact - ✗ **All files with Chinese/Japanese/Korean text** cause git-ai to crash - ✗ **Vue, React, HTML files with non-English comments** are unusable with git-ai - ✗ **Any file containing emoji** triggers the panic - ✗ **Cannot use git-ai in CJK (China, Japan, Korea) markets** - ✗ **Affects all AI operations** - CR, edits, refactoring on files with multi-byte characters ## Expected Behavior git-ai should handle multi-byte UTF-8 characters correctly: 1. Byte indices should be validated or adjusted to character boundaries before slicing 2. String operations should use character-safe methods 3. No panic should occur when processing files with multi-byte characters ## Affected Users This affects **any user** who: - Works with Chinese, Japanese, Korean, or other CJK languages - Uses emoji in their code - Has non-ASCII characters in comments or strings - Works in international teams with multi-language codebases ## Suggested Fix Use Rust's character boundary checks to ensure safe string slicing: ```rust // Option 1: Use is_char_boundary check let start = std::cmp::max(line_start, attribution.start); let end = std::cmp::min(line_end, attribution.end); // Ensure indices are on character boundaries if !full_content.is_char_boundary(start) || !full_content.is_char_boundary(end) { // Adjust to nearest character boundary or skip // Option A: Skip this attribution continue; // Option B: Adjust to character boundaries // let start = full_content.char_indices() // .find(|(idx, _)| *idx >= start) // .map(|(idx, _)| idx) // .unwrap_or(end); } let content_slice = &full_content[start..end]; ``` Or use `.get()` for safer slicing: ```rust // Option 2: Use get() method let start = std::cmp::max(line_start, attribution.start); let end = std::cmp::min(line_end, attribution.end); let content_slice = match full_content.get(start..end) { Some(slice) => slice, None => continue, // Skip if not on character boundary }; ``` ## Environment - git-ai version: 1.0.31 - OS: Windows (but affects all platforms) - File types: Vue, HTML, JavaScript, TypeScript, any text file - Character encoding: UTF-8 with multi-byte characters ## Additional Context This is a **blocker** for using git-ai in many international markets and projects. The panic occurs during checkpoint creation, which means: - AI edits on files with multi-byte characters cannot be tracked - Users must remove all CJK text or emoji from files before using git-ai - Makes git-ai essentially unusable for entire regions (China, Japan, Korea, etc.) The fix is straightforward and should be prioritized as **critical**.

kerem

2026-03-02 04:12:04 +03:00

closed this issue
added the
bug

working-on
labels

kerem commented

2026-03-02 04:12:06 +03:00

Author

Owner

@svarlamov commented on GitHub (Jan 17, 2026):

@harvest-L Thanks for the report again! Fixed in the latest next release: https://github.com/acunniffe/git-ai/releases/tag/v1.0.36-next-8b2936f -- please try it by pulling the install.sh from that release to get it

Will be in the next stable release (day or two)

@svarlamov commented on GitHub (Jan 17, 2026): @harvest-L Thanks for the report again! Fixed in the latest `next` release: https://github.com/acunniffe/git-ai/releases/tag/v1.0.36-next-8b2936f -- please try it by pulling the install.sh from that release to get it Will be in the next stable release (day or two)