[GH-ISSUE #404] Bug Report: When the filename contains Chinese characters, all AI-generated code is incorrectly classified as human-written. #152

Closed
opened 2026-03-02 04:12:20 +03:00 by kerem · 4 comments
Owner

Originally created by @ShuboLiang on GitHub (Jan 26, 2026).
Original GitHub issue: https://github.com/git-ai-project/git-ai/issues/404

Issue Summary

When I use an AI agent to generate code in a file whose filename contains Chinese characters, all the generated lines are incorrectly classified as human-written. Running the command git-ai stats --json produces the following output:

{
  "human_additions": 9,
  "mixed_additions": 0,
  "ai_additions": 0,
  "ai_accepted": 0,
  "total_ai_additions": 9,
  "total_ai_deletions": 0,
  "time_waiting_for_ai": 0,
  "git_diff_deleted_lines": 0,
  "git_diff_added_lines": 9,
  "tool_model_breakdown": {
    "turing-coder::deepseek-v3": {
      "ai_additions": 0,
      "mixed_additions": 0,
      "ai_accepted": 0,
      "total_ai_additions": 9,
      "total_ai_deletions": 0,
      "time_waiting_for_ai": 0
    }
  }
}

The puzzling part is that total_ai_additions correctly reports 9, yet ai_additions is 0, with all additions attributed to human_additions. This perhaps suggests the git-ai tracks AI-generated content internally but fails to classify it properly when the filename includes non-ASCII (e.g., Chinese) characters.

Notably, if the filename uses only English characters, the classification works correctly—AI-generated lines are properly counted under ai_additions.

enviroment

  • git-ai-version: latest
  • os: windows11
Originally created by @ShuboLiang on GitHub (Jan 26, 2026). Original GitHub issue: https://github.com/git-ai-project/git-ai/issues/404 # Issue Summary When I use an AI agent to generate code in a file whose **filename contains Chinese characters**, all the generated lines are incorrectly classified as human-written. Running the command `git-ai stats --json` produces the following output: ```json { "human_additions": 9, "mixed_additions": 0, "ai_additions": 0, "ai_accepted": 0, "total_ai_additions": 9, "total_ai_deletions": 0, "time_waiting_for_ai": 0, "git_diff_deleted_lines": 0, "git_diff_added_lines": 9, "tool_model_breakdown": { "turing-coder::deepseek-v3": { "ai_additions": 0, "mixed_additions": 0, "ai_accepted": 0, "total_ai_additions": 9, "total_ai_deletions": 0, "time_waiting_for_ai": 0 } } } ``` The puzzling part is that `total_ai_additions` correctly reports 9, yet `ai_additions` is 0, with all additions attributed to `human_additions`. This perhaps suggests the git-ai tracks AI-generated content internally but fails to classify it properly when the **filename includes non-ASCII (e.g., Chinese) characters**. Notably, if the filename uses only English characters, the classification works correctly—AI-generated lines are properly counted under `ai_additions`. ## enviroment - git-ai-version: latest - os: windows11
kerem 2026-03-02 04:12:20 +03:00
Author
Owner

@svarlamov commented on GitHub (Jan 28, 2026):

@ShuboLiang Thank you -- we'll get this patched right away.

@Krishnachaitanyakc Wondering if you'd be interested in taking this one on? Think we should probably also do some TDD with complex utf-8 char file contents, names, etc.

<!-- gh-comment-id:3811827845 --> @svarlamov commented on GitHub (Jan 28, 2026): @ShuboLiang Thank you -- we'll get this patched right away. @Krishnachaitanyakc Wondering if you'd be interested in taking this one on? Think we should probably also do some TDD with complex utf-8 char file contents, names, etc.
Author
Owner

@ShuboLiang commented on GitHub (Feb 4, 2026):

hello! is this in progress? @svarlamov

<!-- gh-comment-id:3844952722 --> @ShuboLiang commented on GitHub (Feb 4, 2026): hello! is this in progress? @svarlamov
Author
Owner

@svarlamov commented on GitHub (Feb 4, 2026):

Hey! I see no one has picked up yet -- should be in the release after next though. Probably in production by early next week

<!-- gh-comment-id:3845324053 --> @svarlamov commented on GitHub (Feb 4, 2026): Hey! I see no one has picked up yet -- should be in the release after next though. Probably in production by early next week
Author
Owner

@Krishnachaitanyakc commented on GitHub (Feb 4, 2026):

@svarlamov sorry, couldn't pick this up earlier. Created a PR, started with chinese but expanded it to others as well to make it comprehensive. I am not sure if it's overkill tbh, let me know.

<!-- gh-comment-id:3845750253 --> @Krishnachaitanyakc commented on GitHub (Feb 4, 2026): @svarlamov sorry, couldn't pick this up earlier. Created a PR, started with chinese but expanded it to others as well to make it comprehensive. I am not sure if it's overkill tbh, let me know.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/git-ai#152
No description provided.