starred/git-ai

Fork 0

mirror of https://github.com/git-ai-project/git-ai.git synced 2026-04-25 14:25:53 +03:00

[GH-ISSUE #355] Attribution incorrect after AI code review with no content changes #129

New issue

Closed

opened 2026-03-02 04:12:01 +03:00 by kerem · 3 comments

kerem commented

2026-03-02 04:12:01 +03:00

Owner

Originally created by @harvest-L on GitHub (Jan 16, 2026).
Original GitHub issue: https://github.com/git-ai-project/git-ai/issues/355

Problem Description

After using AI (Claude Code or Cursor) to perform code review on a file, some lines are incorrectly attributed to the current user instead of the AI, even though the content of those lines hasn't changed.

Reproduction Steps

Start with a clean commit where all lines in a file are authored by user a
User b opens the file in Cursor/Claude Code and asks AI to review the code
AI reviews the file but makes no actual changes to certain lines (content remains identical)
Commit immediately after AI review
Check attribution with git-ai blame

Expected Behavior

Lines with unchanged content should maintain their original attribution (user a)
Only lines actually modified by AI should show AI attribution
AI code percentage should reflect actual changes made by AI

Actual Behavior

Some lines with unchanged content are attributed to user b instead of maintaining original author a
git-ai blame shows user b for lines where content hasn't changed
The original author a's attribution is lost for these lines

Environment

git-ai version: 1.0.31
OS: Windows
AI tools: Claude Code / Cursor
Git workflow: Clean commit → AI CR → Immediate commit

Additional Context

This occurs even when file content hasn't changed at the byte level (not line ending or whitespace issues)
The issue appears random - some unchanged lines maintain correct attribution, others don't

Originally created by @harvest-L on GitHub (Jan 16, 2026). Original GitHub issue: https://github.com/git-ai-project/git-ai/issues/355 ## Problem Description After using AI (Claude Code or Cursor) to perform code review on a file, some lines are incorrectly attributed to the current user instead of the AI, even though the content of those lines hasn't changed. ## Reproduction Steps 1. Start with a clean commit where all lines in a file are authored by user `a` 2. User `b` opens the file in Cursor/Claude Code and asks AI to review the code 3. AI reviews the file but makes **no actual changes** to certain lines (content remains identical) 4. Commit immediately after AI review 5. Check attribution with `git-ai blame` ## Expected Behavior - Lines with unchanged content should maintain their original attribution (user `a`) - Only lines actually modified by AI should show AI attribution - AI code percentage should reflect actual changes made by AI ## Actual Behavior - Some lines with unchanged content are attributed to user `b` instead of maintaining original author `a` - `git-ai blame` shows user `b` for lines where content hasn't changed - The original author `a`'s attribution is lost for these lines ## Environment - git-ai version: 1.0.31 - OS: Windows - AI tools: Claude Code / Cursor - Git workflow: Clean commit → AI CR → Immediate commit ## Additional Context - This occurs even when file content hasn't changed at the byte level (not line ending or whitespace issues) - The issue appears random - some unchanged lines maintain correct attribution, others don't

kerem

2026-03-02 04:12:01 +03:00

closed this issue
added the
question
label

kerem commented

2026-03-02 04:12:04 +03:00

Author

Owner

@harvest-L commented on GitHub (Jan 16, 2026):

Summary

When AI performs formatting refactoring (e.g., breaking single-line code into multiple lines), these modified
lines are not recorded in Git Notes, causing them to be incorrectly attributed to the current committer instead of
the AI agent in git-ai blame.

Environment

git-ai version: 1.0.31
Platform: Windows

Reproduction Steps

Step 1: Create initial file

Create hello2.js with single-line content:
function hello2() { console.log("Hello2"); }

Commit this file as user A.

Step 2: Create human checkpoint

git-ai checkpoint human

Step 3: AI reformats file

AI changes file from single-line to multi-line:
function hello2() {
console.log("Hello2");
}

Step 4: Create AI checkpoint

git-ai checkpoint mock_ai

Step 5: Commit changes (as user B)

Switch to different user and commit.

Step 6: Verify

git-ai blame hello2.js

Actual Result

e880fb3 (chengyi 2026-01-17 02:36:20 +0800 1) function hello2() {
e880fb3 (chengyi 2026-01-17 02:36:20 +0800 2) console.log("Hello2");
e880fb3 (chengyi 2026-01-17 02:36:20 +0800 3) }

All 3 lines show current committer "chengyi" instead of AI

Expected Result

All 3 lines should show AI (mock_ai)

Root Cause Analysis

Investigation Results

Character-level attributions: Exist and correct, AI modified newlines and indentation
Line-level attributions: Empty []
Git Notes: No line attribution entries

Bug Chain

Token diff discovers tokens unchanged, only line breaks and indentation changed
AI attribution covers whitespace characters only (newlines + spaces)
Character → Line attribution conversion only counts non-whitespace characters to determine dominant author
AI's formatting changes discarded, all lines judged as human
Human lines filtered out, nothing written to Git Notes
Git attributes all lines to current committer
git-ai blame can't find Git Notes, falls back to current committer

Key Code Locations

src/authorship/attribution_tracker.rs:890-908
- find_dominant_author_for_line function
- Only counts non-whitespace characters, discards whitespace attributions
src/authorship/attribution_tracker.rs:1730-1732
- Filters out human lines before writing to Git Notes
src/authorship/attribution_tracker.rs:1339-1519
- build_token_aligned_diffs function
- substantive_new_ranges excludes pure formatting changes
src/commands/blame.rs:759-780
- Falls back to Git's original author when Git Notes missing

Impact

All AI code reviews and refactoring involving formatting changes cause users to lose accurate AI attribution
information.

@harvest-L commented on GitHub (Jan 16, 2026): Summary When AI performs formatting refactoring (e.g., breaking single-line code into multiple lines), these modified lines are not recorded in Git Notes, causing them to be incorrectly attributed to the current committer instead of the AI agent in git-ai blame. Environment - git-ai version: 1.0.31 - Platform: Windows Reproduction Steps Step 1: Create initial file Create hello2.js with single-line content: function hello2() { console.log("Hello2"); } Commit this file as user A. Step 2: Create human checkpoint git-ai checkpoint human Step 3: AI reformats file AI changes file from single-line to multi-line: function hello2() { console.log("Hello2"); } Step 4: Create AI checkpoint git-ai checkpoint mock_ai Step 5: Commit changes (as user B) Switch to different user and commit. Step 6: Verify git-ai blame hello2.js Actual Result e880fb3 (chengyi 2026-01-17 02:36:20 +0800 1) function hello2() { e880fb3 (chengyi 2026-01-17 02:36:20 +0800 2) console.log("Hello2"); e880fb3 (chengyi 2026-01-17 02:36:20 +0800 3) } All 3 lines show current committer "chengyi" instead of AI Expected Result All 3 lines should show AI (mock_ai) Root Cause Analysis Investigation Results - Character-level attributions: Exist and correct, AI modified newlines and indentation - Line-level attributions: Empty [] - Git Notes: No line attribution entries Bug Chain 1. Token diff discovers tokens unchanged, only line breaks and indentation changed 2. AI attribution covers whitespace characters only (newlines + spaces) 3. Character → Line attribution conversion only counts non-whitespace characters to determine dominant author 4. AI's formatting changes discarded, all lines judged as human 5. Human lines filtered out, nothing written to Git Notes 6. Git attributes all lines to current committer 7. git-ai blame can't find Git Notes, falls back to current committer Key Code Locations 1. src/authorship/attribution_tracker.rs:890-908 - find_dominant_author_for_line function - Only counts non-whitespace characters, discards whitespace attributions 2. src/authorship/attribution_tracker.rs:1730-1732 - Filters out human lines before writing to Git Notes 3. src/authorship/attribution_tracker.rs:1339-1519 - build_token_aligned_diffs function - substantive_new_ranges excludes pure formatting changes 4. src/commands/blame.rs:759-780 - Falls back to Git's original author when Git Notes missing Impact All AI code reviews and refactoring involving formatting changes cause users to lose accurate AI attribution information.

kerem commented

2026-03-02 04:12:04 +03:00

Author

Owner

@svarlamov commented on GitHub (Jan 16, 2026):

Hey @harvest-L thanks for the report. We've actually implemented all of this tracking (and stuff like multi-line pure formatting changes).

Please make the following update to your ~/.git-ai/config.json to get this support:

{
  ...
  "feature_flags": {
    "checkpoint_inter_commit_move": true
  }
  ...
}

We plan to make this the default behavior soon, but it’s currently gated behind a feature flag. The feature requires running git blame on checkpoints. While this typically takes under 50 ms, it can take a few seconds for very large files in repositories with massive histories (for example, the Chromium repo with 1M+ commits). Because many Git AI users work in large codebases, we’re keeping it behind a flag until the workflow is fully asynchronous.

@svarlamov commented on GitHub (Jan 16, 2026): Hey @harvest-L thanks for the report. We've actually implemented all of this tracking (and stuff like multi-line pure formatting changes). Please make the following update to your `~/.git-ai/config.json` to get this support: ``` { ... "feature_flags": { "checkpoint_inter_commit_move": true } ... } ``` We plan to make this the default behavior soon, but it’s currently gated behind a feature flag. The feature requires running git blame on checkpoints. While this typically takes under 50 ms, it can take a few seconds for very large files in repositories with massive histories (for example, the Chromium repo with 1M+ commits). Because many Git AI users work in large codebases, we’re keeping it behind a flag until the workflow is fully asynchronous.

kerem commented

2026-03-02 04:12:04 +03:00

Author

Owner

@harvest-L commented on GitHub (Jan 19, 2026):

@svarlamov
It seems like this configuration doesn't really address the issue I mentioned — the problem I raised isn't applicable to this scenario.
Here's what the AI concluded.

Problem Summary

When AI performs formatting-only changes (e.g., breaking single-line code into multiple lines, adjusting indentation), these modifications are not recorded in Git Notes. Consequently, git-ai blame incorrectly attributes all lines to the current committer instead of the AI agent.

Configuration Analysis

`checkpoint_inter_commit_move` Setting

What it does:

Located in: src/commands/checkpoint.rs:739
Controls whether to use git blame to retrieve cross-commit line attribution
When true: Uses repo.blame() to get actual line attribution
When false: Defaults all lines to "human"

Why it does NOT fix the formatting bug:

This configuration only affects the initialization phase of attribution retrieval (from git blame). The formatting bug occurs in a completely different part of the codebase - during the character-to-line attribution conversion.

Root Cause

The bug is in find_dominant_author_for_line() function (src/authorship/attribution_tracker.rs:905-910):

// Line 898-900: Only counts NON-whitespace characters
attr_non_whitespace_count = content_slice.chars()
    .filter(|c| !c.is_whitespace())  // ⚠️ Whitespace ignored!
    .count();

// Line 905-910: Discards whitespace-only attributions
if attr_non_whitespace_count > 0 || is_line_empty || is_deletion_marker {
    candidate_attrs.push(attribution.clone());
} else {
    // AI formatting changes discarded here!
    continue;
}

Bug Chain

AI reformats code: function hello() { console.log("Hello"); }
→ Multi-line with newlines and indentation
Token diff: Tokens unchanged, only whitespace modified ✓
Character-level attribution: AI attributed to newlines/spaces ✓
Line-level conversion (find_dominant_author_for_line):
- Counts only non-whitespace characters
- Discards whitespace-only attributions
- candidate_attrs becomes empty
- Returns "human" as default
Git Notes filtering (attribution_tracker.rs:874-876):
- Strips away all human lines
- Result: Empty line attributions
git-ai blame:
- Cannot find Git Notes
- Falls back to current committer

Test Evidence

The existing test line_reflow_without_token_change_is_non_substantive (attribution_tracker.rs:1091-1107) validates that line reflow SHOULD preserve original authorship:

let old = "call(foo, bar, baz)";
let new = "call(\n  foo,\n  bar,\n  baz\n)";
let line_attrs = attributions_to_line_attributions(&updated, new);

assert!(
    line_attrs.iter().all(|la| la.author_id == "Alice"),
    "every reflowed line should remain Alice"
);

This test passes, confirming the issue is in the line attribution conversion logic.

Conclusion

The checkpoint_inter_commit_move feature flag is designed for a different purpose (cross-commit move detection) and does not address the formatting attribution bug.

@harvest-L commented on GitHub (Jan 19, 2026): @svarlamov It seems like this configuration doesn't really address the issue I mentioned — the problem I raised isn't applicable to this scenario. Here's what the AI concluded. ## Problem Summary When AI performs formatting-only changes (e.g., breaking single-line code into multiple lines, adjusting indentation), these modifications are not recorded in Git Notes. Consequently, `git-ai blame` incorrectly attributes all lines to the current committer instead of the AI agent. ## Configuration Analysis ### `checkpoint_inter_commit_move` Setting **What it does:** - Located in: `src/commands/checkpoint.rs:739` - Controls whether to use `git blame` to retrieve cross-commit line attribution - When `true`: Uses `repo.blame()` to get actual line attribution - When `false`: Defaults all lines to "human" **Why it does NOT fix the formatting bug:** This configuration only affects the **initialization phase** of attribution retrieval (from git blame). The formatting bug occurs in a completely different part of the codebase - during the **character-to-line attribution conversion**. ## Root Cause The bug is in `find_dominant_author_for_line()` function (`src/authorship/attribution_tracker.rs:905-910`): ```rust // Line 898-900: Only counts NON-whitespace characters attr_non_whitespace_count = content_slice.chars() .filter(|c| !c.is_whitespace()) // ⚠️ Whitespace ignored! .count(); // Line 905-910: Discards whitespace-only attributions if attr_non_whitespace_count > 0 || is_line_empty || is_deletion_marker { candidate_attrs.push(attribution.clone()); } else { // AI formatting changes discarded here! continue; } ``` ### Bug Chain 1. AI reformats code: `function hello() { console.log("Hello"); }` → Multi-line with newlines and indentation 2. **Token diff**: Tokens unchanged, only whitespace modified ✓ 3. **Character-level attribution**: AI attributed to newlines/spaces ✓ 4. **Line-level conversion** (`find_dominant_author_for_line`): - Counts only non-whitespace characters - Discards whitespace-only attributions - `candidate_attrs` becomes empty - Returns "human" as default 5. **Git Notes filtering** (`attribution_tracker.rs:874-876`): - Strips away all human lines - Result: Empty line attributions 6. **git-ai blame**: - Cannot find Git Notes - Falls back to current committer ## Test Evidence The existing test `line_reflow_without_token_change_is_non_substantive` (`attribution_tracker.rs:1091-1107`) validates that line reflow SHOULD preserve original authorship: ```rust let old = "call(foo, bar, baz)"; let new = "call(\n foo,\n bar,\n baz\n)"; let line_attrs = attributions_to_line_attributions(&updated, new); assert!( line_attrs.iter().all(|la| la.author_id == "Alice"), "every reflowed line should remain Alice" ); ``` This test passes, confirming the issue is in the line attribution conversion logic. ## Conclusion The `checkpoint_inter_commit_move` feature flag is designed for a different purpose (cross-commit move detection) and does not address the formatting attribution bug.

kerem referenced this issue

2026-03-02 04:13:09 +03:00

[PR #129] Update windows set path on install to first set for the user (instead of system-wide) #274