[GH-ISSUE #168] Empty lines in AI-generated code are incorrectly attributed to human #62

Closed
opened 2026-03-02 04:11:34 +03:00 by kerem · 3 comments
Owner

Originally created by @AtnesNess on GitHub (Oct 29, 2025).
Original GitHub issue: https://github.com/git-ai-project/git-ai/issues/168

Description

When AI generates code that includes empty lines between code blocks, those empty lines are incorrectly attributed to the human user instead of the AI agent in the authorship tracking.

Expected Behavior

When AI generates code with empty lines (blank lines used for formatting/readability), those empty lines should be attributed to the AI agent, since the AI generated them as part of the code structure.

Actual Behavior

Empty lines within AI-generated code blocks are attributed to the human user in git-ai stats and git-ai blame output.

Reproduction Steps

  1. Initialize a git repository and create a basic file:
git init
git config user.email "test@example.com"
git config user.name "Test User"

# Initial commit
echo "# My Application" > app.py
git add app.py
git commit -m "Initial commit"
  1. Mark initial human work:
git-ai checkpoint
  1. Simulate AI adding code with empty lines:
cat > app.py <<EOF
# My Application

import os
import sys

def setup():
    print("Setting up")

def main():
    setup()
    print("Running main")

def cleanup():
    print("Cleaning up")

if __name__ == "__main__":
    main()
EOF
  1. Mark as AI-generated:
git-ai checkpoint mock_ai app.py
  1. Commit and check stats:
git add app.py
git commit -m "AI added code with empty lines"
git-ai stats --json

Expected Output

The AI should be credited with all 16 added lines (including the 4 empty lines):

{
  "human_additions": 1,
  "ai_additions": 16,
  "ai_accepted": 16,
  ...
}

Actual Output

Empty lines are incorrectly attributed to human:

{
  "human_additions": 5,  // Should be 1
  "ai_additions": 12,    // Should be 16
  "ai_accepted": 12,     // Should be 16
  ...
}

When running git-ai blame app.py, the empty lines show as authored by the human user instead of mock_ai.

# bc81323 (Test User 2025-10-29 16:32:35 +0000  1) # My Application
# bc81323 (Test User 2025-10-29 16:32:35 +0000  2) 
# bc81323 (mock_ai   2025-10-29 16:32:35 +0000  3) import os
# bc81323 (mock_ai   2025-10-29 16:32:35 +0000  4) import sys
# bc81323 (Test User 2025-10-29 16:32:35 +0000  5) 
# bc81323 (mock_ai   2025-10-29 16:32:35 +0000  6) def setup():
# bc81323 (mock_ai   2025-10-29 16:32:35 +0000  7)     print("Setting up")
# bc81323 (Test User 2025-10-29 16:32:35 +0000  8) 
# bc81323 (mock_ai   2025-10-29 16:32:35 +0000  9) def main():
# bc81323 (mock_ai   2025-10-29 16:32:35 +0000 10)     setup()
# bc81323 (mock_ai   2025-10-29 16:32:35 +0000 11)     print("Running main")
# bc81323 (Test User 2025-10-29 16:32:35 +0000 12) 
# bc81323 (mock_ai   2025-10-29 16:32:35 +0000 13) def cleanup():
# bc81323 (mock_ai   2025-10-29 16:32:35 +0000 14)     print("Cleaning up")
# bc81323 (Test User 2025-10-29 16:32:35 +0000 15) 
# bc81323 (mock_ai   2025-10-29 16:32:35 +0000 16) if __name__ == "__main__":
# bc81323 (mock_ai   2025-10-29 16:32:35 +0000 17)     main()

Impact

This bug causes:

  • Inaccurate AI contribution metrics - AI's actual contribution is underreported
  • Inflated human metrics - Humans get credit for empty lines they didn't write
  • Incorrect blame attribution - Makes it harder to track which agent generated what code
  • Misleading acceptance rates - ai_accepted percentage appears lower than reality

Environment

  • git-ai version: 1.0.10
  • Platform: Linux/macOS
Originally created by @AtnesNess on GitHub (Oct 29, 2025). Original GitHub issue: https://github.com/git-ai-project/git-ai/issues/168 ## Description When AI generates code that includes empty lines between code blocks, those empty lines are incorrectly attributed to the human user instead of the AI agent in the authorship tracking. ## Expected Behavior When AI generates code with empty lines (blank lines used for formatting/readability), those empty lines should be attributed to the AI agent, since the AI generated them as part of the code structure. ## Actual Behavior Empty lines within AI-generated code blocks are attributed to the human user in `git-ai stats` and `git-ai blame` output. ## Reproduction Steps 1. Initialize a git repository and create a basic file: ```bash git init git config user.email "test@example.com" git config user.name "Test User" # Initial commit echo "# My Application" > app.py git add app.py git commit -m "Initial commit" ``` 2. Mark initial human work: ```bash git-ai checkpoint ``` 3. Simulate AI adding code with empty lines: ```bash cat > app.py <<EOF # My Application import os import sys def setup(): print("Setting up") def main(): setup() print("Running main") def cleanup(): print("Cleaning up") if __name__ == "__main__": main() EOF ``` 4. Mark as AI-generated: ```bash git-ai checkpoint mock_ai app.py ``` 5. Commit and check stats: ```bash git add app.py git commit -m "AI added code with empty lines" git-ai stats --json ``` ## Expected Output The AI should be credited with all 16 added lines (including the 4 empty lines): ```json { "human_additions": 1, "ai_additions": 16, "ai_accepted": 16, ... } ``` ## Actual Output Empty lines are incorrectly attributed to human: ```json { "human_additions": 5, // Should be 1 "ai_additions": 12, // Should be 16 "ai_accepted": 12, // Should be 16 ... } ``` When running `git-ai blame app.py`, the empty lines show as authored by the human user instead of `mock_ai`. ``` # bc81323 (Test User 2025-10-29 16:32:35 +0000 1) # My Application # bc81323 (Test User 2025-10-29 16:32:35 +0000 2) # bc81323 (mock_ai 2025-10-29 16:32:35 +0000 3) import os # bc81323 (mock_ai 2025-10-29 16:32:35 +0000 4) import sys # bc81323 (Test User 2025-10-29 16:32:35 +0000 5) # bc81323 (mock_ai 2025-10-29 16:32:35 +0000 6) def setup(): # bc81323 (mock_ai 2025-10-29 16:32:35 +0000 7) print("Setting up") # bc81323 (Test User 2025-10-29 16:32:35 +0000 8) # bc81323 (mock_ai 2025-10-29 16:32:35 +0000 9) def main(): # bc81323 (mock_ai 2025-10-29 16:32:35 +0000 10) setup() # bc81323 (mock_ai 2025-10-29 16:32:35 +0000 11) print("Running main") # bc81323 (Test User 2025-10-29 16:32:35 +0000 12) # bc81323 (mock_ai 2025-10-29 16:32:35 +0000 13) def cleanup(): # bc81323 (mock_ai 2025-10-29 16:32:35 +0000 14) print("Cleaning up") # bc81323 (Test User 2025-10-29 16:32:35 +0000 15) # bc81323 (mock_ai 2025-10-29 16:32:35 +0000 16) if __name__ == "__main__": # bc81323 (mock_ai 2025-10-29 16:32:35 +0000 17) main() ``` ## Impact This bug causes: - **Inaccurate AI contribution metrics** - AI's actual contribution is underreported - **Inflated human metrics** - Humans get credit for empty lines they didn't write - **Incorrect blame attribution** - Makes it harder to track which agent generated what code - **Misleading acceptance rates** - `ai_accepted` percentage appears lower than reality ## Environment - git-ai version: 1.0.10 - Platform: Linux/macOS
kerem closed this issue 2026-03-02 04:11:34 +03:00
Author
Owner

@svarlamov commented on GitHub (Oct 30, 2025):

Thank you for the detailed report, Atnes!

The stats bug is definitely not right. I made changes to ensure that only non-empty lines are counted for stats and it seems like it's not being applied to the human counts properly. I think it's happening in between checkpoints->authorship log where the stats are recalc'd in some rather outdated code. My next task actually is to rewrite and drastically simplify the way stats are handled, so that should be fixed in the coming days! Thanks for bearing with us on this!

During the move to diff-match-patch, I ended up deliberately stripping empty lines out of AI attribution ranges. I did that mainly because of some discussions and frustrations we had earlier with how empty lines were being handled (I don't remember the exact details). However, I think the diff-match-path implementation should be more resilient to those kinds of challenges. What do you think of retaining empty AI lines within a contiguous AI block, but still trimming empty lines at the edges. For example

1    AI function helloworld() {
2    AI     console.log('hello world');
3    AI     
4    AI }
5 HUMAN
6 HUMAN ...

So even if AI did have an empty line below the function (line 5) it would be excluded, however, the empty line before the closing brace (line 3) would retain the AI attribution. For the purposes of stats, the example above would only credit 3 lines to AI even though it would show up for 4 lines. And human would only get credit for the non-empty lines (... and beyond)

<!-- gh-comment-id:3466494592 --> @svarlamov commented on GitHub (Oct 30, 2025): Thank you for the detailed report, Atnes! The stats bug is definitely not right. I made changes to ensure that only non-empty lines are counted for stats and it seems like it's not being applied to the human counts properly. I think it's happening in between checkpoints->authorship log where the stats are recalc'd in some rather outdated code. My next task actually is to rewrite and drastically simplify the way stats are handled, so that should be fixed in the coming days! Thanks for bearing with us on this! During the move to diff-match-patch, I ended up deliberately stripping empty lines out of AI attribution ranges. I did that mainly because of some discussions and frustrations we had earlier with how empty lines were being handled (I don't remember the exact details). However, I think the diff-match-path implementation should be more resilient to those kinds of challenges. What do you think of retaining empty AI lines within a contiguous AI block, but still trimming empty lines at the edges. For example ``` 1 AI function helloworld() { 2 AI console.log('hello world'); 3 AI 4 AI } 5 HUMAN 6 HUMAN ... ``` So even if AI did have an empty line below the function (line 5) it would be excluded, however, the empty line before the closing brace (line 3) would retain the AI attribution. For the purposes of stats, the example above would only credit 3 lines to AI even though it would show up for 4 lines. And human would only get credit for the non-empty lines (... and beyond)
Author
Owner

@AtnesNess commented on GitHub (Oct 30, 2025):

I'd say that ideally edge lines also should be attributed to AI. However, that would definitely be an improvement for the data quality!

<!-- gh-comment-id:3469261283 --> @AtnesNess commented on GitHub (Oct 30, 2025): I'd say that ideally edge lines also should be attributed to AI. However, that would definitely be an improvement for the data quality!
Author
Owner

@svarlamov commented on GitHub (Nov 1, 2025):

@AtnesNess Updated the default to retain attribution and counts for empty lines. In the process of adding source lines of code metrics separately that will track only 'real' additions/deletions. Should work as expected now!

<!-- gh-comment-id:3475490471 --> @svarlamov commented on GitHub (Nov 1, 2025): @AtnesNess Updated the default to retain attribution and counts for empty lines. In the process of adding source lines of code metrics separately that will track only 'real' additions/deletions. Should work as expected now!
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/git-ai#62
No description provided.