[PR #98] [MERGED] Bug/fix notes merging behavior #256

Closed
opened 2026-03-02 04:13:06 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/git-ai-project/git-ai/pull/98
Author: @svarlamov
Created: 10/8/2025
Status: Merged
Merged: 10/9/2025
Merged by: @svarlamov

Base: mainHead: bug/fix-notes-merging-behavior


📝 Commits (4)

  • 2f42b0c implement local notes merging
  • ccdad69 add comment about the local copy of remote notes for merging
  • 3459410 fmt and change merge strategy to ours
  • 819421d added notes merge benchmark

📊 Changes

4 files changed (+398 additions, -9 deletions)

View changed files

scripts/benchmarks/git/benchmark_notes_merge.py (+228 -0)
📝 src/commands/fetch_hooks.rs (+35 -3)
📝 src/commands/push_hooks.rs (+60 -4)
📝 src/git/refs.rs (+75 -2)

📄 Description

Fixes #95

When we migrated to git notes the other day, we failed to handle the new problem of merging notes. Unlike the independent commit-based AI refs, git notes are trees and so even though each note is still unrelated to the other notes, the tree structure itself needs to get merged. The problem was hidden for a while, because the push hook was passing a prop that would override the tree if it diverged.

This PR solves this by having clients merge their local ai note tree (using the union method) with the latest remote ai note tree. It also removes the force flag on the push. It makes the push a bit slower (need to fetch remote notes before pushing our own), but I think there are many possible optimizations to be made still.

@acunniffe Can you please make sure that this plays nicely with the rebase/squash/etc. work?

More detailed AI-generated commit summary below (I think it does a good job of explaining the small but somewhat complex changes)

Problem Statement

When two independent clones of a repository create AI-authored commits on different branches and push to the same remote, the git notes containing authorship metadata conflict. This manifests as:

  1. git-ai blame <file> fails to show AI attribution for the first branch pushed
  2. git notes --ref ai show <commit> returns no results for affected commits
  3. Manual fetch of notes produces non-fast-forward rejection errors
  4. Fresh clones only receive notes from the most recent push, losing historical authorship data

Root Cause

The issue stems from using a force refspec (+refs/notes/ai:refs/notes/ai) for both push and fetch operations. Git notes are stored as a tree structure under refs/notes/ai, where each note is a blob mapped to a commit SHA. When using force push:

  1. Clone A pushes branch feature-1 → Creates notes tree T1 on remote
  2. Clone B independently pushes branch feature-2Force-overwrites remote with tree T2
  3. Tree T1 is completely lost, orphaning all notes for commits in feature-1

This occurs because notes form a single shared tree structure, not independent per-commit blobs. Each push/fetch operation affects the entire tree, making force operations destructive.

Solution

Implemented a merge-based synchronization workflow that prevents data loss:

Architecture

  1. Tracking Refs Pattern - Remote notes are fetched into isolated tracking refs (refs/notes/ai-remote/{remote}) rather than directly into refs/notes/ai
  2. Automatic Merging - After fetching, tracking refs are merged into local notes using git notes merge -s union
  3. Non-Force Push - Notes are pushed using refs/notes/ai:refs/notes/ai (without + prefix), requiring fast-forward

Implementation Details

1. Helper Functions (src/git/refs.rs)

- tracking_ref_for_remote() // Generates refs/notes/ai-remote/{remote}
- sanitize_remote_name()     // Ensures valid ref names (replaces special chars)
- ref_exists()               // Checks ref existence via show-ref
- merge_notes_from_ref()     // Executes git notes merge -s union
- copy_ref()                 // Initial setup for new tracking refs

2. Fetch Hook (src/commands/fetch_hooks.rs)

fetch origin +refs/notes/ai:refs/notes/ai-remote/origin
  ↓
if refs/notes/ai exists:
  git notes --ref=ai merge -s union refs/notes/ai-remote/origin
else:
  git update-ref refs/notes/ai refs/notes/ai-remote/origin

3. Push Hook (src/commands/push_hooks.rs)

Pre-push:
  fetch origin +refs/notes/ai:refs/notes/ai-remote/origin
  git notes --ref=ai merge -s union refs/notes/ai-remote/origin

Post-push:
  git push origin refs/notes/ai:refs/notes/ai  # No force!

Merge Strategy: Union

Uses git notes merge -s union which concatenates notes content. This is safe because:

  • Each commit SHA is globally unique → notes for different commits never conflict
  • Authorship logs are JSON blobs → concatenation doesn't corrupt structure
  • Alternative strategies like cat_sort_uniq could deduplicate lines but aren't needed for JSON data

Workflow Example

Before (Buggy):

Clone A: Creates note for commit abc123 → Push → Remote has {abc123}
Clone B: Creates note for commit def456 → Push → Remote has {def456} (abc123 LOST!)
Clone C: Fetch → Only sees {def456}

After (Fixed):

Clone A: Creates note for commit abc123 → Push → Remote has {abc123}
Clone B: Creates note for commit def456 → Pre-push fetch+merge → Local has {abc123, def456}
        Push → Remote has {abc123, def456}
Clone C: Fetch → Receives {abc123, def456} → Merge → All notes preserved

Limitations and Edge Cases

1. git push --mirror

Status: Partially handled

  • The post-push hook is skipped for --mirror operations to avoid double-pushing notes (mirror already pushes all refs)
  • If a user hasn't recently fetched notes before running --mirror, their refs/notes/ai may be stale and overwrite remote notes
  • Mitigation: Mirror operations are rare (typically for backups/migrations) and users performing them generally expect to push their exact local state
  • Future improvement: Could add a pre-push hook to fetch+merge before mirror, but adds complexity for minimal benefit

2. Tracking Ref Pollution

Status: Not an issue

  • Tracking refs are stored at refs/notes/ai-remote/{remote} which is outside standard push namespaces
  • Git push behavior by namespace:
    • git push → only refs/heads/* (branches)
    • git push --all → only refs/heads/*
    • git push --tags → only refs/tags/*
    • git push --mirror → all refs (but hook skips this case)
  • Tracking refs remain local-only unless explicitly pushed via git push origin refs/notes/ai-remote/* (extremely unlikely)

3. Race Conditions

Status: Acceptable failure mode

  • If two clones push simultaneously, one may fail with non-fast-forward error
  • Behavior: Failed push logs error via debug_log() but doesn't fail user's operation (best-effort)
  • User can retry push, which will fetch+merge and succeed
  • This is standard git behavior for concurrent pushes

4. Merge Conflicts

Status: Theoretically possible, practically impossible

  • The union strategy auto-resolves by concatenation
  • Conflicts could only occur if the same commit somehow has different notes in tracking ref vs local ref
  • This can't happen in normal operation since notes are created deterministically from authorship logs
  • If it did occur, merge would fail and log error (best-effort handling)

Backward Compatibility

  • No migration required - Existing repositories automatically upgrade on first fetch/push
  • Legacy refspec preserved - AI_AUTHORSHIP_REFSPEC constant retained for potential future use
  • Graceful degradation - If notes sync fails (network issues, etc.), errors are logged but user operations proceed

Testing Considerations

The fix resolves the exact scenario from the bug report:

  1. Two clones modify the same file on different branches
  2. Both push to remote
  3. Fresh clone fetches both branches
  4. git-ai blame now works correctly for both branches
  5. No non-fast-forward errors when fetching notes

Performance Impact

  • Fetch operations: Adds one git fetch + one git notes merge per fetch (negligible overhead)
  • Push operations: Adds one git fetch + one git notes merge + one git push (best-effort, non-blocking)
  • Storage: Tracking refs add minimal overhead (~one ref per remote, typically just origin)

Security Considerations

  • All git operations run with core.hooksPath=/dev/null to prevent recursive hook invocation
  • Remote names are sanitized using sanitize_remote_name() to prevent ref injection attacks
  • Operations are best-effort and cannot fail user's git commands

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/git-ai-project/git-ai/pull/98 **Author:** [@svarlamov](https://github.com/svarlamov) **Created:** 10/8/2025 **Status:** ✅ Merged **Merged:** 10/9/2025 **Merged by:** [@svarlamov](https://github.com/svarlamov) **Base:** `main` ← **Head:** `bug/fix-notes-merging-behavior` --- ### 📝 Commits (4) - [`2f42b0c`](https://github.com/git-ai-project/git-ai/commit/2f42b0cb8439fddd88491af5d84367589b4cb8fa) implement local notes merging - [`ccdad69`](https://github.com/git-ai-project/git-ai/commit/ccdad691bb1da0996bd0d1a70a57017ffc906906) add comment about the local copy of remote notes for merging - [`3459410`](https://github.com/git-ai-project/git-ai/commit/34594101a60ac20665063d9b5110191019fab98e) fmt and change merge strategy to ours - [`819421d`](https://github.com/git-ai-project/git-ai/commit/819421dfc5dd0c8227802cb07aa032def52a475b) added notes merge benchmark ### 📊 Changes **4 files changed** (+398 additions, -9 deletions) <details> <summary>View changed files</summary> ➕ `scripts/benchmarks/git/benchmark_notes_merge.py` (+228 -0) 📝 `src/commands/fetch_hooks.rs` (+35 -3) 📝 `src/commands/push_hooks.rs` (+60 -4) 📝 `src/git/refs.rs` (+75 -2) </details> ### 📄 Description Fixes #95 When we migrated to git notes the other day, we failed to handle the new problem of merging notes. Unlike the independent commit-based AI refs, git notes are trees and so even though each note is still unrelated to the other notes, the tree structure itself needs to get merged. The problem was hidden for a while, because the push hook was passing a prop that would override the tree if it diverged. This PR solves this by having clients merge their local ai note tree (using the `union` method) with the latest remote ai note tree. It also removes the force flag on the push. It makes the push a bit slower (need to fetch remote notes before pushing our own), but I think there are many possible optimizations to be made still. @acunniffe Can you please make sure that this plays nicely with the rebase/squash/etc. work? More detailed AI-generated commit summary below (I think it does a good job of explaining the small but somewhat complex changes) ## Problem Statement When two independent clones of a repository create AI-authored commits on different branches and push to the same remote, the git notes containing authorship metadata conflict. This manifests as: 1. `git-ai blame <file>` fails to show AI attribution for the first branch pushed 2. `git notes --ref ai show <commit>` returns no results for affected commits 3. Manual fetch of notes produces non-fast-forward rejection errors 4. Fresh clones only receive notes from the most recent push, losing historical authorship data ## Root Cause The issue stems from using a **force refspec** (`+refs/notes/ai:refs/notes/ai`) for both push and fetch operations. Git notes are stored as a tree structure under `refs/notes/ai`, where each note is a blob mapped to a commit SHA. When using force push: 1. Clone A pushes branch `feature-1` → Creates notes tree T1 on remote 2. Clone B independently pushes branch `feature-2` → **Force-overwrites** remote with tree T2 3. Tree T1 is completely lost, orphaning all notes for commits in `feature-1` This occurs because notes form a single shared tree structure, not independent per-commit blobs. Each push/fetch operation affects the entire tree, making force operations destructive. ## Solution Implemented a **merge-based synchronization workflow** that prevents data loss: ### Architecture 1. **Tracking Refs Pattern** - Remote notes are fetched into isolated tracking refs (`refs/notes/ai-remote/{remote}`) rather than directly into `refs/notes/ai` 2. **Automatic Merging** - After fetching, tracking refs are merged into local notes using `git notes merge -s union` 3. **Non-Force Push** - Notes are pushed using `refs/notes/ai:refs/notes/ai` (without `+` prefix), requiring fast-forward ### Implementation Details **1. Helper Functions ([src/git/refs.rs](../src/git/refs.rs#L142-L201))** ```rust - tracking_ref_for_remote() // Generates refs/notes/ai-remote/{remote} - sanitize_remote_name() // Ensures valid ref names (replaces special chars) - ref_exists() // Checks ref existence via show-ref - merge_notes_from_ref() // Executes git notes merge -s union - copy_ref() // Initial setup for new tracking refs ``` **2. Fetch Hook ([src/commands/fetch_hooks.rs](../src/commands/fetch_hooks.rs#L49-L98))** ``` fetch origin +refs/notes/ai:refs/notes/ai-remote/origin ↓ if refs/notes/ai exists: git notes --ref=ai merge -s union refs/notes/ai-remote/origin else: git update-ref refs/notes/ai refs/notes/ai-remote/origin ``` **3. Push Hook ([src/commands/push_hooks.rs](../src/commands/push_hooks.rs#L56-L117))** ``` Pre-push: fetch origin +refs/notes/ai:refs/notes/ai-remote/origin git notes --ref=ai merge -s union refs/notes/ai-remote/origin Post-push: git push origin refs/notes/ai:refs/notes/ai # No force! ``` ## Merge Strategy: Union Uses `git notes merge -s union` which concatenates notes content. This is safe because: - Each commit SHA is globally unique → notes for different commits never conflict - Authorship logs are JSON blobs → concatenation doesn't corrupt structure - Alternative strategies like `cat_sort_uniq` could deduplicate lines but aren't needed for JSON data ## Workflow Example **Before (Buggy):** ``` Clone A: Creates note for commit abc123 → Push → Remote has {abc123} Clone B: Creates note for commit def456 → Push → Remote has {def456} (abc123 LOST!) Clone C: Fetch → Only sees {def456} ``` **After (Fixed):** ``` Clone A: Creates note for commit abc123 → Push → Remote has {abc123} Clone B: Creates note for commit def456 → Pre-push fetch+merge → Local has {abc123, def456} Push → Remote has {abc123, def456} Clone C: Fetch → Receives {abc123, def456} → Merge → All notes preserved ``` ## Limitations and Edge Cases ### 1. `git push --mirror` **Status:** Partially handled - The post-push hook is **skipped** for `--mirror` operations to avoid double-pushing notes (mirror already pushes all refs) - If a user hasn't recently fetched notes before running `--mirror`, their `refs/notes/ai` may be stale and overwrite remote notes - **Mitigation:** Mirror operations are rare (typically for backups/migrations) and users performing them generally expect to push their exact local state - **Future improvement:** Could add a pre-push hook to fetch+merge before mirror, but adds complexity for minimal benefit ### 2. Tracking Ref Pollution **Status:** Not an issue - Tracking refs are stored at `refs/notes/ai-remote/{remote}` which is outside standard push namespaces - Git push behavior by namespace: - `git push` → only `refs/heads/*` (branches) - `git push --all` → only `refs/heads/*` - `git push --tags` → only `refs/tags/*` - `git push --mirror` → all refs (but hook skips this case) - Tracking refs remain local-only unless explicitly pushed via `git push origin refs/notes/ai-remote/*` (extremely unlikely) ### 3. Race Conditions **Status:** Acceptable failure mode - If two clones push simultaneously, one may fail with non-fast-forward error - **Behavior:** Failed push logs error via `debug_log()` but doesn't fail user's operation (best-effort) - User can retry push, which will fetch+merge and succeed - This is standard git behavior for concurrent pushes ### 4. Merge Conflicts **Status:** Theoretically possible, practically impossible - The `union` strategy auto-resolves by concatenation - Conflicts could only occur if the same commit somehow has different notes in tracking ref vs local ref - This can't happen in normal operation since notes are created deterministically from authorship logs - If it did occur, merge would fail and log error (best-effort handling) ## Backward Compatibility - **No migration required** - Existing repositories automatically upgrade on first fetch/push - **Legacy refspec preserved** - `AI_AUTHORSHIP_REFSPEC` constant retained for potential future use - **Graceful degradation** - If notes sync fails (network issues, etc.), errors are logged but user operations proceed ## Testing Considerations The fix resolves the exact scenario from the bug report: 1. Two clones modify the same file on different branches 2. Both push to remote 3. Fresh clone fetches both branches 4. `git-ai blame` now works correctly for **both** branches 5. No non-fast-forward errors when fetching notes ## Performance Impact - **Fetch operations:** Adds one `git fetch` + one `git notes merge` per fetch (negligible overhead) - **Push operations:** Adds one `git fetch` + one `git notes merge` + one `git push` (best-effort, non-blocking) - **Storage:** Tracking refs add minimal overhead (~one ref per remote, typically just `origin`) ## Security Considerations - All git operations run with `core.hooksPath=/dev/null` to prevent recursive hook invocation - Remote names are sanitized using `sanitize_remote_name()` to prevent ref injection attacks - Operations are best-effort and cannot fail user's git commands --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-03-02 04:13:06 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/git-ai#256
No description provided.