[PR #595] perf: batch git exec calls for notes, rev-parse, status, and checkpoints #601

Open
opened 2026-03-02 04:14:05 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/git-ai-project/git-ai/pull/595
Author: @svarlamov
Created: 2/26/2026
Status: 🔄 Open

Base: mainHead: devin/1772084830-perf-batch-git-calls


📝 Commits (3)

  • f49731e perf: batch git exec calls for notes, rev-parse, status, and checkpoints
  • 87cd073 style: fix cargo fmt formatting issues
  • fb7ef32 merge: resolve conflict with origin/main in diff.rs

📊 Changes

6 files changed (+220 additions, -94 deletions)

View changed files

📝 src/commands/blame.rs (+55 -23)
📝 src/commands/checkpoint.rs (+25 -28)
📝 src/commands/diff.rs (+43 -36)
📝 src/git/authorship_traversal.rs (+1 -1)
📝 src/git/refs.rs (+81 -2)
📝 src/git/status.rs (+15 -4)

📄 Description

Summary

This PR implements performance optimizations to reduce git subprocess overhead by batching git exec calls. The primary bottleneck in git-ai is the number of individual git invocations, especially for operations like git notes show and git rev-parse that are called repeatedly in loops.

Changes:

  • Fix #1 & #2: Batch-fetch authorship notes in blame.rs and refs.rs using 2 git calls (1 cat-file --batch-check + 1 cat-file --batch) instead of N individual git notes show calls
  • Fix #4: Combine resolve_commit + resolve_parent into single git rev-parse call in diff.rs
  • Fix #5: Skip redundant get_staged_filenames() call in status() when no pathspecs provided (porcelain v2 output already contains staging info)
  • Fix #7: Read checkpoints once instead of twice in get_all_tracked_files()
  • Add batch_get_authorship_logs() helper function for reuse across codebase
  • Make batch_read_blobs_with_oids() public for cross-module access

Expected impact: Significant reduction in subprocess overhead for blame operations (N commits → 2 git calls), diff operations (2 calls → 1), and status operations (2 calls → 1 when no pathspecs).

Updates since last revision

  • Resolved merge conflict with main in src/commands/diff.rs: the resolve_two_revs function now uses exec_git_with_profile(&args, InternalGitProfile::General)? (combining the PR's ? error propagation + stdout parsing with main's migration from exec_git to exec_git_with_profile). This matches the pattern already used by resolve_commit in the same file.

Review & Testing Checklist for Human

⚠️ Risk level: YELLOW - Core git operations modified without end-to-end testing

  • Test blame command with commits that have authorship notes, especially files with many unique commits. Verify AI/human attribution is correct and performance improves.
  • Test diff command on edge cases: initial commits (no parent), symbolic refs (HEAD, branch names), commit ranges. Verify correct fallback to empty tree hash. Pay attention to the merge-conflict resolution in resolve_two_revs — confirm error propagation via ? behaves correctly when revisions can't be resolved.
  • Test status command with staged files, both with and without pathspecs. Verify staged files are correctly identified without the explicit get_staged_filenames() pre-call.
  • Verify schema version handling: batch_get_authorship_logs() correctly filters by AUTHORSHIP_LOG_VERSION, but the inline batch code in get_commits_with_notes_from_list() does not — it deserializes without a version check. Confirm this difference is intentional.

Notes

  • Only cargo build and cargo clippy were run — no functional tests executed
  • The status.rs change assumes porcelain v2 output correctly identifies staged files; this should be verified with real repos
  • The diff.rs resolve_two_revs now propagates errors with ? instead of the old match on Ok/Err; initial commits fall back to the empty tree hash via the caller's match on resolve_two_revs
  • In get_diff_json_filtered, the parent is resolved via resolve_two_revs(repo, &parent_rev, &to_commit) which redundantly re-resolves to_commit — minor inefficiency but not a bug

Link to Devin run: https://app.devin.ai/sessions/972f0c2bf6d94b17b663d3e7d52da147
Requested by: @svarlamov


Open with Devin

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/git-ai-project/git-ai/pull/595 **Author:** [@svarlamov](https://github.com/svarlamov) **Created:** 2/26/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `devin/1772084830-perf-batch-git-calls` --- ### 📝 Commits (3) - [`f49731e`](https://github.com/git-ai-project/git-ai/commit/f49731e872851538c7894e2954daf00a83cff47b) perf: batch git exec calls for notes, rev-parse, status, and checkpoints - [`87cd073`](https://github.com/git-ai-project/git-ai/commit/87cd07357c429075d88dddfee5055ae7837aa777) style: fix cargo fmt formatting issues - [`fb7ef32`](https://github.com/git-ai-project/git-ai/commit/fb7ef32da125c364d6bb5a03f0b0cf709d984c8f) merge: resolve conflict with origin/main in diff.rs ### 📊 Changes **6 files changed** (+220 additions, -94 deletions) <details> <summary>View changed files</summary> 📝 `src/commands/blame.rs` (+55 -23) 📝 `src/commands/checkpoint.rs` (+25 -28) 📝 `src/commands/diff.rs` (+43 -36) 📝 `src/git/authorship_traversal.rs` (+1 -1) 📝 `src/git/refs.rs` (+81 -2) 📝 `src/git/status.rs` (+15 -4) </details> ### 📄 Description ## Summary This PR implements performance optimizations to reduce git subprocess overhead by batching git exec calls. The primary bottleneck in git-ai is the number of individual git invocations, especially for operations like `git notes show` and `git rev-parse` that are called repeatedly in loops. **Changes:** - **Fix #1 & #2**: Batch-fetch authorship notes in `blame.rs` and `refs.rs` using 2 git calls (1 `cat-file --batch-check` + 1 `cat-file --batch`) instead of N individual `git notes show` calls - **Fix #4**: Combine `resolve_commit` + `resolve_parent` into single `git rev-parse` call in `diff.rs` - **Fix #5**: Skip redundant `get_staged_filenames()` call in `status()` when no pathspecs provided (porcelain v2 output already contains staging info) - **Fix #7**: Read checkpoints once instead of twice in `get_all_tracked_files()` - Add `batch_get_authorship_logs()` helper function for reuse across codebase - Make `batch_read_blobs_with_oids()` public for cross-module access **Expected impact:** Significant reduction in subprocess overhead for blame operations (N commits → 2 git calls), diff operations (2 calls → 1), and status operations (2 calls → 1 when no pathspecs). ## Updates since last revision - Resolved merge conflict with `main` in `src/commands/diff.rs`: the `resolve_two_revs` function now uses `exec_git_with_profile(&args, InternalGitProfile::General)?` (combining the PR's `?` error propagation + stdout parsing with main's migration from `exec_git` to `exec_git_with_profile`). This matches the pattern already used by `resolve_commit` in the same file. ## Review & Testing Checklist for Human ⚠️ **Risk level: YELLOW** - Core git operations modified without end-to-end testing - [ ] **Test blame command** with commits that have authorship notes, especially files with many unique commits. Verify AI/human attribution is correct and performance improves. - [ ] **Test diff command** on edge cases: initial commits (no parent), symbolic refs (HEAD, branch names), commit ranges. Verify correct fallback to empty tree hash. Pay attention to the merge-conflict resolution in `resolve_two_revs` — confirm error propagation via `?` behaves correctly when revisions can't be resolved. - [ ] **Test status command** with staged files, both with and without pathspecs. Verify staged files are correctly identified without the explicit `get_staged_filenames()` pre-call. - [ ] **Verify schema version handling**: `batch_get_authorship_logs()` correctly filters by `AUTHORSHIP_LOG_VERSION`, but the inline batch code in `get_commits_with_notes_from_list()` does **not** — it deserializes without a version check. Confirm this difference is intentional. ### Notes - Only `cargo build` and `cargo clippy` were run — no functional tests executed - The `status.rs` change assumes porcelain v2 output correctly identifies staged files; this should be verified with real repos - The `diff.rs` `resolve_two_revs` now propagates errors with `?` instead of the old `match` on `Ok`/`Err`; initial commits fall back to the empty tree hash via the caller's `match` on `resolve_two_revs` - In `get_diff_json_filtered`, the parent is resolved via `resolve_two_revs(repo, &parent_rev, &to_commit)` which redundantly re-resolves `to_commit` — minor inefficiency but not a bug --- Link to Devin run: https://app.devin.ai/sessions/972f0c2bf6d94b17b663d3e7d52da147 Requested by: @svarlamov <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/git-ai-project/git-ai/pull/595" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end --> --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/git-ai#601
No description provided.