[PR #157] Move to virtual attestations for faster rewrite ops #295

Closed
opened 2026-03-02 04:13:13 +03:00 by kerem · 0 comments
Owner

Original Pull Request: https://github.com/git-ai-project/git-ai/pull/157

State: closed
Merged: Yes


Performance for rewrite ops was scaling along the repo size and file lines. The approach of using simulated commits and line-by-line blame was accurate, but required a ton of git lookups and was not fit for the task.

The new approach introduced in this PR relies on the same engine powering our DMP checkpoints. VirtualAttribution maps are created for both lineages being rewritten (for rebase, cherry-pick, merge+squash, reset). We can then merge them entirely in memory, without running git blame or simulating commits. This is many orders faster, requires less complex code (look at the diff), and is more logically correct.

other nice characteristics

  • Resets and merge+squash do not need to simulate checkpoints. INITIAL attestation set instead so initial checkpoint includes the changes already.
  • no more hanging commits in repo (would never be committed since there's no ref attached, but still cruft)
  • should scale with size of commits (git's goal!), not size of repos or length of history
  • Blame only needs to run once per version and is parallelized (30 at once)
**Original Pull Request:** https://github.com/git-ai-project/git-ai/pull/157 **State:** closed **Merged:** Yes --- Performance for rewrite ops was scaling along the repo size and file lines. The approach of using simulated commits and line-by-line blame was accurate, but required a ton of git lookups and was not fit for the task. The new approach introduced in this PR relies on the same engine powering our DMP checkpoints. `VirtualAttribution` maps are created for both lineages being rewritten (for rebase, cherry-pick, merge+squash, reset). We can then merge them entirely in memory, without running git blame or simulating commits. This is many orders faster, requires less complex code (look at the diff), and is more logically correct. other nice characteristics - Resets and merge+squash do not need to simulate checkpoints. `INITIAL` attestation set instead so initial checkpoint includes the changes already. - no more hanging commits in repo (would never be committed since there's no ref attached, but still cruft) - should scale with size of commits (git's goal!), not size of repos or length of history - Blame only needs to run once per version and is parallelized (30 at once)
kerem 2026-03-02 04:13:13 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/git-ai#295
No description provided.