[PR #571] fix: reduce memory overflow from checkpoint reads and writes (#344) #577

Open
opened 2026-03-02 04:14:01 +03:00 by kerem · 0 comments
Owner

Original Pull Request: https://github.com/git-ai-project/git-ai/pull/571

State: open
Merged: No


fix: reduce memory overflow from checkpoint reads and writes (#344)

Summary

Addresses the runaway memory usage (30-60GB) reported in #344 by fixing the highest-impact patterns in checkpoint I/O:

1. Streaming prune-and-rewrite on append (repo_storage.rs): append_checkpoint previously read ALL checkpoints into memory, appended one, pruned, then wrote ALL back — O(N) memory per append. Now it streams existing checkpoints one-at-a-time through a BufReader, prunes char-level attributions for files superseded by the new checkpoint, writes to a temp file via BufWriter, appends the new checkpoint, then atomically renames. Peak memory is one checkpoint + the new checkpoint, rather than all checkpoints. The file stays small throughout long agent loops because pruning happens on every append.

2. Eliminate redundant full reads (checkpoint.rs): A single checkpoint::run() call previously triggered 4+ independent read_all_checkpoints() deserializations of the entire JSONL file. Now checkpoints are read once at the top of run() and passed through to get_all_tracked_files via a new preloaded_checkpoints parameter.

3. Streaming reads (repo_storage.rs): read_all_checkpoints now uses BufReader line-by-line instead of fs::read_to_string, avoiding holding the full file string and parsed structs in memory simultaneously.

4. BufWriter for writes (repo_storage.rs): write_all_checkpoints now streams serialization through BufWriter instead of building a full string in memory. An explicit flush() call ensures write errors are propagated rather than silently dropped on BufWriter::drop.

All 38 checkpoint-related unit tests pass (31 checkpoint tests + 7 repo_storage tests). No new dependencies added.

Updates since last revision

Major rework: The initial approach deferred char-level attribution pruning to write_all_checkpoints, but this left un-pruned data in the file during the intra-commit loop where memory issues are worst. The new approach prunes on every append_checkpoint call using a streaming read-modify-write pattern that keeps only one checkpoint in memory at a time. The prune_old_char_attributions method was removed entirely (logic inlined into append_checkpoint).

Reset ordering fix (ef4a4d70): Fixed a data loss bug where reset=true + is_pre_commit=true would destroy the working log before the early-exit check, causing the check to always pass (empty checkpoints → no AI edits) and skip checkpoint creation. Now checkpoints are always read first for the early-exit check, then reset happens afterward if needed.

Other changes:

  • write_all_checkpoints signature reverted to &[Checkpoint] (no longer needs &mut since pruning moved to append)
  • Added explicit writer.flush()?; in both append_checkpoint and write_all_checkpoints to ensure I/O errors are propagated

Review & Testing Checklist for Human

  • Reset + pre_commit interaction: When reset=true and is_pre_commit=true, verify that the early-exit check correctly inspects the old data before reset, and that the subsequent reset properly clears both disk state and in-memory checkpoints. Test with a repo that has AI checkpoints, run checkpoint --reset during pre-commit, and verify a fresh checkpoint is created (not skipped).
  • Streaming prune correctness: The new append_checkpoint assumes the checkpoint being appended is the newest for its files (clears attributions from older entries with matching files). This should always be true since we append chronologically, but verify no code path appends out-of-order or writes un-pruned checkpoints via write_all_checkpoints directly.
  • Real-world validation: Test with a repo that has a large checkpoint file (>100MB) and multiple agent sessions. Verify memory usage stays reasonable during git commit and that attributions are correctly preserved end-to-end. Unit tests validate correctness but not the memory improvement.
  • Temp file cleanup: append_checkpoint writes to checkpoints.jsonl.tmp then renames. If the process crashes mid-write, the temp file is left behind (harmless but clutters .git/ai/). Consider if cleanup is needed.

Notes

  • get_all_tracked_files gained an optional preloaded_checkpoints parameter. Existing callers that don't pass it will still work (reads from disk as before).
  • No changes to checkpoint format or serialization — purely I/O optimization.
  • The streaming prune approach means write_all_checkpoints no longer prunes. If any code path writes un-pruned data via write_all_checkpoints, it won't be pruned until the next append_checkpoint.
  • The has_no_ai_edits logic was rewritten from all_ai_touched_files().is_empty() to checkpoints.iter().all(|cp| cp.entries.is_empty() || cp.kind != AiAgent/AiTab). These are logically equivalent but worth a careful trace through both code paths.

Link to Devin run: https://app.devin.ai/sessions/2a46b6eaa71f4f46913488bef2ff52a1
Requested by: @svarlamov


Open with Devin
**Original Pull Request:** https://github.com/git-ai-project/git-ai/pull/571 **State:** open **Merged:** No --- # fix: reduce memory overflow from checkpoint reads and writes (#344) ## Summary Addresses the runaway memory usage (30-60GB) reported in #344 by fixing the highest-impact patterns in checkpoint I/O: **1. Streaming prune-and-rewrite on append** (`repo_storage.rs`): `append_checkpoint` previously read ALL checkpoints into memory, appended one, pruned, then wrote ALL back — O(N) memory per append. Now it streams existing checkpoints one-at-a-time through a `BufReader`, prunes char-level attributions for files superseded by the new checkpoint, writes to a temp file via `BufWriter`, appends the new checkpoint, then atomically renames. Peak memory is one checkpoint + the new checkpoint, rather than all checkpoints. The file stays small throughout long agent loops because pruning happens on every append. **2. Eliminate redundant full reads** (`checkpoint.rs`): A single `checkpoint::run()` call previously triggered 4+ independent `read_all_checkpoints()` deserializations of the entire JSONL file. Now checkpoints are read once at the top of `run()` and passed through to `get_all_tracked_files` via a new `preloaded_checkpoints` parameter. **3. Streaming reads** (`repo_storage.rs`): `read_all_checkpoints` now uses `BufReader` line-by-line instead of `fs::read_to_string`, avoiding holding the full file string and parsed structs in memory simultaneously. **4. BufWriter for writes** (`repo_storage.rs`): `write_all_checkpoints` now streams serialization through `BufWriter` instead of building a full string in memory. An explicit `flush()` call ensures write errors are propagated rather than silently dropped on `BufWriter::drop`. All 38 checkpoint-related unit tests pass (31 checkpoint tests + 7 repo_storage tests). No new dependencies added. ## Updates since last revision **Major rework**: The initial approach deferred char-level attribution pruning to `write_all_checkpoints`, but this left un-pruned data in the file during the intra-commit loop where memory issues are worst. The new approach prunes on every `append_checkpoint` call using a streaming read-modify-write pattern that keeps only one checkpoint in memory at a time. The `prune_old_char_attributions` method was removed entirely (logic inlined into `append_checkpoint`). **Reset ordering fix** (ef4a4d70): Fixed a data loss bug where `reset=true` + `is_pre_commit=true` would destroy the working log before the early-exit check, causing the check to always pass (empty checkpoints → no AI edits) and skip checkpoint creation. Now checkpoints are always read first for the early-exit check, then reset happens afterward if needed. Other changes: - `write_all_checkpoints` signature reverted to `&[Checkpoint]` (no longer needs `&mut` since pruning moved to append) - Added explicit `writer.flush()?;` in both `append_checkpoint` and `write_all_checkpoints` to ensure I/O errors are propagated ## Review & Testing Checklist for Human - [ ] **Reset + pre_commit interaction**: When `reset=true` and `is_pre_commit=true`, verify that the early-exit check correctly inspects the old data before reset, and that the subsequent reset properly clears both disk state and in-memory checkpoints. Test with a repo that has AI checkpoints, run `checkpoint --reset` during pre-commit, and verify a fresh checkpoint is created (not skipped). - [ ] **Streaming prune correctness**: The new `append_checkpoint` assumes the checkpoint being appended is the newest for its files (clears attributions from older entries with matching files). This should always be true since we append chronologically, but verify no code path appends out-of-order or writes un-pruned checkpoints via `write_all_checkpoints` directly. - [ ] **Real-world validation**: Test with a repo that has a large checkpoint file (>100MB) and multiple agent sessions. Verify memory usage stays reasonable during `git commit` and that attributions are correctly preserved end-to-end. Unit tests validate correctness but not the memory improvement. - [ ] **Temp file cleanup**: `append_checkpoint` writes to `checkpoints.jsonl.tmp` then renames. If the process crashes mid-write, the temp file is left behind (harmless but clutters `.git/ai/`). Consider if cleanup is needed. ### Notes - `get_all_tracked_files` gained an optional `preloaded_checkpoints` parameter. Existing callers that don't pass it will still work (reads from disk as before). - No changes to checkpoint format or serialization — purely I/O optimization. - The streaming prune approach means `write_all_checkpoints` no longer prunes. If any code path writes un-pruned data via `write_all_checkpoints`, it won't be pruned until the next `append_checkpoint`. - The `has_no_ai_edits` logic was rewritten from `all_ai_touched_files().is_empty()` to `checkpoints.iter().all(|cp| cp.entries.is_empty() || cp.kind != AiAgent/AiTab)`. These are logically equivalent but worth a careful trace through both code paths. Link to Devin run: https://app.devin.ai/sessions/2a46b6eaa71f4f46913488bef2ff52a1 Requested by: @svarlamov <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/git-ai-project/git-ai/pull/571" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end -->
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/git-ai#577
No description provided.