[PR #562] Store subagent transcripts as separate prompt records #569

Open
opened 2026-03-02 04:14:00 +03:00 by kerem · 0 comments
Owner

Original Pull Request: https://github.com/git-ai-project/git-ai/pull/562

State: open
Merged: No


Summary

  • Parse Claude Code subagent transcripts from <session>/subagents/agent-<id>.jsonl into separate SubagentInfo structs instead of merging them into the parent transcript
  • Add parent_id field to PromptRecord and PromptDbRecord to link subagent prompt records back to their parent
  • Expand subagent metadata into separate prompt records at post-commit time and in virtual attributions, preserving thread hierarchy
  • Make the v3→v4 DB migration idempotent: apply_migration() now catches "duplicate column name" errors from ALTER TABLE ADD COLUMN so a crash between the DDL and the version-bump doesn't brick the database on restart

Closes #371

Design

Subagent transcripts stored in Claude Code's subagents/ directory are now collected as SubagentInfo structs during JSONL parsing rather than being flattened into the parent transcript. These are propagated through the checkpoint pipeline via PromptUpdateResult::Updated and stored as JSON metadata (__subagents key) on the parent checkpoint.

At post-commit time (and in virtual attributions), the subagent metadata is expanded into separate PromptDbRecord entries, each with a unique hash ID and a parent_id linking back to the parent prompt. This preserves the natural thread structure of agentic coding sessions.

DB schema migrated from version 3 to 4 to add the parent_id TEXT column to the prompts table. The migration is idempotent — if the column already exists (e.g. from a previous partial migration), the error is caught and silently skipped.

Updates since last revision

  • Merged latest main into the branch and resolved merge conflicts in tests/claude_code.rs (added new subagent test names to the reuse_tests_in_worktree! macro)
  • Made apply_migration() idempotent by catching "duplicate column name" SQLite errors instead of crashing
  • Added test_initialize_schema_handles_preexisting_parent_id_column to verify the idempotency path
  • Fixed missing parent_id: None in tests/worktrees.rs PromptRecord initializer (compilation error)
  • Fixed formatting (extra blank line in tests/claude_code.rs)

Test plan

  • cargo clippy — no warnings
  • cargo test --test claude_code — all 24 tests pass
  • cargo test --test agent_presets_comprehensive — all 58 tests pass
  • New test: test_parse_claude_code_jsonl_with_subagents verifies subagents are returned separately
  • Updated test: test_parse_claude_code_jsonl_without_subagents_dir verifies empty vec for no subagents
  • New test: test_initialize_schema_handles_preexisting_parent_id_column verifies idempotent migration
  • New test fixtures: claude-code-with-subagents.jsonl and subagents/agent-test-sub-1.jsonl
  • CI: Format , Lint (all platforms) , Ubuntu tests (hooks/wrapper/both)

Review & Testing Checklist for Human

There are 4 items to verify given the scope of DB schema + query changes:

  • Idempotency approach: Verify that catching "duplicate column name" via string matching is acceptable for SQLite migrations (this is a common pattern but relies on error message text)
  • SQL query coverage: Spot-check that all SELECT, INSERT, and UPDATE queries in internal_db.rs include the new parent_id column in the correct position (column order matters for positional binding)
  • Test coverage: Confirm all PromptRecord initializers in test files include parent_id: None (compilation would fail if any were missed, but worth double-checking)
  • Subagent parsing correctness: Review test_parse_claude_code_jsonl_with_subagents to ensure subagent messages are correctly separated from the parent transcript and not duplicated

Notes

  • The parent_id field is Option<String> with #[serde(default)], so existing serialized data without this field will deserialize correctly
  • The migration is safe to run multiple times — if the column already exists, the error is caught and the version is still updated
  • All Ubuntu CI checks are passing; macOS and Windows checks are still pending but not blocking per user request

🤖 Generated with Devin
Requested by: @svarlamov

**Original Pull Request:** https://github.com/git-ai-project/git-ai/pull/562 **State:** open **Merged:** No --- ## Summary - Parse Claude Code subagent transcripts from `<session>/subagents/agent-<id>.jsonl` into separate `SubagentInfo` structs instead of merging them into the parent transcript - Add `parent_id` field to `PromptRecord` and `PromptDbRecord` to link subagent prompt records back to their parent - Expand subagent metadata into separate prompt records at post-commit time and in virtual attributions, preserving thread hierarchy - **Make the v3→v4 DB migration idempotent**: `apply_migration()` now catches `"duplicate column name"` errors from `ALTER TABLE ADD COLUMN` so a crash between the DDL and the version-bump doesn't brick the database on restart Closes #371 ## Design Subagent transcripts stored in Claude Code's `subagents/` directory are now collected as `SubagentInfo` structs during JSONL parsing rather than being flattened into the parent transcript. These are propagated through the checkpoint pipeline via `PromptUpdateResult::Updated` and stored as JSON metadata (`__subagents` key) on the parent checkpoint. At post-commit time (and in virtual attributions), the subagent metadata is expanded into separate `PromptDbRecord` entries, each with a unique hash ID and a `parent_id` linking back to the parent prompt. This preserves the natural thread structure of agentic coding sessions. DB schema migrated from version 3 to 4 to add the `parent_id TEXT` column to the prompts table. The migration is idempotent — if the column already exists (e.g. from a previous partial migration), the error is caught and silently skipped. ## Updates since last revision - Merged latest `main` into the branch and resolved merge conflicts in `tests/claude_code.rs` (added new subagent test names to the `reuse_tests_in_worktree!` macro) - Made `apply_migration()` idempotent by catching `"duplicate column name"` SQLite errors instead of crashing - Added `test_initialize_schema_handles_preexisting_parent_id_column` to verify the idempotency path - Fixed missing `parent_id: None` in `tests/worktrees.rs` `PromptRecord` initializer (compilation error) - Fixed formatting (extra blank line in `tests/claude_code.rs`) ## Test plan - [x] `cargo clippy` — no warnings - [x] `cargo test --test claude_code` — all 24 tests pass - [x] `cargo test --test agent_presets_comprehensive` — all 58 tests pass - [x] New test: `test_parse_claude_code_jsonl_with_subagents` verifies subagents are returned separately - [x] Updated test: `test_parse_claude_code_jsonl_without_subagents_dir` verifies empty vec for no subagents - [x] New test: `test_initialize_schema_handles_preexisting_parent_id_column` verifies idempotent migration - [x] New test fixtures: `claude-code-with-subagents.jsonl` and `subagents/agent-test-sub-1.jsonl` - [x] CI: Format ✅, Lint (all platforms) ✅, Ubuntu tests (hooks/wrapper/both) ✅ ## Review & Testing Checklist for Human There are 4 items to verify given the scope of DB schema + query changes: - [ ] **Idempotency approach**: Verify that catching `"duplicate column name"` via string matching is acceptable for SQLite migrations (this is a common pattern but relies on error message text) - [ ] **SQL query coverage**: Spot-check that all `SELECT`, `INSERT`, and `UPDATE` queries in `internal_db.rs` include the new `parent_id` column in the correct position (column order matters for positional binding) - [ ] **Test coverage**: Confirm all `PromptRecord` initializers in test files include `parent_id: None` (compilation would fail if any were missed, but worth double-checking) - [ ] **Subagent parsing correctness**: Review `test_parse_claude_code_jsonl_with_subagents` to ensure subagent messages are correctly separated from the parent transcript and not duplicated ### Notes - The `parent_id` field is `Option<String>` with `#[serde(default)]`, so existing serialized data without this field will deserialize correctly - The migration is safe to run multiple times — if the column already exists, the error is caught and the version is still updated - All Ubuntu CI checks are passing; macOS and Windows checks are still pending but not blocking per user request 🤖 Generated with [Devin](https://app.devin.ai/sessions/00bd2e733bf8409980f95f21b7245dc1) Requested by: @svarlamov
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/git-ai#569
No description provided.