[PR #121] new move algo #268

Closed
opened 2026-03-02 04:13:08 +03:00 by kerem · 0 comments
Owner

Original Pull Request: https://github.com/git-ai-project/git-ai/pull/121

State: closed
Merged: Yes


Initial implementation of move detection was too slow and not predictable enough. This PR adds a much simpler, more efficient, and robust implementation for move.

Inputs: InsertedLine[] DeletedLine[] lines

InsertedLine:

  • Line content
  • Normalized content (unset, set in the algorithm)
  • Line number (in destination content)
  • Insertion idx (index of the insertion in the DMP insertions vec, for use in move mappings)

DeletedLine:

  • Line content
  • Normalized content (unset, set in the algorithm)
  • Line number (in original content)
  • Deletion idx (index of the deletion in the DMP deletions vec, for use in move mappings)

Algorithm:

  1. Sort inserted[] and deleted[] by line number (asc)
  2. Group insertions and deletions into contiguous ranges (a contiguous range is where the next entry’s line number is prev+1 — no gaps)
  3. Store the ‘normalized’ value for each line by trimming leading/trailing whitespace.
  4. Drop the lines for which the normalized value is empty (e.g., was all whitespace or empty line)
  5. Drop all groups (from both insertions and deletions) that is shorter than threshold (e.g., <3)
  6. For each line in each deletion group, hash the line’s normalized value. In a hashmap, use the line hash as the key and set the value to a tuple with the index of the deletion group and the index of the line in that deletion group.
  7. For each insertion group, loop over the lines and check if that line’s hash matches any hashes in the deleted lines hashmap. When you hit a match, run an inner loop to see how many contiguous lines you can match from that deletion group starting at the next index of the insertion and the subsequent index of the line matched from the deletion group. If you can match to the threshold (e.g., >2) then add the move mapping and move the outer loop to the next index in the insertion group. If you matched all the way to the end, then just move on to the next insertion group.
  8. Return the move mappings
**Original Pull Request:** https://github.com/git-ai-project/git-ai/pull/121 **State:** closed **Merged:** Yes --- Initial implementation of move detection was too slow and not predictable enough. This PR adds a much simpler, more efficient, and robust implementation for move. Inputs: InsertedLine[] DeletedLine[] lines InsertedLine: * Line content * Normalized content (unset, set in the algorithm) * Line number (in destination content) * Insertion idx (index of the insertion in the DMP insertions vec, for use in move mappings) DeletedLine: * Line content * Normalized content (unset, set in the algorithm) * Line number (in original content) * Deletion idx (index of the deletion in the DMP deletions vec, for use in move mappings) Algorithm: 1. Sort inserted[] and deleted[] by line number (asc) 2. Group insertions and deletions into contiguous ranges (a contiguous range is where the next entry’s line number is prev+1 — no gaps) 3. Store the ‘normalized’ value for each line by trimming leading/trailing whitespace. 4. Drop the lines for which the normalized value is empty (e.g., was all whitespace or empty line) 5. Drop all groups (from both insertions and deletions) that is shorter than threshold (e.g., <3) 6. For each line in each deletion group, hash the line’s normalized value. In a hashmap, use the line hash as the key and set the value to a tuple with the index of the deletion group and the index of the line in that deletion group. 7. For each insertion group, loop over the lines and check if that line’s hash matches any hashes in the deleted lines hashmap. When you hit a match, run an inner loop to see how many contiguous lines you can match from that deletion group starting at the next index of the insertion and the subsequent index of the line matched from the deletion group. If you can match to the threshold (e.g., >2) then add the move mapping and move the outer loop to the next index in the insertion group. If you matched all the way to the end, then just move on to the next insertion group. 8. Return the move mappings
kerem 2026-03-02 04:13:08 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/git-ai#268
No description provided.