[GH-ISSUE #1495] Bug: Sort By Size Not Working #880

Open
opened 2026-03-01 14:47:01 +03:00 by kerem · 3 comments
Owner

Originally created by @zero77 on GitHub (Aug 27, 2024).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1495

Describe the bug

Under snapshot when you click on size either once or twice it does not sort by size but does reorder the list of snapshots.

Steps to reproduce

Go to snapshot click on size either once or twice

Screenshots or log output

ArchiveBox version

v0.8.2

Originally created by @zero77 on GitHub (Aug 27, 2024). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1495 #### Describe the bug Under snapshot when you click on size either once or twice it does not sort by size but does reorder the list of snapshots. #### Steps to reproduce Go to snapshot click on size either once or twice #### Screenshots or log output <!-- If applicable, post any relevant screenshots or copy/pasted terminal output from ArchiveBox. If you're reporting a parsing / importing error, **you must paste a copy of your redacted import file here**. --> #### ArchiveBox version v0.8.2
Author
Owner

@pirate commented on GitHub (Aug 27, 2024):

That is correct, size is not stored in DB as it's an expensive filesystem calculation, it's computed on the fly and cached only for the snapshots currently visible on the page. When you sort by size it actually sorts by number of output results recorded, which is the closest approximation that's still reasonably fast.

If you really need to sort by size I suggest just sorting the data/archive/ dir in your filesystem browser or using a CLI tool like Ncdu (both of which you'll notice take several minutes to compute sizes for everything).

I will likely add size sorting in the UI back in the future as it's very useful, but I need to do some internal architecture improvements to support doing it performantly.

<!-- gh-comment-id:2313258388 --> @pirate commented on GitHub (Aug 27, 2024): That is correct, size is not stored in DB as it's an expensive filesystem calculation, it's computed on the fly and cached only for the snapshots currently visible on the page. When you sort by size it actually sorts by number of output results recorded, which is the closest approximation that's still reasonably fast. If you really need to sort by size I suggest just sorting the data/archive/ dir in your filesystem browser or using a CLI tool like Ncdu (both of which you'll notice take several minutes to compute sizes for everything). I will likely add size sorting in the UI back in the future as it's very useful, but I need to do some internal architecture improvements to support doing it performantly.
Author
Owner

@zero77 commented on GitHub (Aug 28, 2024):

@pirate
Thank you for the explanation, the sizes are there already even if they are not completely accurate.

In the below example i have sorted by largest first but it didn't work:

image

<!-- gh-comment-id:2314944025 --> @zero77 commented on GitHub (Aug 28, 2024): @pirate Thank you for the explanation, the sizes are there already even if they are not completely accurate. In the below example i have sorted by largest first but it didn't work: ![image](https://github.com/user-attachments/assets/e292af9e-5936-4c47-bb98-0680287eff07)
Author
Owner

@pirate commented on GitHub (Aug 28, 2024):

The displayed sizes are accurate but it's doing a trick where it only calculates them for the list page you're actively viewing (<100 snapshots at a time). The sorting needs to operate across all pages matching your active filters, so it cant be based on the displayed sizes in the current architecture (calculating the sizes of every single snapshot across all pages in order to sort correctly would take too long).

<!-- gh-comment-id:2314982599 --> @pirate commented on GitHub (Aug 28, 2024): The displayed sizes are accurate but it's doing a trick where it only calculates them for the list page you're actively viewing (<100 snapshots at a time). The sorting needs to operate across *all pages* matching your active filters, so it cant be based on the displayed sizes in the current architecture (calculating the sizes of every single snapshot across all pages in order to sort correctly would take too long).
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#880
No description provided.