[GH-ISSUE #702] Question: Performance Comparison docker based vs bare metal #1951

Closed
opened 2026-03-01 17:55:16 +03:00 by kerem · 1 comment
Owner

Originally created by @asitemade4u on GitHub (Apr 11, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/702

Hi,
Thank you for Archive Box, which is outstanding -- basically exactly what I was looking for.
My only concern is ArchiveBox performance when executed within a docker environment. Notably. it does not seem to be using all CPU available nor memory.
Is there a better track record with bare-metal installations? Can ArchiveBox take full advantage of all processing power available and work in parallel?
Best,
Stephen

Originally created by @asitemade4u on GitHub (Apr 11, 2021). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/702 Hi, Thank you for Archive Box, which is outstanding -- basically exactly what I was looking for. My only concern is ArchiveBox performance when executed within a docker environment. Notably. it does not seem to be using all CPU available nor memory. Is there a better track record with bare-metal installations? Can ArchiveBox take full advantage of all processing power available and work in parallel? Best, Stephen
kerem closed this issue 2026-03-01 17:55:16 +03:00
Author
Owner

@pirate commented on GitHub (Apr 12, 2021):

Going to close this in favor of our existing issue about parallel archiving / performance: https://github.com/ArchiveBox/ArchiveBox/issues/91
Please subscribe to that one if you want updates.

There is no significant difference in performance between docker / non-docker, the main bottleneck is the blocking IO and network due to single-threaded extractor execution (which will be removed when we moved to a message-passing based worker queue architecture).

The summary is: parallel archiving is semi-doable and safe already right now, but not perfect (you might encounter "database locked" errors https://github.com/ArchiveBox/ArchiveBox/issues/601 if you try too many threads). Just run multiple archivebox add commands at once.

<!-- gh-comment-id:817443084 --> @pirate commented on GitHub (Apr 12, 2021): Going to close this in favor of our existing issue about parallel archiving / performance: https://github.com/ArchiveBox/ArchiveBox/issues/91 Please subscribe to that one if you want updates. There is no significant difference in performance between docker / non-docker, the main bottleneck is the blocking IO and network due to single-threaded extractor execution (which will be removed when we moved to a message-passing based worker queue architecture). The summary is: parallel archiving is semi-doable and safe already right now, but not perfect (you might encounter "database locked" errors https://github.com/ArchiveBox/ArchiveBox/issues/601 if you try too many threads). Just run multiple `archivebox add` commands at once.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#1951
No description provided.