mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-26 01:26:00 +03:00
[GH-ISSUE #985] UX Wart: 504 error when long-running request times out in web UI #3631
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#3631
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @kylrth on GitHub (May 27, 2022).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/985
I reverse proxy my public archive through NGINX. If I request a long-running action from the web UI (like if I click "pull" when I've selected 50 archives), the browser waits for a response from the server, which will only come when the entire job is completed. Since that's going to take a while, the request times out and NGINX serves a "504 gateway time-out" page.
The server should probably respond when the request is received, not when it's completed.
ArchiveBox version
@mAAdhaTTah commented on GitHub (May 29, 2022):
I ran into this and I ended up configuring my reverse proxy to not timeout. The page can't load until the process is completed (it's not a background task) so I'm not sure your solution would work.
@kylrth commented on GitHub (May 29, 2022):
Ok, thanks for the recommendation. I'll probably do that for now. What I'm suggesting is to make it a background task, because that seems more appropriate. It's fine if not though.
@mAAdhaTTah commented on GitHub (May 29, 2022):
Feasibly, you could do this via the CLI with some combo of
scheduleand/or a cronjob. Although I can't seem to find it now, there are designs on improving the background job capabilities of ArchiveBox, although I wouldn't expect that to land anytime soon, given the slow development on the project right now.@pirate commented on GitHub (Jun 9, 2022):
The archive task actually continues just fine even if the user navigates away after the 504, so I haven't prioritized fixing this but I've been aware of it for a while. It was convenient to run the archive task from the main request thread without forking because if it finishes in time for the response then the UX is normal, and if it 504s and they refresh it also works, it's just a UX wart to show the error on long running tasks.
One easy way to solve this is to use Django's little-known post-request pattern where you subclass and override
HTTPResponse.close()to run a function after the response is returned (that way we don't have to add a whole async task runner or scheduling system like Celery/dramatiq): https://gist.github.com/pirate/c4deb41c16793c05950a6721a820cde9Another way is to use
StreamingHttpResponseto return 90% of the response html immediately and the last chunk on completion that runs some JS to trigger a page refresh: https://gist.github.com/pirate/79f84dfee81ba0a38b6113541e827fd5@pirate commented on GitHub (Jan 19, 2024):
This was improved on the
/add/page in v0.7.2, the UI now auto-redirects back to the Snapshots page. We should still implement improvements for the other long-running admin actions though...@Routhinator commented on GitHub (Oct 31, 2025):
I'll note that I landed on this as the Reset option does timeout, and after several tests I can say that the page is not reset and updated with the latest snapshot unless the connection does not time out . I have a snapshot with an IP block on the singlepage output, and when I reset it - the updated copy gathered is discarded after the 504, and the old copy remains.
If I use Re-snapshot things work, but then i need to go delete the old snapshot to keep one copy.
@Routhinator commented on GitHub (Oct 31, 2025):
Actually this may be something else - finally got proxy timeouts set long enough for Reset to complete, and it just never updated the chrome singlepage. Running re-snapshot works. Not sure why it never updates with Reset even though the logs show it is re-snapshotting the url
@pirate commented on GitHub (Dec 29, 2025):
this is fixed on
dev