mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 17:16:00 +03:00
[GH-ISSUE #186] wget Errors on latest master #132
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#132
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @n0ncetonic on GitHub (Mar 21, 2019).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/186
Describe the bug
wgettimes out after 30 seconds on the latest build ofmasterbranch. When same wget command is run outside of ArchiveBox wget works as expectedSteps to reproduce
Steps to reproduce the behavior:
Run ./archive
`echo "https://developer.apple.com/library/archive/technotes/tn2218/_index.html#//apple_ref/doc/uid/DTS40007625" | ./archive
See error
Screenshots or log output
Software versions
(please complete the following information)
d798117@n0ncetonic commented on GitHub (Mar 21, 2019):
Somehow appears to have resolved itself although wget does appear to have been severely slowed down by something in the commits between
c79e1dfandd798117and I'm getting throughput of 1 url archived every 30 or so seconds@pirate commented on GitHub (Mar 21, 2019):
Ok this is all helpful, thanks, I'll try to git bisect and see if it's the commit I think it is that slowed everything down so much.
Out of curiosity, how fast is your disk IO? I recently added some code that does ~5x more reading and rewriting of the index and output dir in order to provide a more real-time UI experience during the archiving process, so if disk IO is your bottleneck it would make sense that that change slowed it significantly for you.
@n0ncetonic commented on GitHub (Mar 21, 2019):
Unsure of how to determine speed of disk I/O. I'm pointing ArchiveBox to a mounted NAS share on the local network and the NAS has a gigabit line with 3 x WD RED 8TB NAS Drives
@n0ncetonic commented on GitHub (Mar 21, 2019):
Update: So I checked my settings and it looks like my NAS was mounting using SMB 2 by default. I've since changed this to SMB 3 which should help with any disk I/O issues resulting from network latency.
I'm running ArchiveBox again on the same data set as before and the issue seems to be resolved with an average archive time of 15 seconds per link which is back to a fairly decent speed.
@n0ncetonic commented on GitHub (Mar 22, 2019):
Closing this as it appears to have been fixed by switching to SMB3
@pirate commented on GitHub (Mar 22, 2019):
Also in case you ever need to check in the future, you can determine I/O speed like this: