mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-26 01:26:00 +03:00
[GH-ISSUE #865] Bug: wget timeout retrying unavailable ipv6 (Docker/Pihole) #3556
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#3556
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @lkubb on GitHub (Sep 30, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/865
This is not necessarily a bug in Archivebox, but an edge case in the default configuration that might cause hiccups for some. Hope this template is fine nevertheless.
Describe the bug
I'm running Archivebox inside a Docker container configured with docker-compose. Pihole runs inside my network as a DNS server. Running wget extractor on a site with a blocked resource results in a timeout, failing to archive. It will retry repeatedly to reach the resource, running over the time limit specified by Archivebox. This is probably specific to the way Pihole blocks domains on the DNS level by default, some Docker ipv6 weirdness and wget not recognising
Cannot assign requested addressas fatal.This is solvable by setting
inet4_only = onin/etc/wgetrc. Not sure if you want to support this edge case, but I would propose either documenting this issue or adding a configurationWGET_FORCE_IPV4which adds--inet4-onlyto the generated wget command. I can open a pull request, if desired.Related issue: https://github.com/ArchiveBox/ArchiveBox/issues/491
Steps to reproduce
wget https://raw.githubusercontent.com/ArchiveBox/ArchiveBox/dev/docker-compose.ymldocker-compose up -dforce DNS queries for
cdn.optimizely.comto resolve toipv4 0.0.0.0 / ipv6 ::try to archive
https://medium.comusing wget extractor, e.g. via web UI, result:or verify that the following command runs longer than 60s:
Screenshots or log output
[Note: my pihole blocks another domain on that site,
static.cloudflareinsights.com]ArchiveBox version
@pirate commented on GitHub (Sep 30, 2021):
You can set
WGET_ARGSfor this:In general, every dependency used has a similar
<dependency name>_ARGSconfig option to satisfy edge cases like this. 👍See here for the source code for the
WGET_ARGSoption: https://github.com/ArchiveBox/ArchiveBox/blob/dev/archivebox/config.py#L150