[GH-ISSUE #654] Bugfix: Error: Search Backend only searching default admin search fields #410

Closed
opened 2026-03-01 14:43:18 +03:00 by kerem · 16 comments
Owner

Originally created by @alsokpisz on GitHub (Feb 15, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/654

Describe the bug

Bug:
The bug occurs when I attempt to search any query. An error message appears saying: "Error from the search backend, only showing results from default admin search fields -Error:[Errno -3] Temporary failure in name resolution."

If the search query is a word in the title of a website, it will return results with that word in it.
If it is only in the wget snapshot of the item, it will not return that item.

Context:
I am running ArchiveBox using on Windows 10 with docker-compose and have launched the web UI which I am successfully accessing at http://127.0.0.1:8000. As far as I can tell, all the snapshots are functional and there are no pending links. The output directory is on an external hard drive, but there have been no issues reading/writing from this drive (except for speed, though I can't tell if that's just how the Django Web UI is or not).

Relevant Info:
Bug seems similar to @jdcaballerov comment when search enabled but backend failed in his testing (see screenshot 4).

Steps to reproduce

mkdir archivebox && cd archivebox
curl -O https://raw.githubusercontent.com/ArchiveBox/ArchiveBox/master/docker-compose.yml

Edit the docker-compose yml's volumes section to read:

volumes:
  - ./data:/data
  - D:\\archivebox:/mnt/d/archivebox

(unsure if external drive-specific setup needed to reproduce, so wanted to include)

Open a Windows Terminal in administrator mode, navigate to D:/archivebox/, open a Git Bash tab and run the following:

> docker-compose up -d
> docker-compose run archivebox init
> docker-compose run archivebox manage createsuperuser
> docker-compose run archivebox add 'https://www.dailydot.com/parsec/fandom/dieselpunk-steampunk-beginners-guide/'

Navigate to http://127.0.0.1:8000 and search "beginners' (see screenshot 1). Because it is in the title, it will show up. The error message will also show up.

Search "biopunk" (see screenshot 2). Even though it is in the wget file, it will not show up (see screenshot 3). The error message will show up. I have not done extensive testing on whether different filetype snapshots will get searched or not, but I don't think it picks any of them up if they are not in title.

Screenshots or log output

Screenshot 1:
image

Screenshot 2:
image

Screenshot 3:
image

Screenshot 4:
image

ArchiveBox version

ArchiveBox v0.5.6
Cpython Linux Linux-4.19.128-microsoft-standard-x86_64-with-glibc2.28 x86_64 (in Docker)

[i] Dependency versions:
 √  ARCHIVEBOX_BINARY     v0.5.6          valid     /usr/local/bin/archivebox
 √  PYTHON_BINARY         v3.9.1          valid     /usr/local/bin/python3.9
 √  DJANGO_BINARY         v3.1.3          valid     /usr/local/lib/python3.9/site-packages/django/bin/django-admin.py
 √  CURL_BINARY           v7.64.0         valid     /usr/bin/curl
 √  WGET_BINARY           v1.20.1         valid     /usr/bin/wget
 √  NODE_BINARY           v15.8.0         valid     /usr/bin/node
 √  SINGLEFILE_BINARY     v0.1.14         valid     /node/node_modules/single-file/cli/single-file
 √  READABILITY_BINARY    v0.1.0          valid     /node/node_modules/readability-extractor/readability-extractor
 √  MERCURY_BINARY        v1.0.0          valid     /node/node_modules/@postlight/mercury-parser/cli.js
 √  GIT_BINARY            v2.20.1         valid     /usr/bin/git
 √  YOUTUBEDL_BINARY      v2021.02.04.1   valid     /usr/local/bin/youtube-dl
 √  CHROME_BINARY         v88.0.4324.146  valid     /usr/bin/chromium
 √  RIPGREP_BINARY        v0.10.0         valid     /usr/bin/rg

[i] Source-code locations:
 √  PACKAGE_DIR           22 files        valid     /app/archivebox
 √  TEMPLATES_DIR         3 files         valid     /app/archivebox/templates

[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled
 -  COOKIES_FILE          -               disabled

>docker -version
Docker version 20.10.2, build 2291f61

>docker-compose --version
docker-compose version 1.27.4, build 40524192

Originally created by @alsokpisz on GitHub (Feb 15, 2021). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/654 #### Describe the bug <!-- A description of what the bug is, what you expected to happen, and any relevant context about issue. --> **Bug**: The bug occurs when I attempt to search any query. An error message appears saying: `"Error from the search backend, only showing results from default admin search fields -Error:[Errno -3] Temporary failure in name resolution."` If the search query is a word in the title of a website, it will return results with that word in it. If it is only in the wget snapshot of the item, it will not return that item. _Context_: I am running ArchiveBox using on Windows 10 with docker-compose and have launched the web UI which I am successfully accessing at `http://127.0.0.1:8000`. As far as I can tell, all the snapshots are functional and there are no pending links. The output directory is on an external hard drive, but there have been no issues reading/writing from this drive (except for speed, though I can't tell if that's just how the Django Web UI is or not). _Relevant Info_: Bug seems similar to @jdcaballerov comment when search enabled but backend failed in [his testing](https://github.com/ArchiveBox/ArchiveBox/pull/543#issuecomment-730849375) (see screenshot 4). #### Steps to reproduce <!-- For example: 1. Ran ArchiveBox with the following config '...' 2. Saw this output during archiving '....' 3. UI didn't show the thing I was expecting '....' --> `mkdir archivebox && cd archivebox` `curl -O https://raw.githubusercontent.com/ArchiveBox/ArchiveBox/master/docker-compose.yml` Edit the docker-compose yml's `volumes` section to read: ``` volumes: - ./data:/data - D:\\archivebox:/mnt/d/archivebox ``` (unsure if external drive-specific setup needed to reproduce, so wanted to include) Open a Windows Terminal in administrator mode, navigate to `D:/archivebox/`, open a Git Bash tab and run the following: `> docker-compose up -d` `> docker-compose run archivebox init` `> docker-compose run archivebox manage createsuperuser` `> docker-compose run archivebox add 'https://www.dailydot.com/parsec/fandom/dieselpunk-steampunk-beginners-guide/'` Navigate to `http://127.0.0.1:8000` and search "beginners' (see screenshot 1). Because it is in the title, it will show up. The error message will also show up. Search "biopunk" (see screenshot 2). Even though it is in the wget file, it will not show up (see screenshot 3). The error message will show up. I have not done extensive testing on whether different filetype snapshots will get searched or not, but I don't think it picks any of them up if they are not in title. #### Screenshots or log output <!-- If applicable, post any relevant screenshots or copy/pasted terminal output from ArchiveBox. If you're reporting a parsing / importing error, **you must paste a copy of your redacted import file here**. --> **Screenshot 1:** ![image](https://user-images.githubusercontent.com/36479341/107894062-40e03880-6ee3-11eb-9720-2ee36c32cc9a.png) **Screenshot 2:** ![image](https://user-images.githubusercontent.com/36479341/107894100-640ae800-6ee3-11eb-8e23-dea3c8034ec8.png) **Screenshot 3:** ![image](https://user-images.githubusercontent.com/36479341/107894175-a0d6df00-6ee3-11eb-9bf9-36f49bcde6a0.png) **Screenshot 4:** ![image](https://user-images.githubusercontent.com/36479341/107894343-2fe3f700-6ee4-11eb-90ea-29dd05301d99.png) #### ArchiveBox version <!-- Run the `archivebox version` command locally then copy paste the result here: --> ```logs ArchiveBox v0.5.6 Cpython Linux Linux-4.19.128-microsoft-standard-x86_64-with-glibc2.28 x86_64 (in Docker) [i] Dependency versions: √ ARCHIVEBOX_BINARY v0.5.6 valid /usr/local/bin/archivebox √ PYTHON_BINARY v3.9.1 valid /usr/local/bin/python3.9 √ DJANGO_BINARY v3.1.3 valid /usr/local/lib/python3.9/site-packages/django/bin/django-admin.py √ CURL_BINARY v7.64.0 valid /usr/bin/curl √ WGET_BINARY v1.20.1 valid /usr/bin/wget √ NODE_BINARY v15.8.0 valid /usr/bin/node √ SINGLEFILE_BINARY v0.1.14 valid /node/node_modules/single-file/cli/single-file √ READABILITY_BINARY v0.1.0 valid /node/node_modules/readability-extractor/readability-extractor √ MERCURY_BINARY v1.0.0 valid /node/node_modules/@postlight/mercury-parser/cli.js √ GIT_BINARY v2.20.1 valid /usr/bin/git √ YOUTUBEDL_BINARY v2021.02.04.1 valid /usr/local/bin/youtube-dl √ CHROME_BINARY v88.0.4324.146 valid /usr/bin/chromium √ RIPGREP_BINARY v0.10.0 valid /usr/bin/rg [i] Source-code locations: √ PACKAGE_DIR 22 files valid /app/archivebox √ TEMPLATES_DIR 3 files valid /app/archivebox/templates [i] Secrets locations: - CHROME_USER_DATA_DIR - disabled - COOKIES_FILE - disabled ``` <!-- Tickets without full version info will closed until it is provided, we need the full output here to help you solve your issue --> `>docker -version` `Docker version 20.10.2, build 2291f61` `>docker-compose --version` `docker-compose version 1.27.4, build 40524192`
kerem 2026-03-01 14:43:18 +03:00
Author
Owner

@jdcaballerov commented on GitHub (Feb 15, 2021):

@alsokpisz The error message describes a mis configured dns in the docker compose setup. If the search backend can't be queried the search will only occur in the url and title, the admin fields.

<!-- gh-comment-id:779220289 --> @jdcaballerov commented on GitHub (Feb 15, 2021): @alsokpisz The error message describes a mis configured dns in the docker compose setup. If the search backend can't be queried the search will only occur in the url and title, the admin fields.
Author
Owner

@pirate commented on GitHub (Feb 15, 2021):

@alsokpisz as @jdcaballerov mentioned this is likely a DNS resolving issue inside your docker-compose network. Docker on macOS is infamous for having container DNS issues, so I wouldn't be surprised if Docker on Windows is plagued by similar bugs.

First please make sure you have Sonic's config.cfg file present in ./etc/sonic next to your docker-compose.yml file (if not, create that dir and download the config file within):

# these linux commands may be different on Windows, sorry I don't know the equivalents for batch/powershell
mkdir -p ./etc/sonic
cd ./etc/sonic
curl -O https://raw.githubusercontent.com/ArchiveBox/ArchiveBox/master/etc/sonic/config.cfg

Then confirm that Sonic is up and running and accessible from the archivebox container, can you run these python commands manually and report back what output you get:

docker-compose run archivebox /usr/local/bin/python3
>>> import socket
>>> HOST = 'sonic'
>>> PORT = 1491
>>> with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
...     s.connect((HOST, PORT))
...     s.sendall(b'Hello, world')
...     data = s.recv(1024)
...
>>> print('Received', repr(data))
Received b'CONNECTED <sonic-server v1.3.0>\r\n'   # this line means everything is working, if your output is different then something is wrong
>>>
<!-- gh-comment-id:779387819 --> @pirate commented on GitHub (Feb 15, 2021): @alsokpisz as @jdcaballerov mentioned this is likely a DNS resolving issue inside your docker-compose network. Docker on macOS is infamous for having container DNS issues, so I wouldn't be surprised if Docker on Windows is plagued by similar bugs. First please make sure you have Sonic's [`config.cfg`](https://raw.githubusercontent.com/ArchiveBox/ArchiveBox/master/etc/sonic/config.cfg) file present in `./etc/sonic` next to your `docker-compose.yml` file (if not, create that dir and download the config file within): ```bash # these linux commands may be different on Windows, sorry I don't know the equivalents for batch/powershell mkdir -p ./etc/sonic cd ./etc/sonic curl -O https://raw.githubusercontent.com/ArchiveBox/ArchiveBox/master/etc/sonic/config.cfg ``` Then confirm that Sonic is up and running and accessible from the archivebox container, can you run these python commands manually and report back what output you get: ```python3 docker-compose run archivebox /usr/local/bin/python3 >>> import socket >>> HOST = 'sonic' >>> PORT = 1491 >>> with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: ... s.connect((HOST, PORT)) ... s.sendall(b'Hello, world') ... data = s.recv(1024) ... >>> print('Received', repr(data)) Received b'CONNECTED <sonic-server v1.3.0>\r\n' # this line means everything is working, if your output is different then something is wrong >>> ```
Author
Owner

@alsokpisz commented on GitHub (Feb 15, 2021):

--archivebox
   | -- docker-compose.yml
   | --data
        | -- archive, logs, sources, A...B.conf, A...B.conf.bak, index.sqlite3
   | --etc
        | -- sonic
              | -- config.cfg (file)

Several errors in the terminal (see screenshot 1).
Is the tree I've set up above not correct?

EDIT:
I hard-coded the volume specifier again in the Sonic section (screenshot 2), and everything starts up fine (screenshot 3). Of note, the error messages do not appear anymore on search query, but the search is not working correctly still.

> docker-compose run archivebox C:/Python37/python (which I think would be the equivalent command to launch it with Python just brings up (screenshot 4). Sorry if I'm missing something obvious, I don't see why the way you wrote that command wouldn't cause a subargument issue.

EDIT 2 (one hot cup of coffee later):
> docker ps has both services running. I make a Python file with the code you posted above.

#!/usr/bin/env python3
import socket
import time
HOST = 'sonic'
PORT = 1491

def main():
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
        s.connect((HOST, PORT))
        s.sendall(b'Hello, world')
        data = s.recv(1024)

    print('Received', repr(data))
    time.sleep(40)

if __name__ == "__main__":
    """ This is executed when run from the command line """
    main()

> py SONICTEST.py

Result:
image

Screenshots
Screenshot 1.
image

Screenshot 2.
image

Screenshot 3.
image

Screenshot 4.
image

<!-- gh-comment-id:779438243 --> @alsokpisz commented on GitHub (Feb 15, 2021): ``` --archivebox | -- docker-compose.yml | --data | -- archive, logs, sources, A...B.conf, A...B.conf.bak, index.sqlite3 | --etc | -- sonic | -- config.cfg (file) ``` Several errors in the terminal (see screenshot 1). Is the tree I've set up above not correct? **EDIT:** I hard-coded the volume specifier again in the Sonic section (screenshot 2), and everything starts up fine (screenshot 3). Of note, the error messages do not appear anymore on search query, but the search is not working correctly still. `> docker-compose run archivebox C:/Python37/python` (which I think would be the equivalent command to launch it with Python just brings up (screenshot 4). Sorry if I'm missing something obvious, I don't see why the way you wrote that command wouldn't cause a subargument issue. **EDIT 2 (one hot cup of coffee later):** `> docker ps` has both services running. I make a Python file with the code you posted above. ``` #!/usr/bin/env python3 import socket import time HOST = 'sonic' PORT = 1491 def main(): with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: s.connect((HOST, PORT)) s.sendall(b'Hello, world') data = s.recv(1024) print('Received', repr(data)) time.sleep(40) if __name__ == "__main__": """ This is executed when run from the command line """ main() ``` `> py SONICTEST.py` **Result:** ![image](https://user-images.githubusercontent.com/36479341/107994185-fc12db00-6f90-11eb-9cfd-6cebb1366791.png) **Screenshots** Screenshot 1. ![image](https://user-images.githubusercontent.com/36479341/107991052-a9362500-6f8a-11eb-8f8e-44ef3416abd0.png) Screenshot 2. ![image](https://user-images.githubusercontent.com/36479341/107991731-15655880-6f8c-11eb-97a9-7709177ace41.png) Screenshot 3. ![image](https://user-images.githubusercontent.com/36479341/107991766-21511a80-6f8c-11eb-884e-af2bbebf658f.png) Screenshot 4. ![image](https://user-images.githubusercontent.com/36479341/107992251-28c4f380-6f8d-11eb-95ff-b0a1b55e85e4.png)
Author
Owner

@pirate commented on GitHub (Feb 16, 2021):

Inside of docker is always linux, so having a Windows path in this docker command doesn't make sense: docker-compose run archivebox C:/Python37/python

Run it verbatim as I posted above, and paste in the script line by line (don't make a file):

docker-compose run archivebox /usr/local/bin/python3

>>> ... paste in lines above here
<!-- gh-comment-id:779586393 --> @pirate commented on GitHub (Feb 16, 2021): Inside of docker is always linux, so having a Windows path in this docker command doesn't make sense: `docker-compose run archivebox C:/Python37/python` Run it verbatim as I posted above, and paste in the script line by line (don't make a file): ```bash docker-compose run archivebox /usr/local/bin/python3 >>> ... paste in lines above here ```
Author
Owner

@alsokpisz commented on GitHub (Feb 16, 2021):

image
Causes a subargument issue.

<!-- gh-comment-id:779587613 --> @alsokpisz commented on GitHub (Feb 16, 2021): ![image](https://user-images.githubusercontent.com/36479341/108021344-369d6780-6fd3-11eb-9d22-77e9a2f3686d.png) Causes a subargument issue.
Author
Owner

@pirate commented on GitHub (Feb 16, 2021):

try docker-compose run archivebox shell

<!-- gh-comment-id:779659337 --> @pirate commented on GitHub (Feb 16, 2021): try `docker-compose run archivebox shell`
Author
Owner

@alsokpisz commented on GitHub (Feb 16, 2021):

Result:
Received b'CONNECTED <sonic-server v1.3.0>\r\nENDED '
I tried this when docker ps still shows both services running.

<!-- gh-comment-id:780146040 --> @alsokpisz commented on GitHub (Feb 16, 2021): **Result:** `Received b'CONNECTED <sonic-server v1.3.0>\r\nENDED '` I tried this when `docker ps` still shows both services running.
Author
Owner

@pirate commented on GitHub (Feb 17, 2021):

Great! That means both the inter-container DNS and the TCP socket to the sonic container are working. Try this next in docker-compose run archivebox shell:

>>> from sonic import SearchClient
>>> from archivebox.config import SEARCH_BACKEND_HOST_NAME, SEARCH_BACKEND_PORT, SEARCH_BACKEND_PASSWORD, SONIC_BUCKET, SONIC_COLLECTION
>>> with SearchClient(SEARCH_BACKEND_HOST_NAME, SEARCH_BACKEND_PORT, SEARCH_BACKEND_PASSWORD) as querycl:
>>>    print(querycl.query(SONIC_COLLECTION, SONIC_BUCKET, 'test'))
<!-- gh-comment-id:780203160 --> @pirate commented on GitHub (Feb 17, 2021): Great! That means both the inter-container DNS and the TCP socket to the sonic container are working. Try this next in `docker-compose run archivebox shell`: ```python3 >>> from sonic import SearchClient >>> from archivebox.config import SEARCH_BACKEND_HOST_NAME, SEARCH_BACKEND_PORT, SEARCH_BACKEND_PASSWORD, SONIC_BUCKET, SONIC_COLLECTION >>> with SearchClient(SEARCH_BACKEND_HOST_NAME, SEARCH_BACKEND_PORT, SEARCH_BACKEND_PASSWORD) as querycl: >>> print(querycl.query(SONIC_COLLECTION, SONIC_BUCKET, 'test')) ```
Author
Owner

@alsokpisz commented on GitHub (Feb 17, 2021):

Results:
['3ad870d4-82b5-4974-a6ce-ee8cc6a235fa', '5d6734a5-1b9d-418a-a215-8e1e1dbdb8e5', '74696c7d-4421-46b8-8f35-9f1c9537ee1b', 'fca5096f-13da-4d94-8afd-5d742d7b3fb4', '6f009e8d-947a-4fa7-94d7-f21a94c2b525']

EDIT: While troubleshooting why mass import links never seem to get the Chrome headless stuff to capture (pdf, scrnshot, dom) I essentially re-imported all of my links. Two new folders appeared in archivebox/data : fst and kv. The search can get wget text now only in admin mode, not in the signed out mode.

<!-- gh-comment-id:780209610 --> @alsokpisz commented on GitHub (Feb 17, 2021): **Results:** `['3ad870d4-82b5-4974-a6ce-ee8cc6a235fa', '5d6734a5-1b9d-418a-a215-8e1e1dbdb8e5', '74696c7d-4421-46b8-8f35-9f1c9537ee1b', 'fca5096f-13da-4d94-8afd-5d742d7b3fb4', '6f009e8d-947a-4fa7-94d7-f21a94c2b525']` **EDIT:** While troubleshooting why mass import links never seem to get the Chrome headless stuff to capture (pdf, scrnshot, dom) I essentially re-imported all of my links. Two new folders appeared in `archivebox/data` : `fst` and `kv`. The search can get `wget` text now only in admin mode, not in the signed out mode.
Author
Owner

@pirate commented on GitHub (Feb 17, 2021):

Ok, getting closer, sound like Sonic is working and connected but it's not getting text to index. Can you try running this to force a re-index:

docker-compose run archivebox update --index-only

Then you can test full-text search from the CLI like so:

archivebox list --filter-type=search example

If it works from the Admin and the CLI then we can try and track down why the public index isn't working. If it's broken on the CLI then there's still an issue with the Sonic backend we have figure out. Thanks for bearing with me here!

<!-- gh-comment-id:780754001 --> @pirate commented on GitHub (Feb 17, 2021): Ok, getting closer, sound like Sonic is working and connected but it's not getting text to index. Can you try running this to force a re-index: ```bash docker-compose run archivebox update --index-only ``` Then you can test full-text search from the CLI like so: ```bash archivebox list --filter-type=search example ``` If it works from the Admin and the CLI then we can try and track down why the public index isn't working. If it's broken on the CLI then there's still an issue with the Sonic backend we have figure out. Thanks for bearing with me here!
Author
Owner

@alsokpisz commented on GitHub (Feb 18, 2021):

Seems like any page which is a .pdf, or .jpg causes this error during the index command:

[*] <link.pdf>
[X] An Exception ocurred reading the indexable content='utf-8' codec can't decode byte 0xb5 in position 10: invalid start byte:
[*] <link>
[*] <link>
[X] The search backend threw an exception=ERR invalid_format(PUSH <collection> <bucket> <object> "<text>" [LANG(<locale>)]?)
:
[*] <link>
[*] <link>

And then it hangs.

Sometimes I'd get just the one error. I think this happened after I got rid of the .pdf links. I jotted it down but didn't write any context with it.

[X] The search backend threw an exception=ERR invalid_format(PUSH <collection> <bucket> <object> "<text>" [LANG(<locale>)]?)
:
[*] <link>
[*] <link>
*terminal hangs*

After removing all the .pdf/.jpg links, there are no errors in the terminal when I run the re-index command, but it will still spend ages on random pages. Notably stuff with 'weirder' components like live webcam feeds or something. I removed those one by one until it managed to get through the 80ish bookmarks in less than 10 minutes.

The terminal results were the same ones as the admin search, but the public search still didn't work.

<!-- gh-comment-id:781109309 --> @alsokpisz commented on GitHub (Feb 18, 2021): Seems like any page which is a .pdf, or .jpg causes this error during the index command: ``` [*] <link.pdf> [X] An Exception ocurred reading the indexable content='utf-8' codec can't decode byte 0xb5 in position 10: invalid start byte: [*] <link> [*] <link> [X] The search backend threw an exception=ERR invalid_format(PUSH <collection> <bucket> <object> "<text>" [LANG(<locale>)]?) : [*] <link> [*] <link> ``` And then it hangs. Sometimes I'd get just the one error. I think this happened after I got rid of the .pdf links. I jotted it down but didn't write any context with it. ```[*] <link> [X] The search backend threw an exception=ERR invalid_format(PUSH <collection> <bucket> <object> "<text>" [LANG(<locale>)]?) : [*] <link> [*] <link> *terminal hangs* ``` After removing all the .pdf/.jpg links, there are no errors in the terminal when I run the re-index command, but it will still spend ages on random pages. Notably stuff with 'weirder' components like live webcam feeds or something. I removed those one by one until it managed to get through the 80ish bookmarks in less than 10 minutes. The terminal results were the same ones as the admin search, but the public search still didn't work.
Author
Owner

@jdcaballerov commented on GitHub (Feb 18, 2021):

@alsokpisz the public search view is not connected to the search backend for security and performance reasons.

<!-- gh-comment-id:781431030 --> @jdcaballerov commented on GitHub (Feb 18, 2021): @alsokpisz the public search view is not connected to the search backend for security and performance reasons.
Author
Owner

@alsokpisz commented on GitHub (Feb 18, 2021):

Well that's that then I suppose.
Is there a way to set a "timeout" per link on the docker-compose run archivebox update --index-only command? So it will skip links it spends more than say, a minute trying to index?

<!-- gh-comment-id:781611207 --> @alsokpisz commented on GitHub (Feb 18, 2021): Well that's that then I suppose. Is there a way to set a "timeout" per link on the `docker-compose run archivebox update --index-only` command? So it will skip links it spends more than say, a minute trying to index?
Author
Owner

@pirate commented on GitHub (Apr 6, 2021):

Ok this should be somewhat improved in f67a5a2. It will be out with the next v0.6 release soon.
You can also try it early by adding this line to your docker-compose config: build: https://github.com/ArchiveBox/ArchiveBox.git#dev.

Comment back here if you're still having issues with indexing failures/hanging and I'll reopen the issue.

<!-- gh-comment-id:813848607 --> @pirate commented on GitHub (Apr 6, 2021): Ok this should be somewhat improved in f67a5a2. It will be out with the next v0.6 release soon. You can also try it early by adding this line to your docker-compose config: `build: https://github.com/ArchiveBox/ArchiveBox.git#dev`. Comment back here if you're still having issues with indexing failures/hanging and I'll reopen the issue.
Author
Owner

@ghost commented on GitHub (Nov 12, 2021):

print('Received', repr(data))

This is what I get when I follow the troubleshooting:
>>> print('Received', repr(data)) Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'data' is not defined

Anyone know what I'm doing incorrectly?

<!-- gh-comment-id:967279584 --> @ghost commented on GitHub (Nov 12, 2021): > ```python > print('Received', repr(data)) > ``` This is what I get when I follow the troubleshooting: `>>> print('Received', repr(data)) Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'data' is not defined ` Anyone know what I'm doing incorrectly?
Author
Owner

@pirate commented on GitHub (Nov 12, 2021):

Looks like you messed up the indentation, make sure to copy paste that whole block together above, or remove the extra newline before that print to be doubly sure. @jdqw210

<!-- gh-comment-id:967283876 --> @pirate commented on GitHub (Nov 12, 2021): Looks like you messed up the indentation, make sure to copy paste that whole block together above, or remove the extra newline before that print to be doubly sure. @jdqw210
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#410
No description provided.