mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-26 01:26:00 +03:00
[GH-ISSUE #621] Bugfix: docker-compose instructions create a sonic container that fails to start #1897
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#1897
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @johnmaguire on GitHub (Jan 20, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/621
Originally assigned to: @jdcaballerov on GitHub.
Describe the bug
I followed the docker-compose instructions from the README. This is the result:
Search seems to work anyway.
I would expect one of:
a.
soniccontainer is not created by default if it requires the user to manually create a config and is not necessary to run ArchiveBoxb.
config.cfgis created for me by the init script, using the environment variable I set in the docker-compose filec.
config.cfgis not required by sonic (however, this is not the case: https://github.com/valeriansaliou/sonic/issues/197)Steps to reproduce
From the README:
ArchiveBox version
@johnmaguire commented on GitHub (Jan 20, 2021):
After deleting the automatically generated directory and copying https://github.com/valeriansaliou/sonic/blob/master/config.cfg to
./etc/sonic/config.cfg, I get the following error when runningdocker-compose up -d... it seems this is because the original Sonic container is not recreated, and has the erroneous directory in it. Removing it manually by finding it indocker ps -a, thendocker rm <id>and then runningdocker-compose up- dagain fixes it:@pirate commented on GitHub (Jan 20, 2021):
Thanks for reporting, @jdcaballerov are there any steps missing I need to add to the readme to get Sonic working?
@johnmaguire commented on GitHub (Jan 20, 2021):
One option might be to simply include the example
config.cfgfrom thesonicrepo when running the init command.Sorry to continue commenting, but I'm working through this still... now that Sonic is running, search doesn't seem to return any results. After
docker-compose stop sonic, I get results again, but with the notice "Error from the search backend, only showing results from default admin search fields - Error: [Errno -2] Name or service not known"Is there some sort of indexing job I need to kickoff for Sonic to return results?
@jdcaballerov commented on GitHub (Jan 20, 2021):
@pirate @JohnMaguire The required
config,cfgis included inetc/we need to devise a solution to include it for the people not having theetcdirectory ( the config file is mounted as a volume) and are using an image from docker registry. (BUG)Using the config from https://github.com/ArchiveBox/ArchiveBox/tree/dev/etc/sonic will likely err since it uses IPV6 not enabled by default in docker, as I remember. There are other parameters tuned for this use case.
@johnmaguire commented on GitHub (Jan 20, 2021):
Perhaps a simple solution would be to amend the docker-compose instructions to include:
@johnmaguire commented on GitHub (Jan 20, 2021):
After updating to use the config from this repo, I am still not seeing results however...
Pre-Search:
Post-Search:
Logs show nothing of note:
@jdcaballerov commented on GitHub (Jan 20, 2021):
@JohnMaguire By default Sonic will only index the newly added links after it's enabled.
When enabling Sonic on an existing collection you have to retroactively add all the old snapshots to the Sonic index by running:
@johnmaguire commented on GitHub (Jan 20, 2021):
But still no results or logs to note (tried
google,google.comandGoogle):@johnmaguire commented on GitHub (Jan 20, 2021):
OK, searching on the public index works correctly. From the admin UI, it returns no results.
@jdcaballerov commented on GitHub (Jan 20, 2021):
@JohnMaguire rebuilding the index is a task that is managed by sonic and doesn't occur immediately after being instructed to do so. Allow some time without killing it and let us know.
@johnmaguire commented on GitHub (Jan 20, 2021):
I added example.com as well, and am seeing the same behavior:
I know you mentioned "Sonic will only index the newly added links after it's enabled," so I think it should index this? And since the public search is returning, it seems unlikely that indexing is broken?
I apologize if I am missing something obvious, or keeping anyone up. This is certainly not urgent.
@thedanbob commented on GitHub (Jan 20, 2021):
I'm seeing this as well: searching while logged in as an admin returns no results while searching logged out works properly. I'm also using docker-compose.
Edit: running
archivebox update --index-onlydid the trick. However, sonic doesn't seem to work very well anyway. A few test searches yielded very mixed results (lots of false positives and false negatives). Maybe that's what you're experiencing @JohnMaguire?@pirate commented on GitHub (Jan 20, 2021):
What kind of false positives and negatives are you seeing @thedanbob, is it similar behavior to what @johnMaguire reported? If you're willing to share screenshots / specific examples of the search queries and bad matches that would help a lot. It could be caused by a number of things, ranging from a bug in the query handling code in the admin backend to using the wrong extractor format for indexing.
@thedanbob commented on GitHub (Jan 20, 2021):
A search for
scarmatches https://www.radiomods.co.nz/kenwood/kenwoodts440.htmljackfails to match that page but matches http://tarpn.net/t/faq/faq_networking_on_purpose.html and http://tarpn.net/t/faq/faq_packet_radio.htmlbrass,dirt, andmantrafail to match https://teddit.net/r/WritingPrompts/comments/5kxe94/wp_you_live_in_a_world_where_each_lie_creates_a/dirtalso matches https://nwavguy.blogspot.com/2011/07/o2-headphone-amp.htmlI'm running the suggested docker-compose config with PDF, screenshot, DOM, readability, and archive.org saving turned off.
@pirate commented on GitHub (Jan 20, 2021):
Can you try enabling either the
readabilityormercuryextractor @thedanbob and runningarchivebox update --index-onlyagain? Having at least one article text extractor available will yield the highest quality index, it sometimes struggles when it only indexes raw HTML without the cleaned/extracted text.@thedanbob commented on GitHub (Jan 20, 2021):
I did have mercury enabled, but I enabled readability as well and reindexed everything. This time a bunch of URLs returned the error
The search backend threw an exception=ERR query_error. I saw most of the same false positives/negatives, though a few changed:jackno longer matches http://tarpn.net/t/faq/faq_packet_radio.htmldirtno longer matches https://nwavguy.blogspot.com/2011/07/o2-headphone-amp.html but does match https://teddit.net/r/WritingPrompts/comments/5pi8t0/pi_everybody_in_the_world_has_a_superpower_that/, a false negative that I didn't catch before. Still has the original false negative.These are the URLs that returned errors:
http://tarpn.net/t/builders.html
http://tarpn.net/t/builder/builders_tarpn_protocols.html
http://tarpn.net/t/builder/builders_tarpn_hardware.html
http://tarpn.net/t/builder/builders_node_shopping_list.html
http://tarpn.net/f/builder_tarpn_home_page/bth.html
http://nwavguy.blogspot.com/2011/08/o2-summary.html
http://nwavguy.blogspot.com/2011/08/o2-details.html
http://nwavguy.blogspot.com/2011/07/o2-headphone-amp.html
http://nwavguy.blogspot.com/2011/07/o2-design-process.html
@jdcaballerov commented on GitHub (Jan 20, 2021):
Thanks @JohnMaguire @thedanbob for taking time to report. Up to now I've noticed a buffer overflow in sonic that might be causing this weird behavior.
https://github.com/ArchiveBox/ArchiveBox/pull/625
@johnmaguire commented on GitHub (Jan 20, 2021):
Given these entries:
The following searches return a single result on the public index, but not admin:
The following searches return on both:
The following searches return on neither:
From what I can tell, it is only the Google entry (which was added prior to getting Sonic working) that is failing to return from the Admin search.
(Unrelatedly, the pending result was added last night. I think this was maybe a download link to a .tgz, and I was curious how the software would handle it.)
@thedanbob commented on GitHub (Jan 20, 2021):
Thanks @jdcaballerov, #625 fixes all of the false negatives I was seeing (and a few I didn't catch before). I think all of the false positives can be chalked up to sonic's fuzzy search which I wasn't aware of at first.
@johnmaguire commented on GitHub (Jan 22, 2021):
After building a Docker image from
553c3ca219, runningdocker-compose up -dto create a new container off the new image, and runningdocker-compose run archivebox update --index-only, I am still not getting results for "google" or "Google" in the Admin. It continues to work on the public index.@pirate commented on GitHub (Feb 1, 2021):
This should be fixed now in v0.5.4, please give it a try. Report back here if you have any issues and I can reopen the ticket.
Please note while the content, title, and tags support full-text search / substring search, URL search must be exact at the moment. This will be improved in a future version.
@thentoorglan-x commented on GitHub (Mar 14, 2021):
I'm still facing this issue.
I tried creating config.cfg as described by @JohnMaguire too.
@thentoorglan-x commented on GitHub (Mar 14, 2021):
Ubuntu 20.04 LTS @pirate
@johnmaguire commented on GitHub (Mar 16, 2021):
@thentoorglan-x What issue are you experiencing? Your logs do not include the error reported in the original post here (which is regarding a missing config.cfg). Your logs appear as though the service started up correctly. I'd advise you file a new ticket clearly describing what you're experiencing versus what you expected.
@erob8 commented on GitHub (Apr 7, 2021):
Also had the issue described initially in this thread and recently by @thentoorglan-x . I think the sonic section in docker-compose.yaml file should have its volume updated to
This syncs the config file to where it is placed in the wget command
wget https://raw.githubusercontent.com/ArchiveBox/ArchiveBox/master/etc/sonic/config.cfg -O etc/sonic/config.cfgfrom the wiki page https://github.com/ArchiveBox/ArchiveBox/wiki/Docker#setup
@pirate commented on GitHub (Apr 7, 2021):
I updated it in the docker-compose.yml file on
devyesterday, take a look at the comment section above the sonic block for instructions. Just updated the wiki now too https://github.com/ArchiveBox/ArchiveBox/wiki/Docker. 👍@asitemade4u commented on GitHub (Apr 11, 2021):
I have tried the last version in
devand face the same issue as previously described.IMO the question is WHERE to save the Sonic
config.cfgfile.I am using a docker installation nested within a Proxmox container -- it works very well and I have currently more than 20 docker servers working in production that way.
So, I tried to download the config file in:
/config.cfgNone worked: basically Sonic could not find the config file or missed it to a directory.
Are you sure the path to the config file should be, in the
docker-compose.yamlfile./config.cfg?Please help with instructions.
Best,
Stephen
@pirate commented on GitHub (Apr 12, 2021):
Where are you seeing
config.cfgin the docker-compose.yml file?Can you double check that it's actually up to date with the one on dev or screenshot exactly the one you're using?
Please note you're not mounting it into the
archiveboxcontainer, you should be mounting it into thesoniccontainer at/etc/sonic.cfginside the container. Where you put it or what you name it outside the container doesn't matter, though I recommend downloading it to./sonic.cfgnext to./dataas is illustrated in the docker-compose.yml on dev.