mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-26 01:26:00 +03:00
[GH-ISSUE #1170] Question: Sonic auto index #2235
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#2235
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @ghost on GitHub (Jul 3, 2023).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1170
Hello There
I've configured Docker stack
Sonic doesn't index new items.
I can search content in new archived sites only after I start "archivebox update --index-only" in Docker
Could you please help me to know how to invetigate and fix it?
Also I use NginxProxyManager as reverse proxy, so 8000 is not exposed to docker host
My config:
@IvanVas commented on GitHub (Aug 11, 2023):
+1
@pirate commented on GitHub (Jan 19, 2024):
Can you set
log_level = "debug"insonic.cfg, andDEBUG=Trueon the archivebox environment, then restart everything.Check the output / post it here from
docker compose logswhen you add a new URL.Also note there have been some changes in v0.7.2 to the docker-compose.yml file, I recommend upgrading your container version and checking for any differences you might want to copy over.
I would also not sync
/etc/timezone:/etc/timezone:rowith the archivebox container, as archivebox must always be run in UTC and does not support server side timezone changes (only client-side).@pirate commented on GitHub (Mar 1, 2024):
Closing as inactive for now, feel free to comment back if you're still having issues on
:devand I can reopen it.@dehlen commented on GitHub (Aug 22, 2024):
Hey, I am currently experiencing the same issue. Happy to provide more logs when I am back at my computer tomorrow. I am only adding new URLs via the scheduler. Could this be related? I just triggered the update --index-only but this is very time consuming as I have quite a large library of snapshots. I just stumbled upon this issue because I was searching for something in my library i was very sure about should be in the index. I found my saved article with a valid readability representation containing my keyword but search did not return this result.
@pirate commented on GitHub (Aug 22, 2024):
Ok, I can investigate but please open a new issue and share logs +
archivebox versionoutput when you get a chance.@dehlen commented on GitHub (Aug 23, 2024):
Nevermind I had a look this morning at my config again and I think I found the issue. So here is my assumption:
I use docker compose. In it I use the archive box service with these environment variables:
I also setup the archive box scheduler by adding this volume to the archive box service:
- path/to/crontabs:/var/spool/cron/crontabsThen I setup the sonic service:
And I setup the archive box scheduler service:
What was missing was the SEARCH_BACKEND related environment variables for my scheduler service. So whenever my scheduler was running archive box add was executed but the scheduler service adding the URLs did not know about the sonic backend. I added the 3 SEARCH_BACKEND related env variables to my scheduler service as well and now it seems to be working. I checked whether I could find a new link by connecting to the scheduler service container and running archivebox schedule --run-all. This executed all my scheduled scripts and added new links by that which immediately were findable via search. Previously this was not the case for URLs added via the scheduler. When I added URLs from the web UI all worked flawlessly. So I think this probably caused me the above problems.
@pirate commented on GitHub (Aug 23, 2024):
That makes sense @dehlen. That's why I generally recommend using
ArchiveBox.conffor config (which is shared between all archivebox containers), instead of docker-composeenvironment:lines (which are per-container).@virtadpt commented on GitHub (Aug 27, 2024):
Moved to: https://github.com/ArchiveBox/ArchiveBox/issues/1497