mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 09:06:02 +03:00
[GH-ISSUE #490] index not building? #321
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#321
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @ekiel on GitHub (Sep 25, 2020).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/490
I just updated the docker container to the latest (Digest:sha256:48a08a1c1e4a2e3480031f25817db57ae398f9720405f5d7495f92a92ced5659) and now when I add links, they seem to process, but the index is no longer rebuilding. I also tried the Django server and my newest archives are no longer showing up. Here is a snippet of my docker-compose.yml. Currently my archive method is as follows:
This has been working great for me lately so it seems something has broken, but I'm not sure where to look.
docker-compose.yml:
@ekiel commented on GitHub (Sep 26, 2020):
I reverted back to the 0.4.21 docker image and it seems to be building now. Is there somewhere I should look to see the diff? I've never deconstructed the layers before.
@ekiel commented on GitHub (Sep 28, 2020):
After continuing to test, I'm only able to replicate this problem on 1 of my archive sets. using the same docker-compose.yml I've found that this is what happens - all on the latest docker image.
*if there are new links in the text file the index is built and works as expected otherwise the index is erased (shows 0 links in the archive)
If then I add a new link to the archive, the index is built as expected.
This is causing issues with scheduled cron jobs to run the archive job - if there aren't any new links the archive breaks functionality.
However, this seems to be a weird issue, if I do the exact same thing on another archive, the index isn't blown away. Any ideas as to what I can look at?
@cdvv7788 commented on GitHub (Sep 28, 2020):
Hi. Thanks for reporting. What index are we talking about?
index.json? I will try to reproduce it. Theindex.jsonfile will not be generated automatically anymore. We are moving to thesqliteindex in the current version. However, at this point it should still be working...can you try this command:archivebox list --json --with-headersand check if it outputs your links correctly?@ekiel commented on GitHub (Sep 28, 2020):
It is actually the index.html that gets cleared out
@cdvv7788 commented on GitHub (Sep 28, 2020):
That one will be removed too. You can generate it with
archivebox list --html --with-headers > index.html.@ekiel commented on GitHub (Sep 28, 2020):
OK that worked, but is this expected behavior? I would expect that an "archivebox add" wouldn't touch the index if no links are added
@cdvv7788 commented on GitHub (Sep 28, 2020):
This may be fixed briefly (generating it correctly after
archivebox add) but this is something that will stop working this way in the short term.We are in the middle of the index refactor. Those will be completely removed in the future. They will not be updated or touched if they exist. You should not rely on them if you are using
v0.5.x. The django server now has the list, and will be the central point of control. If you still need theindex.htmlorindex.json, you will need to generate it after you run your command usingarchivebox list.@ekiel commented on GitHub (Sep 28, 2020):
ok, thank you for clarifying - I haven't fully embraced the django server functionality as I have 2 archives, but I may consider merging the archives to simplify this usage.
@cdvv7788 commented on GitHub (Sep 28, 2020):
Please be careful with this process to avoid breaking stuff (copy archives before playing with them). With this change, we expect big archives to behave better. Old indexes were written pretty often, and with big archives this was a BIG bottleneck. The sqlite index should be faster to update, so the overall performance of archivebox should be more stable and less dependent on archive size.
@ekiel commented on GitHub (Sep 29, 2020):
OK thanks for your help - now that I know this is expected I'll close this issue.