[GH-ISSUE #251] Behavior: Old links are updated even when ONLY_NEW=False #176

Closed
opened 2026-03-01 14:41:17 +03:00 by kerem · 3 comments
Owner

Originally created by @Pofilo on GitHub (Jul 1, 2019).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/251

Describe the bug

I'm using a file where I add links I want to archive. When using the ONLY_NEW = True option, my links still get updated. I think the behavior of this option is broken.

Steps to reproduce

  1. Ran ArchiveBox with the default config except ONLY_NEW = True with an already archived link
  2. ArchiveBox just archived again my link.

Software versions

not relevant

Discuss the option

Do we agree my link should not be updated ?
If yes, I can push immediately a merge request to fix it.

It is in index.py in the function load_links_index(). This function returns new_links but if a link of import_path already exists, it will still add it in new_links.
My fix: check if a link is in existing_links before adding it to new_links.

If we do agree about this behavior, I will create a PR right now.

Thanks.

Originally created by @Pofilo on GitHub (Jul 1, 2019). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/251 #### Describe the bug I'm using a file where I add links I want to archive. When using the `ONLY_NEW = True` option, my links still get updated. I think the behavior of this option is broken. #### Steps to reproduce 1. Ran ArchiveBox with the default config except `ONLY_NEW = True` with an already archived link 2. ArchiveBox just archived again my link. #### Software versions _not relevant_ #### Discuss the option Do we agree my link should not be updated ? If yes, I can push immediately a merge request to fix it. It is in `index.py` in the function `load_links_index()`. This function returns `new_links` but if a link of `import_path` already exists, it will still add it in `new_links`. My fix: check if a link is in `existing_links` before adding it to `new_links`. If we do agree about this behavior, I will create a PR right now. Thanks.
kerem 2026-03-01 14:41:17 +03:00
Author
Owner

@pirate commented on GitHub (Jul 5, 2019):

I think this is fixed already in the upcoming v0.4 release, but I'll keep this open double check after I finish the pending security work.

Follow here for updates: https://github.com/pirate/ArchiveBox/pull/207#issuecomment-494107553

<!-- gh-comment-id:508873028 --> @pirate commented on GitHub (Jul 5, 2019): I think this is fixed already in the upcoming v0.4 release, but I'll keep this open double check after I finish the pending security work. Follow here for updates: https://github.com/pirate/ArchiveBox/pull/207#issuecomment-494107553
Author
Owner

@Pofilo commented on GitHub (Jul 6, 2019):

Oh, okay, thanks for the answer, I'll close after the release then.

<!-- gh-comment-id:508897063 --> @Pofilo commented on GitHub (Jul 6, 2019): Oh, okay, thanks for the answer, I'll close after the release then.
Author
Owner

@pirate commented on GitHub (Jul 24, 2020):

I believe this is fixed now.

git checkout django
git pull
docker build . -t archivebox
docker run -v $PWD/output:/data archivebox init
docker run -v $PWD/output:/data archivebox add 'https://example.com' --only-new

If you still see this on the latest django branch comment back here and I'll reopen the issue.

<!-- gh-comment-id:663622898 --> @pirate commented on GitHub (Jul 24, 2020): I believe this is fixed now. ```bash git checkout django git pull docker build . -t archivebox docker run -v $PWD/output:/data archivebox init docker run -v $PWD/output:/data archivebox add 'https://example.com' --only-new ``` If you still see this on the latest django branch comment back here and I'll reopen the issue.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#176
No description provided.