[GH-ISSUE #673] How to search for site URL? #422

Closed
opened 2026-03-01 14:43:25 +03:00 by kerem · 8 comments
Owner

Originally created by @voarsh2 on GitHub (Mar 24, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/673

I have various sites and urls for different parts of a site indexed, however, searching for URLs don't seem to work, at all.
The only thing what works with search using the UI is the website title, which isn't good enough for me.

Any work arounds?

Originally created by @voarsh2 on GitHub (Mar 24, 2021). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/673 I have various sites and urls for different parts of a site indexed, however, searching for URLs don't seem to work, at all. The only thing what works with search using the UI is the website title, which isn't good enough for me. Any work arounds?
kerem 2026-03-01 14:43:25 +03:00
Author
Owner

@pirate commented on GitHub (Mar 24, 2021):

URL search works but it has to be exact, this is a known issue fixed in v0.6 (coming soon). If you want you can try running archivebox from the pre-release v0.6 debug-toolbar branch to get this fix.

<!-- gh-comment-id:806196589 --> @pirate commented on GitHub (Mar 24, 2021): URL search works but it has to be exact, this is a known issue fixed in v0.6 (coming soon). If you want you can try running archivebox from the pre-release v0.6 `debug-toolbar` branch to get this fix.
Author
Owner

@saywebsolutions commented on GitHub (Mar 24, 2021):

you can also use filesystem tools like grep, fzf, etc. to search the archive html and text files directly which will get you full text search and a lot of flexibility

<!-- gh-comment-id:806198788 --> @saywebsolutions commented on GitHub (Mar 24, 2021): you can also use filesystem tools like grep, fzf, etc. to search the archive html and text files directly which will get you full text search and a lot of flexibility
Author
Owner

@voarsh2 commented on GitHub (Mar 25, 2021):

URL search works but it has to be exact, this is a known issue fixed in v0.6 (coming soon). If you want you can try running archivebox from the pre-release v0.6 debug-toolbar branch to get this fix.

I eagerly await this fix then... because I'm typing almost exact URLs (base domain) with no results....

you can also use filesystem tools like grep, fzf, etc. to search the archive html and text files directly which will get you full text search and a lot of flexibility

Uh, yeah, but I'd rather use the UI... especially for bulk update, tag, delete, etc.

<!-- gh-comment-id:806379016 --> @voarsh2 commented on GitHub (Mar 25, 2021): > URL search works but it has to be exact, this is a known issue fixed in v0.6 (coming soon). If you want you can try running archivebox from the pre-release v0.6 `debug-toolbar` branch to get this fix. I eagerly await this fix then... because I'm typing almost exact URLs (base domain) with no results.... > you can also use filesystem tools like grep, fzf, etc. to search the archive html and text files directly which will get you full text search and a lot of flexibility Uh, yeah, but I'd rather use the UI... especially for bulk update, tag, delete, etc.
Author
Owner

@voarsh2 commented on GitHub (Apr 2, 2021):

Just thought I would add that [site]/public search performs much better than the native djanjo admin/site snapshot screens. How come?
I want to delete/rescan, etc but I can't search in admin area effectly to select 30+ records, but no problem on public version.

<!-- gh-comment-id:812459319 --> @voarsh2 commented on GitHub (Apr 2, 2021): Just thought I would add that [site]/public search performs much better than the native djanjo admin/site snapshot screens. How come? I want to delete/rescan, etc but I can't search in admin area effectly to select 30+ records, but no problem on public version.
Author
Owner

@pirate commented on GitHub (Apr 2, 2021):

We implemented the public search ourselves, but the admin search is django's built-in search function that's really difficult to customize. Still working out the best way to override it to use our algo instead.

github.com/ArchiveBox/ArchiveBox@7162649b03/archivebox/core/admin.py (L58)
github.com/ArchiveBox/ArchiveBox@7162649b03/archivebox/core/mixins.py (L5)

<!-- gh-comment-id:812593903 --> @pirate commented on GitHub (Apr 2, 2021): We implemented the public search ourselves, but the admin search is django's built-in search function that's really difficult to customize. Still working out the best way to override it to use our algo instead. https://github.com/ArchiveBox/ArchiveBox/blob/7162649b03302df455967d9ccbc9c2cca506e33b/archivebox/core/admin.py#L58 https://github.com/ArchiveBox/ArchiveBox/blob/7162649b03302df455967d9ccbc9c2cca506e33b/archivebox/core/mixins.py#L5
Author
Owner

@voarsh2 commented on GitHub (Apr 4, 2021):

We implemented the public search ourselves, but the admin search is django's built-in search function that's really difficult to customize. Still working out the best way to override it to use our algo instead.

I figured that. At least make /public/ have the ability to remove content? That would get around that issue.
Because I can't manage content at /public/ and it's the only place I can actually search properly.

<!-- gh-comment-id:813116966 --> @voarsh2 commented on GitHub (Apr 4, 2021): > We implemented the public search ourselves, but the admin search is django's built-in search function that's really difficult to customize. Still working out the best way to override it to use our algo instead. I figured that. At least make /public/ have the ability to remove content? That would get around that issue. Because I can't manage content at /public/ and it's the only place I can actually search properly.
Author
Owner

@pirate commented on GitHub (Apr 5, 2021):

Cant do that, public is read-only, but what you can do is search on public to find the timestamp, then search using the timestamp on private until v0.6 is released.

<!-- gh-comment-id:813459199 --> @pirate commented on GitHub (Apr 5, 2021): Cant do that, public is read-only, but what you can do is search on public to find the timestamp, then search using the timestamp on private until v0.6 is released.
Author
Owner

@pirate commented on GitHub (Apr 6, 2021):

Ok I just pushed a real fix cadac48 instead of the hack I had before. This is officially fixed in v0.6 (which is on dev/master). I'll be pushing the Docker image, pip package, apt package, etc release versions soon.

I'm going to close this as fixed for now to cleanup the issue backlog, feel free to comment back here if you're still having issues after the new release drops and I'll reopen the issue.

<!-- gh-comment-id:813792681 --> @pirate commented on GitHub (Apr 6, 2021): Ok I just pushed a real fix cadac48 instead of the hack I had before. This is officially fixed in v0.6 (which is on dev/master). I'll be pushing the Docker image, pip package, apt package, etc release versions soon. I'm going to close this as fixed for now to cleanup the issue backlog, feel free to comment back here if you're still having issues after the new release drops and I'll reopen the issue.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#422
No description provided.