[GH-ISSUE #1316] Text Search and Filters don't work at the same time in the web UI #805

Closed
opened 2026-03-01 14:46:27 +03:00 by kerem · 4 comments
Owner

Originally created by @sbutcher on GitHub (Jan 11, 2024).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1316

Describe the bug

In the UI, providing a search term, then applying a filter (e.g. changed today, or a tag) does not make any difference to the results. I would expect to see the filter applied to the results on screen.

Steps to reproduce

Running archivebox dev version
Open the web UI and enter a search term that gives results e.g. football
Then click the "Today" option in the "By date added" in the filter column

Screenshots or log output

image

clicking the "Today" in the filter view makes no change to the results, where the expected result is to show a subset of the results, combining all results with teh word football that were added today. The same issue applies to any filters while there is a search term specified. If you clear the search term and show all results, then the filters work correctly

ArchiveBox version

0.7.2
ArchiveBox v0.7.2+editable COMMIT_HASH=e43babb BUILD_TIME=2024-01-06 01:22:06 1704504126
IN_DOCKER=True IN_QEMU=False ARCH=x86_64 OS=Linux PLATFORM=Linux-6.2.0-39-generic-x86_64-with-glibc2.36 PYTHON=Cpython
FS_ATOMIC=True FS_REMOTE=True FS_USER=911:911 FS_PERMS=644
DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND=ripgrep LDAP=False

[i] Dependency versions:
 √  PYTHON_BINARY         v3.11.7         valid     /usr/local/bin/python3.11                                                   
 √  SQLITE_BINARY         v2.6.0          valid     /usr/local/lib/python3.11/sqlite3/dbapi2.py                                 
 √  DJANGO_BINARY         v3.1.14         valid     /usr/local/lib/python3.11/site-packages/django/__init__.py                  
 √  ARCHIVEBOX_BINARY     v0.7.2          valid     /usr/local/bin/archivebox                                                   

 √  CURL_BINARY           v8.5.0          valid     /usr/bin/curl                                                               
 √  WGET_BINARY           v1.21.3         valid     /usr/bin/wget                                                               
 √  NODE_BINARY           v20.10.0        valid     /usr/bin/node                                                               
 √  SINGLEFILE_BINARY     v1.1.46         valid     /app/node_modules/single-file-cli/single-file                               
 √  READABILITY_BINARY    v0.0.11         valid     /app/node_modules/readability-extractor/readability-extractor               
 √  MERCURY_BINARY        v1.0.0          valid     /app/node_modules/@postlight/parser/cli.js                                  
 √  GIT_BINARY            v2.39.2         valid     /usr/bin/git                                                                
 √  YOUTUBEDL_BINARY      v2023.12.30     valid     /usr/local/bin/yt-dlp                                                       
 √  CHROME_BINARY         v120.0.6099.28  valid     /usr/bin/chromium-browser                                                   
 √  RIPGREP_BINARY        v13.0.0         valid     /usr/bin/rg                                                                 

[i] Source-code locations:
 √  PACKAGE_DIR           23 files        valid     /app/archivebox                                                             
 √  TEMPLATES_DIR         3 files         valid     /app/archivebox/templates                                                   
 -  CUSTOM_TEMPLATES_DIR  -               disabled  None                                                                        

[i] Secrets locations:
 √  CHROME_USER_DATA_DIR  31 files        valid     ./chrome-profile                                                            
 -  COOKIES_FILE          -               disabled  None                                                                        

[i] Data locations:
 √  OUTPUT_DIR            10 files @      valid     /data                                                                       
 √  SOURCES_DIR           27 files        valid     ./sources                                                                   
 √  LOGS_DIR              1 files         valid     ./logs                                                                      
 √  ARCHIVE_DIR           20 files        valid     ./archive                                                                   
 √  CONFIG_FILE           277.0 Bytes     valid     ./ArchiveBox.conf                                                           
 √  SQL_INDEX             440.0 KB        valid     ./index.sqlite3                                                             

Originally created by @sbutcher on GitHub (Jan 11, 2024). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1316 #### Describe the bug In the UI, providing a search term, then applying a filter (e.g. changed today, or a tag) does not make any difference to the results. I would expect to see the filter applied to the results on screen. #### Steps to reproduce Running archivebox dev version Open the web UI and enter a search term that gives results e.g. football Then click the "Today" option in the "By date added" in the filter column #### Screenshots or log output ![image](https://github.com/ArchiveBox/ArchiveBox/assets/4619907/f9cecaea-203c-45f7-97e2-0b87fb1c4afb) clicking the "Today" in the filter view makes no change to the results, where the expected result is to show a subset of the results, combining all results with teh word football that were added today. The same issue applies to any filters while there is a search term specified. If you clear the search term and show all results, then the filters work correctly #### ArchiveBox version ``` 0.7.2 ArchiveBox v0.7.2+editable COMMIT_HASH=e43babb BUILD_TIME=2024-01-06 01:22:06 1704504126 IN_DOCKER=True IN_QEMU=False ARCH=x86_64 OS=Linux PLATFORM=Linux-6.2.0-39-generic-x86_64-with-glibc2.36 PYTHON=Cpython FS_ATOMIC=True FS_REMOTE=True FS_USER=911:911 FS_PERMS=644 DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND=ripgrep LDAP=False [i] Dependency versions: √ PYTHON_BINARY v3.11.7 valid /usr/local/bin/python3.11 √ SQLITE_BINARY v2.6.0 valid /usr/local/lib/python3.11/sqlite3/dbapi2.py √ DJANGO_BINARY v3.1.14 valid /usr/local/lib/python3.11/site-packages/django/__init__.py √ ARCHIVEBOX_BINARY v0.7.2 valid /usr/local/bin/archivebox √ CURL_BINARY v8.5.0 valid /usr/bin/curl √ WGET_BINARY v1.21.3 valid /usr/bin/wget √ NODE_BINARY v20.10.0 valid /usr/bin/node √ SINGLEFILE_BINARY v1.1.46 valid /app/node_modules/single-file-cli/single-file √ READABILITY_BINARY v0.0.11 valid /app/node_modules/readability-extractor/readability-extractor √ MERCURY_BINARY v1.0.0 valid /app/node_modules/@postlight/parser/cli.js √ GIT_BINARY v2.39.2 valid /usr/bin/git √ YOUTUBEDL_BINARY v2023.12.30 valid /usr/local/bin/yt-dlp √ CHROME_BINARY v120.0.6099.28 valid /usr/bin/chromium-browser √ RIPGREP_BINARY v13.0.0 valid /usr/bin/rg [i] Source-code locations: √ PACKAGE_DIR 23 files valid /app/archivebox √ TEMPLATES_DIR 3 files valid /app/archivebox/templates - CUSTOM_TEMPLATES_DIR - disabled None [i] Secrets locations: √ CHROME_USER_DATA_DIR 31 files valid ./chrome-profile - COOKIES_FILE - disabled None [i] Data locations: √ OUTPUT_DIR 10 files @ valid /data √ SOURCES_DIR 27 files valid ./sources √ LOGS_DIR 1 files valid ./logs √ ARCHIVE_DIR 20 files valid ./archive √ CONFIG_FILE 277.0 Bytes valid ./ArchiveBox.conf √ SQL_INDEX 440.0 KB valid ./index.sqlite3 ```
Author
Owner

@pirate commented on GitHub (Jan 12, 2024):

Strange... this used to work. Thanks for reporting!

I'll take a look.

<!-- gh-comment-id:1888182552 --> @pirate commented on GitHub (Jan 12, 2024): Strange... this used to work. Thanks for reporting! I'll take a look.
Author
Owner

@neel-suthar commented on GitHub (Jan 21, 2024):

@pirate I found the issue. It's a small fix. Look at the following code.

from django.contrib import messages

from archivebox.search import query_search_index

class SearchResultsAdminMixin:
    def get_search_results(self, request, queryset, search_term: str):
        """Enhances the search queryset with results from the search backend"""
        
        qs, use_distinct = super().get_search_results(request, queryset, search_term)

        search_term = search_term.strip()
        if not search_term:
            return qs, use_distinct
        try:
            qsearch = query_search_index(search_term)
            qs = qs | qsearch #THIS LINE NEEDS TO BE UPDATED 
        except Exception as err:
            print(f'[!] Error while using search backend: {err.__class__.__name__} {err}')
            messages.add_message(request, messages.WARNING, f'Error from the search backend, only showing results from default admin search fields - Error: {err}')
        
        return qs.distinct(), use_distinct

In the try clause, we combine the filter results with the search results using the "or" (|) operation on two result sets. This can lead to an issue where snapshots that match either the filter or the search term will appear in the combined result.

I did try it out and can confirm that the change fixed the bug mentioned above. Correct me if I am wrong. Thanks.

<!-- gh-comment-id:1902801001 --> @neel-suthar commented on GitHub (Jan 21, 2024): @pirate I found the issue. It's a small fix. Look at the following code. ``` from django.contrib import messages from archivebox.search import query_search_index class SearchResultsAdminMixin: def get_search_results(self, request, queryset, search_term: str): """Enhances the search queryset with results from the search backend""" qs, use_distinct = super().get_search_results(request, queryset, search_term) search_term = search_term.strip() if not search_term: return qs, use_distinct try: qsearch = query_search_index(search_term) qs = qs | qsearch #THIS LINE NEEDS TO BE UPDATED except Exception as err: print(f'[!] Error while using search backend: {err.__class__.__name__} {err}') messages.add_message(request, messages.WARNING, f'Error from the search backend, only showing results from default admin search fields - Error: {err}') return qs.distinct(), use_distinct ``` In the try clause, we combine the filter results with the search results using the "or" (|) operation on two result sets. This can lead to an issue where snapshots that match either the filter or the search term will appear in the combined result. I did try it out and can confirm that the change fixed the bug mentioned above. Correct me if I am wrong. Thanks.
Author
Owner

@neel-suthar commented on GitHub (Jan 21, 2024):

@pirate One more thing, I see lots of print statements in the code. Arent we using some kind of logger?

<!-- gh-comment-id:1902801515 --> @neel-suthar commented on GitHub (Jan 21, 2024): @pirate One more thing, I see lots of print statements in the code. Arent we using some kind of logger?
Author
Owner

@pirate commented on GitHub (Jan 23, 2024):

We're not using a logger yet, I've thought about it in the past but haven't found a logging library that I really like yet.

The main reason I didn't use a library is that I need is fine-grained control over stdout/stderr for the (ncurses-style) live-updating archivebox.logging_util.TimedProgress progress bars we use when SHOW_PROGRESS=True.

If we do use a library I want it to be able to provide ncurses subpanes to display scrolling subcommand output in realtime without filling the entire screen with subcommand stdout/stderr, like: https://github.com/bpython/curtsies

<!-- gh-comment-id:1907025233 --> @pirate commented on GitHub (Jan 23, 2024): We're not using a logger yet, I've thought about it in the past but haven't found a logging library that I really like yet. The main reason I didn't use a library is that I need is fine-grained control over stdout/stderr for the (ncurses-style) live-updating `archivebox.logging_util.TimedProgress` progress bars we use when `SHOW_PROGRESS=True`. If we do use a library I want it to be able to provide ncurses subpanes to display scrolling subcommand output in realtime without filling the entire screen with subcommand stdout/stderr, like: https://github.com/bpython/curtsies
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#805
No description provided.