[GH-ISSUE #484] [FEATURE] Bold search terms in results #314

Closed
opened 2026-02-25 20:35:26 +03:00 by kerem · 26 comments
Owner

Originally created by @benbusby on GitHub (Oct 24, 2021).
Original GitHub issue: https://github.com/benbusby/whoogle-search/issues/484

Originally assigned to: @DUOLabs333 on GitHub.

Describe the feature you'd like to see added
Individual words in a search query should be bolded throughout the search results. If parts of a query are wrapped in quotes, only exact matches to contents within the results should be bolded.

Additional context
See discussion here: https://github.com/benbusby/whoogle-search/discussions/450#discussioncomment-1528544 (also includes implementation ideas for anyone interested in helping).

Originally created by @benbusby on GitHub (Oct 24, 2021). Original GitHub issue: https://github.com/benbusby/whoogle-search/issues/484 Originally assigned to: @DUOLabs333 on GitHub. <!-- DO NOT REQUEST UI/THEME/GUI/APPEARANCE IMPROVEMENTS HERE THESE SHOULD GO IN ISSUE #60 REQUESTING A NEW FEATURE SHOULD BE STRICTLY RELATED TO NEW FUNCTIONALITY --> **Describe the feature you'd like to see added** Individual words in a search query should be bolded throughout the search results. If parts of a query are wrapped in quotes, only exact matches to contents within the results should be bolded. **Additional context** See discussion here: https://github.com/benbusby/whoogle-search/discussions/450#discussioncomment-1528544 (also includes implementation ideas for anyone interested in helping).
kerem 2026-02-25 20:35:26 +03:00
Author
Owner

@benbusby commented on GitHub (Oct 24, 2021):

@DUOLabs333 apparently I can't assign this issue to you unless you comment here...so I guess let me know here if you still plan on working on it.

<!-- gh-comment-id:950390680 --> @benbusby commented on GitHub (Oct 24, 2021): @DUOLabs333 apparently I can't assign this issue to you unless you comment here...so I guess let me know here if you still plan on working on it.
Author
Owner

@DUOLabs333 commented on GitHub (Oct 24, 2021):

Yes, I just need to finish up something else.

<!-- gh-comment-id:950390812 --> @DUOLabs333 commented on GitHub (Oct 24, 2021): Yes, I just need to finish up something else.
Author
Owner

@benbusby commented on GitHub (Oct 24, 2021):

Great, no rush.

<!-- gh-comment-id:950390918 --> @benbusby commented on GitHub (Oct 24, 2021): Great, no rush.
Author
Owner

@DUOLabs333 commented on GitHub (Oct 24, 2021):

Where are the previews of the results stored?

<!-- gh-comment-id:950394774 --> @DUOLabs333 commented on GitHub (Oct 24, 2021): Where are the previews of the results stored?
Author
Owner

@benbusby commented on GitHub (Oct 25, 2021):

The search result response is stored in response = search_util.generate_response() for the search endpoint. You can parse that response into a bsoup obj with result_soup = bsoup(response, "html.parser") and then do the search + replacement tag using that.

<!-- gh-comment-id:950428858 --> @benbusby commented on GitHub (Oct 25, 2021): The search result response is stored in `response = search_util.generate_response()` for the search endpoint. You can parse that response into a bsoup obj with `result_soup = bsoup(response, "html.parser")` and then do the search + replacement tag using that.
Author
Owner

@benbusby commented on GitHub (Oct 25, 2021):

Also for the query regex you could probably do something like:

search_terms = re.split(r'\s+(?=[^"]*(?:"[^"]*"[^"]*)*$)', query)
html_soup.find_all(text=re.compile('|'.join(search_terms)))

to get individual query elements separated into a list, except for those surrounded by quotes. Haven't tested it out yet though.

<!-- gh-comment-id:950430044 --> @benbusby commented on GitHub (Oct 25, 2021): Also for the query regex you could probably do something like: ```python search_terms = re.split(r'\s+(?=[^"]*(?:"[^"]*"[^"]*)*$)', query) html_soup.find_all(text=re.compile('|'.join(search_terms))) ``` to get individual query elements separated into a list, except for those surrounded by quotes. Haven't tested it out yet though.
Author
Owner

@DUOLabs333 commented on GitHub (Oct 25, 2021):

How would you find the class the descriptions are under, since presumably the class name changes on reload.

<!-- gh-comment-id:950480102 --> @DUOLabs333 commented on GitHub (Oct 25, 2021): How would you find the class the descriptions are under, since presumably the class name changes on reload.
Author
Owner

@benbusby commented on GitHub (Oct 25, 2021):

I'm not sure I understand -- why do you need class names? Ultimately you just need to get all NavigableStrings from the response bsoup that match the query regex above and bold it. For example if the query is "fox" and the NavString in the response is "quick red fox", you should be able to just swap the element's inner context using replace('fox', '<b>fox</b>').

<!-- gh-comment-id:950554729 --> @benbusby commented on GitHub (Oct 25, 2021): I'm not sure I understand -- why do you need class names? Ultimately you just need to get all NavigableStrings from the response bsoup that match the query regex above and bold it. For example if the query is "fox" and the NavString in the response is "quick red fox", you should be able to just swap the element's inner context using `replace('fox', '<b>fox</b>')`.
Author
Owner

@DUOLabs333 commented on GitHub (Oct 25, 2021):

What do you mean? I need to extract the summaries, and do the regex on that.

<!-- gh-comment-id:950807975 --> @DUOLabs333 commented on GitHub (Oct 25, 2021): What do you mean? I need to extract the summaries, and do the regex on that.
Author
Owner

@DUOLabs333 commented on GitHub (Oct 25, 2021):

Ok, never mind -- apparently Google does not change its class names on reload so for now it's BNeawe s3v9rd AP7Wnd.

<!-- gh-comment-id:950819179 --> @DUOLabs333 commented on GitHub (Oct 25, 2021): Ok, never mind -- apparently Google does not change its class names on reload so for now it's `BNeawe s3v9rd AP7Wnd`.
Author
Owner

@silverwings15 commented on GitHub (Oct 25, 2021):

thank you guys for looking into an implementation 🙏

<!-- gh-comment-id:950824636 --> @silverwings15 commented on GitHub (Oct 25, 2021): thank you guys for looking into an implementation 🙏
Author
Owner

@DUOLabs333 commented on GitHub (Oct 25, 2021):

Finished -- there may be some edge cases though.

<!-- gh-comment-id:951065689 --> @DUOLabs333 commented on GitHub (Oct 25, 2021): Finished -- there may be some edge cases though.
Author
Owner

@benbusby commented on GitHub (Oct 25, 2021):

Ah ok, I see what you mean now. What I had in mind was something like

target = response.find_all(
       text=re.compile(
                r'' + re.escape(re.sub(r'[^A-Za-z0-9 ]+', '', word)),
                re.IGNORECASE))
for str_item in target:
    str_item.replace_with(str_item.replace(word, ''.join(['<b>', word, '</b>'])))

so that there wasn't a need to mess with extracting strings from classes directly, rather just grabbing the string elements directly from the response and then swapping them. I'll review your PR soon though.

<!-- gh-comment-id:951145198 --> @benbusby commented on GitHub (Oct 25, 2021): Ah ok, I see what you mean now. What I had in mind was something like ```python target = response.find_all( text=re.compile( r'' + re.escape(re.sub(r'[^A-Za-z0-9 ]+', '', word)), re.IGNORECASE)) for str_item in target: str_item.replace_with(str_item.replace(word, ''.join(['<b>', word, '</b>']))) ``` so that there wasn't a need to mess with extracting strings from classes directly, rather just grabbing the string elements directly from the response and then swapping them. I'll review your PR soon though.
Author
Owner

@DUOLabs333 commented on GitHub (Oct 25, 2021):

Oh, that would make things much easier -- I just didn't want to make any assumptions.

<!-- gh-comment-id:951146164 --> @DUOLabs333 commented on GitHub (Oct 25, 2021): Oh, that would make things much easier -- I just didn't want to make any assumptions.
Author
Owner

@DUOLabs333 commented on GitHub (Oct 25, 2021):

I've been dogfooding it, and I found that for some reason, it's also bolding the titles. This may or may not be wanted.

<!-- gh-comment-id:951314407 --> @DUOLabs333 commented on GitHub (Oct 25, 2021): I've been dogfooding it, and I found that for some reason, it's also bolding the titles. This may or may not be wanted.
Author
Owner

@benbusby commented on GitHub (Oct 25, 2021):

If the intent is to match how Searx behaves, then bolding keywords in the titles is fine. Personally I also prefer words in result titles to be bold, as it gives a quicker idea of how closely the article matches what I'm searching for.

<!-- gh-comment-id:951354258 --> @benbusby commented on GitHub (Oct 25, 2021): If the intent is to match how Searx behaves, then bolding keywords in the titles is fine. Personally I also prefer words in result titles to be bold, as it gives a quicker idea of how closely the article matches what I'm searching for.
Author
Owner

@DUOLabs333 commented on GitHub (Oct 25, 2021):

Ok, then it's fine.

On Mon, Oct 25, 2021, 5:41 PM Ben Busby @.***> wrote:

If the intent is to match how Searx behaves, then bolding keywords in the
titles is fine. Personally I also prefer words in result titles to be bold,
as it gives a quicker idea of how closely the article matches what I'm
searching for.


You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
https://github.com/benbusby/whoogle-search/issues/484#issuecomment-951354258,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ALXWUYB4CG6LUH7JTGQ3FJTUIXFHZANCNFSM5GTZPWVQ
.
Triage notifications on the go with GitHub Mobile for iOS
https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675
or Android
https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

<!-- gh-comment-id:951379015 --> @DUOLabs333 commented on GitHub (Oct 25, 2021): Ok, then it's fine. On Mon, Oct 25, 2021, 5:41 PM Ben Busby ***@***.***> wrote: > If the intent is to match how Searx behaves, then bolding keywords in the > titles is fine. Personally I also prefer words in result titles to be bold, > as it gives a quicker idea of how closely the article matches what I'm > searching for. > > — > You are receiving this because you were assigned. > Reply to this email directly, view it on GitHub > <https://github.com/benbusby/whoogle-search/issues/484#issuecomment-951354258>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ALXWUYB4CG6LUH7JTGQ3FJTUIXFHZANCNFSM5GTZPWVQ> > . > Triage notifications on the go with GitHub Mobile for iOS > <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> > or Android > <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. > >
Author
Owner

@DUOLabs333 commented on GitHub (Oct 25, 2021):

Ironically enough, this just made me realize how often Google doesn't
actually put the keywords in a result's summary.

On Mon, Oct 25, 2021, 6:22 PM DUO Labs @.***> wrote:

Ok, then it's fine.

On Mon, Oct 25, 2021, 5:41 PM Ben Busby @.***> wrote:

If the intent is to match how Searx behaves, then bolding keywords in the
titles is fine. Personally I also prefer words in result titles to be bold,
as it gives a quicker idea of how closely the article matches what I'm
searching for.


You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
https://github.com/benbusby/whoogle-search/issues/484#issuecomment-951354258,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ALXWUYB4CG6LUH7JTGQ3FJTUIXFHZANCNFSM5GTZPWVQ
.
Triage notifications on the go with GitHub Mobile for iOS
https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675
or Android
https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

<!-- gh-comment-id:951379809 --> @DUOLabs333 commented on GitHub (Oct 25, 2021): Ironically enough, this just made me realize how often Google doesn't actually put the keywords in a result's summary. On Mon, Oct 25, 2021, 6:22 PM DUO Labs ***@***.***> wrote: > Ok, then it's fine. > > On Mon, Oct 25, 2021, 5:41 PM Ben Busby ***@***.***> wrote: > >> If the intent is to match how Searx behaves, then bolding keywords in the >> titles is fine. Personally I also prefer words in result titles to be bold, >> as it gives a quicker idea of how closely the article matches what I'm >> searching for. >> >> — >> You are receiving this because you were assigned. >> Reply to this email directly, view it on GitHub >> <https://github.com/benbusby/whoogle-search/issues/484#issuecomment-951354258>, >> or unsubscribe >> <https://github.com/notifications/unsubscribe-auth/ALXWUYB4CG6LUH7JTGQ3FJTUIXFHZANCNFSM5GTZPWVQ> >> . >> Triage notifications on the go with GitHub Mobile for iOS >> <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> >> or Android >> <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. >> >> >
Author
Owner

@DUOLabs333 commented on GitHub (Oct 26, 2021):

There's a problem -- it also boldens the links in the "related links" under each result (for some reason).

<!-- gh-comment-id:951522915 --> @DUOLabs333 commented on GitHub (Oct 26, 2021): There's a problem -- it also boldens the links in the "related links" under each result (for some reason).
Author
Owner

@DUOLabs333 commented on GitHub (Oct 26, 2021):

I just tried it out, but the text isn't being bolded correctly -- the bolding is being treated as text, no tags.

<!-- gh-comment-id:952324100 --> @DUOLabs333 commented on GitHub (Oct 26, 2021): I just tried it out, but the text isn't being bolded correctly -- the bolding is being treated as text, no tags.
Author
Owner

@benbusby commented on GitHub (Oct 26, 2021):

Yeah, I just pushed a fix. I didn't try out your branch before merging (my bad). The HTML needed to be unescaped before rendering.

<!-- gh-comment-id:952325319 --> @benbusby commented on GitHub (Oct 26, 2021): Yeah, I just pushed a fix. I didn't try out your branch before merging (my bad). The HTML needed to be unescaped before rendering.
Author
Owner

@DUOLabs333 commented on GitHub (Oct 26, 2021):

Nice (also dealt with the problems of having tags in the URLs)! Another problem -- for example, if the query contained "in" -- it would also match "linux" and "incompatible".

<!-- gh-comment-id:952328075 --> @DUOLabs333 commented on GitHub (Oct 26, 2021): Nice (also dealt with the problems of having tags in the URLs)! Another problem -- for example, if the query contained "in" -- it would also match "linux" and "incompatible".
Author
Owner

@DUOLabs333 commented on GitHub (Oct 26, 2021):

I think word boundaries in the regex should fix the problem.

<!-- gh-comment-id:952328294 --> @DUOLabs333 commented on GitHub (Oct 26, 2021): I think word boundaries in the regex should fix the problem.
Author
Owner

@DUOLabs333 commented on GitHub (Oct 26, 2021):

It also breaks for some other things (like for some reason converting to Times New Roman when searching "convert to NavigableString", or not highlighting the first the title with "NavigableString") -- I'll try to fix it from my end.

<!-- gh-comment-id:952332297 --> @DUOLabs333 commented on GitHub (Oct 26, 2021): It also breaks for some other things (like for some reason converting to Times New Roman when searching "convert to NavigableString", or not highlighting the first the title with "NavigableString") -- I'll try to fix it from my end.
Author
Owner

@benbusby commented on GitHub (Oct 26, 2021):

Ultimately the regex should just match words with whitespace on one side and/or a set of allowed characters (i.e. not < or > or dashes).

The Times New Roman issue is due to the pattern matching CSS styling that gets embedded in the search results.

<!-- gh-comment-id:952337741 --> @benbusby commented on GitHub (Oct 26, 2021): Ultimately the regex should just match words with whitespace on one side and/or a set of allowed characters (i.e. not `<` or `>` or dashes). The Times New Roman issue is due to the pattern matching CSS styling that gets embedded in the search results.
Author
Owner

@DUOLabs333 commented on GitHub (Oct 26, 2021):

I just fixed the title issues (not the urls)-- I'm making a pull request now.

<!-- gh-comment-id:952338267 --> @DUOLabs333 commented on GitHub (Oct 26, 2021): I just fixed the title issues (not the urls)-- I'm making a pull request now.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/whoogle-search#314
No description provided.