mirror of
https://github.com/benbusby/whoogle-search.git
synced 2026-04-25 12:15:50 +03:00
[GH-ISSUE #484] [FEATURE] Bold search terms in results #314
Labels
No labels
Fixed (Pending PR Merge)
Stale
bug
enhancement
enhancement
good first issue
help wanted
keep-open
needs more info
pull-request
question
theme
unfortunate
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/whoogle-search#314
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @benbusby on GitHub (Oct 24, 2021).
Original GitHub issue: https://github.com/benbusby/whoogle-search/issues/484
Originally assigned to: @DUOLabs333 on GitHub.
Describe the feature you'd like to see added
Individual words in a search query should be bolded throughout the search results. If parts of a query are wrapped in quotes, only exact matches to contents within the results should be bolded.
Additional context
See discussion here: https://github.com/benbusby/whoogle-search/discussions/450#discussioncomment-1528544 (also includes implementation ideas for anyone interested in helping).
@benbusby commented on GitHub (Oct 24, 2021):
@DUOLabs333 apparently I can't assign this issue to you unless you comment here...so I guess let me know here if you still plan on working on it.
@DUOLabs333 commented on GitHub (Oct 24, 2021):
Yes, I just need to finish up something else.
@benbusby commented on GitHub (Oct 24, 2021):
Great, no rush.
@DUOLabs333 commented on GitHub (Oct 24, 2021):
Where are the previews of the results stored?
@benbusby commented on GitHub (Oct 25, 2021):
The search result response is stored in
response = search_util.generate_response()for the search endpoint. You can parse that response into a bsoup obj withresult_soup = bsoup(response, "html.parser")and then do the search + replacement tag using that.@benbusby commented on GitHub (Oct 25, 2021):
Also for the query regex you could probably do something like:
to get individual query elements separated into a list, except for those surrounded by quotes. Haven't tested it out yet though.
@DUOLabs333 commented on GitHub (Oct 25, 2021):
How would you find the class the descriptions are under, since presumably the class name changes on reload.
@benbusby commented on GitHub (Oct 25, 2021):
I'm not sure I understand -- why do you need class names? Ultimately you just need to get all NavigableStrings from the response bsoup that match the query regex above and bold it. For example if the query is "fox" and the NavString in the response is "quick red fox", you should be able to just swap the element's inner context using
replace('fox', '<b>fox</b>').@DUOLabs333 commented on GitHub (Oct 25, 2021):
What do you mean? I need to extract the summaries, and do the regex on that.
@DUOLabs333 commented on GitHub (Oct 25, 2021):
Ok, never mind -- apparently Google does not change its class names on reload so for now it's
BNeawe s3v9rd AP7Wnd.@silverwings15 commented on GitHub (Oct 25, 2021):
thank you guys for looking into an implementation 🙏
@DUOLabs333 commented on GitHub (Oct 25, 2021):
Finished -- there may be some edge cases though.
@benbusby commented on GitHub (Oct 25, 2021):
Ah ok, I see what you mean now. What I had in mind was something like
so that there wasn't a need to mess with extracting strings from classes directly, rather just grabbing the string elements directly from the response and then swapping them. I'll review your PR soon though.
@DUOLabs333 commented on GitHub (Oct 25, 2021):
Oh, that would make things much easier -- I just didn't want to make any assumptions.
@DUOLabs333 commented on GitHub (Oct 25, 2021):
I've been dogfooding it, and I found that for some reason, it's also bolding the titles. This may or may not be wanted.
@benbusby commented on GitHub (Oct 25, 2021):
If the intent is to match how Searx behaves, then bolding keywords in the titles is fine. Personally I also prefer words in result titles to be bold, as it gives a quicker idea of how closely the article matches what I'm searching for.
@DUOLabs333 commented on GitHub (Oct 25, 2021):
Ok, then it's fine.
On Mon, Oct 25, 2021, 5:41 PM Ben Busby @.***> wrote:
@DUOLabs333 commented on GitHub (Oct 25, 2021):
Ironically enough, this just made me realize how often Google doesn't
actually put the keywords in a result's summary.
On Mon, Oct 25, 2021, 6:22 PM DUO Labs @.***> wrote:
@DUOLabs333 commented on GitHub (Oct 26, 2021):
There's a problem -- it also boldens the links in the "related links" under each result (for some reason).
@DUOLabs333 commented on GitHub (Oct 26, 2021):
I just tried it out, but the text isn't being bolded correctly -- the bolding is being treated as text, no tags.
@benbusby commented on GitHub (Oct 26, 2021):
Yeah, I just pushed a fix. I didn't try out your branch before merging (my bad). The HTML needed to be unescaped before rendering.
@DUOLabs333 commented on GitHub (Oct 26, 2021):
Nice (also dealt with the problems of having tags in the URLs)! Another problem -- for example, if the query contained "in" -- it would also match "linux" and "incompatible".
@DUOLabs333 commented on GitHub (Oct 26, 2021):
I think word boundaries in the regex should fix the problem.
@DUOLabs333 commented on GitHub (Oct 26, 2021):
It also breaks for some other things (like for some reason converting to Times New Roman when searching "convert to NavigableString", or not highlighting the first the title with "NavigableString") -- I'll try to fix it from my end.
@benbusby commented on GitHub (Oct 26, 2021):
Ultimately the regex should just match words with whitespace on one side and/or a set of allowed characters (i.e. not
<or>or dashes).The Times New Roman issue is due to the pattern matching CSS styling that gets embedded in the search results.
@DUOLabs333 commented on GitHub (Oct 26, 2021):
I just fixed the title issues (not the urls)-- I'm making a pull request now.