[PR #912] [MERGED] Fix: added a functionality to make sure escaped characters stay escaped. #1015

Closed
opened 2026-02-25 20:37:24 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/benbusby/whoogle-search/pull/912
Author: @ahmad-alkadri
Created: 12/23/2022
Status: Merged
Merged: 12/29/2022
Merged by: @benbusby

Base: mainHead: fix/908-html-element-need-escape


📝 Commits (1)

  • 50da4b6 Added a function to escape html text

📊 Changes

2 files changed (+6 additions, -4 deletions)

View changed files

📝 app/routes.py (+4 -2)
📝 app/utils/search.py (+2 -2)

📄 Description

This PR is linked to the issue #908 which shows that, basically, Whoogle results render html characters unescaped. Here's a screenshot as referenced in the issue:

image

After checking, I found out that several points:

  • the characters inside the <div> content tag from the search results (getbody.text in search.py) are already escaped, with "<" and ">" characters converted into "&lt;" and "&gt;", respectively
  • however, because the getbody.text then passed through several bsoup class, the escaped tag characters became unescaped.

To prevent this, I replaced "&lt;" and "&gt;" with "andlt;" and "andgt;", respectively. This way, when the 'response' object get loaded to bsoup (which happens several times throughout the process between search.py and routes.py), bsoup will not unescape them. Finally, at the end, before the responses object sent to the render_template in routes.py, I simply replaced the "andlt;" and "andgt;" back to "&lt;" and "&gt;".

Here's the screenshot from the search result on Whoogle following this fix:

screenshot-localhost_5000-2022 12 23-22_58_09


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/benbusby/whoogle-search/pull/912 **Author:** [@ahmad-alkadri](https://github.com/ahmad-alkadri) **Created:** 12/23/2022 **Status:** ✅ Merged **Merged:** 12/29/2022 **Merged by:** [@benbusby](https://github.com/benbusby) **Base:** `main` ← **Head:** `fix/908-html-element-need-escape` --- ### 📝 Commits (1) - [`50da4b6`](https://github.com/benbusby/whoogle-search/commit/50da4b6d97ad27be6a1fdb4e73e69c5f4c2a79ff) Added a function to escape html text ### 📊 Changes **2 files changed** (+6 additions, -4 deletions) <details> <summary>View changed files</summary> 📝 `app/routes.py` (+4 -2) 📝 `app/utils/search.py` (+2 -2) </details> ### 📄 Description This PR is linked to the issue #908 which shows that, basically, Whoogle results render html characters unescaped. Here's a screenshot as referenced in the issue: ![image](https://user-images.githubusercontent.com/22837764/209409123-bccd7136-72a2-46b8-bd55-c9c39bbf4680.png) After checking, I found out that several points: + the characters inside the `<div>` content tag from the search results (`getbody.text` in search.py) are already escaped, with `"<"` and `">"` characters converted into `"&lt;"` and `"&gt;"`, respectively + however, because the `getbody.text` then passed through several `bsoup` class, the escaped tag characters became unescaped. To prevent this, I replaced `"&lt;"` and `"&gt;"` with `"andlt;"` and `"andgt;"`, respectively. This way, when the 'response' object get loaded to `bsoup` (which happens several times throughout the process between search.py and routes.py), `bsoup` will not unescape them. Finally, at the end, before the `responses` object sent to the `render_template` in `routes.py`, I simply replaced the `"andlt;"` and `"andgt;"` back to `"&lt;"` and `"&gt;"`. Here's the screenshot from the search result on Whoogle following this fix: ![screenshot-localhost_5000-2022 12 23-22_58_09](https://user-images.githubusercontent.com/22837764/209409436-c7f02e69-f4ed-44b7-9f10-93d9f566a9f1.png) --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-02-25 20:37:24 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/whoogle-search#1015
No description provided.