[PR #1235] [CLOSED] fix: several bugs affecting alt links #1130

Closed
opened 2026-02-25 21:30:16 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/benbusby/whoogle-search/pull/1235
Author: @RoyalOughtness
Created: 7/16/2025
Status: Closed

Base: mainHead: main


📝 Commits (2)

  • 8a24e4a fix: don't append wikiless params to wikipedia.org replacements
  • 0b2d523 remove unused changes

📊 Changes

2 files changed (+19 additions, -5 deletions)

View changed files

📝 app/filter.py (+6 -1)
📝 app/utils/results.py (+13 -4)

📄 Description

This fixes a number of bugs:

  1. This bug, https://github.com/benbusby/whoogle-search/issues/1230, where the subdomain is repeatedly reapplied to the same link. This was fixed by exiting the loop if the alt link is already present in the link. However, this is actually the symptom of a broader redundancy in the code: this loop here calls get_site_alt for each link on the page, but then get_site_alt loops over all the site alts again here. In effect, this means the following is taking place:
foreach site alt
  foreach link
    foreach site alt

A broader refactor is needed to remove this redundancy/inefficiency. However, this PR contains the "exit if alt link is already present in the link" statement which is a working bandaid.

  1. Adds edge case handling for simple.wikipedia.org which breaks if the provided alt link is a specific language of wikipedia.org

  2. Fixes the link_desc field so that prefixes aren't ignored. Otherwise, we end up with link_descs like "www.https://old.reddit.com"


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/benbusby/whoogle-search/pull/1235 **Author:** [@RoyalOughtness](https://github.com/RoyalOughtness) **Created:** 7/16/2025 **Status:** ❌ Closed **Base:** `main` ← **Head:** `main` --- ### 📝 Commits (2) - [`8a24e4a`](https://github.com/benbusby/whoogle-search/commit/8a24e4a03c005c53d1f43cd98c99db4f40f3680a) fix: don't append wikiless params to wikipedia.org replacements - [`0b2d523`](https://github.com/benbusby/whoogle-search/commit/0b2d523506613a707baee8ac23fbe6a68905dff5) remove unused changes ### 📊 Changes **2 files changed** (+19 additions, -5 deletions) <details> <summary>View changed files</summary> 📝 `app/filter.py` (+6 -1) 📝 `app/utils/results.py` (+13 -4) </details> ### 📄 Description This fixes a number of bugs: 1. This bug, https://github.com/benbusby/whoogle-search/issues/1230, where the subdomain is repeatedly reapplied to the same link. This was fixed by exiting the loop if the alt link is already present in the link. However, this is actually the symptom of a broader redundancy in the code: this loop [here](https://github.com/benbusby/whoogle-search/blob/main/app/filter.py#L652) calls get_site_alt for each link on the page, but then get_site_alt loops over all the site alts again [here](https://github.com/benbusby/whoogle-search/blob/main/app/utils/results.py#L198). In effect, this means the following is taking place: ``` foreach site alt foreach link foreach site alt ``` A broader refactor is needed to remove this redundancy/inefficiency. However, this PR contains the "exit if alt link is already present in the link" statement which is a working bandaid. 2. Adds edge case handling for simple.wikipedia.org which breaks if the provided alt link is a specific language of wikipedia.org 3. Fixes the link_desc field so that prefixes aren't ignored. Otherwise, we end up with link_descs like "www.https://old.reddit.com" --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-02-25 21:30:16 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/whoogle-search#1130
No description provided.