[GH-ISSUE #276] [FEATURE] Access website though proxy #189

Closed
opened 2026-02-25 20:35:07 +03:00 by kerem · 12 comments
Owner

Originally created by @alkarkhi on GitHub (Apr 10, 2021).
Original GitHub issue: https://github.com/benbusby/whoogle-search/issues/276

Describe the feature you'd like to see added
Sometimes websites block ip addresses especially if they get high traffic e.g Tor or VPN. Startpage (anonymous view) and Searx (mortyproxy) have a proxy option. I don't know if Whoogle supports this or not but I couldn't find it. So I would be grateful if something like this was introduced to Whoogle.

Originally created by @alkarkhi on GitHub (Apr 10, 2021). Original GitHub issue: https://github.com/benbusby/whoogle-search/issues/276 **Describe the feature you'd like to see added** Sometimes websites block ip addresses especially if they get high traffic e.g Tor or VPN. Startpage (anonymous view) and Searx (mortyproxy) have a proxy option. I don't know if Whoogle supports this or not but I couldn't find it. So I would be grateful if something like this was introduced to Whoogle.
kerem 2026-02-25 20:35:07 +03:00
Author
Owner

@alkarkhi commented on GitHub (Apr 10, 2021):

https://github.com/asciimoo/morty

<!-- gh-comment-id:817098371 --> @alkarkhi commented on GitHub (Apr 10, 2021): https://github.com/asciimoo/morty
Author
Owner

@gripped commented on GitHub (Oct 28, 2021):

Just a proof of concept Ben

--- results.py.orig	2021-10-28 11:14:45.000000000 +0100
+++ results.py	2021-10-28 13:55:15.000000000 +0100
@@ -175,9 +175,9 @@
 
     """
     nojs_link = BeautifulSoup(features='html.parser').new_tag('a')
-    nojs_link['href'] = '/window?location=' + result['href']
+    nojs_link['href'] = '/morty/?mortyurl=' + result['href']
     nojs_link['style'] = 'display:block;width:100%;'
-    nojs_link.string = 'NoJS Link: ' + nojs_link['href']
+    nojs_link.string = 'Proxied'
     result.append(BeautifulSoup('<br><hr><br>', 'html.parser'))
     result.append(nojs_link)

The following added to my Nginx conf

	location /morty {
		proxy_pass http://127.0.0.1:3000;
		proxy_set_header Host $host;
		proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
		proxy_set_header X-Scheme $scheme;
		proxy_set_header X-Script-Name /morty;
		proxy_buffering off;
	}

And proxying through Morty works (obviously I already have Morty setup as I use searx as well)
Searx also adds a hash to the url
https://searx.informationhouse.co.uk/morty/?mortyurl=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FBBC&mortyhash=23020c5f19b4c24ff46fb4ee78801efa7a19ebd9054cda0e2dd4977d7ebf6316
with the help of a key in its setting file. I haven't even looked into the purpose of the hash but doubt it's important in my private instance case. Works without it.
The hash code is here

I made an attempt to do this properly adding new config option, new section in results.py etc. but have failed so far (but I don't think I'm that far off ?)
But I won't do any more unless it's some thing you'd want to implement ?

It was just curious if it would work, and be easy. Yes and yes.

<!-- gh-comment-id:953847640 --> @gripped commented on GitHub (Oct 28, 2021): Just a proof of concept Ben ``` --- results.py.orig 2021-10-28 11:14:45.000000000 +0100 +++ results.py 2021-10-28 13:55:15.000000000 +0100 @@ -175,9 +175,9 @@ """ nojs_link = BeautifulSoup(features='html.parser').new_tag('a') - nojs_link['href'] = '/window?location=' + result['href'] + nojs_link['href'] = '/morty/?mortyurl=' + result['href'] nojs_link['style'] = 'display:block;width:100%;' - nojs_link.string = 'NoJS Link: ' + nojs_link['href'] + nojs_link.string = 'Proxied' result.append(BeautifulSoup('<br><hr><br>', 'html.parser')) result.append(nojs_link) ``` The following added to my Nginx conf ``` location /morty { proxy_pass http://127.0.0.1:3000; proxy_set_header Host $host; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Scheme $scheme; proxy_set_header X-Script-Name /morty; proxy_buffering off; } ``` And proxying through Morty works (obviously I already have Morty setup as I use searx as well) Searx also adds a hash to the url `https://searx.informationhouse.co.uk/morty/?mortyurl=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FBBC&mortyhash=23020c5f19b4c24ff46fb4ee78801efa7a19ebd9054cda0e2dd4977d7ebf6316` with the help of a key in its setting file. I haven't even looked into the purpose of the hash but doubt it's important in my private instance case. Works without it. The hash code is [here](https://github.com/searx/searx/blob/7b368146a1fdeb8d7494fde567737cff0b3c9e46/searx/webapp.py#L337) I made an attempt to do this properly adding new config option, new section in results.py etc. but have failed so far (but I don't think I'm that far off ?) But I won't do any more unless it's some thing you'd want to implement ? It was just curious if it would work, and be easy. Yes and yes.
Author
Owner

@benbusby commented on GitHub (Nov 2, 2021):

Sorry @gripped, this seems to have gotten lost in my stream of notifications, I'm just now seeing your comment.

Yes, I'm definitely interested in getting this implemented, but am curious what you had in mind for determining if morty is available on a user's Whoogle instance. Are you thinking it would just be a configurable setting in the same vein as the privacy frontend settings (i.e. the WHOOGLE_ALT_[MD|RD|IG|etc] vars)? If that's the case, I assume enabling morty for in-app result views would require that environment variable to be set?

I guess my main priority is ensuring that this gets implemented as agnostic as possible (not being dependent on an external proxy). It should be easy enough to allow proxying directly through Whoogle by modifying the existing NoJS option a bit, but could also allow configuring an external proxy (like morty) as an add-on. So in my head, implementing this feature completely would mean:

  • A config setting for enabling website access through app proxy
  • An environment variable to allow querying through an external service -- if not set, then results are proxied through Whoogle itself
    • Could be either a full URL or a path if hosted parallel to Whoogle
  • Potentially removal of (or reconsideration of how to enable) the NoJS feature itself. I'm hesitant to have multiple links for both viewing a NoJS link and accessing a result through a normal/non-sanitized proxy.

Feel free to open a draft PR with your current progress if you want my input before it's ready, otherwise just keep me posted.

<!-- gh-comment-id:957953886 --> @benbusby commented on GitHub (Nov 2, 2021): Sorry @gripped, this seems to have gotten lost in my stream of notifications, I'm just now seeing your comment. Yes, I'm definitely interested in getting this implemented, but am curious what you had in mind for determining if morty is available on a user's Whoogle instance. Are you thinking it would just be a configurable setting in the same vein as the privacy frontend settings (i.e. the `WHOOGLE_ALT_[MD|RD|IG|etc]` vars)? If that's the case, I assume enabling morty for in-app result views would require that environment variable to be set? I guess my main priority is ensuring that this gets implemented as agnostic as possible (not being dependent on an external proxy). It should be easy enough to allow proxying directly through Whoogle by modifying the existing NoJS option a bit, but could also allow configuring an external proxy (like morty) as an add-on. So in my head, implementing this feature completely would mean: - A config setting for enabling website access through app proxy - An environment variable to allow querying through an external service -- if not set, then results are proxied through Whoogle itself - Could be either a full URL or a path if hosted parallel to Whoogle - Potentially removal of (or reconsideration of how to enable) the NoJS feature itself. I'm hesitant to have multiple links for both viewing a NoJS link and accessing a result through a normal/non-sanitized proxy. Feel free to open a draft PR with your current progress if you want my input before it's ready, otherwise just keep me posted.
Author
Owner

@gripped commented on GitHub (Nov 2, 2021):

Sorry @gripped, this seems to have gotten lost in my stream of notifications, I'm just now seeing your comment.

No problem at all. I have Morty working for me with Whoogle. There's no rush.

In my ideal world I think Whoogle would offer:

  • A link to normal results which will be fetched directly by the users browser (It does now)
  • A proxy link which delivers the user the page hiding the the users home IP address from the server (Unless they are hosting on a home IP) with the following mutually exclusive choices
    • Unchanged page, just proxied.
    • Proxied page sanitized of javascript (It does now NoJS)
    • An external proxy like Monty with the proxy part of the URL defined by an environment variable.

Your comment from another issue:
#508

I think it would probably get used a lot more if it was refactored to serve as a general website proxy feature that allowed users to open a result in Whoogle itself (using the same /window route), and conditionally enable/disable Javascript depending on user preference.

So I think we are thinking along the same lines ?
As for the configuration page I guess one option along the lines of 'Show proxy link'.
And then a three state radio button to select the type of proxying ?

I'll see what I can come up with. But don't hold your breath.
I do not consider myself a programmer in the least. I like playing around with it all though and occasionally I succeed.
If you decided to jump in and just do it I would not be in the least bit miffed. Probably more relieved.

<!-- gh-comment-id:958139468 --> @gripped commented on GitHub (Nov 2, 2021): > Sorry @gripped, this seems to have gotten lost in my stream of notifications, I'm just now seeing your comment. No problem at all. I have Morty working for me with Whoogle. There's no rush. In my ideal world I think Whoogle would offer: - A link to normal results which will be fetched directly by the users browser (**It does now**) - A proxy link which delivers the user the page hiding the the users home IP address from the server (Unless they are hosting on a home IP) with the following mutually exclusive choices - Unchanged page, just proxied. - Proxied page sanitized of javascript (**It does now** NoJS) - An external proxy like Monty with the proxy part of the URL defined by an environment variable. Your comment from another issue: #508 > I think it would probably get used a lot more if it was refactored to serve as a general website proxy feature that allowed users to open a result in Whoogle itself (using the same /window route), and conditionally enable/disable Javascript depending on user preference. So I think we are thinking along the same lines ? As for the configuration page I guess one option along the lines of 'Show proxy link'. And then a three state radio button to select the type of proxying ? I'll see what I can come up with. But don't hold your breath. I do not consider myself a programmer in the least. I like playing around with it all though and occasionally I succeed. If you decided to jump in and just do it I would not be in the least bit miffed. Probably more relieved.
Author
Owner

@benbusby commented on GitHub (Nov 2, 2021):

So I think we are thinking along the same lines ?
As for the configuration page I guess one option along the lines of 'Show proxy link'.
And then a three state radio button to select the type of proxying ?

Yep, sounds like we're on the same page. The only catch I think is that if the user doesn't have the external proxy URL configured, that the external proxy option is disabled somehow. There's similar behavior with the Tor config option when the user's machine doesn't have Tor running.

I'll see what I can come up with. But don't hold your breath.
I do not consider myself a programmer in the least. I like playing around with it all though and occasionally I succeed.
If you decided to jump in and just do it I would not be in the least bit miffed. Probably more relieved.

Well I'd still like to encourage you to give it a shot! I'll hold off on implementing anything on my end. If you end up opening a PR that still needs work, I'm happy to jump in at that point (if needed).

<!-- gh-comment-id:958266312 --> @benbusby commented on GitHub (Nov 2, 2021): > So I think we are thinking along the same lines ? As for the configuration page I guess one option along the lines of 'Show proxy link'. And then a three state radio button to select the type of proxying ? Yep, sounds like we're on the same page. The only catch I think is that if the user doesn't have the external proxy URL configured, that the external proxy option is disabled somehow. There's similar behavior with the Tor config option when the user's machine doesn't have Tor running. > I'll see what I can come up with. But don't hold your breath. I do not consider myself a programmer in the least. I like playing around with it all though and occasionally I succeed. If you decided to jump in and just do it I would not be in the least bit miffed. Probably more relieved. Well I'd still like to encourage you to give it a shot! I'll hold off on implementing anything on my end. If you end up opening a PR that still needs work, I'm happy to jump in at that point (if needed).
Author
Owner

@gripped commented on GitHub (Nov 3, 2021):

The only catch I think is that if the user doesn't have the external proxy URL configured, that the external proxy option is disabled somehow.

Yeah I had thought of that but forgot to mention. Environment variable defined = Three choices . Not defined = 2 choices and one greyed out.

Well I'd still like to encourage you to give it a shot!

It's gonna be a longshot! ;)
I am going to try. Mainly by copying your code wherever possible.

<!-- gh-comment-id:958483384 --> @gripped commented on GitHub (Nov 3, 2021): > The only catch I think is that if the user doesn't have the external proxy URL configured, that the external proxy option is disabled somehow. Yeah I had thought of that but forgot to mention. Environment variable defined = Three choices . Not defined = 2 choices and one greyed out. > Well I'd still like to encourage you to give it a shot! It's gonna be a longshot! ;) I am going to try. Mainly by copying your code wherever possible.
Author
Owner

@DUOLabs333 commented on GitHub (Jan 11, 2022):

Doesn't Whoogle already have proxy support (it should support whatever proxies requests support)?

<!-- gh-comment-id:1010198787 --> @DUOLabs333 commented on GitHub (Jan 11, 2022): Doesn't Whoogle already have proxy support (it should support whatever proxies `requests` support)?
Author
Owner

@benbusby commented on GitHub (Jan 11, 2022):

@DUOLabs333 this is a bit different. In this case what's being asked for is to view result webpages through Whoogle itself. So a user would be given a "View Result in Proxy" option next to each result, and if clicked, the result page would be loaded as https://whoogle-instance.com/result?page=example.com, where example.com is presented to the user through Whoogle itself.

@gripped did you end up looking into this? No worries if not, just curious :)

<!-- gh-comment-id:1010217150 --> @benbusby commented on GitHub (Jan 11, 2022): @DUOLabs333 this is a bit different. In this case what's being asked for is to view result webpages through Whoogle itself. So a user would be given a "View Result in Proxy" option next to each result, and if clicked, the result page would be loaded as `https://whoogle-instance.com/result?page=example.com`, where `example.com` is presented to the user through Whoogle itself. @gripped did you end up looking into this? No worries if not, just curious :)
Author
Owner

@DUOLabs333 commented on GitHub (Jan 11, 2022):

@benbusby Oh, ok. Does the link have to be under the whoogle domain, or can the proxy be on some other domain?

<!-- gh-comment-id:1010227027 --> @DUOLabs333 commented on GitHub (Jan 11, 2022): @benbusby Oh, ok. Does the link have to be under the whoogle domain, or can the proxy be on some other domain?
Author
Owner

@benbusby commented on GitHub (Jan 11, 2022):

I believe the desired effect is for the request to be proxied by the same whoogle instance that provided the results.

<!-- gh-comment-id:1010236377 --> @benbusby commented on GitHub (Jan 11, 2022): I believe the desired effect is for the request to be proxied by the same whoogle instance that provided the results.
Author
Owner

@gripped commented on GitHub (Jun 6, 2022):

@gripped did you end up looking into this? No worries if not, just curious :)

@benbusby I did, but failed :) Decided to take a break and come back to it. Then life got in the way (I've been at version 0.6.0 until minutes ago.)

<!-- gh-comment-id:1147345628 --> @gripped commented on GitHub (Jun 6, 2022): > @gripped did you end up looking into this? No worries if not, just curious :) @benbusby I did, but failed :) Decided to take a break and come back to it. Then life got in the way (I've been at version 0.6.0 until minutes ago.)
Author
Owner

@gripped commented on GitHub (Jun 6, 2022):

@benbusby
I've hacked morty into my instance again. I still prefer how morty works. But I consider this a personal hack

--- results.py.bak	2022-06-06 10:42:57.000000000 +0000
+++ results.py	2022-06-06 11:53:35.755138670 +0000
@@ -206,11 +206,8 @@
     av_link = BeautifulSoup(features='html.parser').new_tag('a')
     nojs = 'nojs=1' if config.nojs else 'nojs=0'
     location = f'location={result["href"]}'
-    av_link['href'] = f'{Endpoint.window}?{nojs}&{location}'
-    translation = current_app.config['TRANSLATIONS'][
-       config.get_localization_lang()
-    ]
-    av_link.string = f'{translation["anon-view"]}'
+    av_link['href'] = '/morty/?mortyurl=' + result['href']
+    av_link.string = ' Proxied'
     av_link['class'] = 'anon-view'
     result.append(av_link)

Only sharing for the benefit of anyone else who might wish to the same. Which is probably a total of zero people?

<!-- gh-comment-id:1147375125 --> @gripped commented on GitHub (Jun 6, 2022): @benbusby I've hacked morty into my instance again. I still prefer how morty works. But I consider this a personal hack ``` --- results.py.bak 2022-06-06 10:42:57.000000000 +0000 +++ results.py 2022-06-06 11:53:35.755138670 +0000 @@ -206,11 +206,8 @@ av_link = BeautifulSoup(features='html.parser').new_tag('a') nojs = 'nojs=1' if config.nojs else 'nojs=0' location = f'location={result["href"]}' - av_link['href'] = f'{Endpoint.window}?{nojs}&{location}' - translation = current_app.config['TRANSLATIONS'][ - config.get_localization_lang() - ] - av_link.string = f'{translation["anon-view"]}' + av_link['href'] = '/morty/?mortyurl=' + result['href'] + av_link.string = ' Proxied' av_link['class'] = 'anon-view' result.append(av_link) ``` Only sharing for the benefit of anyone else who might wish to the same. Which is probably a total of zero people?
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/whoogle-search#189
No description provided.