mirror of
https://github.com/benbusby/whoogle-search.git
synced 2026-04-25 04:05:57 +03:00
[GH-ISSUE #558] [BUG] Captcha for Single-person Use #354
Labels
No labels
Fixed (Pending PR Merge)
Stale
bug
enhancement
enhancement
good first issue
help wanted
keep-open
needs more info
pull-request
question
theme
unfortunate
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/whoogle-search#354
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @DUOLabs333 on GitHub (Nov 27, 2021).
Original GitHub issue: https://github.com/benbusby/whoogle-search/issues/558
Describe the bug
Not sure how much this is a bug, on whoogle's part but ... I've been using whoogle for a while, and today I got hit with a captcha. I switched back to Google to complete it, and switched back, but it still exists. I've been hosting on my own computer for my personal use.
To Reproduce
This is hard to reproduce, I'm not even exactly sure what caused it
Deployment Method
runexecutableVersion of Whoogle Search
Desktop (please complete the following information):
Smartphone (please complete the following information):
Additional context
Add any other context about the problem here.
@DUOLabs333 commented on GitHub (Nov 28, 2021):
Never mind, it seems the problem resolved itself.
@nakoo commented on GitHub (Nov 28, 2021):
This issue should be reopend.
I have been running latest version in a while, but I'm getting serious amount of captcha after changed the query parameter to
gl(related #544). We need to reconsider using this parameter.@DUOLabs333 commented on GitHub (Nov 28, 2021):
@nakoo Does switching it back to
cpmakes it work again?@nakoo commented on GitHub (Nov 29, 2021):
Yes, I tested and don't get that much of captcha.
But I want to say you need to test yourself at first.
github.com/benbusby/whoogle-search@3c06519130/app/request.py (L123)You need to change this line from
gltocp.And check that you already set
WHOOGLE_CONFIG_COUNTRY='countryUK'inwhoogle.envPlease make sure to set
WHOOGLE_DOTENV=1before running.@DUOLabs333 commented on GitHub (Nov 29, 2021):
Does it matter if I don't live in the UK?
On Mon, Nov 29, 2021, 1:25 AM nakoo @.***> wrote:
@nakoo commented on GitHub (Nov 29, 2021):
No, it's just default value that used previously.
@DUOLabs333 commented on GitHub (Nov 29, 2021):
I got hit with it again --- doing
param_dict['gl'] = ('&cp=' + config.ctry) if config.ctry else ''does not improve matters.@benbusby commented on GitHub (Nov 29, 2021):
Looks like you might be getting CAPTCHA'd potentially due to the country value being invalid. The value for
glis different from the value forcp(see the list here -- namely, "country" is not used for the newglparam). If you're setting theWHOOGLE_CONFIG_COUNTRYvalue to somethingcountry*, it's possible that invalid values used in parameters triggers their CAPTCHA check faster.Edit: @DUOLabs333 after updating to use the
cpparam again, what was the behavior? Did it start working again and then hit you with the CAPTCHA again shortly afterwards? Or was it blocked to begin with and thecpparam just didn't fix matters?@nakoo commented on GitHub (Nov 30, 2021):
github.com/benbusby/whoogle-search@b75ff0782d/whoogle.template.env (L28-L29)We need to change
countryUKtoUKof this line inwhoogle.template.envFor speaking my case, I changed all value correctly following the commit. So it's not my case. But I believe we need to see the effect of this change for a long term.
@DUOLabs333 commented on GitHub (Nov 30, 2021):
@benbusby I was blocked from the beginning -- I tried changing the parameter, but to no avail.
@benbusby commented on GitHub (Dec 1, 2021):
@nakoo that template file has been updated.
@DUOLabs333 changing the param likely wouldn't fix being blocked. I'm not quite sure how Google determines if/when to unblock an instance, but it's unlikely that fixing the param would un-block your instance. I'm also not certain that param is the problem, it just seems coincidental. Public instances with presumably much more traffic than private instances are using that param without any issues.
@DUOLabs333 commented on GitHub (Dec 1, 2021):
I'm just surprised that it only started happening now.
@bhulk commented on GitHub (Dec 2, 2021):
I get captcha and I go try and change my VPN to something (physically) closer to where I actually am, it usually solves the problem. But lately, I am getting captcha on almost every time I use whoogle.
@DUOLabs333 commented on GitHub (Dec 7, 2021):
One way to solve this might be to add cookies back in (I get captcha'd when searching up technical topics on incognito mode). However, this would negate the privacy benefits, so maybe an environment variable? @benbusby Where are the cookies removed?
I should note that on Whoogle, the CAPTCHA usually comes after I search up multiple technical/programming-related topics.
@cyker commented on GitHub (Dec 7, 2021):
你发来的文件我已经收到
@DUOLabs333 commented on GitHub (Dec 7, 2021):
@cyker Wrong person?
@benbusby commented on GitHub (Dec 8, 2021):
@DUOLabs333 cookies actually aren't being explicitly removed, they're just not being accepted by the
requestslibrary. It seems straightforward enough to just establish a request session withreq_session = requests.Session()though and then send all requests withreq_session.get(...)(instead of just the regularrequests.get(...)call), which should include all cookies between queries. I don't think it would fully sacrifice privacy, since Whoogle would still be acting as a buffer and storing the received cookies for the user.I'll try to work that in soon, unless you want to take a stab at it.
@DUOLabs333 commented on GitHub (Dec 8, 2021):
@benbusby So what file should I be looking at?
@ericjohncarlson commented on GitHub (Dec 8, 2021):
I also run Whoogle at home, running 0.54 for months. I upgraded to 0.6 and just got hit with this within the first day of use. My problem is that I can't see where to actually solve the captcha. If I pull the URL from the Whoogle "Our systems have detected..." page and paste that in, google searches work fine. All other google searches work fine without a captcha. This is from the same origin IP, so somehow I'm caught here.
I moved back to 0.54 but still blocked. How do I solve this captcha? I even added a webproxy to the server running whoogle so my browser traffic came from the same internal IP - same issue.
@benbusby commented on GitHub (Dec 9, 2021):
@DUOLabs333 app/request.py -- specifically:
github.com/benbusby/whoogle-search@7bea6349a0/app/request.py (L297-L301)would need to be updated to send the request with
requests.Session()(which would either need to be stored in a global app config var (seeapp/__init__.py) or store a requests session per user and store it in the Flask session...not sure which is better).@ericjohncarlson there's no way to solve the captcha while using Whoogle. The captcha is removed from the view since it's loaded using JS (which Whoogle blocks), but even if it were displayed on the page, completing the captcha there wouldn't work since the hostname has to match what the captcha expects (which would be a "google.com" domain).
Are you using the tagged release of 0.6.0 or the latest tag/main branch? I'm still not really sure if 0.6.0 is to blame, since there are public instances running 0.6.0 without any issues. My best guess at this point is that big-G recently made an update that blocks the default Whoogle user agents, but only for residential IPs? Just guessing. I've been meaning to switch how the user agent is generated to make it a bit harder to block, so that could help.
@DUOLabs333 commented on GitHub (Dec 9, 2021):
I'll look into this... how do you make a variable global? I tried placing it in
__init__.py, then importing, but no good.@DUOLabs333 commented on GitHub (Dec 9, 2021):
Ok, got it to work (probably). Still blocked though. Do I have to initialize some cookies in it?
@benbusby commented on GitHub (Dec 9, 2021):
No, the request session should import whatever cookies Google wants to set and then pass them back for each subsequent request. Can you try hardcoding your user agent in
app/requests.pyto your actual user agent (from https://www.whatsmyua.info/ or something similar)? You'd need to change the following line:github.com/benbusby/whoogle-search@7bea6349a0/app/request.py (L174)to just be a string version of your UA.
@DUOLabs333 commented on GitHub (Dec 9, 2021):
Still blocked.
@benbusby commented on GitHub (Dec 9, 2021):
Just to clarify, after you modify your instance, are you clearing the captcha manually with Google (i.e. navigating to google.com and completing the captcha there) before reattempting a search?
@DUOLabs333 commented on GitHub (Dec 9, 2021):
I don't get captcha'd if I search on Google directly (only in Incognito mode).
@benbusby commented on GitHub (Dec 9, 2021):
Oh I see. Yeah, no clue what the problem is in that case and don't have any more ideas at the moment. Since I can't replicate it on my end or on my public instances, I think it's just going to have to involve trial and error from someone who is experiencing the issue.
@DUOLabs333 commented on GitHub (Dec 9, 2021):
I get unblocked every hour or so, and it comes up again when I'm searching up something programming-related (or that is unrelated, and the search just happens to be programming-related due to my searching habits).
@ericjohncarlson commented on GitHub (Dec 9, 2021):
Makes sense.
It was the
latesttag your docker repo (benbusby/whoogle-search:latest). This is on FIOS connection from a residential connection, for sure. Would it be possible to allow setting the UA string from an environment variable? That'd be handy here as could find my local ua from the browser and give it over to Whoogle easily.@accountForIssues commented on GitHub (Dec 12, 2021):
I maintain a private instance on the cloud and I started getting rate limited almost every day (sometimes multiple times a day). Usually I could delete and recreate the app to get a new IP and it would work but a bit later it got limited again.
I noticed that it was using
WHOOGLE_CONFIG_COUNTRY=USeven though I never used it or set it. Maybe some update caused it. Idk.When I followed the farside link into another instance and played around with the country parameter (in the URL), that instance got limited after a few (< 10) requests. I don't think that was coincidental.
As someone above mentioned that using invalid values could be causing these issues, I removed the country by using
WHOOGLE_CONFIG_COUNTRY="". Now since about 2 days, it has been working fine with no limiting. Maybe it's a coincidence. I'll keep trying but just wanted to share. Even searching for 'technical' and 'specific' terms works fine.Could it be that Google is blocking repeated use of the
glparameter without auth ? Has anyone tried with an account and/or using cookies ?@DUOLabs333 commented on GitHub (Dec 13, 2021):
@accountForIssues I'll try this and see what happens.
@DUOLabs333 commented on GitHub (Dec 13, 2021):
@accountForIssues You're right, I see no rate limits. This may be the solution (or at least a short-term solution).
@DUOLabs333 commented on GitHub (Dec 13, 2021):
Never mind, it appears again.
@accountForIssues commented on GitHub (Dec 13, 2021):
@DUOLabs333 That's interesting. Maybe check your config and the farside link to see what (and if any) settings are being used by default or being overridden that could be causing an issue ?
My instance has been working fine since I made the change (fingers crossed).
The only settings I explicitly have are:
@DUOLabs333 commented on GitHub (Dec 13, 2021):
Interesting, I added
WHOOGLE_CONFIG_LANGUAGE=lang_en, then restarted, and now it works. It may be the fact that I restarted, or the option. I'll see what happens.@nakoo commented on GitHub (Dec 20, 2021):
Thank you for bringing it up. I'm confident that this issue happened by invalid
glparameter.I've had no issue so far after changing it.
Is this wokring again after adding that line? tbh I'm not sure why you're getting the captcha after this.
@DUOLabs333 commented on GitHub (Dec 20, 2021):
After adding the lang_en, I have had no issues.
@ericjohncarlson commented on GitHub (Dec 20, 2021):
Just chiming in here to say that adding a blank CONFIG_COUNTRY and setting CONFIG_LANGUAGE to lang_en also solved my captcha issues. I can try adding back the COUNTRY setting if that's helpful, but the combination of these two has been great.
@DUOLabs333 commented on GitHub (Dec 22, 2021):
I believe the issue can be closed now.
@nakoo commented on GitHub (Dec 22, 2021):
Since this issue is still relevant to the latest version, @benbusby needs to address this.
@DUOLabs333 commented on GitHub (Dec 22, 2021):
It can probably be solved by making the necessary whoogle.env changes the default, or hardcoding the values into the code.
@DUOLabs333 commented on GitHub (Dec 23, 2021):
I got captcha'd (this is what I'll be calling it now) again. Though that may be due to my use of a self-hosted youtube front-end.
@benbusby commented on GitHub (Dec 23, 2021):
That's very likely a reason. I host my (5) instances separately from other alt frontends for this reason.
I'm going to push a change to make the default country config blank (which will exclude the param from the url sent to big-G), but beyond that, I'm not willing to remove the country param altogether. It's literally the one thing that makes public instances actually useful now, rather than returning results dependent on the instance's hosting location. Without it, the public instances aren't really practical for the majority of users who don't want to self host. And I personally haven't seen any proof on public or private instances that I manage that the country code is what triggers captchas.
@ShlomiD83 commented on GitHub (May 9, 2022):
Hi,
I get the same error "instance has been ratelimited".
my current config is:
WHOOGLE_CONFIG_COUNTRY =IL
WHOOGLE_CONFIG_LANGUAGE=lang_en
WHOOGLE_SEARCH_LANGUAGE=lang_iw
I've tried removing these ENV variables, changing them, even removing the entire container and spinning up a new one.
is there a magic fix?
@DUOLabs333 commented on GitHub (May 9, 2022):
Yeah, I get the error some times now -- seems to be no way around it except for just getting lucky.
@ShlomiD83 commented on GitHub (May 9, 2022):
I've been using Whoogle for a while now, just now I started receiving this error.
@ericjohncarlson commented on GitHub (May 9, 2022):
I've had no rate limiting issue since adding these env variables:
WHOOGLE_DOTENV=1
WHOOGLE_CONFIG_LANGUAGE=lang_en
WHOOGLE_CONFIG_COUNTRY=
Important thing for me here was ensuring that COUNTRY is unset. I've had no issues since that point.
@ShlomiD83 commented on GitHub (May 9, 2022):
are you using a .env file? if not then WHOOGLE_DOTENV=1 is redundant.
I've tried your suggestion, unfortunately it didn't help.
@suzaku commented on GitHub (Aug 25, 2023):
It seems like what works is the restarting that get a new IP for your instance, not the specific env vars you change.
@AT3K commented on GitHub (Apr 19, 2024):
Would it work to run Gluetun and connect Whoogle to that?