[GH-ISSUE #467] URL编码错误导致代理失效 #371

Open
opened 2026-02-27 23:16:40 +03:00 by kerem · 9 comments
Owner

Originally created by @esingtse on GitHub (Feb 23, 2022).
Original GitHub issue: https://github.com/snail007/goproxy/issues/467

Current Behavior

通过CURL发起一个POST请求,具体内容如下:

curl -x http://test:test@*:7777 -X POST -H "Content-Type: application/x-www-form-urlencoded" -d "page=1&coldStart=true&count=6&pcursor=0&pv=true&newUserRefreshTimes=1&newUserAction=%7B%22click%22%3A%5B%5D%2C%22follow%22%3A%5B%5D%2C%22lik22%3A%5B%5D%7D&recoReportContext=%7B%22pushEvoke%22%3A1%2C%22pushUrl%22%3A%22%22%2C%22adClientInfo%22%3A%7B%22deviceStatBattery%22%3A100%2C%22deviceStatMemory%22%3A2058%2C%22deviceStatDiskFree%22%3A19979%7D%2C%22launchReferrerInfo%22%3A%7B%22packageName%22%3A%22com.google.android.apps.nexuslauncher%22%2C%22launchTime%22%3A1636613724420%7D%7D&source=2&edgeRecoBit=0&edgeRerankConfigVersion=&clientRealReportData=&videoModelCrowdTag=&os=android&cs=false&sig=45ca262eae2ed2125d464dd01c76a7a9&client_key=3c2cd3f3&__NS_sig3=9c8dfddead95da02d5d4d7d691e5dbdef707daac19c8cbdd" "http://apissl.gifshow.com/rest/n/feed/selection?mod=Google%28Pixel%29&keyconfig_state=1&appver=9.7.10.264&isp=&cold=true&language=zh-cn&sys=ANDROID_8.1.0&max_memory=256&ud=0&did_tag=0&egid=&cold_launch_time_ms=1644844417614&oc=ANDROID_DSP_BA_XXLRGZT_NSET_CPA_DSP_JS_GLHD_1%2CSUB10012&sh=1920&app_status=3&ddpi=420&browseType=4&power_mode=0&net=WIFI&socName=Qualcomm%20MSM8996PRO-AB&kcv=5&app=0&kpf=ANDROID_PHONE&bottom_navigation=false&ver=9.7&is_background=1&c=ANDROID_DSP_BA_XXLRGZT_NSET_CPA_DSP_JS_GLHD_1%2CSUB10012&oDid=TEST_ANDROID_860ff74c2e86959b&android_os=0&sw=1080&boardPlatform=msm8996&ftt=&kpn=KUAISHOU&androidApiLevel=27&newOc=ANDROID_DSP_BA_XXLRGZT_NSET_CPA_DSP_JS_GLHD_1%2CSUB10012&abi=arm32&country_code=CN&device_info=MDAydQz1xvFS8JpWQG%252BPGQdmFaT598IvmxlXlmhjxWNGCc8KbRRyZ8j2TdNgOYWP7iJc70oEjgsT%250AvTtG%252FOfhCrGSp28i4XJKqNCf%252BANZn0LO2pEqVEBs16Nc%252B%252FgCx7u9zZpoTh7KPic0PUAoBt%252FD5np7%250AHjWslb5qgcxNVhniCQSftw4ynQXYtxB3ngZITiIy2fMgmJPrNUicFiAojXqh%252FYl50JIFhBXJmYug%250Ag%252BIKuJJ71rA%252B57rGl01ZJr%252BAfvPxk6Pj7biVHp1LtNDTPX4KugfqWlK5ZrpKaOgQ6XT7isRTFfIh%250Al9W5ELOO%252BfSq5IXOK0iS560N%252BgBx4K7mzBODBQcCwlmp6UslxWReYRrnFSIJ1L%252BRpKBt%252B%252BSjeorP%250A9HNFYwElSH73YJVqhwHA2lLN%252FGTYQK7Jfb8O4k%252Fw5ugoK0gAXUD%252FzeFQ%252BN5eksFdjcGg4qoTklXf%250A8gnQqci0xK%252BAOQUQNLMG6gj7LwiVY7Y4XsqRlBBAiiQt2hqRPFlLHlMmSXglFWjAYxiWKu2U4mSk%250Ak3Ipz8FIXWIx%250A&totalMemory=3754&grant_browse_type=INITIALIZATION&nbh=126&hotfix_ver=&did_gt=1644844417614&iuid=&rdid=ANDROID_bb7c156387310eed&sbh=63&darkMode=false&did=ANDROID_860ff74c2e86959b"

返回的结果为curl: (52) Empty reply from server

通过proxy输出的debug日志为

2022/02/23 15:12:08.354678 sps/sps.go:883 WARN new http request fail,ERR: http decoder data line err:POST http://apissl.gifshow.com/rest/n/feed/selecti
2022/02/23 15:12:08.354760 sps/sps.go:697 WARN connect to tcp parent  fail, ERR:http decoder data line err:POST http://apissl.gifshow.com/rest/n/feed/selecti from 61.144.147.226:61255

根据日志判断,有可能是代理在转发的时候,URL出现编码问题导致的

Possible Solution

  1. 将URL scheme由http换成https可以正常请求(但不符合正常请求逻辑,还是希望通过http的方式请求)

  2. URL scheme不变,代理模式由http改成socks5,可以正常请求(由于爬虫暂时没有支持socks5代理,希望用回http代理模式)

Context (Environment)

  1. proxy version is : commercial_11.4

  2. full command is : proxy sps -p :7777 --bind-ip ppp*:7777 --max-conns-rate=0 --authcode xxx --log ~/debug.log -a test:test --debug

  3. system is : CentOS 7.6

Possible Implementation

通过仔细排查,发现是在URL params里面的device_info参数导致,其具体的值为

MDAydQz1xvFS8JpWQG%252BPGQdmFaT598IvmxlXlmhjxWNGCc8KbRRyZ8j2TdNgOYWP7iJc70oEjgsT%250AvTtG%252FOfhCrGSp28i4XJKqNCf%252BANZn0LO2pEqVEBs16Nc%252B%252FgCx7u9zZpoTh7KPic0PUAoBt%252FD5np7%250AHjWslb5qgcxNVhniCQSftw4ynQXYtxB3ngZITiIy2fMgmJPrNUicFiAojXqh%252FYl50JIFhBXJmYug%250Ag%252BIKuJJ71rA%252B57rGl01ZJr%252BAfvPxk6Pj7biVHp1LtNDTPX4KugfqWlK5ZrpKaOgQ6XT7isRTFfIh%250Al9W5ELOO%252BfSq5IXOK0iS560N%252BgBx4K7mzBODBQcCwlmp6UslxWReYRrnFSIJ1L%252BRpKBt%252B%252BSjeorP%250A9HNFYwElSH73YJVqhwHA2lLN%252FGTYQK7Jfb8O4k%252Fw5ugoK0gAXUD%252FzeFQ%252BN5eksFdjcGg4qoTklXf%250A8gnQqci0xK%252BAOQUQNLMG6gj7LwiVY7Y4XsqRlBBAiiQt2hqRPFlLHlMmSXglFWjAYxiWKu2U4mSk%250Ak3Ipz8FIXWIx%250A

经由URL解码之后,发现包含了/字符,也就是%252F导致的,如果该字符出现在URL中,会导致http url格式出现问题。

请问是否可以针对这种情况兼容处理?

Originally created by @esingtse on GitHub (Feb 23, 2022). Original GitHub issue: https://github.com/snail007/goproxy/issues/467 <!--- Provide a general summary of the issue in the Title above --> ## Current Behavior <!--- Tell us what happens instead of the expected behavior --> 通过CURL发起一个POST请求,具体内容如下: ``` curl -x http://test:test@*:7777 -X POST -H "Content-Type: application/x-www-form-urlencoded" -d "page=1&coldStart=true&count=6&pcursor=0&pv=true&newUserRefreshTimes=1&newUserAction=%7B%22click%22%3A%5B%5D%2C%22follow%22%3A%5B%5D%2C%22lik22%3A%5B%5D%7D&recoReportContext=%7B%22pushEvoke%22%3A1%2C%22pushUrl%22%3A%22%22%2C%22adClientInfo%22%3A%7B%22deviceStatBattery%22%3A100%2C%22deviceStatMemory%22%3A2058%2C%22deviceStatDiskFree%22%3A19979%7D%2C%22launchReferrerInfo%22%3A%7B%22packageName%22%3A%22com.google.android.apps.nexuslauncher%22%2C%22launchTime%22%3A1636613724420%7D%7D&source=2&edgeRecoBit=0&edgeRerankConfigVersion=&clientRealReportData=&videoModelCrowdTag=&os=android&cs=false&sig=45ca262eae2ed2125d464dd01c76a7a9&client_key=3c2cd3f3&__NS_sig3=9c8dfddead95da02d5d4d7d691e5dbdef707daac19c8cbdd" "http://apissl.gifshow.com/rest/n/feed/selection?mod=Google%28Pixel%29&keyconfig_state=1&appver=9.7.10.264&isp=&cold=true&language=zh-cn&sys=ANDROID_8.1.0&max_memory=256&ud=0&did_tag=0&egid=&cold_launch_time_ms=1644844417614&oc=ANDROID_DSP_BA_XXLRGZT_NSET_CPA_DSP_JS_GLHD_1%2CSUB10012&sh=1920&app_status=3&ddpi=420&browseType=4&power_mode=0&net=WIFI&socName=Qualcomm%20MSM8996PRO-AB&kcv=5&app=0&kpf=ANDROID_PHONE&bottom_navigation=false&ver=9.7&is_background=1&c=ANDROID_DSP_BA_XXLRGZT_NSET_CPA_DSP_JS_GLHD_1%2CSUB10012&oDid=TEST_ANDROID_860ff74c2e86959b&android_os=0&sw=1080&boardPlatform=msm8996&ftt=&kpn=KUAISHOU&androidApiLevel=27&newOc=ANDROID_DSP_BA_XXLRGZT_NSET_CPA_DSP_JS_GLHD_1%2CSUB10012&abi=arm32&country_code=CN&device_info=MDAydQz1xvFS8JpWQG%252BPGQdmFaT598IvmxlXlmhjxWNGCc8KbRRyZ8j2TdNgOYWP7iJc70oEjgsT%250AvTtG%252FOfhCrGSp28i4XJKqNCf%252BANZn0LO2pEqVEBs16Nc%252B%252FgCx7u9zZpoTh7KPic0PUAoBt%252FD5np7%250AHjWslb5qgcxNVhniCQSftw4ynQXYtxB3ngZITiIy2fMgmJPrNUicFiAojXqh%252FYl50JIFhBXJmYug%250Ag%252BIKuJJ71rA%252B57rGl01ZJr%252BAfvPxk6Pj7biVHp1LtNDTPX4KugfqWlK5ZrpKaOgQ6XT7isRTFfIh%250Al9W5ELOO%252BfSq5IXOK0iS560N%252BgBx4K7mzBODBQcCwlmp6UslxWReYRrnFSIJ1L%252BRpKBt%252B%252BSjeorP%250A9HNFYwElSH73YJVqhwHA2lLN%252FGTYQK7Jfb8O4k%252Fw5ugoK0gAXUD%252FzeFQ%252BN5eksFdjcGg4qoTklXf%250A8gnQqci0xK%252BAOQUQNLMG6gj7LwiVY7Y4XsqRlBBAiiQt2hqRPFlLHlMmSXglFWjAYxiWKu2U4mSk%250Ak3Ipz8FIXWIx%250A&totalMemory=3754&grant_browse_type=INITIALIZATION&nbh=126&hotfix_ver=&did_gt=1644844417614&iuid=&rdid=ANDROID_bb7c156387310eed&sbh=63&darkMode=false&did=ANDROID_860ff74c2e86959b" ``` 返回的结果为`curl: (52) Empty reply from server` 通过proxy输出的debug日志为 ``` 2022/02/23 15:12:08.354678 sps/sps.go:883 WARN new http request fail,ERR: http decoder data line err:POST http://apissl.gifshow.com/rest/n/feed/selecti 2022/02/23 15:12:08.354760 sps/sps.go:697 WARN connect to tcp parent fail, ERR:http decoder data line err:POST http://apissl.gifshow.com/rest/n/feed/selecti from 61.144.147.226:61255 ``` 根据日志判断,有可能是代理在转发的时候,URL出现编码问题导致的 ## Possible Solution <!--- Not obligatory, but suggest a fix/reason for the bug, --> 1. 将URL scheme由http换成https可以正常请求(但不符合正常请求逻辑,还是希望通过http的方式请求) 2. URL scheme不变,代理模式由http改成socks5,可以正常请求(由于爬虫暂时没有支持socks5代理,希望用回http代理模式) ## Context (Environment) <!--- How has this issue affected you? What are you trying to accomplish? --> <!--- Providing context helps us come up with a solution that is most useful in the real world --> 1. proxy version is : commercial_11.4 2. full command is : proxy sps -p :7777 --bind-ip ppp*:7777 --max-conns-rate=0 --authcode xxx --log ~/debug.log -a test:test --debug 3. system is : CentOS 7.6 ## Possible Implementation <!--- Not obligatory, but suggest an idea for implementing addition or change --> 通过仔细排查,发现是在URL params里面的`device_info`参数导致,其具体的值为 ``` MDAydQz1xvFS8JpWQG%252BPGQdmFaT598IvmxlXlmhjxWNGCc8KbRRyZ8j2TdNgOYWP7iJc70oEjgsT%250AvTtG%252FOfhCrGSp28i4XJKqNCf%252BANZn0LO2pEqVEBs16Nc%252B%252FgCx7u9zZpoTh7KPic0PUAoBt%252FD5np7%250AHjWslb5qgcxNVhniCQSftw4ynQXYtxB3ngZITiIy2fMgmJPrNUicFiAojXqh%252FYl50JIFhBXJmYug%250Ag%252BIKuJJ71rA%252B57rGl01ZJr%252BAfvPxk6Pj7biVHp1LtNDTPX4KugfqWlK5ZrpKaOgQ6XT7isRTFfIh%250Al9W5ELOO%252BfSq5IXOK0iS560N%252BgBx4K7mzBODBQcCwlmp6UslxWReYRrnFSIJ1L%252BRpKBt%252B%252BSjeorP%250A9HNFYwElSH73YJVqhwHA2lLN%252FGTYQK7Jfb8O4k%252Fw5ugoK0gAXUD%252FzeFQ%252BN5eksFdjcGg4qoTklXf%250A8gnQqci0xK%252BAOQUQNLMG6gj7LwiVY7Y4XsqRlBBAiiQt2hqRPFlLHlMmSXglFWjAYxiWKu2U4mSk%250Ak3Ipz8FIXWIx%250A ``` 经由URL解码之后,发现包含了/字符,也就是`%252F`导致的,如果该字符出现在URL中,会导致http url格式出现问题。 请问是否可以针对这种情况兼容处理?
Author
Owner

@snail007 commented on GitHub (Feb 23, 2022):

proxy:日志
image
curl详情:
image
使用你提供的curl信息请求,没有发现你说的问题,正常代理。

<!-- gh-comment-id:1048575969 --> @snail007 commented on GitHub (Feb 23, 2022): proxy:日志 <img width="1795" alt="image" src="https://user-images.githubusercontent.com/4533203/155289394-d4367e44-cac3-4dc4-a9f7-2a3628bb91b0.png"> curl详情: <img width="1808" alt="image" src="https://user-images.githubusercontent.com/4533203/155289429-51abb8d8-57f0-4a10-8a71-ec84e878838c.png"> 使用你提供的curl信息请求,没有发现你说的问题,正常代理。
Author
Owner

@esingtse commented on GitHub (Feb 23, 2022):

proxy:日志 image curl详情: image 使用你提供的curl信息请求,没有发现你说的问题,正常代理。

image
我按照你的启动方式,proxy http -p :7777,发现会出现成功的情况,但也有出现
Client端
curl: (52) Empty reply from server
Servrer端
2022/02/23 18:07:21.898899 http/http.go:638 WARN decoder error , from 61.144.144.15:57418, ERR:http decoder data line err:POST http://apissl.gifshow.com/rest/n/feed/selecti

是否proxy spsproxy http有所不同?

此外,我的版本是commercial_11.4

<!-- gh-comment-id:1048625162 --> @esingtse commented on GitHub (Feb 23, 2022): > proxy:日志 <img alt="image" width="1795" src="https://user-images.githubusercontent.com/4533203/155289394-d4367e44-cac3-4dc4-a9f7-2a3628bb91b0.png"> curl详情: <img alt="image" width="1808" src="https://user-images.githubusercontent.com/4533203/155289429-51abb8d8-57f0-4a10-8a71-ec84e878838c.png"> 使用你提供的curl信息请求,没有发现你说的问题,正常代理。 ![image](https://user-images.githubusercontent.com/24379369/155298639-3df5a21e-6257-4c65-bd15-106fd7a71620.png) 我按照你的启动方式,`proxy http -p :7777`,发现会出现成功的情况,但也有出现 Client端 `curl: (52) Empty reply from server` Servrer端 `2022/02/23 18:07:21.898899 http/http.go:638 WARN decoder error , from 61.144.144.15:57418, ERR:http decoder data line err:POST http://apissl.gifshow.com/rest/n/feed/selecti` 是否`proxy sps`跟`proxy http`有所不同? 此外,我的版本是commercial_11.4
Author
Owner

@sayue2019 commented on GitHub (Feb 23, 2022):

同样遇到在url中包含符号*(搜狗wx的链接),请求不正常,这是个bug吧

<!-- gh-comment-id:1048702473 --> @sayue2019 commented on GitHub (Feb 23, 2022): 同样遇到在url中包含符号*(搜狗wx的链接),请求不正常,这是个bug吧
Author
Owner

@snail007 commented on GitHub (Feb 23, 2022):

同样遇到在url中包含符号*(搜狗wx的链接),请求不正常,这是个bug吧
不是字符的问题,和字符无关,问题已经定位,下个版本中已经修复。

<!-- gh-comment-id:1048829567 --> @snail007 commented on GitHub (Feb 23, 2022): > 同样遇到在url中包含符号*(搜狗wx的链接),请求不正常,这是个bug吧 不是字符的问题,和字符无关,问题已经定位,下个版本中已经修复。
Author
Owner

@snail007 commented on GitHub (Feb 23, 2022):

下个版本:
1、优化了http/sps代理,新增--http-header-buffer参数,单位字节,设置读取http头部buffer大小,用来支持当http头部很大的时候的情况,默认是4096。
2、优化了http/sps代理,新增--http-header-timeout参数,单位毫秒,设置读取http头部超时时间,默认是1000毫秒。

<!-- gh-comment-id:1048831102 --> @snail007 commented on GitHub (Feb 23, 2022): 下个版本: 1、优化了http/sps代理,新增`--http-header-buffer`参数,单位字节,设置读取http头部buffer大小,用来支持当http头部很大的时候的情况,默认是4096。 2、优化了http/sps代理,新增`--http-header-timeout`参数,单位毫秒,设置读取http头部超时时间,默认是1000毫秒。
Author
Owner

@esingtse commented on GitHub (Feb 24, 2022):

下个版本: 1、优化了http/sps代理,新增--http-header-buffer参数,单位字节,设置读取http头部buffer大小,用来支持当http头部很大的时候的情况,默认是4096。 2、优化了http/sps代理,新增--http-header-timeout参数,单位毫秒,设置读取http头部超时时间,默认是1000毫秒。

我的Headers只有 -H "Content-Type: application/x-www-form-urlencoded"

请教一下与buffer大小、timeout有什么关联?

<!-- gh-comment-id:1049407192 --> @esingtse commented on GitHub (Feb 24, 2022): > 下个版本: 1、优化了http/sps代理,新增`--http-header-buffer`参数,单位字节,设置读取http头部buffer大小,用来支持当http头部很大的时候的情况,默认是4096。 2、优化了http/sps代理,新增`--http-header-timeout`参数,单位毫秒,设置读取http头部超时时间,默认是1000毫秒。 我的Headers只有 `-H "Content-Type: application/x-www-form-urlencoded"` 请教一下与buffer大小、timeout有什么关联?
Author
Owner

@snail007 commented on GitHub (Feb 24, 2022):

和header无关,这个错误是程序设置的读取头部缓冲区默认1k,你的url超过了1K,所以无法读到后面的头部,自然不能解析头部,和你发的是什么数据,什么url,无任何关系。

<!-- gh-comment-id:1049439653 --> @snail007 commented on GitHub (Feb 24, 2022): 和header无关,这个错误是程序设置的读取头部缓冲区默认1k,你的url超过了1K,所以无法读到后面的头部,自然不能解析头部,和你发的是什么数据,什么url,无任何关系。
Author
Owner

@baiheng commented on GitHub (Jun 16, 2022):

有个问题要注意一下的,我改了之前版本的源码。header写死是4096是一个问题。另一个更加严重的问题是,golang 的read方法当数据包没有准备好的时候,他就返回了。所以当数据包比较大的时候,出现的必然bug就是读取的数据还没有完成。你代码里面的structs.go 里面的(*inConn).Read(buf[n:])这个函数,是个bug。看了一下解决方案,比较麻烦,就是解析http包头里面的Content-Length字段,然后再读后面的body出来。fyi https://zhuanlan.zhihu.com/p/351174167

<!-- gh-comment-id:1157368742 --> @baiheng commented on GitHub (Jun 16, 2022): 有个问题要注意一下的,我改了之前版本的源码。header写死是4096是一个问题。另一个更加严重的问题是,golang 的read方法当数据包没有准备好的时候,他就返回了。所以当数据包比较大的时候,出现的必然bug就是读取的数据还没有完成。你代码里面的structs.go 里面的(*inConn).Read(buf[n:])这个函数,是个bug。看了一下解决方案,比较麻烦,就是解析http包头里面的Content-Length字段,然后再读后面的body出来。fyi https://zhuanlan.zhihu.com/p/351174167
Author
Owner

@snail007 commented on GitHub (Jun 16, 2022):

这个地方,新版已经解决。

<!-- gh-comment-id:1157480138 --> @snail007 commented on GitHub (Jun 16, 2022): 这个地方,新版已经解决。
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/goproxy#371
No description provided.