[GH-ISSUE #147] waf_verify_bot 有区分大小写或支持正则表达式吗 #107

Closed
opened 2026-03-04 12:18:58 +03:00 by kerem · 5 comments
Owner

Originally created by @xyz5s on GitHub (Nov 19, 2024).
Original GitHub issue: https://github.com/ADD-SP/ngx_waf/issues/147

ngx_waf: https://hub.docker.com/layers/addsp/ngx_waf-prebuild/ngx-1.25.4-module-current-glibc
nginx version: nginx/1.25.4
conf: waf_verify_bot strict GoogleBot googlebot BingBot BaiduSpider YandexBot ;

error:
curl -I 127.0.0.1 -H 'User-Agent: BaiduSpider'
HTTP/1.1 200 OK
Server: nginx/1.25.4
Date: Tue, 19 Nov 2024 06:25:13 GMT
Content-Type: text/html
Content-Length: 615
Last-Modified: Tue, 19 Nov 2024 05:54:38 GMT
Connection: keep-alive
ETag: "673c281e-267"
Accept-Ranges: bytes

curl -I 127.0.0.1 -H 'User-Agent: Baiduspider'
HTTP/1.1 403 Forbidden
Server: nginx/1.25.4
Date: Tue, 19 Nov 2024 06:25:10 GMT
Content-Type: text/html
Content-Length: 153
Connection: keep-alive

curl -I 127.0.0.1 -H 'User-Agent: googlebot'
HTTP/1.1 200 OK
Server: nginx/1.25.4
Date: Tue, 19 Nov 2024 06:25:44 GMT
Content-Type: text/html
Content-Length: 615
Last-Modified: Tue, 19 Nov 2024 05:54:38 GMT
Connection: keep-alive
ETag: "673c281e-267"
Accept-Ranges: bytes

curl -I 127.0.0.1 -H 'User-Agent: Googlebot'
HTTP/1.1 403 Forbidden
Server: nginx/1.25.4
Date: Tue, 19 Nov 2024 06:25:48 GMT
Content-Type: text/html
Content-Length: 153
Connection: keep-alive

log:
2024/11/19 06:25:10 [alert] 81602#81602: *12 ngx_waf: [FAKE-BOT][Baiduspider] while logging request, client: 127.0.0.1, server: localhost, request: "HEAD / HTTP/1.1", host: "127.0.0.1"
2024/11/19 06:25:48 [alert] 81602#81602: *15 ngx_waf: [FAKE-BOT][GoogleBot] while logging request, client: 127.0.0.1, server: localhost, request: "HEAD / HTTP/1.1", host: "127.0.0.1"

Originally created by @xyz5s on GitHub (Nov 19, 2024). Original GitHub issue: https://github.com/ADD-SP/ngx_waf/issues/147 ngx_waf: https://hub.docker.com/layers/addsp/ngx_waf-prebuild/ngx-1.25.4-module-current-glibc nginx version: nginx/1.25.4 conf: waf_verify_bot strict GoogleBot googlebot BingBot BaiduSpider YandexBot ; error: curl -I 127.0.0.1 -H 'User-Agent: BaiduSpider' HTTP/1.1 200 OK Server: nginx/1.25.4 Date: Tue, 19 Nov 2024 06:25:13 GMT Content-Type: text/html Content-Length: 615 Last-Modified: Tue, 19 Nov 2024 05:54:38 GMT Connection: keep-alive ETag: "673c281e-267" Accept-Ranges: bytes curl -I 127.0.0.1 -H 'User-Agent: Baiduspider' HTTP/1.1 403 Forbidden Server: nginx/1.25.4 Date: Tue, 19 Nov 2024 06:25:10 GMT Content-Type: text/html Content-Length: 153 Connection: keep-alive curl -I 127.0.0.1 -H 'User-Agent: googlebot' HTTP/1.1 200 OK Server: nginx/1.25.4 Date: Tue, 19 Nov 2024 06:25:44 GMT Content-Type: text/html Content-Length: 615 Last-Modified: Tue, 19 Nov 2024 05:54:38 GMT Connection: keep-alive ETag: "673c281e-267" Accept-Ranges: bytes curl -I 127.0.0.1 -H 'User-Agent: Googlebot' HTTP/1.1 403 Forbidden Server: nginx/1.25.4 Date: Tue, 19 Nov 2024 06:25:48 GMT Content-Type: text/html Content-Length: 153 Connection: keep-alive log: 2024/11/19 06:25:10 [alert] 81602#81602: *12 ngx_waf: [FAKE-BOT][Baiduspider] while logging request, client: 127.0.0.1, server: localhost, request: "HEAD / HTTP/1.1", host: "127.0.0.1" 2024/11/19 06:25:48 [alert] 81602#81602: *15 ngx_waf: [FAKE-BOT][GoogleBot] while logging request, client: 127.0.0.1, server: localhost, request: "HEAD / HTTP/1.1", host: "127.0.0.1"
kerem closed this issue 2026-03-04 12:18:58 +03:00
Author
Owner

@xyz5s commented on GitHub (Nov 19, 2024):

waf_verify_bot strict GoogleBot googlebot BingBot BaiduSpider YandexBot ;

curl -I 127.0.0.1 -H 'User-Agent: googlebot'
HTTP/1.1 200 OK
Server: nginx/1.25.4
Date: Tue, 19 Nov 2024 07:19:37 GMT
Content-Type: text/html
Content-Length: 615
Last-Modified: Tue, 19 Nov 2024 05:54:38 GMT
Connection: keep-alive
ETag: "673c281e-267"
Accept-Ranges: bytes

<!-- gh-comment-id:2484876133 --> @xyz5s commented on GitHub (Nov 19, 2024): waf_verify_bot strict GoogleBot googlebot BingBot BaiduSpider YandexBot ; curl -I 127.0.0.1 -H 'User-Agent: googlebot' HTTP/1.1 200 OK Server: nginx/1.25.4 Date: Tue, 19 Nov 2024 07:19:37 GMT Content-Type: text/html Content-Length: 615 Last-Modified: Tue, 19 Nov 2024 05:54:38 GMT Connection: keep-alive ETag: "673c281e-267" Accept-Ranges: bytes
Author
Owner

@ADD-SP commented on GitHub (Nov 20, 2024):

github.com/ADD-SP/ngx_waf@acbf861c0b/src/ngx_http_waf_module_config.c (L975-L993)

可能需要手动编辑一下源代码来实现,目前硬编码的正则是大小写敏感的。


不过这个需求听起来比较奇怪,是有什么特殊的用例么?因为搜索引擎的 UA 通常都不会随便变化的,

<!-- gh-comment-id:2488302715 --> @ADD-SP commented on GitHub (Nov 20, 2024): https://github.com/ADD-SP/ngx_waf/blob/acbf861c0b270f4bd42b70860fbd0e74d6c8271b/src/ngx_http_waf_module_config.c#L975-L993 可能需要手动编辑一下源代码来实现,目前硬编码的正则是大小写敏感的。 *** 不过这个需求听起来比较奇怪,是有什么特殊的用例么?因为搜索引擎的 UA 通常都不会随便变化的,
Author
Owner

@xyz5s commented on GitHub (Nov 21, 2024):

waf难道不是有部分功能来防爬的吗,如果这个不算bug,我是不是都可以伪装UA绕过waf

<!-- gh-comment-id:2489948034 --> @xyz5s commented on GitHub (Nov 21, 2024): waf难道不是有部分功能来防爬的吗,如果这个不算bug,我是不是都可以伪装UA绕过waf
Author
Owner

@ADD-SP commented on GitHub (Nov 24, 2024):

waf难道不是有部分功能来防爬的吗,如果这个不算bug,我是不是都可以伪装UA绕过waf

@xyz5s 请参考严格模式,可以通过反向 DNS 验证 bot 身份。

https://add-sp.github.io/ngx_waf-docs/zh-cn/advance/directive.html#waf-verify-bot

<!-- gh-comment-id:2496014819 --> @ADD-SP commented on GitHub (Nov 24, 2024): > waf难道不是有部分功能来防爬的吗,如果这个不算bug,我是不是都可以伪装UA绕过waf @xyz5s 请参考严格模式,可以通过反向 DNS 验证 bot 身份。 https://add-sp.github.io/ngx_waf-docs/zh-cn/advance/directive.html#waf-verify-bot
Author
Owner

@xyz5s commented on GitHub (Nov 26, 2024):

好的

<!-- gh-comment-id:2499945792 --> @xyz5s commented on GitHub (Nov 26, 2024): 好的
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ngx_waf#107
No description provided.