[GH-ISSUE #4038] Maintainers, has this any chance of being merged ? #2633

Open
opened 2026-02-26 07:36:16 +03:00 by kerem · 3 comments
Owner

Originally created by @AnnoyingTechnology on GitHub (Oct 7, 2024).
Original GitHub issue: https://github.com/NginxProxyManager/nginx-proxy-manager/issues/4038

Love this project and its openappsec-enabled fork.

I would like to provide a PR for some enhancements regarding privacy/security.

Before doing so, could maintainers tell me if they have any chance of being merged ?

Added features :

  • Toggle per vhost "Refuse Indexing" ([i] request that crawling bots don't index this host)
    • expose a /robots.txt Disallow: /
    • adds Header X-Robots-Tag: noindex
add_header X-Robots-Tag "noindex, nofollow";
location = /robots.txt {
  default_type "text/plain";
  return 200 'User-agent: *\nDisallow: /\n';
}
  • Toggle per vhost "Block Indexing" ([i] block crawling bots such a GoogleBot) [1]
    • return 404 to a bunch of user agents known to be crawlers. Useful for bots that disregard robots.txt
if ($http_user_agent ~* (catexplorador|CensysInspect|blexbot|smtbot|nimbostratus|nmap|BlackWidow|ChinaClaw|Custo|DISCo|Download|Demon|eCatch|EirGrabber|EmailSiphon|EmailWolf|SuperHTTP|Surfbot|WebWhacker|Express|WebPictures|ExtractorPro|EyeNetIE|FlashGet|GetRight|GetWeb!|Go!Zilla|Go-Ahead-Got-It|GrabNet|Grafula|HMView|Go!Zilla|Go-Ahead-Got-It|rafula|HMView|HTTrack|Stripper|Sucker|Indy|InterGET|Ninja|JetCar|Spider|larbin|LeechFTP|Downloader|tool|Navroad|NearSite|NetAnts|tAkeOut|WWWOFFLE|GrabNet|NetSpider|Vampire|NetZIP|Octopus|Offline|PageGrabber|Foto|pavuk|pcBrowser|RealDownload|ReGet|SiteSnagger|SmartDownload|SuperBot|WebSpider|Teleport|VoidEYE|Collector|WebAuto|WebCopier|WebFetch|WebGo|WebLeacher|WebReaper|WebSauger|eXtractor|Quester|WebStripper|WebZIP|Widow|Zeus|WebCollector|WebCopy|webcraw|ahrefsbot|alexibot|appengine|aqua_products|archive.org_bot|archive|asterias|attackbot|b2w|backdoorbot|becomebot|blackwidow|blekkobot|blowfish|botalot|builtbottough|bullseye|bunnyslippers|cipacrawler|cliqzbot|coccocbot|domaincheck|lightspeed|linkdex|masscan|megaindex|ccbot|cheesebot|cherrypicker|chinaclaw|chroot|clshttp|collector|control|copernic|copyrightcheck|copyscape|cosmos|craftbot|crescent|custo|demon|disco|dittospyder|dotbot|download|downloader|dumbot|ecatch|eirgrabber|email|emailcollector|emailsiphon|emailwolf|enterprise_search|erocrawler|eventmachine|exabot|express|extractor|extractorpro|eyenetie|fairad|flaming|flashget|foobot|foto|gaisbot|getright|getty|getweb!|gigabot|github|go!zilla|go-ahead-got-it|go-http-client|grabnet|grafula|grub|hari|harvest|hatena|antenna|hloader|hmview|htmlparser|httrack|humanlinks|ia_archiver|indy|infonavirobot|interget|intraformant|iron33|jamesbot|jennybot|jetbot|jetcar|joc|jorgee|kenjin|keyword|larbin|leechftp|lexibot|library|libweb|linkextractorpro|linkpadbot|linkscan|linkwalker|lnspiderguy|looksmart|lwp-trivial|mass|mata|midown|miixpc|mister|netcraft|netestate|nsrbot|mj12bot|moget|msiecrawler|naver|navroad|nearsite|nerdybot|netants|netmechanic|netspider|netzip|nicerspro|ninja|nutch|octopus|offline|openbot|openfind|openlink|pagegrabber|papa|pavuk|pcbrowser|perman|picscout|propowerbot|prowebwalker|psbot|queryn|quester|radiation|realdownload|reget|retriever|seekport|rogerbot|scan|screaming|frog|scooter|searchengineworld|searchpreview|semrush|semrushbot|semrushbot-sa|sogou|xovibot|seokicks-robot|sitesnagger|smartdownload|sootle|spankbot|spanner|spbot|stanford|stripper|superbot|superhttp|surfbot|surveybot|suzuran|szukacz|takeout|teleport|telesoft|thenomad|tocrawl|true_robot|turingos|twengabot|typhoeus|url_spider_pro|urldispatcher|urly|vampire|vci|voideye|warning|webauto|webbandit|webcollector|webcopier|webcopy|webcraw|webenhancer|webfetch|webgo|webleacher|webmasterworld|webmasterworldforumbot|webpictures|webreaper|websauger|webspider|webster|webstripper|webvac|webviewer|webwhacker|webzip|webzip|wesee|widow|plukkie|probethenet|riddler|woobot|www-collector-e|wwwoffle|xenu|semrushbot|ahrefsbot) ) {
        return 404;
}
  • Toggle per vhost "Block Scanners" ([i] block common scanners)
    • return 403 to a bunch a user agents known to be security scanners
if ($http_user_agent ~* (htmlparser|CensysInspect|libwww|Python|perl|urllib|scan|Curl|email|PycURL|Pyth|PyQ|Wget|wget|okhttp|libwww|Wget|LWP|damnBot|BBBike|java|detection|dirbuster) ) {
	return 403;
}
  • Toggle per vhost "Block AI" ([i] block bots relating to AI/LLM training)
    • return a 404 to a bunch of user agents known to be LLM/AI crawlers
if ($http_user_agent ~* (AI2Bot|Ai2Bot-Dolma|Amazonbot|Applebot|Applebot-Extended|Bytespider|CCBot|ChatGPT-User|Claude-Web|ClaudeBot|Diffbot|FacebookBot|FriendlyCrawler|GPTBot|Google-Extended|GoogleOther|GoogleOther-Image|GoogleOther-Video|ICC-Crawler|ISSCyberRiskCrawler|ImagesiftBot|Kangaroo Bot|Meta-ExternalAgent|Meta-ExternalFetcher|OAI-SearchBot|PerplexityBot|PetalBot|Scrapy|Sidetrade indexer bot|Timpibot|VelenPublicWebCrawler|Webzio-Extended|YouBot|anthropic-ai|cohere-ai|facebookexternalhit|iaskspider/2.0|img2dataset|omgili|omgilibot) ) {
        return 404;
}
  • Toggle per vhost "Block commonly probed URLs" ([i] return 403 to many commonly probed URLs, Warning! this will break default wordpress, joomla and possibly some other software)
    • return a 403 to a bunch of URLs typically scanned by dirbusters, script kiddies, bots. As to allow fail2ban to enforce IP level ban in case of excessive 403, a particularly effective technique.

The last one could probably be merged with the current "Block Common Exploits"

  • "//xmlrpc.php?rsd"
  • "//website/wp-includes/wlwmanifest.xml"
  • "//wp/wp-includes/wlwmanifest.xml"
  • "//blog/wp-includes/wlwmanifest.xml"
  • "//wp-includes/wlwmanifest.xml"
  • "/FileZilla.xml"
  • "/filezilla.xml"
  • "/sitemanager.xml"
  • "/WS_FTP.INI"
  • "/ws_ftp.ini"
  • "/deployment-config.json"
  • "/config/database.yml"
  • "/config/databases.yml"
  • "/lfm.php"
  • "/sqlite/main.php"
  • "/sqlitemanager/main.php"
  • "/SQLiteManager/main.php"
  • "/SQlite/main.php"
  • "/agSearch/SQlite/main.php"
  • "/HNAP1/"
  • "/getcfg.php"
  • "/jenkins/script"
  • "/mysqldumper/"
  • "/mysql/"
  • "/sql/"
  • "/phpMyAdmin-*"
  • "/phpMyAdmin-*"
  • "/hudson/script"
  • "/Joomla/administrator/"
  • "/joomla/administrator/"
  • "/status?full=true"
  • "/admin.php"
  • "/admin/login.php"
  • "/administrator/index.php"
  • "/ajaxproxy/proxy.php"
  • "/magmi/web/magmi.php"
  • "/wp-login.php"
  • "/dev/wp-admin/"
  • "/demo/wp-admin/"
  • "/backup/wp-admin/"
  • "/old/wp-admin/"
  • "/wp/wp-admin/"
  • "/new/wp-admin/"
  • "/wordpress/wp-admin/"
  • "/temp/wp-admin/"
  • "/blog/xmlrpc.php"
  • "/.git/"
  • "/.git/HEAD"
  • "//webconfig.txt.php"
  • "//administrator//webconfig.txt.php"
  • "///webconfig.txt.php"
  • "/bogusSkipfish-Inject:bogus"
  • "/sfi9876"
  • "/adminer-*"
  • "/.htaccess*"
  • "/+/skipfish-bom"
  • "/HoHTXVlJ*"
  • "/joomla/"
  • "/.env"
  • "/xmlrpc.php"
  • "/phpinfo.php"
  • "/phpsysinfo/"
  • "/phpmyadmin/"
  • "/login.php"
  • "/config.php"
  • "/config/"
  • "/data/"
  • "/lib/"
  • "/library/"
  • "/cgi/"
  • "/cgi.cgi/"
  • "/bin/"
  • "/phpMyAdmin/"
  • "/admin/cgi"
  • "/piwik/"
  • "/magento/"
  • "/cgi-bin/"
  • "/adm/"
  • "/administrator/"
  • "/3rdparty/phpmyadmin/"
  • "/pma/"
  • "/ownCloud/"
  • "/cms/"
  • "/index.pl"
  • "/index.cgi"
  • "/test/"
  • "/wordpress/"
  • "/cms/"
  • "index.asp"
  • "index.aspx"
  • "/index.action"
  • "/login.action"
  • "/manager/"
  • "/mantis/"
  • "/mantisbt/"
  • "/info.php"
  • "/info_php.php"
  • "/test.php"
  • "/admin.cgi"
  • "/login.pl"
  • "/data/owncloud.log"
  • "/data/owncloud.db"
  • "/.htpasswd"
  • "/.passwd"
  • "/private/"
  • "/phpBB/"
  • "/postnuke/"
  • "//wp-admin/admin-post.php*"
  • "//user/register/*"
  • "//wp-admin/admin-post.php*"
  • "/fckeditor/editor/filemanager/*"
  • "/shop/index.php/admin/"
  • "/store/index.php/admin/"
  • "/magento/index.php/admin/"
  • "/downloader/index.php"
  • "/errors/503.php"
  • "/shop/errors/503.php"
  • "/store/errors/503.php"
  • "/pub/errors/503.php"
  • "/magento2/pub/errors/503.php"
  • "/backup/admin.php"
  • "/backup/shell.php"
  • "/backup/shell.aspx"
  • "/backup/admin.aspx"
  • "/backup/admin.pl"
  • "/backup/admin.py"
  • "/.git/config"
Originally created by @AnnoyingTechnology on GitHub (Oct 7, 2024). Original GitHub issue: https://github.com/NginxProxyManager/nginx-proxy-manager/issues/4038 Love this project and its [openappsec-enabled](https://github.com/openappsec/open-appsec-npm) fork. I would like to provide a PR for some enhancements regarding privacy/security. Before doing so, could maintainers tell me if they have any chance of being merged ? ### Added features : - **Toggle per vhost "Refuse Indexing"** _([i] request that crawling bots don't index this host)_ * expose a /robots.txt Disallow: / * adds Header X-Robots-Tag: noindex ``` add_header X-Robots-Tag "noindex, nofollow"; location = /robots.txt { default_type "text/plain"; return 200 'User-agent: *\nDisallow: /\n'; } ``` - **Toggle per vhost "Block Indexing"** _([i] block crawling bots such a GoogleBot)_ [1] * return 404 to a bunch of user agents known to be crawlers. Useful for bots that disregard robots.txt ``` if ($http_user_agent ~* (catexplorador|CensysInspect|blexbot|smtbot|nimbostratus|nmap|BlackWidow|ChinaClaw|Custo|DISCo|Download|Demon|eCatch|EirGrabber|EmailSiphon|EmailWolf|SuperHTTP|Surfbot|WebWhacker|Express|WebPictures|ExtractorPro|EyeNetIE|FlashGet|GetRight|GetWeb!|Go!Zilla|Go-Ahead-Got-It|GrabNet|Grafula|HMView|Go!Zilla|Go-Ahead-Got-It|rafula|HMView|HTTrack|Stripper|Sucker|Indy|InterGET|Ninja|JetCar|Spider|larbin|LeechFTP|Downloader|tool|Navroad|NearSite|NetAnts|tAkeOut|WWWOFFLE|GrabNet|NetSpider|Vampire|NetZIP|Octopus|Offline|PageGrabber|Foto|pavuk|pcBrowser|RealDownload|ReGet|SiteSnagger|SmartDownload|SuperBot|WebSpider|Teleport|VoidEYE|Collector|WebAuto|WebCopier|WebFetch|WebGo|WebLeacher|WebReaper|WebSauger|eXtractor|Quester|WebStripper|WebZIP|Widow|Zeus|WebCollector|WebCopy|webcraw|ahrefsbot|alexibot|appengine|aqua_products|archive.org_bot|archive|asterias|attackbot|b2w|backdoorbot|becomebot|blackwidow|blekkobot|blowfish|botalot|builtbottough|bullseye|bunnyslippers|cipacrawler|cliqzbot|coccocbot|domaincheck|lightspeed|linkdex|masscan|megaindex|ccbot|cheesebot|cherrypicker|chinaclaw|chroot|clshttp|collector|control|copernic|copyrightcheck|copyscape|cosmos|craftbot|crescent|custo|demon|disco|dittospyder|dotbot|download|downloader|dumbot|ecatch|eirgrabber|email|emailcollector|emailsiphon|emailwolf|enterprise_search|erocrawler|eventmachine|exabot|express|extractor|extractorpro|eyenetie|fairad|flaming|flashget|foobot|foto|gaisbot|getright|getty|getweb!|gigabot|github|go!zilla|go-ahead-got-it|go-http-client|grabnet|grafula|grub|hari|harvest|hatena|antenna|hloader|hmview|htmlparser|httrack|humanlinks|ia_archiver|indy|infonavirobot|interget|intraformant|iron33|jamesbot|jennybot|jetbot|jetcar|joc|jorgee|kenjin|keyword|larbin|leechftp|lexibot|library|libweb|linkextractorpro|linkpadbot|linkscan|linkwalker|lnspiderguy|looksmart|lwp-trivial|mass|mata|midown|miixpc|mister|netcraft|netestate|nsrbot|mj12bot|moget|msiecrawler|naver|navroad|nearsite|nerdybot|netants|netmechanic|netspider|netzip|nicerspro|ninja|nutch|octopus|offline|openbot|openfind|openlink|pagegrabber|papa|pavuk|pcbrowser|perman|picscout|propowerbot|prowebwalker|psbot|queryn|quester|radiation|realdownload|reget|retriever|seekport|rogerbot|scan|screaming|frog|scooter|searchengineworld|searchpreview|semrush|semrushbot|semrushbot-sa|sogou|xovibot|seokicks-robot|sitesnagger|smartdownload|sootle|spankbot|spanner|spbot|stanford|stripper|superbot|superhttp|surfbot|surveybot|suzuran|szukacz|takeout|teleport|telesoft|thenomad|tocrawl|true_robot|turingos|twengabot|typhoeus|url_spider_pro|urldispatcher|urly|vampire|vci|voideye|warning|webauto|webbandit|webcollector|webcopier|webcopy|webcraw|webenhancer|webfetch|webgo|webleacher|webmasterworld|webmasterworldforumbot|webpictures|webreaper|websauger|webspider|webster|webstripper|webvac|webviewer|webwhacker|webzip|webzip|wesee|widow|plukkie|probethenet|riddler|woobot|www-collector-e|wwwoffle|xenu|semrushbot|ahrefsbot) ) { return 404; } ``` - **Toggle per vhost "Block Scanners"** _([i] block common scanners)_ * return 403 to a bunch a user agents known to be security scanners ``` if ($http_user_agent ~* (htmlparser|CensysInspect|libwww|Python|perl|urllib|scan|Curl|email|PycURL|Pyth|PyQ|Wget|wget|okhttp|libwww|Wget|LWP|damnBot|BBBike|java|detection|dirbuster) ) { return 403; } ``` - **Toggle per vhost "Block AI"** _([i] block bots relating to AI/LLM training)_ * return a 404 to a bunch of user agents known to be LLM/AI crawlers ``` if ($http_user_agent ~* (AI2Bot|Ai2Bot-Dolma|Amazonbot|Applebot|Applebot-Extended|Bytespider|CCBot|ChatGPT-User|Claude-Web|ClaudeBot|Diffbot|FacebookBot|FriendlyCrawler|GPTBot|Google-Extended|GoogleOther|GoogleOther-Image|GoogleOther-Video|ICC-Crawler|ISSCyberRiskCrawler|ImagesiftBot|Kangaroo Bot|Meta-ExternalAgent|Meta-ExternalFetcher|OAI-SearchBot|PerplexityBot|PetalBot|Scrapy|Sidetrade indexer bot|Timpibot|VelenPublicWebCrawler|Webzio-Extended|YouBot|anthropic-ai|cohere-ai|facebookexternalhit|iaskspider/2.0|img2dataset|omgili|omgilibot) ) { return 404; } ``` - **Toggle per vhost "Block commonly probed URLs"** _([i] return 403 to many commonly probed URLs, Warning! this will break default wordpress, joomla and possibly some other software)_ * return a 403 to a bunch of URLs typically scanned by dirbusters, script kiddies, bots. As to allow fail2ban to enforce IP level ban in case of excessive 403, a particularly effective technique. The last one could probably be merged with the current "Block Common Exploits" - "//xmlrpc.php?rsd" - "//website/wp-includes/wlwmanifest.xml" - "//wp/wp-includes/wlwmanifest.xml" - "//blog/wp-includes/wlwmanifest.xml" - "//wp-includes/wlwmanifest.xml" - "/FileZilla.xml" - "/filezilla.xml" - "/sitemanager.xml" - "/WS_FTP.INI" - "/ws_ftp.ini" - "/deployment-config.json" - "/config/database.yml" - "/config/databases.yml" - "/lfm.php" - "/sqlite/main.php" - "/sqlitemanager/main.php" - "/SQLiteManager/main.php" - "/SQlite/main.php" - "/agSearch/SQlite/main.php" - "/HNAP1/" - "/getcfg.php" - "/jenkins/script" - "/mysqldumper/" - "/mysql/" - "/sql/" - "/phpMyAdmin-*" - "/phpMyAdmin-*" - "/hudson/script" - "/Joomla/administrator/" - "/joomla/administrator/" - "/status?full=true" - "/admin.php" - "/admin/login.php" - "/administrator/index.php" - "/ajaxproxy/proxy.php" - "/magmi/web/magmi.php" - "/wp-login.php" - "/dev/wp-admin/" - "/demo/wp-admin/" - "/backup/wp-admin/" - "/old/wp-admin/" - "/wp/wp-admin/" - "/new/wp-admin/" - "/wordpress/wp-admin/" - "/temp/wp-admin/" - "/blog/xmlrpc.php" - "/.git/" - "/.git/HEAD" - "//webconfig.txt.php" - "//administrator//webconfig.txt.php" - "///webconfig.txt.php" - "/bogusSkipfish-Inject:bogus" - "/sfi9876" - "/adminer-*" - "/.htaccess*" - "/+/skipfish-bom" - "/HoHTXVlJ*" - "/joomla/" - "/.env" - "/xmlrpc.php" - "/phpinfo.php" - "/phpsysinfo/" - "/phpmyadmin/" - "/login.php" - "/config.php" - "/config/" - "/data/" - "/lib/" - "/library/" - "/cgi/" - "/cgi.cgi/" - "/bin/" - "/phpMyAdmin/" - "/admin/cgi" - "/piwik/" - "/magento/" - "/cgi-bin/" - "/adm/" - "/administrator/" - "/3rdparty/phpmyadmin/" - "/pma/" - "/ownCloud/" - "/cms/" - "/index.pl" - "/index.cgi" - "/test/" - "/wordpress/" - "/cms/" - "index.asp" - "index.aspx" - "/index.action" - "/login.action" - "/manager/" - "/mantis/" - "/mantisbt/" - "/info.php" - "/info_php.php" - "/test.php" - "/admin.cgi" - "/login.pl" - "/data/owncloud.log" - "/data/owncloud.db" - "/.htpasswd" - "/.passwd" - "/private/" - "/phpBB/" - "/postnuke/" - "//wp-admin/admin-post.php*" - "//user/register/*" - "//wp-admin/admin-post.php*" - "/fckeditor/editor/filemanager/*" - "/shop/index.php/admin/" - "/store/index.php/admin/" - "/magento/index.php/admin/" - "/downloader/index.php" - "/errors/503.php" - "/shop/errors/503.php" - "/store/errors/503.php" - "/pub/errors/503.php" - "/magento2/pub/errors/503.php" - "/backup/admin.php" - "/backup/shell.php" - "/backup/shell.aspx" - "/backup/admin.aspx" - "/backup/admin.pl" - "/backup/admin.py" - "/.git/config"
Author
Owner

@kramttocs commented on GitHub (Feb 20, 2025):

Hey @AnnoyingTechnology
It would be great if this stuff could be added.
In the meantime, do you know if things like the Block AI bit can be added as-is to the Custom Nginx Configuration block for each proxy host? I haven't messed with that box so not clear on what it accepts.

if ($http_user_agent ~* (AI2Bot|Ai2Bot-Dolma|Amazonbot|Applebot|Applebot-Extended|Bytespider|CCBot|ChatGPT-User|Claude-Web|ClaudeBot|Diffbot|FacebookBot|FriendlyCrawler|GPTBot|Google-Extended|GoogleOther|GoogleOther-Image|GoogleOther-Video|ICC-Crawler|ISSCyberRiskCrawler|ImagesiftBot|Kangaroo Bot|Meta-ExternalAgent|Meta-ExternalFetcher|OAI-SearchBot|PerplexityBot|PetalBot|Scrapy|Sidetrade indexer bot|Timpibot|VelenPublicWebCrawler|Webzio-Extended|YouBot|anthropic-ai|cohere-ai|facebookexternalhit|iaskspider/2.0|img2dataset|omgili|omgilibot) ) {
        return 404;
}

<!-- gh-comment-id:2670306206 --> @kramttocs commented on GitHub (Feb 20, 2025): Hey @AnnoyingTechnology It would be great if this stuff could be added. In the meantime, do you know if things like the Block AI bit can be added as-is to the Custom Nginx Configuration block for each proxy host? I haven't messed with that box so not clear on what it accepts. ``` if ($http_user_agent ~* (AI2Bot|Ai2Bot-Dolma|Amazonbot|Applebot|Applebot-Extended|Bytespider|CCBot|ChatGPT-User|Claude-Web|ClaudeBot|Diffbot|FacebookBot|FriendlyCrawler|GPTBot|Google-Extended|GoogleOther|GoogleOther-Image|GoogleOther-Video|ICC-Crawler|ISSCyberRiskCrawler|ImagesiftBot|Kangaroo Bot|Meta-ExternalAgent|Meta-ExternalFetcher|OAI-SearchBot|PerplexityBot|PetalBot|Scrapy|Sidetrade indexer bot|Timpibot|VelenPublicWebCrawler|Webzio-Extended|YouBot|anthropic-ai|cohere-ai|facebookexternalhit|iaskspider/2.0|img2dataset|omgili|omgilibot) ) { return 404; } ```
Author
Owner

@github-actions[bot] commented on GitHub (Aug 28, 2025):

Issue is now considered stale. If you want to keep it open, please comment 👍

<!-- gh-comment-id:3231219516 --> @github-actions[bot] commented on GitHub (Aug 28, 2025): Issue is now considered stale. If you want to keep it open, please comment :+1:
Author
Owner

@AnnoyingTechnology commented on GitHub (Aug 28, 2025):

not stale.

<!-- gh-comment-id:3231877726 --> @AnnoyingTechnology commented on GitHub (Aug 28, 2025): not stale.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/nginx-proxy-manager-NginxProxyManager#2633
No description provided.