Apache屏蔽爬虫yisouspider访问站点方法
时间:2023年06月04日
/来源:网络
/编辑:佚名
对于Apache和PHP代码屏蔽yisouspider的办法没有亲自测试,本站只采用了Nginx屏蔽yisouspider的办法,所以如果采用其他方法遇到问题的请留言求助。
Apache屏蔽爬虫yisouspider访问站点方法
1、通过修改 .htaccess文件
修改网站目录下的.htaccess,添加如下代码即可(2种代码任选):
可用代码 (1):
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (^$|yisouspider|FeedDemon|Indy Library|Alexa Toolbar|AskTbFXTV|AhrefsBot|CrawlDaddy|CoolpadWebkit|Java|Feedly|UniversalFeedParser|ApacheBench|Microsoft URL Control|Swiftbot|ZmEu|oBot|jaunty|Python-urllib|lightDeckReports Bot|YYSpider|DigExt|HttpClient|MJ12bot|heritrix|EasouSpider|Ezooms) [NC]
RewriteRule ^(.*)$ - [F]
可用代码 (2):
SetEnvIfNoCase ^User-Agent$ .*(yisouspider|FeedDemon|Indy Library|Alexa Toolbar|AskTbFXTV|AhrefsBot|CrawlDaddy|CoolpadWebkit|Java|Feedly|UniversalFeedParser|ApacheBench|Microsoft URL Control|Swiftbot|ZmEu|oBot|jaunty|Python-urllib|lightDeckReports Bot|YYSpider|DigExt|HttpClient|MJ12bot|heritrix|EasouSpider|Ezooms) BADBOT
Order Allow,Deny
Allow from all
Deny from env=BADBOT
2、通过修改httpd.conf配置文件
找到如下类似位置,根据以下代码 新增 / 修改,然后重启Apache即可:
DocumentRoot /home/wwwroot/xxx
<Directory "/home/wwwroot/xxx">
SetEnvIfNoCase User-Agent ".*(yisouspider|FeedDemon|Indy Library|Alexa Toolbar|AskTbFXTV|AhrefsBot|CrawlDaddy|CoolpadWebkit|Java|Feedly|UniversalFeedParser|ApacheBench|Microsoft URL Control|Swiftbot|ZmEu|oBot|jaunty|Python-urllib|lightDeckReports Bot|YYSpider|DigExt|HttpClient|MJ12bot|heritrix|EasouSpider|Ezooms)" BADBOT
Order allow,deny
Allow from all
deny from env=BADBOT
</Directory>
以上是Apache屏蔽爬虫yisouspider访问站点方法
Apache屏蔽爬虫yisouspider访问站点方法
1、通过修改 .htaccess文件
修改网站目录下的.htaccess,添加如下代码即可(2种代码任选):
可用代码 (1):
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (^$|yisouspider|FeedDemon|Indy Library|Alexa Toolbar|AskTbFXTV|AhrefsBot|CrawlDaddy|CoolpadWebkit|Java|Feedly|UniversalFeedParser|ApacheBench|Microsoft URL Control|Swiftbot|ZmEu|oBot|jaunty|Python-urllib|lightDeckReports Bot|YYSpider|DigExt|HttpClient|MJ12bot|heritrix|EasouSpider|Ezooms) [NC]
RewriteRule ^(.*)$ - [F]
可用代码 (2):
SetEnvIfNoCase ^User-Agent$ .*(yisouspider|FeedDemon|Indy Library|Alexa Toolbar|AskTbFXTV|AhrefsBot|CrawlDaddy|CoolpadWebkit|Java|Feedly|UniversalFeedParser|ApacheBench|Microsoft URL Control|Swiftbot|ZmEu|oBot|jaunty|Python-urllib|lightDeckReports Bot|YYSpider|DigExt|HttpClient|MJ12bot|heritrix|EasouSpider|Ezooms) BADBOT
Order Allow,Deny
Allow from all
Deny from env=BADBOT
2、通过修改httpd.conf配置文件
找到如下类似位置,根据以下代码 新增 / 修改,然后重启Apache即可:
DocumentRoot /home/wwwroot/xxx
<Directory "/home/wwwroot/xxx">
SetEnvIfNoCase User-Agent ".*(yisouspider|FeedDemon|Indy Library|Alexa Toolbar|AskTbFXTV|AhrefsBot|CrawlDaddy|CoolpadWebkit|Java|Feedly|UniversalFeedParser|ApacheBench|Microsoft URL Control|Swiftbot|ZmEu|oBot|jaunty|Python-urllib|lightDeckReports Bot|YYSpider|DigExt|HttpClient|MJ12bot|heritrix|EasouSpider|Ezooms)" BADBOT
Order allow,deny
Allow from all
deny from env=BADBOT
</Directory>
以上是Apache屏蔽爬虫yisouspider访问站点方法
新闻资讯 更多
- 【建站知识】查询nginx日志状态码大于400的请求并打印整行04-03
- 【建站知识】Python中的logger和handler到底是个什么?04-03
- 【建站知识】python3拉勾网爬虫之(您操作太频繁,请稍后访问)04-03
- 【建站知识】xpath 获取meta里的keywords及description的方法04-03
- 【建站知识】python向上取整以50为界04-03
- 【建站知识】scrapy xpath遇见乱码解决04-03
- 【建站知识】scrapy爬取后中文乱码,解决word转为html 时cp1252编码问题04-03
- 【建站知识】scrapy采集—爬取中文乱码,gb2312转为utf-804-03