Bad bots ...
... with a cavalier attitude to robots.txt
Note to errant botmeisters: bots that observe the robots.txt standard should not crawl sites where the file contains a 'deny all' setting, unless there is a prior 'allow' setting for that bot. A 'deny all' setting is consecutive lines 'User-agent: *
' & 'Disallow: /
'; an 'allow' setting is consecutive lines 'User-agent: Thatbot
' & 'Disallow:
'.
Webmasters are not required to individually block the myriad unwanted bots. RTFM!
Some bots are also unwanted, for the reason given; 'SetEnvIf User-Agent' rules return 'Not Allowed'.
Last Updated: 2021.01.14 - View As Text File
adscanner
UA string: Mozilla/5.0 (compatible; adscanner/)/1.0 (http://seocompany.store; spider@seocompany.store)
Website: seocompany.store
FFS! Website is insecure, served over http! It serves no visible content. It appears to be connected to 'GoDaddy', the domain registrar; it tries to load javascript from godaddy.com.
Unwanted: observes robots.txt standard, but assumed to serve commercial clients.
AhrefsBot/6.1
UA string: Mozilla/5.0 (compatible; AhrefsBot/6.1; +http://ahrefs.com/robot/)
Website: ahrefs.com/robot/
Unwanted: serves commercial clients.
AlphaBot/3.2
UA string: Mozilla/5.0 (compatible; AlphaBot/3.2; +http://alphaseobot.com/bot.html)
Website: alphaseobot.com
Doesn't ask for robots.txt, so can't possibly observe it!
Applebot/0.1
UA string: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Applebot/0.1; +http://www.apple.com/go/applebot)
Website: www.apple.com/go/applebot
Asks for robots.txt, but fails to observe it!
AwarioSmartBot/1.0
UA string: AwarioSmartBot/1.0 (+https://awario.com/bots.html; bots@awario.com)
Website: awario.com/bots.html
Doesn't ask for robots.txt, so can't possibly observe it!
Unwanted: serves commercial clients.
Baiduspider-render
UA string: Mozilla/5.0 (compatible; Baiduspider-render/2.0; +http://www.baidu.com/search/spider.html)
Website: www.baidu.com/search/spider.html
FFS! Website is insecure, served over http!
Doesn't ask for robots.txt, so can't possibly observe it!
Baiduspider/2.0
UA string: Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)
Website: www.baidu.com/search/spider.html
FFS! Website is insecure, served over http!
Doesn't ask for robots.txt, so can't possibly observe it!
BingPreview/1.0b
UA string: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko) BingPreview/1.0b
Website: None
Doesn't ask for robots.txt, so can't possibly observe it!
BLEXBot/1.0
UA string: Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/)
Website: webmeup-crawler.com
FFS! Website is insecure, served over http!
Doesn't ask for robots.txt, so can't possibly observe it!
Unwanted: serves commercial clients.
BorneoBot/0.7.1
UA string: BorneoBot/0.7.1 (crawlcheck123@gmail.com)
Website: None
Asks for robots.txt, but fails to observe it!
coccocbot-image/1.0
UA string: Mozilla/5.0 (compatible; coccocbot-image/1.0; +http://help.coccoc.com/searchengine)
Website: http://help.coccoc.com/searchengine
Asks for robots.txt, but fails to observe it!
com.tinyspeck.chatlyio
UA string: com.tinyspeck.chatlyio/20.04.20 (iPhone; iOS 13.4.1; Scale/3.00)
Website: None
Doesn't ask for robots.txt, so can't possibly observe it!
Unwanted: 'Slack' app for iOS; serves commercial clients.
DuckDuckBot-Https/1.1
UA string: 'Mozilla/5.0 (compatible; DuckDuckBot-Https/1.1; https://duckduckgo.com/duckduckbot)'
Website: duckduckgo.com/duckduckbot
Doesn't ask for robots.txt, so can't possibly observe it!
e.ventures Investment Crawler
UA string: e.ventures Investment Crawler (eventures.vc)
Website: None.
Doesn't ask for robots.txt, so can't possibly observe it!
Unwanted: presumably serves commercial clients.
GarlikCrawler/1.2
UA string: GarlikCrawler/1.2 (http://garlik.com/, crawler@garlik.com)
Website: garlik.com
Doesn't ask for robots.txt, so can't possibly observe it!
Unwanted: serves commercial clients.
Gather Analyze Provide
UA string: https://gdnplus.com:Gather Analyze Provide.
Website: gdnplus.com
Doesn't ask for robots.txt, so can't possibly observe it!
Unwanted: apparently serves commercial clients.
Gluten Free Crawler/1.0
UA string: Mozilla/5.0 (compatible; Gluten Free Crawler/1.0; +http://glutenfreepleasure.com/)
Website: glutenfreepleasure.com
Doesn't ask for robots.txt, so can't possibly observe it!
Googlebot-Image/1.0
UA string: Googlebot-Image/1.0
Website: None
Doesn't ask for robots.txt, so can't possibly observe it!
HealthCheckBot/0.2
UA string: HealthCheckBot/0.2
Website: None specified, though a search reveals pypi.org/project/healthcheckbot as probably relevant.
Doesn't ask for robots.txt, so can't possibly observe it!
Unwanted: it seems to have no business crawling third-party sites.
HTTP Banner Detection
UA string: HTTP Banner Detection (https://security.ipip.net)
Website: security.ipip.net
Website address appears invalid.
Doesn't ask for robots.txt, so can't possibly observe it!
Unwanted: until it provides a valid website address!
HubSpot Links Crawler 2.0
UA string: HubSpot Links Crawler 2.0 http://www.hubspot.com/
Website: www.hubspot.com
Unwanted: serves commercial clients.
hypestat/1.0
UA string: Mozilla/5.0 (compatible; hypestat/1.0; +https://hypestat.com/bot)
Website: hypestat.com/bot
Doesn't ask for robots.txt, so can't possibly observe it!
Internet-structure-research-project-bot
UA string: Internet-structure-research-project-bot
Website: None
Doesn't ask for robots.txt, so can't possibly observe it!
Unwanted until it provides more information about its purpose.
iodc; odysseus
UA string: Mozilla/5.0 (iodc; odysseus 3352-131-011119113358-349; +https://iodc.co.uk)
Website: iodc.co.uk
Doesn't ask for robots.txt, so can't possibly observe it!
ips-agent
UA string: Mozilla/5.0 (compatible; ips-agent)
Website: None
Apparently used by Verisign (who run the .com and .net domain name servers) to assess traffic on domains known by them to be expiring, using this data to help sell potentially valuable 'busy' domains to bulk buyers at other registrars.
Asks for robots.txt, but fails to observe it!
LightspeedSystemsCrawler
UA string: LightspeedSystemsCrawler Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US
Website: None
Doesn't ask for robots.txt, so can't possibly observe it!
linkdexbot/2.0
UA string: Mozilla/5.0 (compatible; linkdexbot/2.0; +http://www.linkdex.com/bots/)
Website: www.linkdex.com/bots/
Doesn't ask for robots.txt, so can't possibly observe it!
linkdexbot/2.2
UA string: Mozilla/5.0 (compatible; linkdexbot/2.2; +http://www.linkdex.com/bots/)
Website: www.linkdex.com/bots
Asks for robots.txt, but fails to observe it!
ltx71
UA string: ltx71 - (http://ltx71.com/)
Website: ltx71.com
FFS! Website is insecure, served over http!
Asks for robots.txt, but fails to observe it!
Mail.RU_Bot/Robots/2.0
UA string: Mozilla/5.0 (compatible; Linux x86_64; Mail.RU_Bot/Robots/2.0; +http://go.mail.ru/help/robots)
Website: go.mail.ru/help/robots
Asks for robots.txt, but fails to observe it!
masscan 1.0
UA string: masscan 1.0 (http:www)
Website: None
Doesn't ask for robots.txt, so can't possibly observe it!
Unwanted: until it provides a valid website address!
masscan/1.0
UA string: masscan/1.0 (https://github.com/robertdavidgraham/masscan)
Website: github.com/robertdavidgraham/masscan
Doesn't ask for robots.txt, so can't possibly observe it!
Unwanted: promiscuous port scanner, more likely used for harm than good.
MegaIndex.ru/2.0
UA string: 5.9.98.178 Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)
Website: megaindex.com/crawler
Doesn't ask for robots.txt, so can't possibly observe it!
MJ12bot/v1.4.8
UA string: Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/)
Website: mj12bot.com
Unwanted: serves commercial clients, for marketing purposes.
msnbot/2.0b
UA string: msnbot/2.0b (+http://search.msn.com/msnbot.htm)
Website: search.msn.com/msnbot.htm
Doesn't ask for robots.txt, so can't possibly observe it!
NetSystemsResearch
UA string: NetSystemsResearch studies the availability of various services across the internet. Our website is netsystemsresearch.com
Website: netsystemsresearch.com
Website address appears invalid.
Doesn't ask for robots.txt, so can't possibly observe it!
Nimbostratus-Bot/v1.3.2
UA string: Mozilla/5.0 (compatible; Nimbostratus-Bot/v1.3.2; http://cloudsystemnetworks.com)
Website: cloudsystemnetworks.com
FFS! Website is insecure, served over http!
Doesn't ask for robots.txt, so can't possibly observe it!
Nmap Scripting Engine
UA string: Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)
Website: nmap.org/book/nse.html
Doesn't ask for robots.txt, so can't possibly observe it!
nsrbot/1.0
UA string: Mozilla/5.0 (compatible; nsrbot/1.0; +http://netsystemsresearch.com)
Website: netsystemsresearch.com
FFS! Website is insecure, served over http!
Doesn't ask for robots.txt, so can't possibly observe it!
oBot/2.3.1
UA string: Mozilla/5.0 (compatible; oBot/2.3.1; http://filterdb.iss.net/crawler/)
Website: filterdb.iss.net/crawler
FFS! Website is insecure, served over http!
Asks for robots.txt, but fails to observe it!
Plukkie/1.6
UA string: Mozilla/5.0 (compatible; Plukkie/1.6; http://www.botje.com/plukkie.htm)
Website: www.botje.com/plukkie.htm
FFS! Website is insecure, served over http!
Asks for robots.txt, but fails to observe it!
probethenet
UA string: www.probethenet.com scanner
Website: www.probethenet.com
FFS! Website is insecure, served over http!
Doesn't ask for robots.txt, so can't possibly observe it!
Riddler
UA string: Riddler (http://riddler.io/about)
Website: riddler.io/about
Asks for robots.txt, but fails to observe it!
SafeDNSBot
UA string: SafeDNSBot (https://www.safedns.com/searchbot)
Website: www.safedns.com/searchbot
Asks for robots.txt, but fails to observe it!
Seekport Crawler
UA string: Mozilla/5.0 (compatible; Seekport Crawler; http://seekport.com/
Website: seekport.com
FFS! Website is insecure, served over http!
Asks for robots.txt, but fails to observe it!
SemanticScholarBot
UA string: Mozilla/5.0 (compatible) SemanticScholarBot (+https://www.semanticscholar.org/crawler)
Website: www.semanticscholar.org/crawler
Doesn't ask for robots.txt, so can't possibly observe it!
SEMrushBot
UA string: SEMrushBot
Website: None.
Doesn't ask for robots.txt, so can't possibly observe it!
Unwanted: assumed to serve commercial clients.
SemrushBot-BA
UA string: Mozilla/5.0 (compatible; SemrushBot-BA; +http://www.semrush.com/bot.html)
Website: www.semrush.com/bot.html
Doesn't ask for robots.txt, so can't possibly observe it!
Unwanted: serves commercial clients.
SemrushBot/1.0~bm
UA string: Mozilla/5.0 (compatible; SemrushBot/1.0~bm; +http://www.semrush.com/bot.html)
Website: www.semrush.com/bot.html
Doesn't ask for robots.txt, so can't possibly observe it!
Unwanted: serves commercial clients.
SemrushBot/6~bl
UA string: Mozilla/5.0 (compatible; SemrushBot/6~bl; +http://www.semrush.com/bot.html)
Website: www.semrush.com/bot.html
Asks for robots.txt, but fails to observe it!
Unwanted: serves commercial clients.
SEOkicks-Robot
UA string: Mozilla/5.0 (compatible; SEOkicks-Robot; +http://www.seokicks.de/robot.html)
Website: www.seokicks.de/robot.html
Doesn't ask for robots.txt, so can't possibly observe it!
serpstatbot/1.0
UA string: serpstatbot/1.0 (advanced backlink tracking bot; curl/7.58.0; http://serpstatbot.com/; abuse@serpstatbot.com)
Website: serpstatbot.com
Unwanted: serves commercial clients.
Slack-ImgProxy
UA string: Slack-ImgProxy (+https://api.slack.com/robots)
Website: api.slack.com/robots
Explicitly doesn't ask for robots.txt, so can't possibly observe it!
Unwanted: serves commercial clients.
Slackbot 1.0
UA string: Slackbot 1.0 (+https://api.slack.com/robots)
Website: api.slack.com/robots
Explicitly doesn't ask for robots.txt, so can't possibly observe it!
Unwanted: serves commercial clients.
Slackbot-LinkExpanding 1.0
UA string: Slackbot-LinkExpanding 1.0 (+https://api.slack.com/robots)
Website: api.slack.com/robots
Explicitly doesn't ask for robots.txt, so can't possibly observe it!
Unwanted: serves commercial clients.
startmebot/1.0
UA string: Mozilla/5.0 (compatible; startmebot/1.0; +https://start.me/bot)
Website: start.me/bot
Explicitly doesn't ask for robots.txt, so can't possibly observe it!
SurdotlyBot/1.0
UA string: Mozilla/5.0 (compatible; SurdotlyBot/1.0; +http://sur.ly/bot.html
Website: sur.ly/bot.html
FFS! Website is insecure, served over http!
Doesn't ask for robots.txt, so can't possibly observe it!
TelegramBot
UA string: TelegramBot (like TwitterBot)
Website: None.
Doesn't ask for robots.txt, so can't possibly observe it! (So, not like Twitterbot/1.0!)
Unwanted: until it provides a valid website address!
TprAdsTxtCrawler/1.0
UA string: TprAdsTxtCrawler/1.0
Website: None
Started appearing in my server logs 19.04.2020, with HEAD request for 'ads.txt'.
Doesn't ask for robots.txt, so can't possibly observe it!
Unwanted: until it provides a source of information about itself.
TrendsmapResolver/0.1
UA string: Mozilla/5.0 (compatible; TrendsmapResolver/0.1)
Website: None.
Doesn't ask for robots.txt, so can't possibly observe it!
UA string: Mozilla/5.0 (TweetmemeBot/4.0; +http://datasift.com/bot.html) Gecko/20100101 Firefox/31.0
Website: datasift.com/bot.html
Explicitly doesn't ask for robots.txt, so can't possibly observe it!
Unwanted: serves commercial clients.
unfurlist
UA string: unfurlist (https://github.com/Doist/unfurlist)
Website: github.com/Doist/unfurlist
Doesn't ask for robots.txt, so can't possibly observe it!
Xovibot/2.0
UA string: Mozilla/5.0 (compatible; XoviBot/2.0; +http://www.xovibot.net/
Website: www.xovibot.net/
FFS! Website is insecure, served over http!
Doesn't ask for robots.txt, so can't possibly observe it!
Yahoo! Slurp
UA string: Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
Website: help.yahoo.com/help/us/ysearch/slurp
Doesn't ask for robots.txt, so can't possibly observe it!
YandexMobileBot/3.0
UA string: Mozilla/5.0 (iPhone; CPU iPhone OS 8_1 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B411 Safari/600.1.4 (compatible; YandexMobileBot/3.0; +http://yandex.com/bots)
Website: yandex.com/bots
Doesn't ask for robots.txt, so can't possibly observe it!