Bad bots ...

... with a cavalier attitude to robots.txt

Note to errant botmeisters: bots that observe the robots.txt standard should not crawl sites where the file contains a 'deny all' setting, unless there is a prior 'allow' setting for that bot. A 'deny all' setting is consecutive lines 'User-agent: *' & 'Disallow: /'; an 'allow' setting is consecutive lines 'User-agent: Thatbot' & 'Disallow:'.

Webmasters are not required to individually block the myriad unwanted bots. RTFM!

Some bots are also unwanted, for the reason given; 'SetEnvIf User-Agent' rules return 'Not Allowed'.

Last Updated: 2020.06.05 - View As Text File

AhrefsBot/6.1

UA string: Mozilla/5.0 (compatible; AhrefsBot/6.1; +http://ahrefs.com/robot/)

Website: ahrefs.com/robot/

Unwanted: serves commercial clients.

AlphaBot/3.2

UA string: Mozilla/5.0 (compatible; AlphaBot/3.2; +http://alphaseobot.com/bot.html)

Website: alphaseobot.com

Doesn't ask for robots.txt, so can't possibly observe it!

Applebot/0.1

UA string: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Applebot/0.1; +http://www.apple.com/go/applebot)

Website: www.apple.com/go/applebot

Asks for robots.txt, but fails to observe it!

AwarioSmartBot/1.0

UA string: AwarioSmartBot/1.0 (+https://awario.com/bots.html; bots@awario.com)

Website: awario.com/bots.html

Doesn't ask for robots.txt, so can't possibly observe it!

Unwanted: serves commercial clients.

BingPreview/1.0b

UA string: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko) BingPreview/1.0b

Website: None

Doesn't ask for robots.txt, so can't possibly observe it!

BLEXBot/1.0

UA string: Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/)

Website: webmeup-crawler.com

FFS! Website is insecure, served over http!

Doesn't ask for robots.txt, so can't possibly observe it!

Unwanted: serves commercial clients.

BorneoBot/0.7.1

UA string: BorneoBot/0.7.1 (crawlcheck123@gmail.com)

Website: None

Asks for robots.txt, but fails to observe it!

coccocbot-image/1.0

UA string: Mozilla/5.0 (compatible; coccocbot-image/1.0; +http://help.coccoc.com/searchengine)

Website: http://help.coccoc.com/searchengine

Asks for robots.txt, but fails to observe it!

com.tinyspeck.chatlyio

UA string: com.tinyspeck.chatlyio/20.04.20 (iPhone; iOS 13.4.1; Scale/3.00)

Website: None

Doesn't ask for robots.txt, so can't possibly observe it!

Unwanted: 'Slack' app for iOS; serves commercial clients.

DuckDuckBot-Https/1.1

UA string: 'Mozilla/5.0 (compatible; DuckDuckBot-Https/1.1; https://duckduckgo.com/duckduckbot)'

Website: duckduckgo.com/duckduckbot

Doesn't ask for robots.txt, so can't possibly observe it!

e.ventures Investment Crawler

UA string: e.ventures Investment Crawler (eventures.vc)

Website: None.

Doesn't ask for robots.txt, so can't possibly observe it!

Unwanted: presumably serves commercial clients.

GarlikCrawler/1.2

UA string: GarlikCrawler/1.2 (http://garlik.com/, crawler@garlik.com)

Website: garlik.com

Doesn't ask for robots.txt, so can't possibly observe it!

Unwanted: serves commercial clients.

Gather Analyze Provide

UA string: https://gdnplus.com:Gather Analyze Provide.

Website: gdnplus.com

Doesn't ask for robots.txt, so can't possibly observe it!

Unwanted: apparently serves commercial clients.

Gluten Free Crawler/1.0

UA string: Mozilla/5.0 (compatible; Gluten Free Crawler/1.0; +http://glutenfreepleasure.com/)

Website: glutenfreepleasure.com

Doesn't ask for robots.txt, so can't possibly observe it!

Googlebot-Image/1.0

UA string: Googlebot-Image/1.0

Website: None

Doesn't ask for robots.txt, so can't possibly observe it!

HealthCheckBot/0.2

UA string: HealthCheckBot/0.2

Website: None specified, though a search reveals pypi.org/project/healthcheckbot as probably relevant.

Doesn't ask for robots.txt, so can't possibly observe it!

Unwanted: it seems to have no business crawling third-party sites.

UA string: HTTP Banner Detection (https://security.ipip.net)

Website: security.ipip.net

Website address appears invalid.

Doesn't ask for robots.txt, so can't possibly observe it!

Unwanted: until it provides a valid website address!

UA string: HubSpot Links Crawler 2.0 http://www.hubspot.com/

Website: www.hubspot.com

Unwanted: serves commercial clients.

hypestat/1.0

UA string: Mozilla/5.0 (compatible; hypestat/1.0; +https://hypestat.com/bot)

Website: hypestat.com/bot

Doesn't ask for robots.txt, so can't possibly observe it!

Internet-structure-research-project-bot

UA string: Internet-structure-research-project-bot

Website: None

Doesn't ask for robots.txt, so can't possibly observe it!

iodc; odysseus

UA string: Mozilla/5.0 (iodc; odysseus 3352-131-011119113358-349; +https://iodc.co.uk)

Website: iodc.co.uk

Doesn't ask for robots.txt, so can't possibly observe it!

ips-agent

UA string: Mozilla/5.0 (compatible; ips-agent)

Website: None

Apparently used by Verisign (who run the .com and .net domain name servers) to assess traffic on domains known by them to be expiring, using this data to help sell potentially valuable 'busy' domains to bulk buyers at other registrars.

Asks for robots.txt, but fails to observe it!

LightspeedSystemsCrawler

UA string: LightspeedSystemsCrawler Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US

Website: None

Doesn't ask for robots.txt, so can't possibly observe it!

linkdexbot/2.2

UA string: Mozilla/5.0 (compatible; linkdexbot/2.2; +http://www.linkdex.com/bots/)

Website: www.linkdex.com/bots

Asks for robots.txt, but fails to observe it!

Mail.RU_Bot/Robots/2.0

UA string: Mozilla/5.0 (compatible; Linux x86_64; Mail.RU_Bot/Robots/2.0; +http://go.mail.ru/help/robots)

Website: go.mail.ru/help/robots

Asks for robots.txt, but fails to observe it!

masscan 1.0

UA string: masscan 1.0 (http:www)

Website: None

Doesn't ask for robots.txt, so can't possibly observe it!

Unwanted: until it provides a valid website address!

masscan/1.0

UA string: masscan/1.0 (https://github.com/robertdavidgraham/masscan)

Website: github.com/robertdavidgraham/masscan

Doesn't ask for robots.txt, so can't possibly observe it!

Unwanted: promiscuous port scanner, more likely used for harm than good.

MegaIndex.ru/2.0

UA string: 5.9.98.178 Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)

Website: megaindex.com/crawler

Doesn't ask for robots.txt, so can't possibly observe it!

MJ12bot/v1.4.8

UA string: Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/)

Website: mj12bot.com

Unwanted: serves commercial clients, for marketing purposes.

msnbot/2.0b

UA string: msnbot/2.0b (+http://search.msn.com/msnbot.htm)

Website: search.msn.com/msnbot.htm

Doesn't ask for robots.txt, so can't possibly observe it!

NetcraftSurveyAgent/1.0

UA string: Mozilla/5.0 (compatible; NetcraftSurveyAgent/1.0; +info@netcraft.com)

Website: netcraft.com

Doesn't ask for robots.txt, so can't possibly observe it!

netEstate NE Crawler

UA string: netEstate NE Crawler (+http://www.website-datenbank.de/)

Website: www.website-datenbank.de/

Asks for robots.txt, but fails to observe it!

NetSystemsResearch

UA string: NetSystemsResearch studies the availability of various services across the internet. Our website is netsystemsresearch.com

Website: netsystemsresearch.com

FFS! Website is insecure, served over http! Oh, the irony, for a 'security research organisation'!

Doesn't ask for robots.txt, so can't possibly observe it!

Unwanted: until they start serving their website over https!

Nimbostratus-Bot/v1.3.2

UA string: Mozilla/5.0 (compatible; Nimbostratus-Bot/v1.3.2; http://cloudsystemnetworks.com)

Website: cloudsystemnetworks.com

FFS! Website is insecure, served over http!

Doesn't ask for robots.txt, so can't possibly observe it!

nsrbot/1.0

UA string: Mozilla/5.0 (compatible; nsrbot/1.0; +http://netsystemsresearch.com)

Website: netsystemsresearch.com

FFS! Website is insecure, served over http!

Doesn't ask for robots.txt, so can't possibly observe it!

oBot/2.3.1

UA string: Mozilla/5.0 (compatible; oBot/2.3.1; http://filterdb.iss.net/crawler/)

Website: filterdb.iss.net/crawler

FFS! Website is insecure, served over http!

Asks for robots.txt, but fails to observe it!

PetalBot

UA string: Mozilla/5.0 (Linux; Android 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://aspiegel.com/petalbot)

Website: aspiegel.com/petalbot

Asks for robots.txt, but never goes further, even when allowed.

Unwanted: clearly so poorly coded that it isn't doing its ostensible job, but cluttering up server logs.

Plukkie/1.6

UA string: Mozilla/5.0 (compatible; Plukkie/1.6; http://www.botje.com/plukkie.htm)

Website: www.botje.com/plukkie.htm

FFS! Website is insecure, served over http!

Asks for robots.txt, but fails to observe it!

probethenet

UA string: www.probethenet.com scanner

Website: www.probethenet.com

FFS! Website is insecure, served over http!

Doesn't ask for robots.txt, so can't possibly observe it!

Riddler

UA string: Riddler (http://riddler.io/about)

Website: riddler.io/about

Asks for robots.txt, but fails to observe it!

SafeDNSBot

UA string: SafeDNSBot (https://www.safedns.com/searchbot)

Website: www.safedns.com/searchbot

Asks for robots.txt, but fails to observe it!

Seekport Crawler

UA string: Mozilla/5.0 (compatible; Seekport Crawler; http://seekport.com/

Website: seekport.com

FFS! Website is insecure, served over http!

Asks for robots.txt, but fails to observe it!

SemanticScholarBot

UA string: Mozilla/5.0 (compatible) SemanticScholarBot (+https://www.semanticscholar.org/crawler)

Website: www.semanticscholar.org/crawler

Doesn't ask for robots.txt, so can't possibly observe it!

SEMrushBot

UA string: SEMrushBot

Website: None.

Doesn't ask for robots.txt, so can't possibly observe it!

Unwanted: assumed to serve commercial clients.

Semrushbot-BA

UA string: Mozilla/5.0 (compatible; SemrushBot-BA; +http://www.semrush.com/bot.html)

Website: www.semrush.com/bot.html

Doesn't ask for robots.txt, so can't possibly observe it!

Unwanted: serves commercial clients.

SemrushBot/1.0~bm

UA string: Mozilla/5.0 (compatible; SemrushBot/1.0~bm; +http://www.semrush.com/bot.html)

Website: www.semrush.com/bot.html

Doesn't ask for robots.txt, so can't possibly observe it!

Unwanted: serves commercial clients.

SemrushBot/6~bl

UA string: Mozilla/5.0 (compatible; SemrushBot/6~bl; +http://www.semrush.com/bot.html)

Website: www.semrush.com/bot.html

Asks for robots.txt, but fails to observe it!

Unwanted: serves commercial clients.

seocompany

UA string: Mozilla/5.0 (compatible; adscanner/)/1.1 (http://seocompany.store; spider@seocompany.store)

Website: seocompany.store

FFS! Website is insecure, served over http! Address also appears invalid.

Unwanted: observes robots.txt standard, but serves commercial clients.

SEOkicks-Robot

UA string: Mozilla/5.0 (compatible; SEOkicks-Robot; +http://www.seokicks.de/robot.html)

Website: www.seokicks.de/robot.html

Doesn't ask for robots.txt, so can't possibly observe it!

serpstatbot/1.0

UA string: serpstatbot/1.0 (advanced backlink tracking bot; curl/7.58.0; http://serpstatbot.com/; abuse@serpstatbot.com)

Website: serpstatbot.com

Unwanted: serves commercial clients.

Slack-ImgProxy

UA string: Slack-ImgProxy (+https://api.slack.com/robots)

Website: api.slack.com/robots

Explicitly doesn't ask for robots.txt, so can't possibly observe it!

Unwanted: serves commercial clients.

Slackbot 1.0

UA string: Slackbot 1.0 (+https://api.slack.com/robots)

Website: api.slack.com/robots

Explicitly doesn't ask for robots.txt, so can't possibly observe it!

Unwanted: serves commercial clients.

Slackbot-LinkExpanding 1.0

UA string: Slackbot-LinkExpanding 1.0 (+https://api.slack.com/robots)

Website: api.slack.com/robots

Explicitly doesn't ask for robots.txt, so can't possibly observe it!

Unwanted: serves commercial clients.

Sogou web spider/4.0

UA string: Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)

Website: www.sogou.com/docs/help/webmasters.htm#07

FFS! Website is insecure, served over http! Address also appears invalid.

Suddenly started asking for robots.txt, and appears to be observing it!

Will not be added to robots.txt until it cleans up its act, and provides a valid https address for more info.

startmebot/1.0

UA string: Mozilla/5.0 (compatible; startmebot/1.0; +https://start.me/bot)

Website: start.me/bot

Explicitly doesn't ask for robots.txt, so can't possibly observe it!

SurdotlyBot/1.0

UA string: Mozilla/5.0 (compatible; SurdotlyBot/1.0; +http://sur.ly/bot.html

Website: sur.ly/bot.html

FFS! Website is insecure, served over http!

Doesn't ask for robots.txt, so can't possibly observe it!

TelegramBot

UA string: TelegramBot (like TwitterBot)

Website: None.

Doesn't ask for robots.txt, so can't possibly observe it! (So, not like Twitterbot/1.0!)

Unwanted: until it provides a valid website address!

TprAdsTxtCrawler/1.0

UA string: TprAdsTxtCrawler/1.0

Website: None

Started appearing in my server logs 19.04.2020, with HEAD request for 'ads.txt'.

Doesn't ask for robots.txt, so can't possibly observe it!

Unwanted: until it provides a source of information about itself.

TrendsmapResolver/0.1

UA string: Mozilla/5.0 (compatible; TrendsmapResolver/0.1)

Website: None.

Doesn't ask for robots.txt, so can't possibly observe it!

TweetmemeBot/4.0

UA string: Mozilla/5.0 (TweetmemeBot/4.0; +http://datasift.com/bot.html) Gecko/20100101 Firefox/31.0

Website: datasift.com/bot.html

Explicitly doesn't ask for robots.txt, so can't possibly observe it!

Unwanted: serves commercial clients.

unfurlist

UA string: unfurlist (https://github.com/Doist/unfurlist)

Website: github.com/Doist/unfurlist

Doesn't ask for robots.txt, so can't possibly observe it!

Xovibot/2.0

UA string: Mozilla/5.0 (compatible; XoviBot/2.0; +http://www.xovibot.net/

Website: www.xovibot.net/

FFS! Website is insecure, served over http!

Doesn't ask for robots.txt, so can't possibly observe it!

Yahoo! Slurp

UA string: Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)

Website: help.yahoo.com/help/us/ysearch/slurp

Doesn't ask for robots.txt, so can't possibly observe it!