In the SEO world, one of the first things that marketeers do in order to push their offsite efforts is the analysis of the backlink profile of the competitor(s). There are several tools in the market, although the two best tools are Ahrefs and MajesticSEO. The majority of the rest of the tools have either smaller indexes than these too or base their data on these two indexes. Since data from Google Webmaster Tools is private and exclusive to the website owner, most SEOs base their competitor analysis on these two tools.
In order to reduce the visibility of a backlink profile to competitors, hiding as much data as possible from these two tools if vital if you want to keep a competitive advantage towards the competition.
Hiding backlinks from private networks
Google has been quite active lately on penalizing link networks like MyBlogGuest, PostJoint and recently Teliad, although link networks seems to still be working to rank fast websites. Glenn Allsopp explains in depth why he will keep maintaining his Private Blog Network (PBN) in order to rank his websites on Google.
Blocking crawlers on robots.txt file
So if you are running a PBN, you can easily block the crawlers of popular backlink analysis tools, by adding disallow rules on your robots.txt file.
User-agent: Rogerbot User-agent: Exabot User-agent: MJ12bot User-agent: Dotbot User-agent: Gigabot User-agent: AhrefsBot User-agent: BlackWidow User-agent: ChinaClaw User-agent: Custo User-agent: DISCo User-agent: Download\ Demon User-agent: eCatch User-agent: EirGrabber User-agent: EmailSiphon User-agent: EmailWolf User-agent: Express\ WebPictures User-agent: ExtractorPro User-agent: EyeNetIE User-agent: FlashGet User-agent: GetRight User-agent: GetWeb! User-agent: Go!Zilla User-agent: Go-Ahead-Got-It User-agent: GrabNet User-agent: Grafula User-agent: HMView User-agent: HTTrack User-agent: Image\ Stripper User-agent: Image\ Sucker User-agent: Indy\ Library User-agent: InterGET User-agent: Internet\ Ninja User-agent: JetCar User-agent: JOC\ Web\ Spider User-agent: larbin User-agent: LeechFTP User-agent: Mass\ Downloader User-agent: MIDown\ tool User-agent: Mister\ PiX User-agent: MJ12Bot User-agent: Navroad User-agent: NearSite User-agent: NetAnts User-agent: NetSpider User-agent: Net\ Vampire User-agent: NetZIP User-agent: Octopus User-agent: Offline\ Explorer User-agent: Offline\ Navigator User-agent: PageGrabber User-agent: Papa\ Foto User-agent: pavuk User-agent: pcBrowser User-agent: RealDownload User-agent: ReGet User-agent: SiteSnagger User-agent: SmartDownload User-agent: SuperBot User-agent: SuperHTTP User-agent: Surfbot User-agent: tAkeOut User-agent: Teleport\ Pro User-agent: VoidEYE User-agent: Web\ Image\ Collector User-agent: Web\ Sucker User-agent: WebAuto User-agent: WebCopier User-agent: WebFetch User-agent: WebGo\ IS User-agent: WebLeacher User-agent: WebReaper User-agent: WebSauger User-agent: Website\ eXtractor User-agent: Website\ Quester User-agent: WebStripper User-agent: WebWhacker User-agent: WebZIP User-agent: Wget User-agent: Widow User-agent: WWWOFFLE User-agent: Xaldon\ WebSpider User-agent: Zeus Disallow: /
The above example includes the rules that have to be added on the robots.txt file of all the websites of a Private Blog Network in order to block ahrefs, majestic and other crawlers. You can always analyse your web server log files in depth in order to see if you ‘re missing any user agent that should be added on the robots.txt file
Use .htaccess file to block crawlers
Since many bots / crawlers don’t respect the rules on a robots.txt file, you can securely block the access of specific crawlers from the .htaccess file of your web server.
The easy trick is to redirect the crawler to another website, before it starts crawling your own website.
RewriteEngine On RewriteBase / RewriteCond %{HTTP_USER_AGENT} .*MJ12.* [OR] RewriteCond %{HTTP_USER_AGENT} .*AhrefsBot.* RewriteRule ^(.*)$ http://www.domain.com/ [L,R=301] Order Allow,Deny Allow from all Deny from 209.222.8.0/8 Deny from ....
The above .htaccess hack requires the mod_rewrite apache module to be implemented in order to work. You can update the IP range you want to block by frequently checking websites that provide the IPs of popular user-agents like this one.
You can easily apply the above settings even if you redirect domains to your websites, which will make ahrefs, majestic and other crawlers unable to read the redirection