How to hide part of your backlink profile from your competitors

In the SEO world, one of the first things that marketeers do in order to push their offsite efforts is the analysis of the backlink profile of the competitor(s). There are several tools in the market, although the two best tools are Ahrefs and MajesticSEO. The majority of the rest of the tools have either smaller indexes than these too or base their data on these two indexes. Since data from Google Webmaster Tools is private and exclusive to the website owner, most SEOs base their competitor analysis on these two tools.

In order to reduce the visibility of a backlink profile to competitors, hiding as much data as possible from these two tools if vital if you want to keep a competitive advantage towards the competition.

Hiding backlinks from private networks

Google has been quite active lately on penalizing link networks like MyBlogGuest, PostJoint and recently Teliad, although link networks seems to still be working to rank fast websites. Glenn Allsopp explains in depth why he will keep maintaining his Private Blog Network (PBN) in order to rank his websites on Google.

Blocking crawlers on robots.txt file

So if you are running a PBN, you can easily block the crawlers of popular backlink analysis tools, by adding disallow rules on your robots.txt file.

User-agent: Rogerbot 
User-agent: Exabot 
User-agent: MJ12bot 
User-agent: Dotbot 
User-agent: Gigabot 
User-agent: AhrefsBot 
User-agent: BlackWidow 
User-agent: ChinaClaw 
User-agent: Custo 
User-agent: DISCo 
User-agent: Download\ Demon 
User-agent: eCatch 
User-agent: EirGrabber 
User-agent: EmailSiphon 
User-agent: EmailWolf 
User-agent: Express\ WebPictures 
User-agent: ExtractorPro 
User-agent: EyeNetIE 
User-agent: FlashGet 
User-agent: GetRight 
User-agent: GetWeb! 
User-agent: Go!Zilla 
User-agent: Go-Ahead-Got-It 
User-agent: GrabNet 
User-agent: Grafula 
User-agent: HMView 
User-agent: HTTrack 
User-agent: Image\ Stripper 
User-agent: Image\ Sucker 
User-agent: Indy\ Library
User-agent: InterGET 
User-agent: Internet\ Ninja 
User-agent: JetCar 
User-agent: JOC\ Web\ Spider 
User-agent: larbin 
User-agent: LeechFTP 
User-agent: Mass\ Downloader 
User-agent: MIDown\ tool 
User-agent: Mister\ PiX
User-agent: MJ12Bot
User-agent: Navroad 
User-agent: NearSite 
User-agent: NetAnts 
User-agent: NetSpider 
User-agent: Net\ Vampire 
User-agent: NetZIP 
User-agent: Octopus 
User-agent: Offline\ Explorer 
User-agent: Offline\ Navigator 
User-agent: PageGrabber 
User-agent: Papa\ Foto 
User-agent: pavuk 
User-agent: pcBrowser 
User-agent: RealDownload 
User-agent: ReGet 
User-agent: SiteSnagger 
User-agent: SmartDownload 
User-agent: SuperBot 
User-agent: SuperHTTP 
User-agent: Surfbot 
User-agent: tAkeOut 
User-agent: Teleport\ Pro 
User-agent: VoidEYE 
User-agent: Web\ Image\ Collector 
User-agent: Web\ Sucker 
User-agent: WebAuto 
User-agent: WebCopier 
User-agent: WebFetch 
User-agent: WebGo\ IS 
User-agent: WebLeacher 
User-agent: WebReaper 
User-agent: WebSauger 
User-agent: Website\ eXtractor 
User-agent: Website\ Quester 
User-agent: WebStripper 
User-agent: WebWhacker 
User-agent: WebZIP 
User-agent: Wget 
User-agent: Widow 
User-agent: WWWOFFLE 
User-agent: Xaldon\ WebSpider 
User-agent: Zeus
Disallow: /

The above example includes the rules that have to be added on the robots.txt file of all the websites of a Private Blog Network in order to block ahrefs, majestic and other crawlers. You can always analyse your web server log files in depth in order to see if you ‘re missing any user agent that should be added on the robots.txt file

Use .htaccess file to block crawlers

Since many bots / crawlers don’t respect the rules on a robots.txt file, you can securely block the access of specific crawlers from the .htaccess file of your web server.

The easy trick is to redirect the crawler to another website, before it starts crawling your own website.

RewriteEngine On
RewriteBase /

RewriteCond %{HTTP_USER_AGENT} .*MJ12.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*AhrefsBot.*
RewriteRule ^(.*)$ http://www.domain.com/ [L,R=301]

Order Allow,Deny
Allow from all
Deny from 209.222.8.0/8
Deny from ....

The above .htaccess hack requires the mod_rewrite apache module to be implemented in order to work. You can update the IP range you want to block by frequently checking websites that provide the IPs of popular user-agents like this one.

You can easily apply the above settings even if you redirect domains to your websites, which will make ahrefs, majestic and other crawlers unable to read the redirection