Posted: . At: 9:15 AM. This was 7 months ago. Post ID: 18510
Page permalink. WordPress uses cookies, or tiny pieces of information stored on your computer, to verify who you are. There are cookies for logged in users and for commenters.
These cookies expire two weeks after they are set.



Sponsored



Use the robots.txt file to prevent indexing of various areas of your site.


Using a robots.txt file is very useful for preventing indexing of various areas of your website that you do not wish to be indexed by a web crawler. The example below will disallow access by web crawlers to a few directories under the root of the public_html folder. This can be useful for a large site, as unneeded folders do not waste crawl cycles, and only the important pages are indexed instead.

robots.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /cache/
Disallow: /class/
Disallow: /images/
Disallow: /include/
Disallow: /install/
Disallow: /kernel/
Disallow: /language/
Disallow: /templates_c/
Disallow: /themes/
Disallow: /uploads/

The example below is very good for a WordPress website.

1
2
3
4
5
6
7
8
User-agent *
Disallow: /wp-admin/
Disallow: /wp-content/plugins/
Disallow: /wp-content/themes/
Disallow: /readme.html
Disallow: /refer/
Allow /wp-admin/admin-ajax.php
Sitemap: sitemap.xml

This is a great way to optimise Google crawling of the website and prevent Google from wasting time indexing unnecessary files.

And this is yet another version that disallows certain folders. This may be used to allow certain folders and then disallow others.

1
2
3
4
5
6
7
8
# Default robots file version:2
User-agent: *
Disallow: /calendar/action*
Disallow: /events/action*
Allow: /*.css
Allow: /*.js
Disallow: /*?
Crawl-delay: 3

And finally, this is how to block certain bots from crawling your website.

#
# Disallow Money for Google News
User-agent: Googlebot-News
Disallow: /tmoney/*
#
# Allow Adsense
User-agent: Mediapartners-Google
Disallow:
#
#
User-agent: CrystalSemanticsBot
Disallow: /
#
User-agent: GPTBot
Disallow: /
#

Or use this in your .htaccess file.

.htaccess
1
2
3
    RewriteEngine On
    RewriteCond %{HTTP_USER_AGENT} ^.*(Baiduspider|HTTrack|Yandex).*$ [NC]
    RewriteRule .*[F,L]

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.