Subscribe via RSS FeedRandom Post

Control Which Areas Of Your Site Are Indexed With A Robots.txt File

View Comments February 15, 2007 | Tony Williams

Sometimes there are areas of your site that you don’t want indexing by search engines – e.g. admin files or images. By creating a robots.txt file, you can tell search engine robots about what pages on your website should be crawled and consequently indexed.

A robots.txt can be created quickly with Notepad. If you are using WordPress a sample robots.txt file would be:

User-agent: *
Disallow: /wp-
Disallow: /feed/
Disallow: /trackback/

“User-agent: *” means that all the search bots (from Google, Yahoo, MSN and so on) should use those instructions to crawl your website. Unless your website is complex you will not need to set different instructions for different spiders.

“Disallow: /wp-” will make sure that the search engines will not crawl the WordPress files. This line will exclude all files and foldes starting with “wp-” from the indexation, avoiding duplicated content and admin files.

If you are not using WordPress just substitute the Disallow lines with files or folders on your website that should not be crawled, for instance:
User-agent: *
Disallow: /images/
Disallow: /cgi-bin/
Disallow: /any other folder to be excluded/

After you created the robots.txt file just upload it to your root directory and you are done!

Thanks to DailyBlogTips for today’s tip

Tip: Click here to run a free scan for common PC errors

GD Star Rating
loading...

Tags: , , ,

Category: Using The Internet

Next Post: »»

Prev Post: »»

About Tony Williams: Want to get more out of your PC, the internet or your mp3 player? Onetipaday.com is here to provide clear simple tips and guides to help you achieve just that. Each tip will be easy to implement and will take no more than 5 minutes to read and implement. What will you learn today? View posts.

Leave a Reply




If you want a picture to show with your comment, go get a Gravatar.

blog comments powered by Disqus