Robots.txt is used to define the Robots Exclusion Protocols for the websites. It handles the behaviors of all the robots, bots and web-crawler programs. In a simple words any web-crawler program or bots visiting any website checks for the root file /robots.txt, which defined Exclusion Protocols for that bots. One of the common example of the robots.txt file is defined below How to decide the robots.txt for Wordpress blogsA common example of robots.txt file used at honeytechblog is listed as below User-Agent: * User-agent: Googlebot-Image User-agent: Mediapartners-Google* Explanations of the common exclusions and agent used in the robots.txt files1.Sitemap: http://www.honeytechblog.com/sitemap.xmlUsed to define the sitemap location for the bots, these will create ease for search bots to detect your new pages. 2.User-Agent: *Already described above 3.Disallow: */?mobi*Used to exclude the pages containing “/?mobi”. I used this feature to avoid the content duplicacy issues generated for mobile users.( It is not necessary for you ) 4.Disallow: /wp-admin/Used to exclude the wordpress admin pages from the search engine. It is necessary to avoid the listing of any hack prone page or errors. 5.Disallow: /wp-includes/Used to exclude the Wordpress includes folder which also necessary to avoid from the searching bots. It is necessary because some times when your Wordpress faces any plugins or update issues, it floats a serious errors which can be easily indexed by the search bots or hackers. 6.Disallow: /wp-content/Again it is not necessary to index all the files in the wp-contents. 7.Disallow: /wp-For security purpose its better hide all the core files and pages. 8.Disallow: /*.css$For exclusion of all the style-sheets. (If you want to further protect your css files) 9.Disallow: /*?Used to disallow all the urls having “?” in it. (Used to avoid content duplicacy issues, tracking urls and custom features from the reach of bots) 10.Disallow: /name/Used to disallow any directory ,folders or categories. for example you want to disallow “admin” folder then you can simple use “Disallow: /admin/” , if you want to disallow a category named “download” then you can simply use “Disallow:/category/download*” and for uncategorized category use can use “Disallow: /category/uncategorized*” Extra:To allow all the images bots (like google image bot) to search and index all images of the website / blog
Allow: /*.png$ Allow: /*.jpg$ Allow: /*.gif$ Allow: /*.jpeg$ Allow: /*.jpg$ Allow: /*.ico$ Allow: /images/ |
Saturday, December 19, 2009
How to decide the robots.txt for Wordpress blogs
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment