It has been difficult to come up and keeping up correct robots.txt files but thanks to Google for inventing an updated “robots.txt testing tool” in its Webmaster Tools.
Most sites have had this easy when they don’t employ robots in their system. An updated testing tool can be found in the Webmaster Tools within the Crawl section of Google site; this is where you’ll find the current robots.txt file, and you can actually test new URLs so as to see if they’re disallowed for crawling.
The robots.txt file will guide you through complicated directives let alone highlighting the specific one that led to the final decision. Once you have made your decision, you can make changes in the file and test those too and you’ll have to upload the new version of the file to your server afterwards to make the changes take effect. Google developers’ site has detailed information regarding robots.txt directives and how the files are processed.
You’ll be able to assess older versions of your robots.txt file and discover that access issues block you from crawling. For instance, if Googlebot sees a 500 server error for the robots.txt file, Google will generally pause further crawling of the website.
If you are still experiencing errors or warnings on your sites, Google advices that you double check your robots.txt files; you can do this by combining with other parts of Webmaster Tools for instance, you might use the updated Fetch as Google tool to render important pages on your website.
A robots.txt tester can be used in the event of any blocked URLs; this will assist to find the directive that’s blocking them.
Asaph Arnon, a member of the Webmaster tools team said, “A common problem we’ve seen comes from old robots.txt files that block CSS, JavaScript, or mobile content-fixing that that is often trivial once you’ve seen it.”