New Media Services
Google Webmaster Tools And Robots.txt
Do you use Google Webmaster Tools? The free service from Google to help webmasters make their sites more Google friendly.
I have been thinking of Google Webmaster Tools lately as JR from Internet Marketing Strategies) has written a fabulous article covering the tool. He describes the 6 tabs on the tool one by one and covers some tips on why and how to use GWT.
As i mentioned to JR every time i read a new article on GWT, i learn something new. I wouldn’t necessarily describe GWT as super advanced, but it isn’t exactly child’s play either. Some level of webmastering is required.
I want to get to the bottom of one thing within GWT: robot.txt
The what, who, when, why, and how? Feel free to share your tips and experiences below. When i go to Google Webmaster Tools, i notice a few things in the overview section. (see screenshot below)
It seems this blog has 19 not found URL’s and 2 URL’s restricted by robots.txt
What does this mean? How do i fix it? Feel free to weigh in below in comments.
Missy.
| Print article | This entry was posted by Missy on October 18, 2008 at 8:10 PM, and is filed under Blogging Tips. Follow any responses to this post through RSS 2.0. You can leave a response or trackback from your own site. |



about 1 year ago
Hello Missy,
Thanks for the mention, I appreciate it. RE: The URL’s blocked on your robots file just look at your robots file and see what pages are blocked.
You can see what your robots file says in GWT, but you can only edit it from your hosting files. Usually a robots.txt file is created by the webmaster, so if you did not create it, I would certainly look into what is in yours and edit it correctly.
There are pages that you do not want indexed and that you do not want Google to crawl to preserve link juice and crawl juice for the important pages. You do not want for instance, your css files, wp-admin, feeds, about, contact pages etc…to be crawled and indexed. You need to block categories and tags on blogs because these would be considered duplicate content, all this is accomplished with a robots file.
Here is my robots file, you can copy it and use that for your blog, edit out pages you may not have as needed, such as terms-of-service, that you may not have on your blog.
User-agent: *
Disallow: /wp-admin/
Disallow: /cgi-bin/
Disallow: /*.css$
Disallow: /cont/
Disallow: /scripts/
Disallow: /wp-content/themes/
Disallow: /?s=
Disallow: /rss/
Disallow: /feed/
Disallow: /feed/
Disallow: /terms of use/
Disallow: /disclaimer/
Disallow: /privacy policy/
Disallow: /search/
Disallow: /about/
Disallow: /contact me/
Disallow: /tags/
Disallow: /archives/
Disallow: /category/
Disallow: /trackback/
Disallow: /contact-form/
Disallow: /comments/feed/
Disallow: /*.avi$
Disallow: /*.cgi$
Disallow: /index.php
Disallow: /*.js$
Disallow: /*.inc$
User-agent: Mediapartners-Google*
The 19 not found URL’s may be redirects, what are they? and What is the reason that GWT is showing them not found is a good place to start.
about 1 year ago
Hey, thanxs JR. I think i better understand now what is the robots.txt file. (or atleast its purpose)
You say it needs to be changed via my host, which file is it in?
Thanxs for the copy of your robots file, will look it over. That is awesomesauce!
As for the 19 URL’s not found, i will have to look into it. I honestly don’t know.
Thanxs for your groovy help. Appreciate it loads.
about 1 year ago
Hey Missy
I agree the article by JR on Google’WMT is awesome. On last check, you don’t have a robots.txt file yet which is not an ideal situation for you. I guess JR has explained to you what it’s for….
The other 19 urls could be some of the broken links within your site that you need to fix. Use a broken link checker plugin to see where the link is and rectify it immediately.
Hope it helps…
Yan
Blog for Beginners’s last blog post..Me, Carl Ocab and Direct Response Templates