Share this topic on FacebookShare this topic on MySpaceShare this topic on Del.icio.usShare this topic on DiggShare this topic on RedditShare this topic on StumbleUponShare this topic on TwitterShare this topic on MagnoliaShare this topic on GoogleShare this topic on Yahoo

Author Topic: Robots.txt (what not to do)  (Read 2582 times)

0 Members and 1 Guest are viewing this topic.

Offline Black Viper

  • Administrator
  • ******
  • Posts: 1906
  • "Have you tweaked your OS lately?"
    • Black Viper's Web Site
Robots.txt (what not to do)
« on: April 03, 2009, 11:33:15 am »
I am always tweaking what can and cannot go outbound on my internet connection with relation to my server. As such, robots.txt (for those automated systems that obey it) rocks.
Now, I had included a line such as the following on the forums:

Code: [Select]
User-agent: *
Disallow: /*?*

This removed all robots from indexing anything that had a "?" in it. This is a good thing as sorting and "action" URL's are removed.

Just in case, I added this line:

Code: [Select]
Disallow: /*action*
This is a valid action to get rid of robots indexing, for example, all of the "help" links or "search" and "profile" links. This greatly reduces bandwidth.

Also, links strewn throughout the forums have a bookmark to them. For example, they have "<page url>#new".
Those links are sort of pointless as they are only pointing back to normal pages that should have been indexed in the topic views.
So, I added in this line:

Code: [Select]
Disallow: /*#*
I "thought" that this would get rid of those bookmark links (jump to #top, etc) and help out in the "Page Rank" of my BBS to wipe out redundant redundant URL's pointing to the exact same information.
Boy, was I very wrong.

According to robotstxt.org, and I quote:

Quote
Comments can be included in file using UNIX bourne shell conventions: the '#' character is used to indicate that preceding space (if any) and the remainder of the line up to the line termination is discarded. Lines containing only a comment are discarded completely, and therefore do not indicate a record boundary.


So, in non-geek-speak. "#" indicates a comment... or, "anything after "#" in the line is discarded".
So, lets look at this line again:

Code: [Select]
Disallow: /*#*
This basically states the following:
"Disallow /*" or... "Disallow everything" in non-geek-speak.

Needless to say, I have been wondering why Google and others have not visited the BBS lately... and this is why. :P

Take it from a geek... "reducing bandwidth tweaks that ya may think will work" actually does... it reduced nice robots from visiting at all!

Ya learn something new everyday. :)
« Last Edit: April 03, 2009, 09:13:08 pm by Black Viper »
Charles "Black Viper" Sparks
www.blackviper.com

Offline Black Viper

  • Administrator
  • ******
  • Posts: 1906
  • "Have you tweaked your OS lately?"
    • Black Viper's Web Site
Re: Robots.txt (what not to do)
« Reply #1 on: April 03, 2009, 12:01:39 pm »
Within a few minutes of this post (and me fixing the issue), both Yahoo and Google have shown up. :)
« Last Edit: April 03, 2009, 09:13:37 pm by Black Viper »
Charles "Black Viper" Sparks
www.blackviper.com