I am always tweaking what can and cannot go outbound on my internet connection with relation to my server. As such, robots.txt (for those automated systems that obey it) rocks.
Now, I had included a line such as the following on the forums:
User-agent: *
Disallow: /*?*This removed all robots from indexing anything that had a "?" in it. This is a good thing as sorting and "action" URL's are removed.
Just in case, I added this line:
Disallow: /*action*This is a valid action to get rid of robots indexing, for example, all of the "help" links or "search" and "profile" links. This greatly reduces bandwidth.
Also, links strewn throughout the forums have a bookmark to them. For example, they have "<page url>
#new".
Those links are sort of pointless as they are only pointing back to normal pages that should have been indexed in the topic views.
So, I added in this line:
Disallow: /*#*I "thought" that this would get rid of those bookmark links (jump to #top, etc) and help out in the "Page Rank" of my BBS to wipe out redundant redundant URL's pointing to the exact same information.
Boy, was I
very wrong.
According to robotstxt.org, and I quote:
Comments can be included in file using UNIX bourne shell conventions: the '#' character is used to indicate that preceding space (if any) and the remainder of the line up to the line termination is discarded. Lines containing only a comment are discarded completely, and therefore do not indicate a record boundary.
So, in non-geek-speak. "#" indicates a comment... or, "anything after "#" in the line is discarded".
So, lets look at this line again:
Disallow: /*#*This basically states the following:
"Disallow /*" or... "Disallow everything" in non-geek-speak.
Needless to say, I have been wondering why Google and others have not visited the BBS lately... and this is why.
Take it from a geek... "reducing bandwidth tweaks that ya may think will work" actually does... it reduced nice robots from visiting
at all!
Ya learn something new everyday.