|
The
robots exclusion standard or robots.txt protocol is a convention to prevent
cooperating web spiders and other web robots from accessing all or part of a
website which is, otherwise, publicly viewable.
Robots are often used by search engines to categorize
and archive web sites, or by webmasters to proofread source code. A
robots/txt file on a website will which function as a request that specified
robots ignore specified files or directories in their search. This
might be, for example, out of a preference for privacy from search engine
results, or the belief that the content os the selected directies might be
misleading or irrlelevant to the categorization of the site as a whole, or
out of a desire that an application only operate on certain data. |
|
The protocol, however, is purely advisory. It
relies on the cooperation of the web robot, so that marking an area of your
site out of bounds with robots.txt does not guarantee privacy. Some web site
administrators have tried to use the robots file to make private parts of a
website invisible to the rest of the world, but the file is necessarily
publicly available and its content is easily checked by anyone with a web
browser.Information about robots.txt,
the Web Robots Exclusion Standard, and writing well-behaved Web robots.
http://www.robotstxt.org |