robots.txt : Java Glossary

robots.txt
robots.txt is a file you can place in the root directory of your website to tell web crawlers (search engines) which pages to index and which to ignore. A typical robots.txt file might look like this:
# parts of the mindprod.com website not indexed
user-agent: *
disallow: /template.html
disallow: /include/
disallow: /jgloss/include/
Sitemap: http://mindprod.com/sitemap.gz
It means, for all browsers, don’t look at the file template.html or anything in the two directories mentioned. There is no way to tell it to avoid certain file extensions. Note that the Sitemap directive takes a full URL (Uniform Resource Locator), unlike the others.

CMP homejump to top You can get the freshest copy of this page from: or possibly from your local J: drive (Java virtual drive/mindprod.com website mirror)
http://mindprod.com/jgloss/robotstxt.html J:\mindprod\jgloss\robotstxt.html
logofeedback Please email your feedback for publication, letters to the editor, errors, omissions, typos, formatting errors, ambiguities, unclear wording, broken/redirected link reports, suggestions to improve this page or comments to Roedy Green : feedback email If you want your message kept confidential, not considered for posting, please explicitly specify that.
mindprod.com IP:[65.110.21.43]
view BlogYour face IP:[38.107.179.212]
You are visitor number 10,569.