Sitemaps were originally created by Google as a way to help them spider websites more efficiently. Google desiged them in way to make it difficult for other search engines to use them. Since then, sitemaps have been opened up and made accessible to all search engines. Basically, a sitemap is just an XML (extensible Markup Language) file that lists all the spiderable files on your website, when you last updated them, and how important you think it is to keep the spidering up to date. A typical sitemap file might look something like this:
Normally the file itself is GZip compressed, which, because XML is so fluffy, gets about 25 to 1 compression. You should validate your XML files before submitting them using these XSD schemas for SiteMaps.
|How Do Search Engines find your Sitemap?||Links|
# parts of the mindprod.com website not indexed user-agent: * disallow: /include/ disallow: /jgloss/include/ disallow: /image/restricted/ Sitemap: http://mindprod.com/sitemap.gzNote how Sitemap takes a full URL (Uniform Resource Locator), unlike the others.
I have written a Java program called SiteMap you can to generate a Google sitemap file for your own website, offline. It can prepare a sitemap for a website of 10,000 files in under 2 seconds. This is fast enough that you can use it before every upload, ensuring your Google sitemap is 100% up to date for when the Google spider of opportunity knocks to spider your site.
You control the utility by composing three files with a text editor. The first siteconfig.properties looks like this:
You also compose a list of entire directories to specially catalog: directories.csv, containing the directory name, frequency of update and relative importance when it comes to keeping it up to date, expressed
You then compose a list of exceptions, the files to specially catalog a different way from the default for their directory : files.csv: It contains directory name, filename, frequency updated, and
# individual file exceptions to the directory rules # directory, file, frequency, spidering importance. , whatsnew.html, daily, .9 jgloss, deadpadsites.html, never, 0 project, projects.html, weekly, .7
My utility then scans your disk and prepares a Google sitemap entry for every individual file that meets those criteria and compresses it. You then upload it to your website. The first time, you must also register that file’s name with Google.
You want to regenerate your site map just prior to every upload, otherwise if the Google spider comes, it will miss some of your recently updated files.
You can validate a sitemap file with an sitemap XML schema.
There is another kind of site map, that tries to give your a birds eye view of the entire website, so you can jump directly to the section you need. You can prepare these manually using the usual HTML (Hypertext Markup Language) editors, combined with directory listings and search/replace or you can use a utility to build one for you such as Coffee Cup SiteMapper. I created a primitive one manually for my own website.
available on the web at:
optional Replicator mirror
Your face IP:[184.108.40.206]
|Feedback||You are visitor number 8,252.|