You use a sitemap to encourage Google or other search engines to more frequently and efficiently index your website.
If you are not familiar with sitemaps, see the overview information about sitemaps.
The layout of the files that Sitemap generates are defined at Sitemaps.org.
You use this sitemap utility because it is so quick you can use it before every upload so that your sitemap is always 100% up-to-date, always ready for whenever opportunity knocks, in the form of the Google spider.
This approach in orders of magnitude quicker than actually spidering the site yourself with a tool like Xenu or GSiteCrawler. With the sitemap utility, you can prepare a fresh sitemap in a couple of seconds.
To speed spidering and ensure the most important files get spidered frequently, Google has created a system where you leave a compressed catalog of all your files on the website for it to find. The CMP (Canadian Mind Products) SiteMap utility creates that file.
For a sitemap to take effect, you must upload the generated compressed sitemap.gz file to the root directory of your website and you must one time register the name of the file with Google ⇒ Tools ⇒ Add Site so they will know what you called it and where to look for it. You can check it is registered properly in your Google ⇒ Webmaster ⇒ Tools ⇒ Dashboard. The sitemap.gz file can be used by all search engines, not just Google, so long as they know to look for it. You can tell all the search engines where to find your sitemap by adding a line to your robots.txt files like this:
# robots.txt. Lives in root directory of the wabsite # parts of the mindprod.com website not indexed user-agent: * disallow: /include/ disallow: /jgloss/include/ disallow: /image/restricted/ Sitemap: http://mindprod.com/sitemap.gz
See more information on robots.txt. You can use it to control which parts of your website get indexed. You can also use the robots meta tag.
It might be wise to put a link to your sitemap somewhere on your website so all the search engines could in theory find it by spidering, without other help.
I wrote SiteMap back in 2006-01 before I knew of the existence of any competition, so I just called it SiteMap, which sounds a bit grandiose now. To distinguish it from others, you might refer to it as CMP SiteMap.
To use the program, you must configure five files:
You also need robots.txt, but that is not for the SiteMap utility.
defines where to find your website html files.
defines which directories you want Google to look at and some facts about them. All files in each directory mentions will be treated the same way, unless there is a special entry for it in files.csv It has comma-separated fields:
defines files you want Google to handle specially. They have comma-separated fields
You may optionally include # comments. The meaning of these frequency and priority file is defined in the Google FAQ. The program automatically generates the lastmod.
# individual file exceptions to the directory rules # directory, file, frequency, spidering importance. , whatsnew.html, daily, .9 jgloss, deadpadsites.html, never, 0 project, projects.html, weekly, .7
A list of records that look like this:
#includes.csv file, which extensions to include in the sitemap startWith,xxx endsWith,.html endsWith,.txt regexMatch,xxxxx
You ran specify strings file names (not path names) that start with, end with or that match a regex pattern. You can put only one string per line. If don’t provide an includes.csv file out it presumes endswith .html .htm .txt .pdf. These are the files you want to include is the list you leave for Google.
A list of records that look like this:
You ran specify strings file names (not path names) that start with, end with or that match a regex pattern. You can put only one string per line. If you don’t provide an excludes.csv file, it presumes no excludes. These are the files you want to exclude from the list you leave for Google.The program looks in the current working directory for all five files.
You invoke it with without parameters:
rem invoke the sitemap utility to create a sitemap. rem it reads five configuration files from the current directory sitemap.jar
You can view the generated sitemap.gz with WinZip. Tell it that the internal file ends in .xml. You can also view a sitemap.proof file with contains a decompressed version of what is inside the sitemap.gz file.
To install the sitemap utility, extract the zip download with WinZip, available from WinZip.com (or similar unzip utility) into any directory you please, often C:\ — ticking off the User folder names option. To run as an application, type:
rem invoke the sitemap utility to create a sitemap. rem it reads five configuration files from the current directory sitemap.jar
adjusting as necessary to account for where the jar file is.
See notes in the program and in the sample files.
Note that you normally have both a robots.txt file and a Google sitemap.gz file. The directories you exclude in robots.txt trump inclusion in the sitemap.
For superfast execution, compile the jar with Jet to create a highly optimised sitemap.exe file. Then you can invoke it with just sitemap on the command line.
Why the spider icon? The sitemap helps Google rapidly spider the website, visiting all the files.
The file created for Google contains a list of all the individual *.html and *.txt files on the website and when they were last updated. It is fairly easy to modify the program to include other types of files. There is no point in including type Google does not index, such as zip.
You can tell how often Google is actually spidering your files by looking your files up in Google and noting the date of the cached version.
Technophiles might want to validate the sitemap.xml that the sitemap utility generates and compresses into the sitemap.gz, just to make sure it is completely compliant with the sitemap standards. Use an XSD (XML Scheme Definition) Sitemap Schema with the link embedded in the sitemap.proof file, namely http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd You can use a utility like Stylus Studio to do the validation. This would be just for your own reassurance. The code generated is always compliant.
Package | Version | Released | Licence | Language | Notes | ||
---|---|---|---|---|---|---|---|
SiteMap For Google |
1.9 | 2010-12-01 | free | Java |
❘ precis
❘ manual
❘ screenshot
❘ browse source repository
for the current version of SiteMap For Google. Build a Sitemap for a website for more efficient spidering by Google and other search engines.
2.4MB
zip for SiteMap For Google Java source, compiled class files, jar and documentation to run on your own machine as an application.
Runs on any OS that supports Java e.g. W2K, XP, W2003, Vista, W2008, W7-32, W7-64, W8-32, W8-64, W2012, W10-32, W10-64, Linux, LinuxARM, LinuxX86, LinuxX64, Ubuntu, Solaris, SolarisSPARC, SolarisSPARC64, SolarisX86, SolarisX64 and OSX. First install the most recent Java. To install, extract the zip download with WinZip, (or similar unzip utility) into any directory you please, often J:\ — ticking off the use folder names option. To check out the corresponding source from the Subversion repository, use the TortoiseSVN repo-browser to After you have installed the jar, you can run it as an application. Type: java -jar J:\com\mindprod\sitemap\sitemap.jar parms
adjusting as necessary to account for where the jar file is. download ASP PAD XML program description for the current version of SiteMap For Google. SiteMap For Google is free. Full source included. You may even include the source code, modified or unmodified in free/commercial open source/proprietary programs that you write and distribute. Non-military use only. |
||
This page is posted |
http://mindprod.com/application/sitemap.manual.html | |
Optional Replicator mirror
|
J:\mindprod\application\sitemap.manual.html | |
Please read the feedback from other visitors,
or send your own feedback about the site. Contact Roedy. Please feel free to link to this page without explicit permission. | ||
Canadian
Mind
Products
IP:[65.110.21.43] Your face IP:[18.97.14.89] |
| |
Feedback |
You are visitor number | |