A particularly good search engine. They screwed thousands of people by
withdrawing their programmer API and now won’t let anyone use it with a
program, only a browser.
Limiting Searches
Limiting a Search To A Single Site
Use site:mindprod.com in your search criteria to limit
search to one site.
Finding a similar Page
Use related:mindprod.com/religion/god.html Find pages
similar to this URL.
Find a Definition
Use define:zeugma definitions of a term.
Find a Website
Use inurl:sun find websites with the word “sun”
in the domain name.
Finding links To a Site
Use link:mindprod.com to find links to that website
from other websites. Find out who is talking about you.
Excluding a word
nuts -almonds means documents containg he word nuts,
but not documents also containing the word almonds.
Exact phrase
"peanut butter" with the quotes, insists the
words appear in that order with nothing in between.
Relaxed Search
"almonds OR nuts" means get documents
containing either almonds or nuts or both. A normal search insists
on all the words.
Extension filetype
filetype:pdf gets only Adobe pdf documents. filetype:html
gets only html documents, (especially useful for Google Desktop that indexes
more file types.) -filetype:html gets everything but
pdf documents.
You can get at other limiters with Google
Advanced Search checkboxes, then look at at the HTTP query generated, and
learn to compose it directly yourself.
Google Parts
Google has many parts for searching and other services:
Google International
Google has a national search engine that focuses on that country:
How Google Ranks Pages
Google managed to con the US patent office into issuing them a patent on a
rather obvious idea — ranking popular sites higher where popular is
defined as having a lot of other sites linking to them.
Google works exceedingly well at filtering out junk and repetition. It is very
fast compared with the competition. It seems to put what I am looking for right
at the top of the list most of the time.
Even without using Google’ custom search feature, you can compose a custom
text-link search with the search keywords built-in so the person only has to
click like this: Google
Easter search which looks in HTML like this:
<a href="http://www.google.com/search?q=rabbits+eggs">Google
Easter search</a>
You can make up a link to any search you want just
by typing it on in the Google command line, then pasting the URL Google converts
it into as the href, converting every & to &
(possibly by using Amper).
You can also set up a search box form where the user
types the search keys, where it searches only your site. See these examples.
Google has a scheme to make it easy to set up custom searches just by asking
questions on their website.
The features include:
- with ads or paid, but ad-free
- narrow the search to a particular website or websites.
- narrow the search to particular web pages or directories.
- salting the search with generic keywords.
- The links you now insert are much shorter and simpler and pass validation, with
the exception of a pair of nugatory <div>
tags and a dangling & that should have been
encoded as &.
- You can choose to have the result appear on your own web pages, or on an google
page as is traditional.
- You can have the search button hosted on google if you are don’t know how
to insert HTML into your own web page, e.g. if you are a blogger.
The disadvantages include:
- When you create your site search they don’t give you the HTML to paste into your
site, leaving you to wonder if search pages must be hosted at google. There is
no mention where to look under help. It turns out they later email you a
link to it, but they don’t tell you that or mention it anywhere in the FAQs.
Look for the email in your SPAM bucket. Later you can find it under code
on the control panel.
- General big brotherness. Google maintains control and in theory could make the
links you put on your website do anything it pleased.
- Clients must have JavaScript and one compatible with Google’s scripts.
This will rule out some of the less popular browsers.
- You can’t let the client choose with a radio button whether for example to
search the web, Sun’s site or your site.
- You can’t set up searches to Google images or Google groups.
- There is nothing in the generated HTML that gives any hint as to what search
engine does. It is up to you to add your own comments to help keep track.
- If you put two different searches on the same page, you must manually modify the
id of each search to make it unique.
Here is a typical example that searches just http://mindprod.com/kjv
where the text of the King James bible is located.
Search King James Bible
To search The King James bible for an
exact phrase, enter it in quotes, e.g. "hand of the poor
and needy"
To search for words used together in a chapter, enter the words without quotes e.g.
wrestled angel Jacob
Here is the HTML for the search box:
Note there is no reference to kjv or mindprod.com
in it anywhere. The 005260666645288681202:f8um2hwp-ho
is a reference to the information about the search stored on Google. That
snippet would thus work on any website.
You can access Google via Java with a SOAP
interface. Read the licence agreement carefully. It is quite onerous.
This let’s you index your local hard disk and search for keywords on it,
using the same sorts of searches you use with Google on the web. It has a
similar function to Microsoft FastFind, but it is more sophisticated. It is
sometimes called DeskBar because you can access it,
even outside a browser from the system tray/taskbar.
It even has a feature to let people on a LAN search each other’s computers.
It also lets you launch apps on your own machine by typing the first few letters
of the name, e.g. wor for MS Word. It is easy to set up
and use. Of course indexing takes up a huge amount of space and computer time.
It cleverly attempts to do its indexing while the computer is otherwise idle. It
appears to work with any browser, though it ignores emails other than those
managed by Outlook and Firefox, unless you find an appropriate plug-in.
To search archives such as 7z, arj, bz2, cab, gz, tar, rar
and zip, you will need to install the Archive
Plug-in. The plug-in also handles nested archives. If Google Desktop does
not do what you want out the box, check for a plug-in
to add the functionality. Failing that, you could write your own plug-in using
the proprietary Microsoft COM interface.
Unfortuately, it gives you no control over where it puts its giant indexes.
Arrgh! However, it is not totally stupid. It picked a different drive from my
choice for its indexes, but at least one with a reasonable amount of free space.
Infuriatingly, it put all its program files on the overcrowded C:.
I tried a plug-in from PodSync
called TweakGDS
to rectify index placement. It is rather literal. When you tell it to move the
Google indexes to drive G: it will put them in the
root directory! You must provide it a suitable directory name such as G:\Google
Desktop Indices. It does not move all the Google data files, just the
indexes, so you must leave the original X:\Google Desktop Data
and F:\Program Files\google directories intact. Be
aware, if you make any changes to the drives to be indexed via TweakGDS, it
discards all its indexes and starts from scratch. After I installed it, both
Google Desktop and my machine general became very unstable. Google kept
rebuilding the indexes over and over from scratch. It would stall for hours
doing making no rebuild or apparently regressing. I uninstalled it, and of
course that triggered yet another round of total index rebuilding. This seem to
have cleared the instability. Perhaps you will have better luck with it.
My biggest complaint with Google Desktop is that, even though I have it
configured to index only when my machine is idle, it is far too eager to index.
It slows down all my work. I would be much happier if it waited for 30 seconds
of inactivity before catching up on its indexing. It is in there like a hungry
dog indexing instantly every time I change a file.
Make sure you exit Google Desktop before you defrag, or else Google’s
index files (e.g. E:\Google Desktop Data\*\*.cf1) will
not be defragged. Google not only indexes your files, it creates multiple
cached copies of them, presumably stored in delta form the way CVS
does. It has to run all the time normally to detect file changes as they happen.
It will catch up when you run it again by doing a complete disk scan. Similarly,
turn it off while you are backing up to CD or it could stall the backup and ruin
it.
When you click on an HTML reference it has found, the document shows up in the
browser, in my case Opera. From there, I can click
view source to edit the original!! If you click a text file, it
shows up in the editor/viewer you have associated with *.txt
files.
Like DOS Lotus Magellan, Desktop has a plug-in API to allow non-text documents
to be indexed as well. It has no file explorer/copier/mover/filterer/deleter/viewer
like the incomparable Lotus DOS Magellan though. You write indexers in Microsoft
COM compatible languages, e.g. C++ or C#. You write gadgets in JavaScript and
XML.
You can restrict your searches to *.txt files with filetype:txt
in your search criterion.
You can restrict searches to certain directories with under:"C:\Documents
and Settings". I have not yet found a tag that lets you specify file
name wildcards.
from:george@whitehouse.gov looks for emails from a
certain person.
to:laura@whitehouse.gov looks for emails to a certain
person.
Boxes let you narrow the search to within a band around a given date. There are
other advanced
search features.
Google indexing gets run ragged if you move or rename directories frequently as
I do to hide and reveal them in various directory trees. You need to turn off
indexing of all such directories if you don’t want Google going nuts with
endless indexing. Similarly if you have any pipeline batch processing of files,
Google wants to get in there at every stage and take another cached snapshot of
all the files, greatly slowing down the works.
Google does not automatically update itself. You must download and reinstall
Google Desktop every few weeks to get the bug fixes. You can see which version
you have by clicking About.
The index will eventually clog with dozens of versions of the same file. Further
it will mysteriously refuse to index some new files. The only thing I have found
to correct this is to uninstall google desktop, manually wipe all its index
files (search for *.cf1 to find the directories where they live), and start from
scratch, and rebuild the indexes which will take several hours. Unfortunately,
you will have to reenter all your preference settings from scratch.
The bottom line is, I removed Google Desktop from my machine. It had only about
a 30% chance of finding what I was looking for. It simply did not index
everything it should. I could discern no pattern in its omissions. I needed more
accuracy. I am now using Copernic.
Finding Roedy’s Recommended Books
If you have Google Desktop installed and if
you are reading a local mirror copy of my website provided via the
Replicator, you can do a Google desktop search with: filetype:html
"recommend book", (note the exact-search
quotes) and Google Desktop will find you my book referrals sprinkled throughout
the mindprod.com site, about whatever other search
criteria you add. Note the spelling of recommend — one c,
two ms, not
reccommend or recomend or reccommend. You have to spell it
exactly in the form of a command: "recommend book",
not "recommend books" or "recommended book" or "recommend
a book".
This feature will also kick in later using an ordinary web browser Google.com
site search with site:mindprod.com "recommend book"
for people who browse the ordinary way, when Google next indexes the mindprod.com
website.
Incidentally, Google Desktop will let you find
anything else on the mindprod.com website with
indexes that are only seconds out of date.
You can similarly find my DVD recommendations with filetype:html
"recommend DVD" in Google Desktop and site:mindprod.com
"recommend book" in an ordinary web browser Google site search.
You can also use Google
Books to search for books, and browse them on-line to various degrees. You
can also use or Copernic, which I find generally
faster and more reliable that Google Desktop. Just search for recommend
book without quotes. You might optionally set the filetype to html to
narrow the search.
Google lets you search for videos. The search is unusual in that it does not
lead you to the original site. Google itself has caputured the videos and plays
them for you using their own video system or You-Tube, which they bought out.
Oddly you don’t seem to need to download any sort of viewer. Their plugin
allows you to search within the video using thumbnails. There are two big
advantages to this:
- Usually Google has much higher bandwidth than the original site.
- Even after the original site drops carrying the video, Google presumably will
continue to carry it.
I hope this additional competition improves the quality of other video plug-ins.
Google gvi format is copy protected so when you buy videos, you can view them on/line
as often as you like, but only on-line.
Google Earth is great fun. It lets you zoom over the surface of the entire earth
with an eagle’s view. You can even see your own house. They have digitised
satellite photos and some street level photos. You can swoop down the famous
streets of the world seeing the building fronts and even the people as they were
last snapped. There are three versions. The pro version gives you the best
resolution, but after a free trial you must pay for it. You download a program
that renders the 3D images. You can “fly” at any height and in any
direction swooping around with dizzying rapidity. It is a about as close being
Spiderman you can get for free. You are not seeing real time images, so it is
not suitable for spying on your mate. You will need a fast computer and fast
Internet connection to get the full effect. It has practical uses as well. You
can swoop around your vicinity and see the commercial establishments specially
marked. You can easily tell forest from clearcut, from parkland from residences.
When it shows you driving instructions, you don’t just get a map, you get
to see what the route will look like to drive. The 3D effects work best for
mountains and hills. For buildings, I gather the 3D information in gleaned from
straight overhead satellite photogrammetry which tells you how tall a building
is, but nothing about what its sides look like.
A toolbar you can add into Internet Explorer or Firefox,
but not other browsers. It lets you do the following:
- Ordinary Google web searches.
- Opera/Gator-like automatic fill in of web forms, called autofill
not to be confused with autolink, the button right
beside it. It is not very bright. You have to watch it carefully since it often
puts data is the wrong slots or fails to fill in slots it should be able to
figure out. It does not handle passwords, company names, nicknames, gender, age…
- Spell check of web form input.
- Address lookup showing maps, USA only.
- Translations of foreign languages, like Babelfish.
- Ability to customise buttons and searches.
Google is scanning the books of the world and letting you see the original
scanned images on-line. You can also search the texts. For public domain books,
you can see the entire text of the book. For cooperating copyrighted books you
can see a few pages surrounding the target of your search, table of contents,
index, front page etc. and for non-cooperating copyrighted books, you can see
just a few sentences.
iGoogle
A customisable home web page on one of Google’s servers. iGoogle
has various widgets you can use to decorate the page. It displays news, weather,
the date and time and bits of trivia.
Google
AdSense
The ads at the top and bottom of each page on this website are managed by Google.
Click throughs on those ads generates the revenues that pay the ISP bills for
mindprod.com. Google AdSense is one of the few affiliate
programs worth the effort. You can put up to three banners per page and if the
banners are big enough, sometimes multiple ads will fit in a banner. They are
popular because they are very easy to set up. You just put a standard piece of
HTML on your page that tells Google the size of ad you want and they take it
from there. You can block ads from objectionable companies.
Google pays you for placing most ads on a CPC (Cost Per Click)
basis. Ones advertisers target to your particular website, they pay on a CPM (Cost
per thousand/M impressions) basis. You will get something in the
order of
to
per click based on your natural ranking, an
advertising desirability measure.
The rate you get per click is a the result of an automatic auction between
advertisers for the right to advertise to your site or your keywords. You get an
undisclosed percentage of what the advertiser pays.
You must, of course, register as an affiliate first (via the white box on the
right) or they won’t know where to mail the cheque.
I keep the ads off the high traffic pages since they can slow down loading and,
if Google in unreachable, can freeze the loading entirely. I have written Google
repeatedly about the problem. There is also a problem using the website offsite
when the ad content is not available. Google JavaScript needs to suppress the ad
if when you are off-line or if the ad does not appear within a few seconds. It
should not hold up the entire page. For now, you have to turn off JavaScript for
off-line viewing of my web pages.
Ads can only appear on pages that Google spiders, so they must be open to robots.txt
and appear in your optional sitemap. Google needs to
examine their content to select relevant ads.
Google
AdWords
The advertiser side of AdSense is called AdWords.
Advertisers usually pay for clicks on ads, not sales or impressions. Advertisers
bid for the right to ad placements for important keywords and sites. Law firms
will bid up to
per click for keywords that may land them clients. Website owners displaying the
ads get a an undisclosed share of this largess. Google has to bar website owners
who lard their sites with phony false popular keywords. When the bids are too
low, Google displays public service ads. Advertisers can submit a daily
advertising budget to Google and be pretty well guaranteed they can’t go
over it.
Here are the approximate fees:
- to sign up
- You can chose a maximum cost per click in the range
to
.
- Your daily click budget can be as low as
.
- Site targeted ads require a minimum price of
per thousand impressions.
AdWords
Sign up
Site
Maps
If you have a website, you can prepare an XML file to describe files on your
website, how frequently they change etc. These are called Google
sitemaps. The file would look like this, only with one item for each file
you want included in the Google index.
You must register
with Google to tell them you have prepared such a file and where you put it. The
file itself is GZip compressed, which, because XML is so fluffy, gets about 25
to 1 compression. You should validate your XML files before submitting them
using these XSD schemas for Simple
SiteMaps and SiteMap
indexes.
GSiteCrawler
Many people are using a program called SOFTPlus
GSiteCrawler to create their sitemaps. It works just like the Google spider
does by chasing links in your on-line website to find all the pages. This is
quite time consuming since it has to download every one of your website’s
pages individually. There is a much faster way, that takes only under two
seconds to prepare a sitemap for a website of 10,000 files, by using an off-line
site preparation utility that does not even need to read any of your files, just
the directory entries. Obviously there are side benefits to GsiteCrawler’s
spidering, e.g. broken link detection, but you can get that much faster with off-line
spidering with Xenu. Many FTP utilities will verify the
consistency of off-line and on-line versions of your files without the heavy
overhead of spidering such as NetLoad and FTP
Voyager.
SiteMap Utility
I have written a Java program called SiteMap you can download
to generate a Google sitemap file for your own website, off-line. It can prepare
a sitemap for a website of 10,000 files in under 2 seconds. This is fast enough
that you can use it before every upload, ensuring your Google sitemap is 100% up
to date for when the Google spider of opportunity knocks to spider your site.
You control the utility by composing three files with a text editor. The first siteconfig.properties
looks like this:
You also compose a list of entire directories to specially catalog: directories.csv,
containing the directory name, frequency of update and relative importance when
it comes to keeping it up to date, expressed as a fraction 0 to 1.
You then compose a list of exceptions, the files to specially catalog a
different way from the default for their directory : files.csv:
It contains directory name, filename, frequency updated, and relative importance.
, whatsnew.html, hourly, .9
projects, projects.html, weekly, .7
My utility then scans your disk and prepares a Google sitemap entry for every
individual file that meets those criteria and compresses it. You then upload it
to your website. The first time, you must also register that file’s name
with Google.
You want to regenerate your site map just prior to every upload, otherwise if
the Google spider comes, it will miss some of your recently updated files.
Google Accelerator
Google WebAccelerator
speeds up your browsing in IE and Firefox
by caching web pages, and preemptively fetching pages in the background. Beware.
The Google web accelerator proxy drastically slows down Java Web start unless
you configure jawaws.exe to use a direct network
connection. Read up on more details on how it works.
General Search
Google Code hosts your
programming projects with Subversion version control. It does not have the
cachet of SourceForge hosting. Google gives you a choice of 8 licences. Public
domain and non-military use only are not among them.