Introduction
This essay will get you started writing HTML so you can put up a website on the
Internet. HTML stands for HyperText Markup Language,a
platform independent technique of distributing formatted documents via the web.
The bold, italic etc. in the document you are reading now (presumably
on a web browser), is encoded by embedding tags like <B> and <I>.
This markup scheme works on any brand of computer and allows web sites to send
all information in a standard way, without having to worry about what brand of
computer the recipient has, or what software she uses.
Learning HTML
I found the easiest way to learn HTML is to look at
other people’s examples, to cut and paste from them, and to experiment by
fiddling the various parameters to see what the visual effects are.
Trying to make sense of W3C HTML standards requires a PhD in computer language
theory. Anyone can play monkey and copy from other sites.
Documentation
It might help to buy an introductory text, but HTML is so simple, that probably
is not necessary.
Proofreading
To ensure your HTML will work when you upload it to a website server, use all
lowercase filenames and directory names. Avoid spaces and punctuation (except _)
in names. Make sure you use relative links to your own files — no C: or
file:///C|/ style absolute references.
The W3 Consortium offers an
on-line
validator for the various HTML dialects. It is sort of a Lint for HTML. It
can ensure your HTML will work properly on browsers other than the one you
tested it on. The W3C consortium also controls the various HTML standards.
For speed, and control, I use
CSE HTMLValidator to check my web pages off-line and in batches.
When it comes to HTML4, and CSS style sheets, browser
support is shaky. TopStyle will help you keep track
of which features work on which browsers.
The Tags
This section just summarised the tags. Sometimes what I tell you here will be
enough to use them. It is really just designed to jog your memory. Look
elsewhere for details or experiment!
Class
It is easiest to use the class attribute, then specify what it means in your CSS
style sheet.
| Use of CSS class |
| Start Tag |
End Tag |
description |
| <span class="strawberry"> |
</span> |
encloses text of the strawberry class. The browser will look in the style
sheet to figure out what attributes should be applied to strawberry text,
perhaps a size, colour, alignment, font etc. |
| <div class="strawberry sweet"> |
</div> |
Applying two classes to the same tag. This applies both the classes strawberry
and sweet to a group of lines. The browser will
look in the style sheet to figure out what attributes should be applied to
strawberry and to sweet text, perhaps a size, colour, alignment, font etc. Note
they are separated by a space not a comma! |
| <ul class="strawberry"> |
</ul> |
like a regular UL except everything in it is should be treated as strawberry
text. |
Groups
| Groups, Lists, Glossaries |
| Start Tag |
End Tag |
description |
| <ol> |
</ol> |
ordered numbered list |
| <ul> |
</ul> |
unordered bulleted list. Consider using a borderless table with column of
titles and a column of detail instead. The bullets themselves don’t give
much additional information. |
| <menu> |
</menu> |
menu list, more compact than ul. |
| <li> |
</li> |
list item |
| <dl> |
</dl> |
dictionary list |
| <dt> |
</dt> |
dictionary term being defined |
| <dd> |
</dd> |
dictionary definition |
Line Breaks
| Line Breaks |
| Tags |
description |
<br>
or
<br /> |
new line, no extra space.
To prepare for XHTML, it is better to use <br />. |
| <br clear="all" /> |
gets past any flow-around illustration. |
<p>
or
<p>…</p> |
new paragraph, blank line inserted.
To prepare for XHTML, it is better to use <p>…</p>
surrounding each paragraph. |
| <p align="center">…</p> |
centre each line |
<hr>
or
<hr /> |
horizontal rule
To prepare for XHTML, it is better to use <hr /> |
Font selectors
| Font Colours and Size |
| Start Tag |
End Tag |
Appearance |
Description |
| <h1> |
</h1> |
sample |
major heading |
| <h6> |
</h6> |
sample |
most minor heading |
| <b> |
</b> |
sample |
bold, c.f. strong |
| <i> |
</i> |
sample |
italic, c.f. em |
| <tt> |
</tt> |
sample |
typewriter font |
| <pre> |
</pre> |
sample |
preformatted |
| <font size="+3"> |
</font> |
sample |
or 3 for absolute size rather than increase |
| <font color="red"> |
</font> |
sample |
see choice of colours. |
| <font face="Comic Sans MS,Helvetica,sanserif"> |
</font> |
sample |
suggest a typeface. User must have it installed, can specify alternates in
order of preference. You should end with one of the CSS default fonts serif,
sansserif or monospace. |
| <big> |
</big> |
sample |
shorthand for <font size="+1"> |
| <small> |
</small> |
sample |
shorthand for <font size="-1"> |
| <dfn> |
</dfn> |
sample
|
definition |
| <em> |
</em> |
sample |
emphasis, usually renders as italic. |
| <cite> |
</cite> |
sample |
book titles |
| <code> |
</code> |
sample |
program listings |
| <kbd> |
</kbd> |
sample |
keystrokes |
| <samp> |
</samp> |
sample
|
computer status messages |
| <sup> |
</sup> |
2 |
superscript. You can also use entities like ²
² |
| <strong> |
</strong> |
sample |
strong emphasis, usually rendered as bold. |
| <var> |
</var> |
sample |
to be replaced by specific when used. Typically rendered in italics. |
| <u> |
</u> |
sample |
underline |
| <address> |
</address> |
sample |
email address, possibly street address. |
| <blockquote> |
</blockquote> |
sample |
long quotation |
Comments
You can insert comments in your HTML that are ignored. You can insert them in
the text but not inside tags. Anything between <!--
… --> is ignored. Comments can span lines.
<!> is a dummy comment. Avoid the string --
inside comments. I always put a space after <!--
and before -->, though it is not strictly
necessary. Note the asymetry of the start and end tags.
Comments are not treated as white space. e.g. grandstand will render as grandstand
not grand stand.
Anchors
typical target — place where you jump TO:
<h2><a name="GLOSSARY">Roedy’s Java Glossary</a></h2>
Rules for making up anchor names:
- The HTML
4.01 spec section 6.2 states that anchor names must begin with a letter a-z,
A-Z, and may be followed by any number of letters,
digits 0-9, hyphens -,
underscores _, colons :,
and periods .. So lead _
are not permitted. All numeric anchors are not permitted.
- Anchor names are supposed to be case insensitive. Apple
is supposed to be treated as the same as APPLE. To be
safe, always consistently use UPPER CASE.
- For indirect links, use a trailing underscore _ on
the anchor name, e.g. MAC_ so you will know not to
refer people to those dummy anchors, but rather directly to the HTML at that
anchor points to. For example the HTML at anchor at MAC_
may say "see MACINTOSH". People are lazy and
will get angry if you send them to anchor MAC_ rather
than anchor MACINTOSH, because they have to do an
extra click to get to MACINTOSH where the real
information is.
Sun flagrantly ignores these rules and uses space, ( ) and comma in its anchors
in generated Javadoc.
Links To
typical reference that when clicked takes you to a particular spot:
Colours
Click any ball to view the corresponding colour palette.
The above
colour chart shows Netscape’s 133 standard colours, and HTML 3.2’s
16 standard colours. It shows the colours displayed eight ways, (colour on white,
colour on black, black on colour, white on colour) both using alpha names and
hex names. You can check out your browser for Netscape colour compatibility. It
shows the Standard Netscape 8.0 alpha names such as "aliceblue" and
also the hex, RGB an HSB values both as HTML and raw ASCII text.
Figures
<fig> <caption> <credit> <overlay>
are not supported in the big three browsers.
Indenting
You can cheat and use
<ul>
...
</ul>
Happily, the technique also nests properly.
The official way is to use CSS styles.
<div style="padding-left: 30px">
...
</div>
If you want to pad all paragraphs, put this in the "head" section or
in the style sheet.
<style type="text/css">
p {padding-left: 30px}
</style>
Or do this:
<style type="text/css">
p.leftpad {padding-left: 30px}
</style>
and then it will only indent subsequent paragraphs that are marked like this:
<p class="leftpad">
...
</p>
Unfortunately, the technique does not handle nesting. <div> does however.
To indent just the first line of each paragraph, use:
<p style="text-indent: 30px">
Composition Tools
I like to create my web pages with a text editor , but if you want a tool to
help you compose HTML in a more WYSIWYG style try one of these:
- SlickEdit: This is what I use — a general
purpose editor. It has HTML and Java syntax colouring which makes it much easier
to avoid typos, and Java and HTML beautifying to nicely indent the tags.
- DreamWeaver: A professional tool for creating
HTML, and HTML with embedded JSP, PHP, ASP, ColdFusion etc.
- Netscape Composer: part of
Netscape Communicator.
- TopStyle: helps you compose and manage your style
sheets.
- Microsoft FrontPage 2000
- The Quoter Amanuensis will
automatically convert HTML’s reserved characters to their & é
© etc. form. You just copy your text to the clipboard, click CONVERT on
the amanuensis, then paste the converted text into your document in a text
editor such as SlickEdit.
Decorating
Here are some tools for snazzing up your web pages with graphics or other gizmos:
Special Character Entity Codes
Here are special characters and the codes you must key to get them in HTML. The
official term for them is entities. These work no
matter what encoding the browser is using. If you want codes that change as the
encoding changes see this ASCII table.
The entities such as ÷✓ only
work in HTML, not Java. In Java, you get at the exotic characters by encoding
them in hex in your strings like this: "\u00f7\u2713"
to produce ÷ ✓. See String
literals for more details.
For official set of W3C entities see this definitive
list of entities. For the newer character entities that are not yet fully
supported, see Robin
LionHeart Illustrated Entity Table. His table shows both decimal and hex
encoding and shows you what the characters look like, unlike the more complete
W3C official documents. Please tell me about any omissions in my own tables.
Last revised 2005-06-24.
Standard Prelude
Here is a standard header you could use on all your HTML files, with the obvious
modifications.
The header, link, meta and body tags have the following purposes:
DOCTYPE
says which level of HTML you are using.
- If you use a strict DTD, you must have absolutely
perfect HTML, something quite unlikely unless your website is generated by a
computer program.
- If your page has frames, use a frameset DTD.
- For most websites, use a more relaxed loose DTD.
- If you want an XML-like markup with strictly matching tags and more consistency
than regular HTML, use one of the XHTML DTD.
Here are the DOCTYPEs I use on this website:
Which
DOCTYPE to use?: you need a different one depending on whether you are doing
plain, frameset or strict html. Note DOCTYPE HTML PUBLIC
should be upper case. Here are the possible DOCTYPES:
link href
point to your master CSS style sheet.
link meta
ICRA rating for sexual and violent content. Replaces old
PICS labels.
link rev
give author’s email address. Used by Lynx browser. Watch out for spam
harvesters. I don’t use it.
link home
Allows keystroke shortcut to get to home page.
link icon
icon for the task bar. Must be 16x16. IE ignores it in favour of the site level favicon.ico.
Netscape igonores it entirely. Opera, Firefox and Sea Monkey will scale it to 16x16
which will usually badly distort it. The URL is relative to the current page.
link prev
specify the logically previous page so browser can do a better back/up. You can
also have a next link to the next logical page to help the browser navigate.
There are all kinds of special
links you can embed for browsing via a toolbar.
Content-Language
the language you wrote the page in, usually en
for English. See the list of possible language
codes.
Content-Style-Type
Says we are using CSS as our stylesheet language. Without it embedded style
markup might be ignored or misinterpreted.
Content-Type
Tell which character set encoding you are using, e.g.
which accented letters you use. Normally iso-8859-1
for Latin-1, sometimes iso-8859-3 for Esperanto, or Unicode
(or utf-8 for compact Unicode) to handle nearly all
languages, including Arabic, Thai and Chinese.
icra-label
encodes your document for adult content, if any.
Author
author’s name.
Copyright
© copyright of the web page.
Description
Summary of what’s on the page. Used by search engines to summarise the
page for people. Should be more detailed than the title.
Generator
What tool you used to create the HTML.
Keywords
important words in the document. Used by search engines to direct people to your
document. Avoid using concept words that don’t actually appear in the
document. Search engines think you are cheating.
title
Title used to display a hit by a search engine. Also used on the window bar when
the page is displayed.
body tags
are for CSS-challenged browsers that don’t understand the style sheet.
Body Tag Details
If you use CSS, you don’t use a <body tag.
| Body Tags |
| Field |
Function |
| BGCOLOR |
background RGB in hex |
| BACKGROUND |
*.gif to use as background tiled. For repeating
backgrounds, it is best to make the *.gif 25 pixels
high even if in theory 1 pixel would do. That speeds rendering even though it
slows download. |
| TEXT |
ordinary text colour |
| LINK |
clickable links not yet visited |
| VLINK |
links that have already been visited |
| ALINK |
active link text, what you just clicked. |
| MARGINHEIGHT |
pixels in border top/bottom. |
| MARGINWIDTH |
pixels in border left/right. |
To make such code easier to maintain, you could use SSI
Server Side Includes. You then need maintain only one copy of the standard
headers. I do it with HTML macros, which does not require any code on the server.
If you look at my HTML source, you can see how I generate standard headers. I
have not yet released the tools I use to the public. If you are curious how I
generate my website using macros, see the HTML entry.
CSS Styles
In general, avoid inline styles, and use style sheets. That way you can make a
change to you style sheet and your whole website is instantly updated. If you
really want to do it, look around the net at people’s headers.
CSS Style Sheets
Have a look at my style sheet. Looking at an
example will probably explain nearly everything you need to know. TopStyle
makes it easier to edit style sheets, but it won’t explain what the
million little fields are for. You will probably figure it out much faster that
way that by reading documentation prepared for and by mathematicians.
| HTML Used Only In Emails |
| HTML |
Purpose |
<x-sigsep>
<p>
</x-sigsep> |
Separates body from the signature. |
|
Indicates some text in your email, in this case http://mindprod.com/
that looked to Eudora like a URL, that it has automatically converted into a
link. |
<blockquote type="cite" class="cite" cite>
...
</blockquote> |
Used for quoting in replies. Rendered as nested vertical bars down the left
margin. |
|
An embedded, as opposed to attached, image. The image itself is made into a
hidden attachment that is base64 encoded. HTML email
cannot presume web access when mail is written, so ordinary <img tags can’t
be used. |
Learning More
- dot
HTML on-line HTML guide.
- dot
CSS on-line CSS guide.
- Visibone make a series of
cheat sheets, both on-line and printed. Some show the full Unicode set and the
extended Unicode &xxx codes. Others show CSS. Others show XHTML. Some are
colour charts.