HTML vs. XHTML
XHTML is the XML version of HTML. Why do I use it instead of just regular HTML? Well, first off, from a programming point of view, it gives me nice, clean document source to work with, and XML and the Document Object Model (DOM) programming interface go hand in hand. Secondly, it is the standard, and as more software and platforms become available to process it, it just makes my pages available to those new venues :)
This article discusses mainly what the differences are between HTML and
XHTML, mainly for those who have existing HTML pages that are considering
converting them to XHTML. Therefore, I am assuming some familiarity with HTML.
You should know what a tag is at least
. If you don't have at least a working knowledge of HTML,
you may want to take a look at W3 School's HTML Tutorial or
the XHTML tutorial on this site.
Resources
- HTML 4.01 Specification - Why the HTML specification for XHTML? Well, the 4.01 specification lists the tags, etc., used for building the web pages. While this information is contained in the XHTML DTD (Document Type Definition), a DTD is not friendly for human eyes to read. So I refer to the HTML 4.01 specification and adhere to the XHTML rules.
- XHTML Strict DTD -- for the curious, here's the one of the three valid XHTML DTDs to look at.
- XHTML Specification -- W3C's recommendation.
- Sizzling HTML Jalfrezi - This website was the site I stumbled upon when first trying to learn HTML. It is still a useful resource in learning the basics of HTML, and has a nice interface for looking up HTML tags.
- The Web Design Group also maintains an HTML reference which can be downloaded to refer to offline.
- W3 Schools is a site that has
tutorials on just about anything WWW. You won't know everything about
a web technology after doing a tutorial on a subject there, but you
certainly will have a good grounding to begin building on. Heres
a couple that are specific to the topic at hand:
- Dynamic HTML: The Definitive Reference (2nd Edition), by Danny Goodman -- if there was one book, and one book only, that I could have handy when working with my pages, it would have to be this one. Solid coverage of javascript, DOM, HTML, and CSS -- just make sure you get the 2nd Edition, not the 1st Edition, which was good in its time, but definitely dated today.
- A Real Validator -- An inexpensive and valuable tool, this program validates XHTML source, and includes a reference for HTML 4.01 (a copy of that provided by the Web Design Group mentioned above). I run my XHTML pages and ASP output through this validator to make sure my pages comply to the standard.
XHTML Rules
When I create a web page in XHTML, I basically write it in HTML 4.01, keeping in mind the few rules imposed by XHTML. A skeleton of an XHTML page that I could use to start building my pages is given below, with placeholder text for clarity in italics.
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-us">
<head>
<meta http-equiv="Content-Type" content="text/html" />
<title>XHTML Skeleton Page</title>
</head>
<body>
body of html doc here
</body>
</html>
Looking at it, it would be understandable to someone familiar with basic
HTML, with perhaps the exception of the !DOCTYPE declaration
and the xmlns attribute given in the html tag.
What makes this XHTML as opposed to HMTL 4.01? Well, time to iterate the rules for XHTML.
- The document must contain a
!DOCTYPEdeclaration for XHTML. - Tagnames must be in lower case.
- The document must contain
html,head,bodyandtitleelements. - The document must be well-formed:
- All elements must be nested under the
htmlelement. - All elements (tags) must be closed.
- Elements must be properly nested.
- All elements must be nested under the
- Attributes must be quoted.
- Attributes cannot be minimized.
The !DOCTYPE Declaration
The !DOCTYPE is what tells the program
reading your source (browser, for example) where to locate the DTD (Document
Type Definition). It is the DTD that lists the elements, their attributes,
characteristics and more. There are three valid !DOCTYPE
declarations for XHTML -- strict, transitional, and frameset.
- Strict, for those pages that do not use any deprecated tags. All
formatting should be done with CSS stylesheets. The declaration is:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> - Transitional, for those pages that will still contain some deprecated
tags*, such as
<u>, but meet all the other rules.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> - For pages setting up frames:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
* - For a list of deprecated tags, click here.
I try to use the Strict DTD when possible, although due to browsers not supporting all of the CSS style properties (especially with table formatting), sometimes I have to use the Transitional DTD. I never use the Frameset DTD, for I avoid using frames in my pages like the plague.
Tagnames Must be in Lower Case
In the DTD, the tagnames are specified in lower case. XML is case
sensitive, so for example, if you specify <BODY>, it won't
be recognised -- you should use <body> instead.
Mandatory Elements - the html, head,
title, and body Tags
The html element is the root element, in which all the other
elements of the document must be enclosed (excluding the !DOCTYPE
declaration),and every XML document must have the root element. The DTD
declaration of the element html has the children elements
head and body specified as required to occur once,
and only once.
The head element in turn requires a title element
to occur once and only once.
If these required elements were missing, the document parser should reject the document as invalid. You will find that browsers will usually be forgiving and display your page anyway, which is one of the reasons I run my source through A Real Validator to make sure my pages are compliant.
If the document is written correctly, it should have at a minimum a tree structure that looks like this:
Keep in mind this structure, as it must be reflected when nesting your elements properly, which brings us to...
The Document Must be Well-formed
As mentioned before, all elements in the document must be nested within
the html element. That is rule one of well-formedness (is that a
word?). Secondly, all elements must be closed. You see this in the skeleton
file given that every opening tag has a matching closing tag.
| Opening Tag | Closing Tag |
|---|---|
<html xmlns="http://www.w3.org/1999/xhtml"
lang="en-us"> |
</html> |
| <head> | </head> |
| <title> | </title> |
| <body> | </body> |
There are such things as empty tags, where nothing would be enclosed between the opening tag and the closing tag. A good example of this is the line break tag <br> or the image <img> tag. So do you specify <br></br> for a line break? You can, but some browsers will not like this. But there still is hope!
XHTML allows a shortcut for empty tags. You can specify <tagname/>. However, some browsers still will not recognize this properly unless you include a space prior to the forward slash. So instead of using <br/>, I would use <br />, which is still valid XHTML and will work in older browsers.
Third for well-formedness is proper nesting. Keeping in mind the tree structure of the document, I know the root element is html. So in creating the source, I would start with the required !DOCTYPE declaration and the tags for the html element:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html></html>
Adding the required head and body tags, I end up with this (with line breaks and indentation added for clarity):
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html> <head></head> <body></body> </html>
Note that the end tag for the html element is after the tags for the head and body elements. That is because the head and body elements are children of the html element and thus must be enclosed within that element. That is what is meant by proper nesting.
Any element that begins within another must have its closing tag specified first. So, when adding the title element, I enclose it within the opening and ending tags of the head element like this:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<title></title>
</head>
<body></body>
</html>
Properly nested, lower case tagnames, !DOCTYPE declaration...so far so good! In fact, the only thing I am missing besides the placeholder text is the xmlns attribute specified for the html element.
Attributes Must be Quoted
So I add that attribute to the html element, and I will have reconstructed my XHTML skeleton. This attribute is required although some parsers will automatically default the value if not supplied.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-us">
<head>
<title></title>
</head>
<body></body>
</html>
An attribute, as you may have guessed, is a keyword/value pair given within
a tag. In the XHTML skeleton, the keyword xmlns is for the XML
namespace (value: http://www.w3.org/1999/xhtml), and the keyword
lang is for the language used in the page content. Neither
attribute is required, but it is good practice to code them both. At this
point, this skeleton XHTML file is valid, even if it doesn't contain anything
to show yet.
Note that the value givens for the xmlns and lang
attributes are quoted. Anytime you have a keyword= followed by a value for a tag, you must
quote it, even if the value passed is numeric. For example, if creating an
input box with a size of 50, you would specify:
<input type="text" size="50" />
but not:
<input type="text" size=50 />
Attributes Cannot be Minimized
What the skeleton XHTML file doesn't show are any attributes that someone
familiar with HTML is used to seeing minimized. So, I'll give an example
here. In HTML, you could specify
<input type="checkbox" checked />
to create a checkbox that is defaulted as checked.
checked is an attribute of this input type, and so must be
specified as
<input type="checkbox" checked="checked" />
to comply with XHTML rules.
Conclusion
Hopefully, the discussion on this page gave an idea of what you need to do to begin converting existing HTML pages to XHTML, and to start creating new ones in XHTML. Converting an existing page can sometimes be a tedious task, but in the long run, is rewarding in view of maintenence and will poise your pages for forward compatibility.
![]()