HTML vs. XHTML

XHTML is the XML version of HTML. Why do I use it instead of just regular HTML? Well, first off, from a programming point of view, it gives me nice, clean document source to work with, and XML and the Document Object Model (DOM) programming interface go hand in hand. Secondly, it is the standard, and as more software and platforms become available to process it, it just makes my pages available to those new venues :)

This article discusses mainly what the differences are between HTML and XHTML, mainly for those who have existing HTML pages that are considering converting them to XHTML. Therefore, I am assuming some familiarity with HTML. You should know what a tag is at least Big Grin. If you don't have at least a working knowledge of HTML, you may want to take a look at W3 School's HTML Tutorial or the XHTML tutorial on this site.

Resources

XHTML Rules

When I create a web page in XHTML, I basically write it in HTML 4.01, keeping in mind the few rules imposed by XHTML. A skeleton of an XHTML page that I could use to start building my pages is given below, with placeholder text for clarity in italics.

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-us">
  <head>
    <meta http-equiv="Content-Type" content="text/html" />
    <title>XHTML Skeleton Page</title>
  </head>
  <body>
     body of html doc here
  </body>
</html>

Looking at it, it would be understandable to someone familiar with basic HTML, with perhaps the exception of the !DOCTYPE declaration and the xmlns attribute given in the html tag.

What makes this XHTML as opposed to HMTL 4.01? Well, time to iterate the rules for XHTML.

The !DOCTYPE Declaration

The !DOCTYPE is what tells the program reading your source (browser, for example) where to locate the DTD (Document Type Definition). It is the DTD that lists the elements, their attributes, characteristics and more. There are three valid !DOCTYPE declarations for XHTML -- strict, transitional, and frameset.

  1. Strict, for those pages that do not use any deprecated tags. All formatting should be done with CSS stylesheets. The declaration is:
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
  2. Transitional, for those pages that will still contain some deprecated tags*, such as <u>, but meet all the other rules.
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
  3. For pages setting up frames:
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">

* - For a list of deprecated tags, click here.

I try to use the Strict DTD when possible, although due to browsers not supporting all of the CSS style properties (especially with table formatting), sometimes I have to use the Transitional DTD. I never use the Frameset DTD, for I avoid using frames in my pages like the plague.

Tagnames Must be in Lower Case

In the DTD, the tagnames are specified in lower case. XML is case sensitive, so for example, if you specify <BODY>, it won't be recognised -- you should use <body> instead.

Mandatory Elements - the html, head, title, and body Tags

The html element is the root element, in which all the other elements of the document must be enclosed (excluding the !DOCTYPE declaration),and every XML document must have the root element. The DTD declaration of the element html has the children elements head and body specified as required to occur once, and only once.

The head element in turn requires a title element to occur once and only once.

If these required elements were missing, the document parser should reject the document as invalid. You will find that browsers will usually be forgiving and display your page anyway, which is one of the reasons I run my source through A Real Validator to make sure my pages are compliant.

If the document is written correctly, it should have at a minimum a tree structure that looks like this:

Document Tree

Keep in mind this structure, as it must be reflected when nesting your elements properly, which brings us to...

The Document Must be Well-formed

As mentioned before, all elements in the document must be nested within the html element. That is rule one of well-formedness (is that a word?). Secondly, all elements must be closed. You see this in the skeleton file given that every opening tag has a matching closing tag.

Opening TagClosing Tag
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-us"> </html>
<head> </head>
<title> </title>
<body> </body>

There are such things as empty tags, where nothing would be enclosed between the opening tag and the closing tag. A good example of this is the line break tag <br> or the image <img> tag. So do you specify <br></br> for a line break? You can, but some browsers will not like this. But there still is hope!

XHTML allows a shortcut for empty tags. You can specify <tagname/>. However, some browsers still will not recognize this properly unless you include a space prior to the forward slash. So instead of using <br/>, I would use <br />, which is still valid XHTML and will work in older browsers.

Third for well-formedness is proper nesting. Keeping in mind the tree structure of the document, I know the root element is html. So in creating the source, I would start with the required !DOCTYPE declaration and the tags for the html element:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html></html>

Adding the required head and body tags, I end up with this (with line breaks and indentation added for clarity):

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
  <head></head>
  <body></body>
</html>

Note that the end tag for the html element is after the tags for the head and body elements. That is because the head and body elements are children of the html element and thus must be enclosed within that element. That is what is meant by proper nesting.

Any element that begins within another must have its closing tag specified first. So, when adding the title element, I enclose it within the opening and ending tags of the head element like this:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
  <head>
    <title></title>
  </head>
  <body></body>
</html>

Properly nested, lower case tagnames, !DOCTYPE declaration...so far so good! In fact, the only thing I am missing besides the placeholder text is the xmlns attribute specified for the html element.

Attributes Must be Quoted

So I add that attribute to the html element, and I will have reconstructed my XHTML skeleton. This attribute is required although some parsers will automatically default the value if not supplied.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-us">
  <head>
    <title></title>
  </head>
  <body></body>
</html>

An attribute, as you may have guessed, is a keyword/value pair given within a tag. In the XHTML skeleton, the keyword xmlns is for the XML namespace (value: http://www.w3.org/1999/xhtml), and the keyword lang is for the language used in the page content. Neither attribute is required, but it is good practice to code them both. At this point, this skeleton XHTML file is valid, even if it doesn't contain anything to show yet.

Note that the value givens for the xmlns and lang attributes are quoted. Anytime you have a keyword= followed by a value for a tag, you must quote it, even if the value passed is numeric. For example, if creating an input box with a size of 50, you would specify:

<input type="text" size="50" />

but not:

<input type="text" size=50 />

Attributes Cannot be Minimized

What the skeleton XHTML file doesn't show are any attributes that someone familiar with HTML is used to seeing minimized. So, I'll give an example here. In HTML, you could specify

<input type="checkbox" checked />

to create a checkbox that is defaulted as checked. checked is an attribute of this input type, and so must be specified as

<input type="checkbox" checked="checked" />

to comply with XHTML rules.

Conclusion

Hopefully, the discussion on this page gave an idea of what you need to do to begin converting existing HTML pages to XHTML, and to start creating new ones in XHTML. Converting an existing page can sometimes be a tedious task, but in the long run, is rewarding in view of maintenence and will poise your pages for forward compatibility.


Valid XHTML 1.0!