Document Type Declaration
March 29, 2002
The document type declaration follows the XML
declaration. The purpose of this declaration is to announce the
root element and to provide the location of the DTD.4 The
general syntax is:
<!DOCTYPE RootElement (SYSTEM |PUBLIC)
ExternalDeclarations? [InternalDeclarations] ?>
where "<!DOCTYPE"is a literal string, RootElement
is whatever you name the outermost element of your hierarchy,
followed by either the literal keyword "SYSTEM"or
"PUBLIC". The optional ExternalDeclarations portion is
typically the relative path or URL to the DTD that describes
your document type. (It is really only optional if the entire
DTD appears as an InternalDeclarations , which is neither likely
nor desirable.) If there are InternalDeclarations , they must be
enclosed in square brackets. In general, you´ll encounter
far more cases with ExternalDeclarations than
InternalDeclarations , so let´s ignore the latter for now.
They constitute the internal subset, which is described
in chapter 4.
Let’s start with a simple but common case. In this example, we
are indicating that the DTD and the XML document reside in the
same directory (i.e., the ExternalDeclarations are
contained in the file employees.dtd ) and that the root
element is Employees :
<!DOCTYPE Employees SYSTEM "employees.dtd">
Similarly,
<!DOCTYPE PriceList SYSTEM "prices.dtd">
indicates a root element PriceList and the local DTD
prices.dtd .
In the next example, we use normal directory path syntax to
indicate a different location for the DTD.
<!DOCTYPE Employees SYSTEM "../dtds/employees.dtd">
As is often the case, we might want to specify a URL for the DTD
since the XML file may not even be on the same host as the DTD.
This case also applies when you are using an XML document for
message passing or data transmission across servers and still
want the validation by referencing a common DTD.
<!DOCTYPE Employees SYSTEM
"http://somewhere.com/dtds/employees.dtd">
Next, we have the case of the PUBLIC identifier. This
is used in formal environments to declare that a given DTD is
available to the public for shared use. Recall that XML’s true
power as a syntax relates to developing languages that permit
exchange of structured data between applications and across
company boundaries. The syntax is a little different:
<!DOCTYPE RootElement PUBLIC PublicID URI>
The new aspect here is the notion of a PublicID , which
is a slightly involved for-matted string that identifies the
source of the DTD whose path follows as the URI . This is
sometimes known as the Formal Public Identifier (FPI).
For example, I was part of a team that developed (Astronomical)
Instrument Markup Language (AIML, IML) for NASA Goddard Space
Flight Center.5 We wanted our DTD to be available to other
astronomers. Our document type declara-tion was:
!DOCTYPE Instrument PUBLIC
"-//NASA//Instrument Markup Language 0.2//EN"
"http://pioneer.gsfc.nasa.gov/public/iml/iml.dtd">
In this case the PublicID is:
"-//NASA//Instrument Markup Language 0.2//EN"
The URI that locates the DTD is:
http://pioneer.gsfc.nasa.gov/public/iml/iml.dtd
Let’s decompose the PublicID . The leading hyphen
indicates that NASA is not a standards body. If it were, a plus
sign would replace the hyphen, except if the standards body were
ISO, in which case the string “ISO” would appear. Next we have
the name of the organization responsible for the DTD (NASA, in
this case), surrounded with double slashes, then a short free-
text description of the DTD (“Instrument Markup Language 0.2”),
double slashes, and a two-character lan-guage identifier (“EN”
for English, in this case).
Since the XML prolog is the combination of the XML declaration
and the docu-ment type declaration, for our NASA example the
complete prolog is:
<?xml version="1.0"encoding="UTF-8"standalone="no"?>
<!DOCTYPE Instrument PUBLIC
"-//NASA//Instrument Markup Language 0.2//EN"
"http://pioneer.gsfc.nasa.gov/public/iml/iml.dtd">
As another example, let’s consider a common case involving DTDs
from the W3C, such as those for XHTML 1.0.
<?xml version="1.0"encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional //EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
W3C is identified as the organization, “DTD XHTML 1.0
Transitiona” is the name of the DTD; it is in English; and the
actual DTD is located by the URI
http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd.
Similarly, the prolog for XHTML Basic 1.0 is:
<?xml version="1.0"encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML Basic 1.0 //EN"
"http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd">
The XHTML Basic 1.0 PublicID is similar but not identical to the
XHTML 1.0 case and of course the DTD is different since it’s a
different language.
If you noticed that the NASA example uses uppercase for the
encoding value UTF-8 and the W3C examples use lowercase, you may
have been bothered because that is inconsistent with what we
learned about the case-sensitive value for the standalone
attribute. The only explanation I can offer is that although
element and attribute names are always case-sensitive,
attributes values may or may not be. A reasonable guess is that
if the possible attribute values are easily enumerated (i.e.,
“yes” or “no”, or other relatively short list of choices), then
case probably matters.
Note:
DTD-related keywords such as DOCTYPE , PUBLIC, and SYSTEM must
be uppercase. XML-related attribute names such as version ,
encoding , and standalone must be lowercase.
-
4. A Document Type Definition is a set of rules that describe
the hierarchical structure of any XML document instance based on
that particular DTD. These rules are used to determine whether
the document is valid. DTDs are discussed in detail in chapter
4.
-
5. Thanks to NASA and Commerce One project participants, Julie
Breed, Troy Ames, Carl Hostetter, Rick Shafer, Dave Fout, Lisa
Koons, Craig Warsaw, Melissa Hess, Ken Wootton, Steve Clark,
Randy Wilke, and Lynne Case, among others.
XML Declaration
XML Family of Specifications: A Practical Guide
Document Body
|