XML Declaration
March 29, 2002
The XML declaration is a piece of markup (which may span
multiple lines of a file) that identifies this as an XML
document. The declaration also indicates whether the document
can be validated by referring to an external Document Type
Definition (DTD). DTDs are the subject of chapter 4; for now,
just think of a DTD as a set of rules that describes the
structure of an XML document.
The minimal XML declaration is:
<?xml version="1.0"?>
XML is case-sensitive (more about this in the next subsection),
so it’s important that you use lowercase for " xml" and
"version".The quotes around the value of the version
attribute are required, as are the “?” characters. At the time
of this writing, "1.0"is the only acceptable value for the
version attribute, but this is certain to change when a
subsequent version of the XML specification appears.
Note:
Do not include a space before the string "xml"or
between the question mark and the angle brackets. The strings
"<?xml"and "?>"must appear exactly as
indicated. The space before the "?>"is optional. No
blank lines or space may precede the XML declaration; adding
white space here can produce strange error messages.
In most cases, this XML declaration is present. If so, it must
be the very first line of the document and must not have leading
white space. This declaration is technically optional, however;
cases where it may be omitted include when combining XML storage
units to create a larger, composite document.
Actually, the formal definition of an XML declaration, according
to the XML 1.0 specification is as follows:
XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
This Extended Backus-Naur Form (EBNF) notation, characteristic
of many W3C specifications, means that an XML declaration
consists of the literal sequence "<?xml", followed
by the required version information, followed by optional
encoding and standalone declarations, followed by an optional
amount of white space, and terminating with the literal sequence
"?>". In this notation, a question mark not
contained in quotes means that the term that precedes it is
optional.
The following declaration means that there is an external DTD on
which this document depends. See the next subsection for the DTD
that this negative stand-alone value implies.
<?xml version="1.0" standalone="no"?>
On the other hand, if your XML document has no associated DTD,
the correct XML declaration is:
<?xml version="1.0" standalone="yes"?>
The XML 1.0 Recommendation states that: “If there are external
markup declarations but there is no standalone document
declaration, the value ‘no’ is assumed.”
The encoding part of the declaration tells the XML processor
(parser) how to interpret the bytes based on a particular
character set. The default encoding is UTF-8, which is one of
seven character-encoding schemes used by the Unicode standard,
also used as the default for Java. In UTF-8, one byte is used to
represent the most common characters and three bytes are used
for the less common special characters. UTF-8 is an efficient
form of Unicode for ASCII-based documents. In fact, UTF-8 is a
superset of ASCII.3
<?xml version="1.0" encoding="UTF-8"?>
For Asian languages, however, an encoding of UTF-16 is more
appropriate because two bytes are required for each character.
It is also possible to specify an ISO character encoding, such
as in the following example, which refers to ASCII plus Greek
characters. Note, however, that some XML processors may not
handle ISO character sets correctly since the specification
requires only that they handle UTF-8 and UTF-16.
<?xml version="1.0" encoding="ISO-8859-7"?>
Both the standalone and encoding information may be supplied:
<?xml version="1.0" standalone="no" encoding="UTF-8"?>
Is the next example valid?
<?xml version="1.0" encoding='UTF-8' standalone='no'?>
Yes, it is. The order of attributes does not matter. Single and
double quotes can be used interchangeably, provided they are of
matching kind around any particular attribute value. (Although
there is no good reason in this example to use double quotes for
version and single quotes for the other, you may need to do so
if the attribute value already contains the kind of quotes you
prefer.) Finally, the lack of a blank space between
'no'and “?>” is not a problem.
Neither of the following XML declarations is valid.
<?XML VERSION="1.0" STANDALONE="no"?>
<?xml version="1.0" standalone="No"?>
The first is invalid because these particular attribute names
must be lowercase, as must “xml”. The problem with the second
declaration is that the value of the stan-dalone attribute must
be literally “yes” or “no”, not “No”. (Do I dare call this a “no
No”?)
-
3. UTF stands for Unicode (or UCS) Transformation Format. UCS is
Universal Character Set.Complete information about Unicode is
available from http://www.unicode.org/.
XML Document Structure
XML Family of Specifications: A Practical Guide
Document Type Declaration
|