Basic Structure

Documents

What is an XML Document?

An XML document is a well-formed and optionally valid file that follows XML syntax rules. It consists of elements, attributes, and text organized in a hierarchical structure. Every XML document must have exactly one root element that contains all other elements.


Structure of an XML Document

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <child>Content</child>
  <child>
    <subchild>More content</subchild>
  </child>
</root>
  • XML Declaration (optional but recommended) defines version and encoding.

  • Root Element wraps the entire document.

  • Elements can be nested to represent complex data structures.

  • The document must be well-formed:

    • Properly nested tags.

    • One unique root element.

    • Tags must be closed.

    • Case-sensitive tags.


Valid vs Well-Formed

Type
Description

Well-Formed

XML syntax rules are followed (required)

Valid

Well-formed + conforms to a DTD or XML Schema

Declaration

XML Declaration

The XML declaration is an optional but recommended statement that appears at the very beginning of an XML document. It specifies important information about the XML version and the character encoding used.


Syntax

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

Attributes

Attribute
Description
Default

version

Specifies the XML version. Usually "1.0".

Required

encoding

Defines the character encoding (e.g., UTF-8, ISO-8859-1).

Optional (default is UTF-8 or UTF-16)

standalone

Indicates if the document relies on external DTD or not. Values: "yes" or "no".

Optional


Example

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<note>
  <to>Alice</to>
  <from>Bob</from>
</note>

Tags & Elements

Tags and Elements

  • Element is the basic building block of an XML document.

  • An element is defined by a start tag, content, and an end tag.

  • Elements can be empty, containing no content.


Syntax Examples

<element>Content</element>          <!-- Element with content -->
<emptyElement />                    <!-- Empty (self-closing) element -->

Rules

  • Tags are enclosed in angle brackets < >.

  • Start tag: <tagName>

  • End tag: </tagName>

  • Empty element tag ends with />.

  • Tags are case-sensitive (<Name><name>).

  • Elements can be nested inside other elements.

  • Elements can contain:

    • Text

    • Other elements

    • Attributes (inside start tag)


Example

<book>
  <title>XML Basics</title>
  <author>John Doe</author>
  <published year="2023" />
</book>

Attributes

What are Attributes?

Attributes provide additional information about elements. They appear inside the start tag and consist of name-value pairs.


Syntax

<tag attributeName="attributeValue" anotherAttr="value" />

Rules

  • Attribute names are case-sensitive.

  • Attribute values must be enclosed in double quotes or single quotes.

  • Multiple attributes are separated by spaces.

  • Attributes cannot contain elements or multiple values; use elements instead for complex data.


Example

<book id="b1" language="en" available="true">
  <title>XML Guide</title>
</book>

Comments

<!-- This is a comment -->

Rules

  • Comments start with <!-- and end with -->.

  • Comments cannot contain the sequence -- inside them.

  • Comments can span multiple lines.

CDATA Sections

What is CDATA?

CDATA (Character Data) sections tell the XML parser to treat enclosed text as raw text, ignoring any markup or special characters. This is useful for embedding code, HTML, or characters that would otherwise be interpreted as XML syntax.


Syntax

<![CDATA[
  Text with <tags>, & characters, and other markup that should NOT be parsed.
]]>

Rules

  • CDATA sections start with <![CDATA[ and end with ]]>.

  • Inside CDATA, characters like <, >, and & are not treated as markup.

  • CDATA sections cannot contain the string ]]>.


Example

<script>
  <![CDATA[
    if (a < b && b > c) {
      console.log("Example");
    }
  ]]>
</script>

Last updated