XML is stands for Extensible Markup Language. It was developed by an organization called the World Wide Web Consortium (W3C) and is available as an open standard. It is a text-based markup language derived from Standard Generalized Markup Language (SGML)a meta markup language for text documents. XML tags identify the data and are used to store and organize the data. Following is the typical example of xml file which represents a book record.
<?xml version="1.0" encoding="UTF-8"?> <Library> <Book> <author>Bill Gates</author> <title>The Road Ahead</title> </Book> </Library>
Here version is the XML version and encoding specifies the character encoding used in the document. Book-Library is root element, it contains Book element which has author, title as sub elements. An XML element is everything from (including) the element’s start tag to (including) the element’s end tag.
XML Elements :
Element names must start with a letter or underscore. It can contain letters, digits, hyphens, underscores, and periods , but not spaces. it must not start with the words as xml or XML. Xml elements are user defined and case sensitive. It is designed to store and describe data & to exchange the information between organizations and systems. It easy to understand for human & machine readable. It is Well structured, easy to read and write from applications. It can text, attributes,other elements ,or a mix of the above.
XML with HTML Comparison –
- XML describes data while HTML describes how the data should be displayed. Therefore, HTML is about displaying information while XML is about describing information.
- XML supports user-defined tags while HTML provides pre-defined tags.
- XML is a case-sensitive language while HTML language is not case-sensitive.
- In XML, all tags must be closed; while in HTML, it is not necessary to close each tag.
XML Well-formed – A well-formed document must follows the rules as
- Every start tag has a matching end tag.
- Elements may nest, but must not overlap.
- There must be exactly one root element.
- Attribute values must be quoted.
- An element may not have two attributes with the same name.
- Comments and processing instructions may not appear inside tags.
- No unescaped < or & signs may occur inside character data.
Only well-formed documents can be processed by XML parsers.
It is designed to contain extra information or data related to a specific element. Here bdt as attribute of age element, which gives extra information as birth date of student.
<college> <student> <Roll>11</Roll> <fname>John</fname> <age bdt="1/1/2000">17</age> </student> </college>
XML document Validation –
An XML document is valid if it is well-formed and its contents match with the elements, attributes document type declaration(DTD), means the document that adheres to the rules defined in the corresponding DTD document is the valid XML document.
What is DTD?
A DTD describes every object (e.g. element, attribute) that can appear in an XML document. It is a text file with a .dtd extension and can be created with any text editor like Notepad. It is used to declare each of the building blocks (elements) used in a XML document. It defines the structure of the XML document, and a list of legal elements of the XML document. DTD can be either specified inside the document, or it can be kept in a separate document and then liked separately.
DTD declarations either internal XML document or make external DTD file, after linked to a XML document.
Internal DTD : You can write rules inside XML document using declaration. Scope of this DTD within this document. Advantages is document validated by itself without external reference.
<?xml version="1.0" standalone="yes"?> <!DOCTYPE Library [ <!ELEMENT Library (Book+)> <!ELEMENT Book (Author,Title)> <!ELEMENT Author (#PCDATA)> <!ELEMENT Title (#PCDATA)> <!ATTLIST Title lang CDATA #REQUIRED> ]> <Library> <Book> <author>Bill Gates</author> <title lang="en">The Road Ahead</title> </Book> </Library>
External DTD : You can write rules in a separate file (with .dtd extension). later this file linked to a XML document. This way you can linked several XML documents refer same DTD rules. External DTD are better because , we can sharing definitions between XML documents. The documents that share the same DTD are more uniform and easier to retrieve. Only the valid XML documents which follows are valuable for exchanging and retrieving information.
External DTD two type – Private & Public, Private DTD identify by the SYSTEM keyword. It access for single or group of users. Public DTD identify by the PUBLIC keyword & It is accessible to any users
DTD file – lib.dtd , It defines structure of library.
<!ELEMENT Library (Book+)> <!ELEMENT Book (Author,Title)> <!ELEMENT Author (#PCDATA)> <!ELEMENT Title (#PCDATA)> <!ATTLIST Title lang CDATA #REQUIRED>
XML file with external dtd file – lib.dtd
<!DOCTYPE Library SYSTEM "lib.dtd"> <Library> <Book> <author>Bill Gates</author> <title lang="en">The Road Ahead</title> </Book> </Library>
PCDATA & CDATA
- PCDATA – It is a text that will be parsed by a parser. Tags inside the text will be treated as markup and entities will be expanded.
- CDATA – It specify the character string data. It is text that will not be parsed by a parser. Tags inside the text will not be treated as markup and entities will not be expanded.
In the DTD, XML elements are declared with an element declaration. An element declaration has the following syntax: <!ELEMENT element-name (element-content)>
1) Empty Elements: keyword “EMPTY”.
<!ELEMENT element-name (EMPTY)> example: <!ELEMENT img (EMPTY)>
2) Text-only: Elements with text are declared:
<!ELEMENT element-name (#PCDATA)> example: < ! ELEMENT name ( # PCDATA) >
3) Any: keyword “ANY” declares an element with any content:
<!ELEMENT element-name (ANY)>
4.DTD – Attributes In DTD, XML element attributes are declared with an ATTLIST declaration. Attribute declaration has the following syntax:
<!ATTLIST element-name attribute-name attribute-type default-value>
#DEFAULT value The attribute has a default value.
#REQUIRED The attribute value must be included in the element
IMPLIED The attribute is optional FIXED value The only value allowed.