What is DOM?

DOM  is Document Object Model API’s designed to allow programmers to access their information without having to write a parser in their programming language of choice. Thus by using either DOM or SAX API’s your program is free to use whatever parser it wishes. These APIs are available in all languages (java. C++. Etc). The Textual information in your XML document gets turned into a bunch of Tree nodes wherein you can access your information only by interacting with this tree of nodes. DOM preserves the sequence of elements that it reads from XML document because for DOM it really matters because DOM by default creates an object model for you.

 

What is SAX?

SAX is Simple API for XML. SAX API’s designed to allow programmers to access their information without having to write a parser in their programming language of choice. These APIs are available in all languages (java. C++.etc). The Textual information in your XML document doesn’t get turned into a bunch of Tree nodes, but as a sequence of Events. SAX is faster then DOM. SAX doesn’t provide object model for you, therefore here we have to create our own custom object model and write a Event handler class that listens to SAX Events and does element to object mapping. It can fire events for every open and close tag, for #PCDATA and CDATA sections and also fires events for DTD’s, comments and Instructions.

 

When to use what?

If you XML document contains document data then DOM is the perfect solution.

If you XML document contains structured data/ machine readable (sql and Xql queries, resultset) then SAX is the perfect solution.

 

Explain about Lex, Yacc , Flex and Bison?

In general the Compiler or Interpreter for a programming language is often decomposed into 2 parts:

Read the source program and discover its structure

Process this structure

Lex and Yacc can generate program fragments that solve the first task.

The task of discovering the source structure again is decomposed into subtasks:

1)Split the source file into tokens (Lex)

Find the hierarchical structure of the program (Yacc).

Lex is a program generator designed for processing character input streams.

Yacc is a general tool for describing the input to a computer program. It specifies a structure and control of flow.

Flex is a tool that reads the given input file for generating scanners (an exe file)

Bison is a general purpose Parser generator that converts a grammar description into a C program to parse that grammar.

Bison is upward compatible with Yacc. Yacc grammars will work with bison with no changes

 

Diff between xml and html?

     html -> responsible for rendering data

     xml  -> responsible for holding data

________________________________________________________________________________________

What is a DTD?

     It defines the document structure. With DTD different groups of people can interchange data commonly.

     Its also used to verify your own data / the data you receive from outside world.

     You can declare it as inline / external to your xml document

    ex:  <!ELEMENT car(brand,type)>

         <!ELEMENT brand(#PCDATA)>

         <!ELEMENT type(#PCDATA)>  

________________________________________________________________________________________

What are XML schema?

     Its similar to DTD.

     It supports datatypes & namespaces.  Its extensible.

     Its written in XML, so simple to use and understand

    ex:  <xs:element name ="age" type="xs:integer"/>

 

     Using DTD we would only be able to specify that a <zip> element was TEXT.

     but using xml schema we could actually create a datatype for zip codes and could limit the zip element to a five digit code.       

________________________________________________________________________________________

Dom Vs Sax?

   Dom  creates a object model.

        transforms the textual information in the xml document into a bunch of tree nodes

 

   Sax doesnt creates a object model

        access the information in the xml documents as a sequence of events.

        here developer needs to write handler classes for interpreting these events.

 

When to use what?

    Use DOM when xml contains document data and use SAX when xml contains structured data

________________________________________________________________________________________

 

XML FUNDAMENTALS

 

XML is a good replacement for EDI. EDI is expensive, it uses a dedicated communication infrastructure.

 

By using XML everybody knows that the same interpretation of the data is used.

It uses the Internet for the data exchange. And it's very flexible. XML makes communication easy. It's a great tool for transactions between businesses.

 

XML is a smaller version of SGML.

XML is a meta-language.( A meta-language is a language that's used to define other languages.)

XML is about defining data.

 

XML: What it can do

With XML you can :

_ Define data structures

_ Make these structures platform independent

_ Process XML defined data automatically

_ Define your own tags

With XML you cannot

_ Define how your data is shown. To show data, you need other techniques.

XSL (eXtensible Stylesheet Language) is created for this purpose. But   the presentation can also be defined with CSS (Cascading Style Sheets).

 

XML declaration is done by using a Tag

<?xml version="1.0"?>

Structure of XML

 


<?xml version="1.0"?>

<root>

<element>

<sub-element>content</sub-element>

<sub-element>content</sub-element>

</element>

</root>

 

 

 

Elements in XML can use attributes. The syntax is:

 


<element attribute-name = "attribute-value"> content </element>

<car color = "green">volvo</car>

 

Try to avoid attributes. Software that checks XML-documents can do a better job with tags than with attributes.

 

Well formed XML document means that the XML document applies to the syntax rules for XML. The  Basic Rules are (There are more rules pertaining to Entities)

_ it contains a root element

_ all other elements are children of the root element

_ all elements are correctly paired

_ the element name in a start-tag and an end-tag are exactly the same

_ attribute names are used only once within the same element

 

 

To be valid an XML document needs to apply to the following rules:

_ The document must be well formed. (More on well formed in the previous page).

_ The document must apply to the rules as defined in a Document Type Definition

(DTD),

 

Companies that exchange XML-documents can check them with the same DTD.

 

DTD defines  rules for a particular type of XML document. A DTD describes elements, Data. ( PCDATA stands for parsed character data and CDATA will not be parsed or shown.) . An element can contain sub-elements.

 

<!ELEMENT car (brand, type) >

<!ELEMENT brand (#PCDATA) >

<!ELEMENT type (#PCDATA) >

 

This means that the element car has two subtypes: brand and type. Each subtype can

contain characters.

 

the number of possible occurrences the following indications can be used:

_ + must occur at least one time but may occur more often

_ * may occur more often but may also be omitted

_ ? may occur once or not at all

'|' you define a choice between two sub elements.

 

A DTD can be an external document that's referred to.

<?xml version="1.0"?>

<!DOCTYPE name of root-element SYSTEM "address">

 

A DTD can also be included in the XML document itself.

<?xml version="1.0"?>

<!DOCTYPE name of root-element [followed by the element definitions.]>

 

 XSL can convert XML documents into HTML. XSLT is used to describe how an

XML source document is transformed into another XML document that uses the XSL

 

1)Usage of XML ?

XML can Separate Data from HTML (html is responsible for rendering data and xml is responsible to hold data)

XML can be used to Store Data

XML can be used to Share Data

 

2) What is a well formed XML ?

A  XML document that has correct XML syntax.

 

3)What is a valid XML document ?

A "Valid" XML document is a "Well Formed" XML document, which also conforms to the rules of a Document Type Definition (DTD):

 

4) What is the purpose of DTD ?

The purpose of a DTD is to define the legal building blocks of an XML document.

It defines the document structure with a list of legal elements.

 

6) Can we use a CSS file to format XML document ?

YES

 

7) what does the tag <xml> do in HTML , what is the relevance

XML data can be embedded directly in an HTML page Or can be attached as separate XML file

Data Islands can be bound to HTML elements in this manner.

 

8)What is the usage of XML parser ?

To create, read and update - an XML document, we need  XML parser.

 

9) Differentiate between SAX(Simple API for XML) and DOM (Document Object Model) ?

Here the textual information in the XML document gets turned into a bunch of tree nodes.

DOM gives access to the information stored in  XML document as a hierarchical object model.

DOM creates a tree of nodes and we can access the information by interacting with this tree of nodes.

SAX chooses to give you access to the information in the XML document, not as a tree of nodes, but as a sequence of events!

SAX doesnt create a default object model on top of the XML document (like DOM does).

This makes SAX faster, and also necessitates the following things to do:

creation of your own custom object model which  "holds”  all the information in the XML document .

creation of a document handler class that listens to SAX events and makes sense of these events to create objects in the custom object model.

All SAX requires is that the parser should read in the XML document, and fire a bunch of events depending on what tags it encounters in the XML document. The developer is responsible for interpreting these events by writing an XML document handler class, which is responsible for making sense of all the tag events and creating objects in your own object model.

SAX  is faster than DOM, because it bypasses the creation of a tree based object model of information.

On the other hand, you have to write a SAX document handler to interpret all the SAX events (which can be a lot of work).

 

11) What kinds of SAX events are fired by the SAX parser?

 

 SAX will fire an event for every open and close tag, it also fires events for #PCDATA and CDATA sections. Document handler has to interpret these events in some meaningful way and create own custom object model based on them. SAX also fires events for processing instructions, DTDs, comments, etc.

 

12) When to use DOM ?

If the XML documents contain document data then DOM is a completely natural fit. An example of this is the Datachannel RIO product, which can index and organize information that comes from all kinds of document sources (like Word and Excel files). In this case, DOM is well suited to allow programs access to information stored in these documents.

However, when dealing with structured data DOM is not the best choice. SAX might be a better fit in this scenario.

 

13) Which parser is the best fit ?

If your information is structured in a way that makes it easy to create this element to object mapping then you should use the SAX API. On the other hand, if your data is much better represented as a tree then you should use DOM.

Advanced XML Questions

1) how to resolve duplication of names when 2 or more xml documents are used in an application?

Namespaces, elaborate on (URI, URN)

 

2) What is a CDATA section?

Anything within CDATA section is ignored by the parser.

A CDATA section cannot contain another CDATA section.

Also make sure there are no spaces or line breaks in the strings it holds

 

3) Why XML Encoding is required ?

XML documents can contain foreign characters. To let the XML parser understand these characters, you should save your XML documents as Unicode.

 

4)  Name 2 HTML tags that are used to display XML data?

<span datasrc="#xmldso" datafld="TITLE"></span>

<div datasrc="#xmldso" datafld="TITLE"></div>

DTD & XML Schema

 

1) What is the purpose of DTD ?

The purpose of a Document Type Definition is to define the legal building blocks of an XML document.

It defines the document structure with a list of legal elements.

A DTD can be declared inline in your XML document, or as an external reference.

DTD doesn’t support inheritence.

 

2) why use a DTD ?

With DTD, your XML files can carry a description of its own format with it.

With a DTD, independent groups of people can agree to use a common DTD for interchanging data.

Your application can use a standard DTD to verify that the data you receive from the outside world is valid. You can also use a DTD to verify your own data.

 

3) What is an XML Schema?

The purpose of an XML Schema is to define the legal building blocks of an XML document, just like a DTD.

An XML Schema:

defines elements that can appear in a document

defines which elements are child elements

defines the number of child elements

defines the order of child elements

defines whether an element is empty or can include text

defines attributes that can appear in a document

defines data types for elements and attributes

defines default and fixed values for elements and attributes

 

4) advantage of schemas over DTD ?

XML Schemas are extensible to future additions

XML Schemas support data types

XML Schemas support namespaces


5) DTD elements ?

Declaring only one occurrence of the same element 

<!ELEMENT note (message)>

The example declaration above declares that the child element message can only occur one time inside the "note" element.

Declaring minimum one occurrence of the same element

<!ELEMENT note (message+)>

The + sign in the example above declares that the child element message must occur one or more times inside the "note" element.

Declaring zero or more occurrences of the same element 

<!ELEMENT note (message*)>

The * sign in the example above declares that the child element message can occur zero or more times inside the "note" element.

Declaring zero or one occurrences of the same element 

<!ELEMENT note (message?)>

The ? sign in the example above declares that the child element message can occur zero or one times inside the "note" element.

Declaring either/or content

<!ELEMENT note (to,from,header,(message|body))>

The example above declares that the "note" element must contain a "to" element, a "from" element, a "header" element, and either a "message" or a "body" element.

Declaring mixed content

<!ELEMENT note (#PCDATA|to|from|header|message)*>

 

6) what is the root element of every schema ?

<schema>

 

7) how to refer a schema in a XML ?

<?xml version="1.0"?>

<note xmlns=http://www.microsoft.com xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

                                                                xsi:schemaLocation="http://www.microsoft.com note.xsd">

<to>sree</to>

<from>baskar</from>

<heading>message</heading>

<body>meet me during lunch </body>

</note>

 

8) How to define simple elements in XSD ?

<xs:element name="firstname" type="xs:string"/>

<xs:element name="age" type="xs:integer"/>

 <xs:element name="dateofbirth" type="xs:date"/>

 

9) What are the available common types in XSD ?

xs:string

xs:decimal

xs:integer

xs:boolean

xs:date

xs:time

 

10) how to assign default values to these common types in XSD ?

<xs:element name="color" type="xs:string" default="red"/>

 

11) how to assign constant values that can not be changed ?

<xs:element name="color" type="xs:string" fixed="red"/>

 

12) How to specify attributes that are mandatory ./ non-mandtory ? what is the key word used ?

<xs:attribute name="lang" type="xs:string" use="optional"/>

To make an attribute required:

<xs:attribute name="lang" type="xs:string" use="required"/>

USE is the key word for specifying an attibute

 

13) How to set restricted/ acceptable values for XML elements or attibutes? What is the exact terminology for this?

This part of code defines an element called "age" with a restriction. The value of age can NOT be lower than 18 or greater than 60:

<xs:element name="age">

<xs:simpleType>

  <xs:restriction base="xs:integer">

    <xs:minInclusive value="18"/>

    <xs:maxInclusive value="60"/>

  </xs:restriction>

</xs:simpleType>

</xs:element>

 

14) advantage of schemas over DTD ?

 I . XML Schema has Support for Data Types

One of the greatest strength of XML Schemas is the support for data types.

With the support for data types:

It is easier to describe permissible document content

It is easier to validate the correctness of data

It is easier to work with data from a database

It is easier to define data facets (restrictions on data)

It is easier to define data patterns (data formats)

It is easier to convert data between different data types

II. XML Schemas use XML Syntax

Another great strength about XML Schemas is that they are written in XML.

Because XML Schemas are written in XML:

You don't have to learn another language

You can use your XML editor to edit your Schema files

You can use your XML parser to parse your Schema files

You can manipulate your Schema with the XML DOM

You can transform your Schema with XSLT

III XML Schemas are Extensible

XML Schemas are extensible, just like XML, because they are written in XML.

With an extensible Schema definition you can:

Reuse your Schema in other Schemas

Create your own data types derived from standard types

Reference multiple schemas from the same document

Restrictions on XML elements are called facets.

 

15) How to make Restrictions on a Set of Values

To limit the content of an XML element to a set of acceptable values, we would use the enumeration constraint.

This example defines an element called "car":

<xs:element name="car">

<xs:simpleType>

  <xs:restriction base="xs:string">

    <xs:enumeration value="Audi"/>

    <xs:enumeration value="Golf"/>

    <xs:enumeration value="BMW"/>

  </xs:restriction>

</xs:simpleType>

</xs:element>

The "car" element is a simple type with a restriction. The acceptable values are: Audi, Golf, BMW.

 

16) how to set restrictions on a series of values ?

To limit the content of an XML element to define a series of numbers or letters that can be used, we would use the pattern constraint.

This example defines an element called "letter":

<xs:element name="letter">

<xs:simpleType>

  <xs:restriction base="xs:string">

    <xs:pattern value="[a-z]"/>

  </xs:restriction>

</xs:simpleType>

</xs:element>

The "letter" element is a simple type with a restriction. The only acceptable value is ONE of the LOWERCASE letters from a to z.

 

17) How to enforce restrictions on whitespace characters ?

To specify how white space characters should be handled, we would use the whiteSpace constraint.

 

18) What are the restrictions available on length ?

To limit the length of a value in an element, we would use the length, maxLength, and minLength constraints.

 

19) differentiate between (maxExclusive,maxInclusive) also between(minExclusive,minInclusive)

maxExclusive

Specifies the upper bounds for numeric values (the value must be less than this value)

maxInclusive

Specifies the upper bounds for numeric values (the value must be less than or equal to this value)

minExclusive

Specifies the lower bounds for numeric values (the value must be greater than this value)

minInclusive

Specifies the lower bounds for numeric values (the value must be greater than or equal to this value)

 

20) can  a empty complex element  contain attributes? -------------------------------> Yes

 

21)what is the restriction ?

That should not have any content between the opening and closing tags.

 

23) what is the usage of <any> element in XSD

The <any> element enables us to extend the XML document with elements not specified by the schema!

 

 

X S L T

XSL (eXtensible Stylesheet Language) is a language for expressing style sheets. It consists of three parts: XSLT, XPath, and XSL Formatting Objects.

1) What is XSLT?

XSLT is a language for transforming the structure of XML documents.

The XSL Transformations (XSLT) vocabulary provides a rule-based framework for selecting and processing document content, and transforming it into new documents.

 

2)How does XSLT work?

During the transformation process, XSLT uses XPath to define parts of the source document that match one or more predefined templates. When a match is found, XSLT will transform the matching part of the source document into the result document. The parts of the source document that do not match a template will end up unmodified in the result document.

3)what is the root element in a XSL stylesheet

The root element that declares the document to be an XSL style sheet is <xsl:stylesheet> or <xsl:transform>.

 

4) what is the use of  “match” attribute ?

The match attribute is used to associate the template with an XML element. The match attribute can also be used to define a template for a whole branch of the XML document (i.e. match="/" defines the whole document).

 

5) Why do I need to use a different XSLT namespace with Internet Explorer?

Actually, you don't!  You should use the standard xmlns:xsl="http://www.w3.org/1999/XSL/Transform".

 

6) What's XPath got to do with XSLT?

XSLT uses Xpath path expressions to filter through a node-tree.

An XSLT style sheet contains "template rules" that define which parts of a document's content should be selected and how they should be processed to create the desired result.

There are two parts to a template rule: a pattern and a template. A template rule uses XPath syntax to express a "pattern" which is then "matched" against elements in the source tree to select the nodes to be processed by the template

7) How to connect an XML source document to an XSLT style sheet?

An XML style sheet declaration is used to connect an XML document to its style sheet.

The style sheet declaration is placed after the XML version declaration and before the root element. Here's what one looks like:

<?xml-stylesheet type="text/xsl" href="nameoffile.xsl"?>

 

8) What is XPath (XML Path Language)?

The XPath Recommendation defines a path language and expression syntax used by XSLT, XPointer, and XLink.

XPath syntax operates on the abstract, logical structure of an XML document, rather than its physical "surface syntax."

XML Path Language (XPath) can be used as a general purpose query notation for addressing and filtering the elements and text of XML documents. XPath is supported in the Microsoft® XML Parser (MSXML) within XSL Transformations (XSLT), and through the Document Object Model (DOM) extensions selectNodes and selectSingleNode

 

9) What is an XPath node tree?

A "node tree" is what is constructed by an XPath processor after parsing.

XPath operates on an XML document as a tree of "nodes." A node tree built by an XPath processor can be used to provide a document hierarchy represented as an inverted ětreeî with the "root node" at the top and the ěbranchesî and ětrunkî below.

 

10) What is an XPath expression?

XPath's primary syntactic construct is the "expression." Two examples of XPath expressions are "location paths" and "function calls."

 

11) What is the relationship between XSLT and XPath?

XSLT uses XPath expressions to select nodes for processing

 

MSXML Example

 

Msxml.tlh and msxml.tli files are to be copied

 

------------------------

. h file

#import <msxml.dll>

#include <atlbase.h>

 

MSXML::IXMLDOMDocumentPtr dptr;

MSXML::IXMLDOMNodeListPtr nlptr;

MSXML::IXMLDOMNodePtr nptr;

 

. cpp file

 

#include <msxml.h>

 

 

::CoInitialize(0);

 

MSXML::IXMLDOMDocumentPtr dptr =0;

 

HRESULT hr = CoCreateInstance(CLSID_DOMDocument,0,CLSCTX_INPROCE_SERVER, IID_IXMLDomDocument, (Void**) &dptr);

 

dptr-> Load (xmlfilename);

 

::CoInInitialize();

 

 

//READING XML USING DOM TECHNIQUE

Dom is a model to parse the xml document use MSXML to parse it in DOM fashion

 

#include <atlbase.h>

 

//D:\Program Files\Microsoft Visual Studio\VC98\Include\MSXML.h  or  at C:\WINNT\system32\msxml.dll

#import <Msxml.dll> rename_namespace("MSXML")

 

void CReadxmlDlg::OnReadbutt()

{

            HRESULT hr = CoInitialize(NULL);

            MSXML::IXMLDOMDocumentPtr spXMLDoc = NULL;

            hr = spXMLDoc.CreateInstance(__uuidof(MSXML::DOMDocument));

//hr= CoCreateInstance(CLSID_XMLDocument,0,CLSCTX_INPROC_SERVER,IID_IXMLDocument, (void**) &spXMLDoc);

            if ( SUCCEEDED(hr) )

            {

                        CComBSTR bstrFileName("C:\\readxml\\samplexml.xml");

                        VARIANT_BOOL varBool =         spXMLDoc->load(_variant_t(bstrFileName));

 

                        MSXML::IXMLDOMElementPtr spXMLDocumentElement = NULL;

                        spXMLDocumentElement = spXMLDoc->GetdocumentElement();

 

                        //CComBSTR l_combstrRootNodeName;

                        //spXMLDocumentElement->get_nodeName(&l_combstrRootNodeName);

 

                        //GET ALL CHILD NODES

                        MSXML::IXMLDOMNodeListPtr l_pXMLDOMNodeList = 0;

                        hr = spXMLDocumentElement->get_childNodes(&l_pXMLDOMNodeList);

                                               

                        long len = l_pXMLDOMNodeList->Getlength();

                       

for(int i=0; i<len; i++)

                        {

                                    //GET EACH NODE DETAILS

                                    MSXML::IXMLDOMNodePtr listnode;

                                    l_pXMLDOMNodeList->get_item(i,&listnode);

                                   

                                    CComBSTR txt,node;

                                    listnode->get_text(&txt);

                                    listnode->get_nodeName(&node);

                                               

                                    AfxMessageBox(CString(node));          AfxMessageBox(CString(txt));

 

                                    //GET ATTRIBUTE DETAILS

                                    MSXML::IXMLDOMNamedNodeMapPtr attr;

                                    listnode->get_attributes (&attr);

                                   

                                    int attrlen = attr->Getlength();

                                   

                                    if(attrlen ==0)              AfxMessageBox("Contains NO Attributes");

                                   

                                    for(int x=0; x<attrlen; x++)

                                    {

                        MSXML::IXMLDOMNodePtr attrnode;

                                                attr->get_item(x,&attrnode);

                                   

                                                CComBSTR attrval;

                                                attrnode->get_text(&attrval);

                                               

                                                AfxMessageBox(CString(attrval));

                                    }

                        }

            }

 

                        CoUninitialize();

 

}

 

12) What is SOAP?

Simple  Object Access Protocol

Corba uses IIOP (internet inter ORB protocol ) and DCOM uses ORPC (Object RPC) . Now to establish a communication between these two protocol SOAP is the best solution.