What is DOM?
DOM is Document Object Model API’s
designed to allow programmers to access their information without having to
write a parser in their programming language of choice. Thus by using either
DOM or SAX API’s your program is free to use whatever parser it wishes.
These APIs are available in all languages (java. C++. Etc). The Textual information in your XML document gets
turned into a bunch of Tree nodes wherein you can access your information only
by interacting with this tree of nodes. DOM preserves the sequence of elements
that it reads from XML document because for DOM it really matters because DOM
by default creates an object model for you.
What is SAX?
SAX is
Simple API for XML. SAX API’s designed to allow programmers to access their
information without having to write a parser in their programming language of
choice. These APIs are available in all languages (java. C++.etc). The Textual
information in your XML document doesn’t get turned into a bunch of Tree nodes,
but as a sequence of Events. SAX is faster then DOM. SAX doesn’t provide object
model for you, therefore here we have to create our own custom object model and
write a Event handler class that listens to SAX Events and does element to
object mapping. It can fire events for every open and close tag, for #PCDATA and CDATA sections and
also fires events for DTD’s, comments and
Instructions.
When to use what?
If you XML
document contains document data then DOM is the
perfect solution.
If you XML document contains structured data/ machine readable
(sql and Xql queries, resultset) then SAX is the perfect solution.
Explain about Lex,
Yacc ,
Flex and Bison?
In general
the Compiler or Interpreter for a programming language is often decomposed into
2 parts:
Read the
source program and discover its structure
Process
this structure
Lex
and Yacc can generate program fragments that solve
the first task.
The task
of discovering the source structure again is decomposed into subtasks:
1)Split
the source file into tokens (Lex)
Find the
hierarchical structure of the program (Yacc).
Lex is a program generator designed for processing
character input streams.
Yacc is a general tool for describing the input to a
computer program. It specifies a structure and control of flow.
Flex is a tool that reads the given
input file for generating scanners (an exe file)
Bison is a general purpose Parser
generator that converts a grammar description into a C program to parse
that grammar.
Bison is
upward compatible with Yacc. Yacc
grammars will work with bison with no changes
Diff between xml and html?
html ->
responsible for rendering data
xml -> responsible for holding data
________________________________________________________________________________________
What is a DTD?
It defines the document structure. With DTD different groups of people can interchange data
commonly.
Its also used to
verify your own data / the data you receive from outside world.
You can declare it as inline / external to
your xml document
ex: <!ELEMENT car(brand,type)>
<!ELEMENT
brand(#PCDATA)>
<!ELEMENT
type(#PCDATA)>
________________________________________________________________________________________
What are XML
schema?
Its similar to DTD.
It supports datatypes
& namespaces. Its
extensible.
Its written in
XML, so simple to use and understand
ex: <xs:element name
="age" type="xs:integer"/>
Using DTD we
would only be able to specify that a <zip> element was TEXT.
but using xml
schema we could actually create a datatype for zip
codes and could limit the zip element to a five digit code.
________________________________________________________________________________________
Dom Vs Sax?
Dom creates a object model.
transforms the
textual information in the xml document into a bunch of tree nodes
Sax doesnt creates
a object model
access the
information in the xml documents as a sequence of events.
here developer
needs to write handler classes for interpreting these events.
When to use what?
Use DOM when xml contains document data and
use SAX when xml contains structured data
________________________________________________________________________________________
XML FUNDAMENTALS
XML is a
good replacement for EDI. EDI is expensive, it uses a
dedicated communication infrastructure.
By using
XML everybody knows that the same interpretation of the data is used.
It uses
the Internet for the data exchange. And it's very flexible. XML makes
communication easy. It's a great tool for transactions between businesses.
XML is a
smaller version of SGML.
XML is a
meta-language.( A meta-language is a language that's
used to define other languages.)
XML is
about defining data.
XML: What it can do
With XML
you can :
_ Define
data structures
_ Make
these structures platform independent
_ Process
XML defined data automatically
_ Define
your own tags
With XML you cannot
_ Define
how your data is shown. To show data, you need other techniques.
XSL (eXtensible Stylesheet Language)
is created for this purpose. But the
presentation can also be defined with CSS (Cascading
Style Sheets).
XML
declaration is done by using a Tag
<?xml
version="1.0"?>
Structure of XML
<?xml
version="1.0"?>
<root>
<element>
<sub-element>content</sub-element>
<sub-element>content</sub-element>
</element>
</root>
Elements
in XML can use attributes. The syntax is:
<element
attribute-name = "attribute-value"> content </element>
<car
color = "green">volvo</car>
Try to
avoid attributes. Software that checks XML-documents can do a better job with
tags than with attributes.
Well formed XML document means that
the XML document applies to the syntax rules for XML. The Basic Rules are (There are more rules
pertaining to Entities)
_ it contains a root element
_ all other elements are children of the root element
_ all elements are correctly paired
_ the
element name in a start-tag and an end-tag are exactly the same
_ attribute names are used only once within the same element
To be valid an XML document needs
to apply to the following rules:
_ The document must be well formed. (More on
well formed in the previous page).
_ The document must apply to the rules as defined in a
Document Type Definition
(DTD),
Companies
that exchange XML-documents can check them with the same DTD.
DTD defines rules for a particular type of XML
document. A DTD describes elements, Data. ( PCDATA stands for parsed
character data and CDATA will not be parsed or
shown.) . An element can contain sub-elements.
<!ELEMENT car (brand, type) >
<!ELEMENT brand (#PCDATA) >
<!ELEMENT type (#PCDATA) >
This means
that the element car has two subtypes: brand and type. Each subtype can
contain
characters.
the
number of possible occurrences the following indications can be used:
_ + must
occur at least one time but may occur more often
_ * may
occur more often but may also be omitted
_ ? may occur once or not at all
'|' you
define a choice between two sub elements.
A DTD can
be an external document that's referred to.
<?xml
version="1.0"?>
<!DOCTYPE name of root-element SYSTEM
"address">
A DTD can
also be included in the XML document itself.
<?xml
version="1.0"?>
<!DOCTYPE name of root-element [followed by
the element definitions.]>
XSL can convert XML
documents into HTML. XSLT is used to describe how an
XML source
document is transformed into another XML document that uses the XSL
1)Usage of XML ?
XML can
Separate Data from HTML (html is responsible for rendering data and xml is
responsible to hold data)
XML can be
used to Store Data
XML can be
used to Share Data
2) What is a well formed XML ?
A XML document that has correct
XML syntax.
3)What is a valid XML document ?
A
"Valid" XML document is a "Well Formed" XML document, which
also conforms to the rules of a Document Type Definition (DTD):
4) What is the purpose of DTD ?
The
purpose of a DTD is to define the legal building
blocks of an XML document.
It defines
the document structure with a list of legal elements.
6) Can we use a CSS
file to format XML document ?
YES
7) what
does the tag <xml> do in HTML , what is the relevance
XML data
can be embedded directly in an HTML page Or can be
attached as separate XML file
8)What is the usage of XML parser ?
To create,
read and update - an XML document, we need XML parser.
9) Differentiate between SAX(Simple API for XML) and DOM (Document Object Model) ?
Here the
textual information in the XML document gets turned into a bunch of tree nodes.
DOM gives
access to the information stored in XML document as a hierarchical object
model.
DOM
creates a tree of nodes and we can access the information by interacting with
this tree of nodes.
SAX
chooses to give you access to the information in the XML document, not as a
tree of nodes, but as a sequence of events!
SAX doesnt
create a default object model on top of the XML document (like DOM does).
This makes
SAX faster, and also necessitates the following things to do:
creation
of your own custom object model which
"holds” all the information
in the XML document .
creation
of a document handler class that listens to SAX events and makes sense of these
events to create objects in the custom object model.
All SAX requires is that the parser should read in the XML
document, and fire a bunch of events depending on what tags it encounters in
the XML document. The developer is responsible for interpreting these events by
writing an XML document handler class, which is responsible for making sense of
all the tag events and creating objects in your own object model.
SAX is faster than DOM, because it
bypasses the creation of a tree based object model of information.
On the
other hand, you have to write a SAX document handler to interpret all the SAX
events (which can be a lot of work).
11) What kinds of SAX events are
fired by the SAX parser?
SAX will fire an event for every open and
close tag, it also fires events for #PCDATA and CDATA sections.
Document handler has to interpret these events in some meaningful way and
create own custom object model based on them. SAX also fires events for
processing instructions, DTDs, comments, etc.
12) When to use DOM
?
If the XML
documents contain document data then DOM is a completely natural fit. An
example of this is the Datachannel RIO product, which
can index and organize information that comes from all kinds of document
sources (like Word and Excel files). In this case, DOM is well suited to allow
programs access to information stored in these documents.
However,
when dealing with structured data DOM is not the best choice. SAX might be a
better fit in this scenario.
13) Which parser is the best fit ?
If your
information is structured in a way that makes it easy to create this element
to object mapping then you should use the SAX API. On the other hand, if your
data is much better represented as a tree then you should use DOM.
Advanced
XML Questions
1) how to resolve duplication of names when 2 or more xml
documents are used in an application?
Namespaces,
elaborate on (URI, URN)
2) What is a CDATA
section?
Anything
within CDATA section is ignored by the parser.
A CDATA section cannot contain another CDATA
section.
Also make
sure there are no spaces or line breaks in the strings it holds
3) Why XML Encoding is required ?
XML
documents can contain foreign characters. To let the XML parser understand
these characters, you should save your XML documents as Unicode.
4)
Name 2 HTML tags that are used to display XML data?
<span datasrc="#xmldso" datafld="TITLE"></span>
<div datasrc="#xmldso" datafld="TITLE"></div>
DTD
& XML Schema
1) What is the purpose of DTD ?
The
purpose of a Document Type Definition is to define the legal building blocks of
an XML document.
It defines
the document structure with a list of legal elements.
A DTD can be declared inline in your XML document, or as an
external reference.
DTD
doesn’t support inheritence.
2) why use
a DTD ?
With DTD, your XML files can carry a description of its own
format with it.
With a DTD, independent groups of people can agree to use a common
DTD for interchanging data.
Your
application can use a standard DTD to verify that the
data you receive from the outside world is valid. You can also use a DTD to verify your own data.
3) What is an XML Schema?
The
purpose of an XML Schema is to define the legal building blocks of an XML
document, just like a DTD.
An XML
Schema:
defines
elements that can appear in a document
defines
which elements are child elements
defines
the number of child elements
defines
the order of child elements
defines
whether an element is empty or can include text
defines
attributes that can appear in a document
defines
data types for elements and attributes
defines
default and fixed values for elements and attributes
4) advantage
of schemas over DTD ?
XML
Schemas are extensible to future additions
XML
Schemas support data types
XML
Schemas support namespaces
5) DTD elements ?
Declaring
only one occurrence of the same element
<!ELEMENT
note (message)> |
The
example declaration above declares that the child element message can only occur
one time inside the "note" element.
Declaring
minimum one occurrence of the same element
<!ELEMENT
note (message+)> |
The + sign
in the example above declares that the child element message must occur one or
more times inside the "note" element.
Declaring
zero or more occurrences of the same element
<!ELEMENT
note (message*)> |
The * sign
in the example above declares that the child element message can occur zero or
more times inside the "note" element.
Declaring
zero or one occurrences of the same element
<!ELEMENT note (message?)> |
The ? sign in the example above declares that the child element
message can occur zero or one times inside the "note" element.
Declaring
either/or content
<!ELEMENT
note (to,from,header,(message|body))> |
The
example above declares that the "note" element must contain a
"to" element, a "from" element, a "header"
element, and either a "message" or a "body" element.
Declaring
mixed content
<!ELEMENT
note (#PCDATA|to|from|header|message)*> |
6) what is
the root element of every schema ?
<schema>
7) how to
refer a schema in a XML ?
<?xml
version="1.0"?>
<note xmlns=http://www.microsoft.com
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.microsoft.com
note.xsd">
<to>sree</to>
<from>baskar</from>
<heading>message</heading>
<body>meet me during lunch </body>
</note>
8) How to define simple elements in
XSD ?
<xs:element name="firstname" type="xs:string"/>
<xs:element name="age"
type="xs:integer"/>
<xs:element
name="dateofbirth" type="xs:date"/>
9) What are the available common
types in XSD ?
xs:string
xs:decimal
xs:integer
xs:boolean
xs:date
xs:time
10) how to
assign default values to these common types in XSD ?
<xs:element name="color"
type="xs:string"
default="red"/>
11) how to
assign constant values that can not be changed ?
<xs:element name="color"
type="xs:string" fixed="red"/>
12) How to specify attributes that
are mandatory ./ non-mandtory
? what is the key word used ?
<xs:attribute name="lang"
type="xs:string"
use="optional"/> |
To make an
attribute required:
<xs:attribute name="lang"
type="xs:string"
use="required"/> |
USE is
the key word for specifying an attibute
13) How to set restricted/
acceptable values for XML elements or attibutes? What
is the exact terminology for this?
This part
of code defines an element called "age" with a restriction. The value
of age can NOT be lower than 18 or greater than 60:
<xs:element name="age"> <xs:simpleType> <xs:restriction
base="xs:integer"> <xs:minInclusive
value="18"/> <xs:maxInclusive
value="60"/> </xs:restriction> </xs:simpleType> </xs:element> |
14) advantage
of schemas over DTD ?
I . XML Schema has
Support for Data Types
One of the
greatest strength of XML Schemas is the support for data types.
With the
support for data types:
It is
easier to describe permissible document content
It is
easier to validate the correctness of data
It is
easier to work with data from a database
It is
easier to define data facets (restrictions on data)
It is
easier to define data patterns (data formats)
It is
easier to convert data between different data types
II. XML
Schemas use XML Syntax
Another
great strength about XML Schemas is that they are written in XML.
Because
XML Schemas are written in XML:
You don't
have to learn another language
You can
use your XML editor to edit your Schema files
You can
use your XML parser to parse your Schema files
You can
manipulate your Schema with the XML DOM
You can
transform your Schema with XSLT
III XML
Schemas are Extensible
XML
Schemas are extensible, just like XML, because they are written in XML.
With an
extensible Schema definition you can:
Reuse your
Schema in other Schemas
Create
your own data types derived from standard types
Reference
multiple schemas from the same document
Restrictions
on XML elements are called facets.
15) How to make Restrictions on a
Set of Values
To limit
the content of an XML element to a set of acceptable values, we would use the
enumeration constraint.
This
example defines an element called "car":
<xs:element name="car"> <xs:simpleType> <xs:restriction
base="xs:string"> <xs:enumeration
value="Audi"/> <xs:enumeration
value="Golf"/> <xs:enumeration
value="BMW"/> </xs:restriction> </xs:simpleType> </xs:element> |
The
"car" element is a simple type with a restriction. The acceptable
values are: Audi, Golf, BMW.
16) how to
set restrictions on a series of values ?
To limit
the content of an XML element to define a series of numbers or letters that can
be used, we would use the pattern constraint.
This
example defines an element called "letter":
<xs:element name="letter"> <xs:simpleType> <xs:restriction
base="xs:string"> <xs:pattern
value="[a-z]"/> </xs:restriction> </xs:simpleType> </xs:element> |
The
"letter" element is a simple type with a restriction. The only
acceptable value is ONE of the LOWERCASE letters from a to
z.
17) How to enforce restrictions on whitespace characters ?
To specify
how white space characters should be handled, we would use the whiteSpace constraint.
18) What are the restrictions
available on length ?
To limit
the length of a value in an element, we would use the length, maxLength, and minLength
constraints.
19) differentiate between (maxExclusive,maxInclusive) also
between(minExclusive,minInclusive)
maxExclusive |
Specifies
the upper bounds for numeric values (the value must be less than this value) |
maxInclusive |
Specifies
the upper bounds for numeric values (the value must be less than or equal to
this value) |
minExclusive |
Specifies
the lower bounds for numeric values (the value must be greater than this
value) |
minInclusive |
Specifies
the lower bounds for numeric values (the value must be greater than or equal
to this value) |
20) can a empty complex element contain attributes?
-------------------------------> Yes
21)what is
the restriction ?
That
should not have any content between the opening and closing tags.
23) what
is the usage of <any> element in XSD
The
<any> element enables us to extend the XML document with elements not
specified by the schema!
X S L T
XSL (eXtensible Stylesheet Language)
is a language for expressing style sheets. It consists of three parts: XSLT, XPath, and XSL Formatting Objects.
1) What is XSLT?
XSLT
is a language for transforming the structure of XML documents.
The XSL Transformations (XSLT)
vocabulary provides a rule-based framework for selecting and processing
document content, and transforming it into new documents.
2)How does XSLT work?
During the
transformation process, XSLT uses XPath
to define parts of the source document that match one or more predefined
templates. When a match is found, XSLT will transform
the matching part of the source document into the result document. The parts of
the source document that do not match a template will end up unmodified in the
result document.
3)what
is the root element in a XSL stylesheet
The root
element that declares the document to be an XSL style
sheet is <xsl:stylesheet>
or <xsl:transform>.
4) what is
the use of “match” attribute ?
The match
attribute is used to associate the template with an XML element. The match
attribute can also be used to define a template for a whole branch of the XML
document (i.e. match="/" defines the whole document).
5) Why do I need to use a different
XSLT namespace with Internet Explorer?
Actually,
you don't! You should use the standard xmlns:xsl="http://www.w3.org/1999/XSL/Transform".
6) What's XPath
got to do with XSLT?
XSLT
uses Xpath path expressions to filter through a
node-tree.
An XSLT style sheet contains "template rules" that
define which parts of a document's content should be selected and how they
should be processed to create the desired result.
There are
two parts to a template rule: a pattern and a template. A template rule uses XPath syntax to express a "pattern" which is then
"matched" against elements in the source tree to select the nodes to
be processed by the template
7) How to connect an XML source
document to an XSLT style sheet?
An XML
style sheet declaration is used to connect an XML document to its style sheet.
The style
sheet declaration is placed after the XML version declaration and before the
root element. Here's what one looks like:
<?xml-stylesheet type="text/xsl"
href="nameoffile.xsl"?>
8) What is XPath
(XML Path Language)?
The XPath Recommendation defines a path language and expression
syntax used by XSLT, XPointer,
and XLink.
XPath
syntax operates on the abstract, logical structure of an XML document, rather
than its physical "surface syntax."
XML Path
Language (XPath) can be used as a general purpose
query notation for addressing and filtering the elements and text of XML
documents. XPath is supported in the Microsoft® XML
Parser (MSXML) within XSL
Transformations (XSLT), and through the Document
Object Model (DOM) extensions selectNodes and selectSingleNode
9) What is an XPath
node tree?
A
"node tree" is what is constructed by an XPath
processor after parsing.
XPath
operates on an XML document as a tree of "nodes." A node tree built
by an XPath processor can be used to provide a
document hierarchy represented as an inverted ětreeî
with the "root node" at the top and the ěbranchesî
and ětrunkî below.
10) What is an XPath
expression?
XPath's
primary syntactic construct is the "expression." Two examples of XPath expressions are "location paths" and
"function calls."
11) What is the relationship
between XSLT and XPath?
XSLT
uses XPath expressions to select nodes for processing
MSXML
Example
Msxml.tlh
and msxml.tli files are to be copied
------------------------
. h file
#import
<msxml.dll>
#include
<atlbase.h>
MSXML::IXMLDOMDocumentPtr dptr;
MSXML::IXMLDOMNodeListPtr nlptr;
MSXML::IXMLDOMNodePtr nptr;
. cpp file
#include
<msxml.h>
::CoInitialize(0);
MSXML::IXMLDOMDocumentPtr dptr =0;
HRESULT
hr = CoCreateInstance(CLSID_DOMDocument,0,CLSCTX_INPROCE_SERVER, IID_IXMLDomDocument, (Void**) &dptr);
dptr-> Load (xmlfilename);
::CoInInitialize();
//READING
XML USING DOM TECHNIQUE
Dom is a
model to parse the xml document use MSXML to parse it
in DOM fashion
#include
<atlbase.h>
//D:\Program
Files\Microsoft Visual Studio\VC98\Include\MSXML.h or
at C:\WINNT\system32\msxml.dll
#import
<Msxml.dll> rename_namespace("MSXML")
void CReadxmlDlg::OnReadbutt()
{
HRESULT hr
= CoInitialize(NULL);
MSXML::IXMLDOMDocumentPtr
spXMLDoc = NULL;
hr = spXMLDoc.CreateInstance(__uuidof(MSXML::DOMDocument));
//hr= CoCreateInstance(CLSID_XMLDocument,0,CLSCTX_INPROC_SERVER,IID_IXMLDocument,
(void**) &spXMLDoc);
if (
SUCCEEDED(hr) )
{
CComBSTR
bstrFileName("C:\\readxml\\samplexml.xml");
VARIANT_BOOL
varBool = spXMLDoc->load(_variant_t(bstrFileName));
MSXML::IXMLDOMElementPtr
spXMLDocumentElement = NULL;
spXMLDocumentElement = spXMLDoc->GetdocumentElement();
//CComBSTR
l_combstrRootNodeName;
//spXMLDocumentElement->get_nodeName(&l_combstrRootNodeName);
//GET ALL CHILD NODES
MSXML::IXMLDOMNodeListPtr
l_pXMLDOMNodeList = 0;
hr
= spXMLDocumentElement->get_childNodes(&l_pXMLDOMNodeList);
long
len = l_pXMLDOMNodeList->Getlength();
for(int i=0; i<len; i++)
{
//GET EACH
NODE DETAILS
MSXML::IXMLDOMNodePtr listnode;
l_pXMLDOMNodeList->get_item(i,&listnode);
CComBSTR txt,node;
listnode->get_text(&txt);
listnode->get_nodeName(&node);
AfxMessageBox(CString(node)); AfxMessageBox(CString(txt));
//GET
ATTRIBUTE DETAILS
MSXML::IXMLDOMNamedNodeMapPtr attr;
listnode->get_attributes
(&attr);
int attrlen
= attr->Getlength();
if(attrlen ==0) AfxMessageBox("Contains
NO Attributes");
for(int x=0; x<attrlen; x++)
{
MSXML::IXMLDOMNodePtr
attrnode;
attr->get_item(x,&attrnode);
CComBSTR attrval;
attrnode->get_text(&attrval);
AfxMessageBox(CString(attrval));
}
}
}
CoUninitialize();
}
12) What is SOAP?
Simple Object Access Protocol
Corba
uses IIOP (internet inter ORB protocol
) and DCOM uses ORPC
(Object RPC) . Now to establish a communication
between these two protocol SOAP is the best solution.