Java and C++ DOM API Comparisons The C++ DOM API is very
similar in design and use to the Java DOM API bindings. As a consequence, conversion of existing Java code that makes use of the DOM to C++ is a straightforward process.
This section outlines the differences between Java and C++ bindings. Accessing the API in Application Code
// From C++ #include <dom/DOM.hpp> |
// From Java, import org.w3c.dom.* |
The header file <dom/DOM.hpp> includes all of the individual headers for the
DOM API classes. Class Names The C++ class names are prefixed with "DOM_". The intent is to prevent conflicts between DOM class names and other names that may already be in use by an
application or other libraries that a DOM based application must link with.The use of C++ namespaces would also have solved this conflict problem, but for the fact that many compilers do not yet support them.
DOM_Document myDocument; // C++ DOM_Node aNode;
DOM_Text someText; |
Document myDocument; // Java Node aNode;
Text someText; |
The Java class names can defined for use from C++ with a typedef. This is not
advisable for the general case - conflicts really do occur - but can be very useful when converting a body of existing Java code to C++.
typedef DOM_Document Document; typedef DOM_Node Node;
Document myDocument; // Now C++ usage is
//indistinguishable from Java
Node aNode; |
Objects and Memory Management
The C++ DOM implementation uses automatic memory management, implemented using reference counting. As a result, the C++ code for most DOM operations is very similar to the equivalent Java code, right down to the use of
factory methods in the DOM document class for nearly all object creation, and the lack of any explicit object deletion. Consider the following code snippets
// This is C++ DOM_Node aNode; aNode = someDocument.createElement("ElementName");
DOM_Node docRootNode = someDoc.getDocumentElement(); docRootNode.AppendChild(aNode); |
// This is Java Node aNode; aNode = someDocument.createElement("ElementName");
Node docRootNode = someDoc.getDocumentElement(); docRootNode.AppendChild(aNode); |
The Java and the C++ are identical on the surface, except for the class names, and this similarity remains true for most DOM code.
However, Java and C++ handle objects in somewhat different ways, making it important to understand a little bit of what is going on beneath the surface. In Java, the variable anode is an object reference
, essentially a pointer. It is initially == null, and references an object only after the assignment statement in the second line of the code.
In C++ the variable anode is, from the C++ language's perspective, an actual live object. It is constructed when the first line of the code executes, and DOM_Node::operator = () executes at the second line. The C++ class
DOM_Node essentially a form of a smart-pointer; it implements much of the behavior of a Java Object Reference variable, and delegates the DOM behaviors
to an implementation class that lives behind the scenes, in the implementation. Key points to remember when using the C++ DOM classes:
w Create them as local variables, or as member variables of some other
class. Never "new" a DOM object into the heap or make an ordinary C pointer variable to one, as this will greatly confuse the automatic memory management.
w The "real" DOM objects - nodes, attributes, CData sections, whatever, do live on the heap, are created with the create... methods on class
DOM_Document. DOM_Node and the other DOM classes serve as reference variables to the underlying heap objects. w The visible DOM classes may be freely copied (assigned), passed as parameters to functions, or returned by value from functions. w Memory management of the underlying DOM heap objects is automatic, implemented by means of reference counting. So long as
some part of a document can be reached, directly or indirectly, via reference variables that are still alive in the application program, the corresponding document data will stay alive in the heap. When all
possible paths of access have been closed off (all of the application's DOM objects have gone out of scope) the heap data itself will be automatically deleted. w
There are restrictions on the ability to subclass the DOM classes.
DOMString
Class DOMString provides the mechanism for passing string data to and from the DOM API. DOMString is not intended to be a completely general string class, but rather to meet the specific needs of the DOM API.
The design derives from two primary sources: from the DOM's CharacterData interface and from class java.lang.string Main features are
w Unicode, with fixed sized 16 bit storage elements. w Automatic memory management, using reference counting. w DOMStrings are mutable - characters can be inserted, deleted or appended.
When a string is passed into a method of the DOM, when setting the value of a Node, for example, the string is cloned so that any subsequent alteration or
reuse of the string by the application will not alter the document contents. Similarly, when strings from the document are returned to an application via the
DOM API, the string is cloned so that the document can not be inadvertently altered by subsequent edits to the string.
Note: The ICU classes are a more general solution to UNICODE character handling for C++ applications. Equality Testing The DOMString equality operators (and all of the rest of the DOM class conventions) are modeled after the Java equivalents. The equals() method
compares the content of the string, while the == operator checks whether the string reference variables (the application program variables) refer to the same
underlying string in memory. This is also true of DOM_Node, DOM_Element, etc., in that operator == tells whether the variables in the application are referring to the same actual node or not. It's all very Java-like
- bool operator == () is true if the DOMString variables refer to the same underlying storage.
- bool equals() is true if the strings contain the same characters.
Here is an example of how the equality operators work:
DOMString a = "Hello"; DOMString b = a; DOMString c = a.clone();
if (b == a) // This is true if (a == c) // This is false
if (a.equals(c)) // This is true b = b + " World"; if (b == a) // Still true, and the string's
value is "Hello World" if (a.equals(c)) // false. a is "Hello World"; c is still "Hello".
Down-Casting
Application code sometimes must cast an object reference from DOM_Node to one of the classes deriving from DOM_Node, DOM_Element, for example. The syntax for doing this in C++ is different from that in Java.
// This is C++ DOM_Node aNode = someFunctionReturningNode(); DOM_Element el = (Element &)Node; |
// This is Java Node aNode = someFunctionReturningNode(); Element el = (Element)Node; |
The C++ cast is not type-safe; the Java cast is checked for compatible types at runtime. If necessary, a type-check can be made in C++ using the node type
information:
// This is C++ DOM_Node aNode = someFunctionReturningNode();
DOM_Element el; // by default, el will == null. if (anode.getNodeType() == DOM_Node::ELEMENT_NODE) el = (Element &)Node; else
// aNode does not refer to an element. // Do something to recover here. |
Subclassing
The C++ DOM classes, DOM_Node, DOM_Attr, DOM_Document, etc., are not designed to be subclassed by an application program. As an alternative, the DOM_Node class provides a User Data field for use by
applications as a hook for extending nodes by referencing additional data or objects. See the API description for DOM_Node for details. |