The use of structured documents in XML has a
wide area of application in different types of fields.
In many cases it is necessary to process documents
of a considerable size where runtime is relevant
and the execution window is clearly limited. As we
saw, there are two types of XML APIs: memory-
based APIs and streaming-based APIs. Memory-
based XML APIs maintain a long lived structural
data in memory and only when the parsing process
is finished modifications are allowed, while
streaming-based APIs use small memory footprint,
allocating and freeing memory constantly,
allowing the process of infinite size XML
documents (in theory).
Generally, for XML handling, dom4j, and DOM are good choices, with the preference between them determined by Java-specific features or cross-language compatibility, depending on project requirements. Although less flexible in XML transformations, OJXQI is a very good choice when you need to do standard modifications with good performance. VTD array of integers’ structure proves to be the best model in almost all tests. It is a model that consumes less memory (compared to other memory-based APIs), the processing time is very fast and even their ability to update a document, maintaining its structure in memory, proved being far superior in relation to the other memory-based APIs (for tested scenario). The use of VTD API is more complex in comparison to other memory-based APIs, where it is necessary an additional effort to dominate the API’s features.
For streaming-based APIs, StAX has proved to be an API with better overall performance compared to SAX and XOM. This kind of APIs do not maintain long-lived structural data in memory, so there are no advantages in using this type of API when you need to perform a set of transformations that somehow change the order of elements in the XML hierarchy. Typically, these types of APIs are used only for forward-only applications or simple modifications using XSLT language.
Generally, for XML handling, dom4j, and DOM are good choices, with the preference between them determined by Java-specific features or cross-language compatibility, depending on project requirements. Although less flexible in XML transformations, OJXQI is a very good choice when you need to do standard modifications with good performance. VTD array of integers’ structure proves to be the best model in almost all tests. It is a model that consumes less memory (compared to other memory-based APIs), the processing time is very fast and even their ability to update a document, maintaining its structure in memory, proved being far superior in relation to the other memory-based APIs (for tested scenario). The use of VTD API is more complex in comparison to other memory-based APIs, where it is necessary an additional effort to dominate the API’s features.
For streaming-based APIs, StAX has proved to be an API with better overall performance compared to SAX and XOM. This kind of APIs do not maintain long-lived structural data in memory, so there are no advantages in using this type of API when you need to perform a set of transformations that somehow change the order of elements in the XML hierarchy. Typically, these types of APIs are used only for forward-only applications or simple modifications using XSLT language.
Memory-based APIs maintain the structure of the
whole document in memory, resulting in some
overhead, however, for updates that somehow
change the document structure, this type of APIs
lead to some advantages over the streaming-based
APIs since those need to perform increased I/O
operations to do same transformation.
Manipulating a document using memory-based
APIs is much more accessible and quick, since for
streaming-based APIs we need to constantly use
temporary buffers to keep information in memory.
In summary, we can conclude that choosing from
the two approaches studied for processing XML
documents depends mostly on project’s
requirements.
