Compare hotel prices and find the best deal - HotelsCombined.com

Tuesday, April 28, 2015

Processing XML with Java – A Performance Benchmark


The use of structured documents in XML has a wide area of application in different types of fields. In many cases it is necessary to process documents of a considerable size where runtime is relevant and the execution window is clearly limited. As we saw, there are two types of XML APIs: memory- based APIs and streaming-based APIs. Memory- based XML APIs maintain a long lived structural data in memory and only when the parsing process is finished modifications are allowed, while streaming-based APIs use small memory footprint, allocating and freeing memory constantly, allowing the process of infinite size XML documents (in theory).

Generally, for XML handling, dom4j, and DOM are good choices, with the preference between them determined by Java-specific features or cross-language compatibility, depending on project requirements. Although less flexible in XML transformations, OJXQI is a very good choice when you need to do standard modifications with good performance. VTD array of integers’ structure proves to be the best model in almost all tests. It is a model that consumes less memory (compared to other memory-based APIs), the processing time is very fast and even their ability to update a document, maintaining its structure in memory, proved being far superior in relation to the other memory-based APIs (for tested scenario). The use of VTD API is more complex in comparison to other memory-based APIs, where it is necessary an additional effort to dominate the API’s features.

For streaming-based APIs, StAX has proved to be an API with better overall performance compared to SAX and XOM. This kind of APIs do not maintain long-lived structural data in memory, so there are no advantages in using this type of API when you need to perform a set of transformations that somehow change the order of elements in the XML hierarchy. Typically, these types of APIs are used only for forward-only applications or simple modifications using XSLT language.

Memory-based APIs maintain the structure of the whole document in memory, resulting in some overhead, however, for updates that somehow change the document structure, this type of APIs lead to some advantages over the streaming-based APIs since those need to perform increased I/O operations to do same transformation. Manipulating a document using memory-based APIs is much more accessible and quick, since for streaming-based APIs we need to constantly use temporary buffers to keep information in memory. In summary, we can conclude that choosing from the two approaches studied for processing XML documents depends mostly on project’s requirements.