When leveraging Mule Runtime and you need to process big XML files that contain a collection of elements that you want to split into separate messages, review the below information.
The file size is on the order of hundreds of megabytes, and due to resource constrains in your environment, you need to keep memory utilization to a minimum.
Usually these kinds of files take the following form:
<?xml version ="1.0" encoding ="UTF-8"?>
<Customers>
<Customer>
<Name>Acme</Name>
<Address>1234, Main Street, AL</Address>
... dozens of inner elements
</Customer>
... thousands of Customer elements
<Customer>
<Name>Zen Inc</Name>
<Address>7890, Other Street, WY</Address>
... dozens of inner elements
</Customer>
</Customers>
After a file:inbound-endpoint, your payload is an InputStream that will be consumed by subsequent message processors. If you use any XPATH expression in your flow, the whole input stream will be read into memory first, and then transformed to a DOM document for the XPATH expression to be evaluated. This can consume gigabytes of RAM for file sizes in the range of 150-250MB.
With MuleSoft when trying to split up a large XML file, use a custom splitter, extending the org.mule.routing.outbound.AbstractSplitter class and overriding the splitMessage method.
The splitMessage method must use StAX classes in order to ensure that the memory consumption is kept to the minimum possible.
The most important portion of the code is the following:
XMLStreamReader xsr = (XMLStreamReader) message.getPayload();
XMLOutputFactory xmlOutputFactory = XMLOutputFactory.newInstance();
XMLStreamWriter xsw = null;
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try {
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
while(xsr.hasNext()) {
if(xsr.isStartElement() && xsr.getLocalName().equals(this.targetElement)){
xsw = xmlOutputFactory.createXMLStreamWriter(baos);
Source source = new StAXSource(xsr);
StAXResult result = new StAXResult(xsw);
transformer.transform(source, result);
splittedMessages.add(baos.toByteArray());
baos.flush();
baos.reset();
}
xsr.next();
}
}
In the above code, the payload, that has been previously transformed to an instance of a class implementing XMLStreamReader, is iterated using the cursor API from StAX.
If the event is the start of an xml element type, and the element's name matches that provided as a parameter to the custom splitter, then all the child elements will be written by XMLStreamWriter to an instance of ByteArrayOutputStream. Finally, the byte array is added to the list of messages returned by the splitter.
Note that as data is read from the underlying InputStream, the memory consumption will be given by the size of the child elements.
Please see the attached source code from the CustomXmlMessageSplitter class for more implementation details.
Example
The following flow is an example of usage of the attached custom transformer and splitter. Note that we are providing a custom XmlToXMLStreamReader transformer in order to set additional properties for the XMLInputFactory.
<flow name ="splitBigXML" doc:name =" splitBigXML ">
<file:inbound-endpoint responseTimeout= "10000"
doc:name ="File" path= "/tmp/in" fileAge ="1000"/>
<!-- The custom splitter requires the payload to be an instance of
a class implementing XMLStreamReader-->
<custom-transformer name ="XmlToXSR"
class ="com.mulesoft.support.CustomXmlToXMLStreamReader" doc:name ="XmlToXSR" />
<!-- Set the targetElement property to match the name of the element
that you want to use to split the big XML file -->
<custom-splitter class ="com.mulesoft.support.CustomXmlMessageSplitter">
<spring:property name ="targetElement" value ="Customer"/>
</custom-splitter>
<file:outbound-endpoint responseTimeout ="10000" doc:name ="File" path ="/tmp/out" />
</flow>
In this case we are writing the split XML messages to files, but you pass them to any message processor or outbound endpoint.
Under severe memory-constrained scenarios, you must set the processingStrategy attribute of the flow to synchronous, to ensure that only one thread will read the big file, split the messages and send them for further processing.
Attachments
An Introduction to StAX
StAX'ing up XML, Part 1: An introduction to Streaming API for XML (StAX)
001118773

We use three kinds of cookies on our websites: required, functional, and advertising. You can choose whether functional and advertising cookies apply. Click on the different cookie categories to find out more about each category and to change the default settings.
Privacy Statement
Required cookies are necessary for basic website functionality. Some examples include: session cookies needed to transmit the website, authentication cookies, and security cookies.
Functional cookies enhance functions, performance, and services on the website. Some examples include: cookies used to analyze site traffic, cookies used for market research, and cookies used to display advertising that is not directed to a particular individual.
Advertising cookies track activity across websites in order to understand a viewer’s interests, and direct them specific marketing. Some examples include: cookies used for remarketing, or interest-based advertising.