Loading

How to Split a Big XML File for MuleSoft

Publiseringsdato: Mar 26, 2026
Beskrivelse

When leveraging Mule Runtime and you need to process big XML files that contain a collection of elements that you want to split into separate messages, review the below information.

The file size is on the order of hundreds of megabytes, and due to resource constrains in your environment, you need to keep ‌memory utilization to a minimum.

Usually these kinds of files take the following form:


<?xml version ="1.0" encoding ="UTF-8"?>
<Customers>
   <Customer>
      <Name>Acme</Name>
      <Address>1234, Main Street, AL</Address>

... dozens of inner elements

   </Customer>

... thousands of Customer elements

   <Customer>
      <Name>Zen Inc</Name>
      <Address>7890, Other Street, WY</Address>

... dozens of inner elements

   </Customer>
</Customers>


After a file:inbound-endpoint, your payload is an InputStream that will be consumed by subsequent message processors. If you use any XPATH expression in your flow, the whole input stream will be read into memory first, and then transformed to a DOM document for the XPATH expression to be evaluated. This can consume gigabytes of RAM for file sizes in the range of 150-250MB.

Løsning

With MuleSoft when trying to split up a large XML file, use a custom splitter, extending the org.mule.routing.outbound.AbstractSplitter class and overriding the splitMessage method.

The splitMessage method must use StAX classes in order to ensure that the memory consumption is kept to the minimum possible.

The most important portion of the code is the following:

XMLStreamReader xsr = (XMLStreamReader) message.getPayload();
XMLOutputFactory xmlOutputFactory = XMLOutputFactory.newInstance();
XMLStreamWriter xsw = null;
ByteArrayOutputStream baos = new ByteArrayOutputStream();

try {
    TransformerFactory transformerFactory = TransformerFactory.newInstance();
    Transformer transformer = transformerFactory.newTransformer();

    while(xsr.hasNext()) {
        if(xsr.isStartElement() && xsr.getLocalName().equals(this.targetElement)){
            xsw = xmlOutputFactory.createXMLStreamWriter(baos);
            Source source = new StAXSource(xsr);
            StAXResult result = new StAXResult(xsw);
            transformer.transform(source, result);
            splittedMessages.add(baos.toByteArray());
            baos.flush();
            baos.reset();
        }
        xsr.next();
    }
}


In the above code, the payload, that has been previously transformed to an instance of a class implementing XMLStreamReader, is iterated using the cursor API from StAX.

If the event is the start of an xml element type, and the element's name matches that provided as a parameter to the custom splitter, then all the child elements will be written by XMLStreamWriter to an instance of ByteArrayOutputStream. Finally, the byte array is added to the list of messages returned by the splitter.
Note that as data is read from the underlying InputStream, the memory consumption will be given by the size of the child elements.
Please see the attached source code from the CustomXmlMessageSplitter class for more implementation details.

Example

The following flow is an example of usage of the attached custom transformer and splitter. Note that we are providing a custom XmlToXMLStreamReader transformer in order to set additional properties for the XMLInputFactory.

<flow name ="splitBigXML" doc:name =" splitBigXML ">
    <file:inbound-endpoint responseTimeout= "10000"
doc:name ="File" path= "/tmp/in" fileAge ="1000"/>
    <!-- The custom splitter requires the payload to be an instance of
         a class implementing XMLStreamReader-->

    <custom-transformer name ="XmlToXSR"
class ="com.mulesoft.support.CustomXmlToXMLStreamReader" doc:name ="XmlToXSR" />
    <!-- Set the targetElement property to match the name of the element
         that you want to use to split the big XML file -->

    <custom-splitter class ="com.mulesoft.support.CustomXmlMessageSplitter">
        <spring:property name ="targetElement" value ="Customer"/>
    </custom-splitter>
    <file:outbound-endpoint responseTimeout ="10000" doc:name ="File" path ="/tmp/out" />
</flow>


In this case we are writing the split XML messages to files, but you pass them to any message processor or outbound endpoint.

Under severe memory-constrained scenarios, you must set the processingStrategy attribute of the flow to synchronous, to ensure that only one thread will read the big file, split the messages and send them for further processing.

Attachments

CustomXmlMessageSplitter.java
CustomXmlToXMLStreamReader-for-3.5.x.java
CustomXmlToXMLStreamReader.java

Knowledge-artikkelnummer

001118773

 
Laster
Salesforce Help | Article