|Large data integration operations, especially integration operations that need to be run on a regular basis, need to be managed carefully. An integration operation that processes millions of records every day can obviously have direct impact on organization performance for you and your users. In some cases, small changes to the integration operation can significantly reduce the amount of data processing, which can improve performance and user experience.
Your integration operations might be doing more work than they have to. If you have operations that reload or rewrite entire tables wholesale, you might benefit from updating your integration operations to use incremental processing. Incremental processing involves working with just the data that changed and only updating (or retrieving, for outbound operations) that changed data in Salesforce.
Why You Should Avoid Full Reloads If Possible
Doing full reloads in cases where you could be doing incremental processing can potentially cause many problems in your organization.
A full reload integration operation might use an unnecessarily high amount of Salesforce server resources, which could result in degradation in organization-wide processing performance, which could impact Bulk API jobs (or other data loading jobs), Dashboard and Report performance, Apex code (@future or batch Apex), and even general page performance.
A full reload integration operation will take longer to complete than one that uses incremental processing. This can cause a synchronization gap between systems. For example, if your integration operation takes several hours or longer to complete, there will be a significant delay before users see Salesforce updates that match with the external system changes, which might result in user confusion on whether the external changes were really synchronized or not.
In extreme cases with a large number of records, a full reload integration could take so long to run that it might overlap with a different operation (or even itself, if the operation is run on a regular basis). This could result in excessive resource consumption, record locking, performance degradation and possibly even failed data changes that would leave the data in an invalid state.
Full reloads can even affect other aspects of Salesforce in subtle ways. For example, one way an integration operation might do a reload would be to delete all records for a given object, and then do an insert of all records (changed and unchanged) using the external resource data. Normally, deleted records are moved to the recycle bin, and once the recycle bin is emptied, a regularly scheduled Salesforce process (the physical delete process) does the final removal of the records. Until that happens, the deleted records can actually affect the query optimizer. This in turn will result in suboptimal query plans (largely on data which hasn’t really changed), that will impact query and report performance.
Finding Integration Operations That Could Benefit From Incremental Processing
There are many scenarios where you might have created an operation that does full reloads of record sets. You should review the following examples to see if they match your operations, and if so, consider updating your operations to use incremental processing.
- You might have simple, frequently run integration operations that needed to be created quickly. It’s often easier and faster in the short-term to develop integration operations that do full reloads, so your initial development work might have just used full reloads. You might have even intended to revisit these operations and update them when more time and resources became available.
- You might have operations that involve complex business requirements or technical limitations that meant that only full reloads would work. Sometimes these types of operations can’t be modified to use incremental processing due to the requirements. However, sometimes the requirements or limitations have changed over time.
- You might have integration operations that are not run on a regular basis, or run infrequently, but still do full reloads. These operations are likely still doing more work than they need to, even though they’re not being run on a regular basis.
When Should You Use Incremental Processing?
You should use incremental processing whenever the Salesforce processing time to complete the integration data change is less than the processing time it would take to rewrite or reload the entire data set. In most scenarios, for regularly run integration operations that are synchronizing data with Salesforce, the expected processing time for the modified records should be less than the time it would take to completely reload all records.
When Should You Use Full Rewrites?
There are scenarios where you won’t be able to use incremental processing. If the data changes affect all, or the majority of records for a particular object, you’ll probably need to do full reloads.
Additionally, as mentioned in the examples above, you might have situations where business requirements or technical limitations limit you from changing the operations to use incremental processing.
What are Some Ways to Apply Incremental Processing?
For inbound integration operations that are pushing data changes from an external system to Salesforce, here are some ways to use incremental processing.
For outbound integration operations that pull data from Salesforce into external systems, here are some ways to use incremental processing.
- The data change source, can set a timestamp when the data is modified, and only data modified past a certain timestamp needs to be synchronized with Salesforce. Alternately, you can use any sort of custom flag to indicate a record needs to be synchronized.
- Many Extract, Transform and Load (ETL) tools automatically support incremental data loads. If your operations are using ETL tools, you should investigate whether you can configure the tools to use incremental data loads (many ETL tools support incremental data loads).
- For business requirement or technical limitations, there might be ways to use features of Salesforce, like Apex Web Services or Apex REST, to work around the limitations. As an example, if you’re trying to update sharing changes from an external source and can’t use Apex triggers on the sharing tables, see if you can implement an Apex Web Service that can be called directly when the external change occurs, that can in turn apply the incremental change immediately, or update a staging table with incremental changes that can be applied later. See Exposing Apex Methods as SOAP Web Services and Exposing Apex Classes as REST Web Services on Salesforce Developers for more details.
- For queries, consider using SystemModStamp or your own record-level flag to control which records get exported from Salesforce. See System Fields for more details on SystemModStamp. If you're using the SOAP API or REST API, the getUpdated() call also utilized SystemModStamp when available. Also, if you are using a SOQL query, take a look at the various ways to make sure it’s selective, rather than getting all rows: How can I make my SOQL query selective? (And the process to determine the fields that can be custom indexed).
- If you’re using an ETL tool for your outbound integration operation, many ETL tools support ways to do incremental data retrieves that you should consider taking advantage of.
- Use Salesforce features that can send notifications of the incremental changes to an external system, such as Apex Callouts or the Streaming API. As an example, you could set up a Streaming API PushTopic to send notifications when a particular field on a particular object is modified, and have your external system subscribe to the PushTopic channel directly. For more information on the Streaming API, see the Streaming API Developer’s Guide on Salesforce Developers.
On average, it is more common that data integration operations process a relatively small percentage of the overall data set involved. Because of this, if you have any integration operations that are doing full reloads, you should review these operations to see if you can utilize incremental processing. Doing so will improve the overall performance of your organization, remove potential sources of data integration errors, and help ensure your architecture can scale with your business.