Print this page

Why do we get System.Exception: Regex too complicated error?

Knowledge Article Number 000005024
Description

What are some of the reasons for getting the "Regex too complicated" error?

Resolution

The error you are getting [System.Exception: Regex too complicated] happens on two different events:

1. Your Matcher is too complex

As documented here (http://www.salesforce.com/us/developer/docs/apexcode/Content/apex_classes_pattern_and_matcher_using.htm):

Salesforce limits the number of times an input sequence for a regular expression can be accessed to 1,000,000 times. If you reach that limit, you receive a runtime error.

Therefore the Matcher string can’t be longer than 1M characters. In this case, you should check the length of the Matcher string to prevent it from running if the string is over 1M characters.

2. Your Pattern is too complex

This is an example of a complex Pattern:

Pattern pat = Pattern.compile('(A)?(B)?(C)?(D)?(E)?(F)?(G)?(H)?(I)?(J)?(K)?(L)?(M)?(N)?(O)?(P)?(Q)?(R)?(S)?(T)?(U)?(V)?(W)?(X)?(Y)?(Z)?(AA)?(AB)?(AC)?(AD)?(AE)?(AF)?(AG)?(AH)?(AI)?(AJ)?(AK)?(AL)?(AM)?(AN)?(AO)?(AP)?(AQ)?(AR)?(AS)?(AT)?(AU)?(AV)?(AW)?(AX)?(AY)?(AZ)?$');
Matcher mat = pat.matcher('asdfasdfasdfasdfasdfasdf');

This type of pattern will fail with any string.

As you can see, this error can happen because one of two options (described above). As a rule of thumb, If the error happens always, then the problem is with the pattern and the solution is to simplify the pattern. If the problem only happens sometimes, then the matcher is longer than 1M characters and the string should be split before it is processed.

The operation is so time consuming for our  servers that if we detect this exception, we immediately kill the process because your process has already spent too much CPU time. This is the reason the error can’t be trapped, but good coding practices should avoid users from seeing this error.

In general, one or more of the following recommendations will need to be implemented as to minimize the likelihood of encountering the 1 million character access limit: 
1) The input sequence can be reduced 
2) The number of patterns being compared against the input sequence can be reduced, combined or eliminated -- particularly redundancies
3) If the batch must all be processed, you can occasionally obtain some leeway here by splitting up the input sequence into halves, cutting the input sequence in half and finding the next period after the half-way point, processing that as one batch and processing the remainder as a second batch via a @future call would be an option. (This is subject to asynchronous limits, so this approach is largely dependent on use cases, available asynchronous calls, etc.)





promote demote