Force.com Query Optimizer FAQ
Knowledge Article Number: 000181277
Internal Query Optimizer implementation
Q: Does Salesforce have its own SQL optimization logic instead of Oracle's built-in?
A: Force.com is a multitenant platform. To consider multitenant statistics, the platform has its own SOQL query optimizer that determines optimal SQL to satisfy a specific tenant's request.
Q: How frequently generated are the statistics used by the query optimizer?
A: Statistics are gathered weekly. Where statistics are not calculated, the optimizer also performs dynamic pre-queries, whose results are cached for 1 hour.
Q: Is there any way to flush the cache when doing your performance testing so your results are not cache biased?
A: Unfortunately not. Queries with selective filters will perform more consistently with less performance variation due to caching, however.
Q: Does salesforce.com provide any archiving options?
A: The best option for archiving large amounts of org data is the Force.com Bulk API. Many tools provide nice UIs for this API, including many third-party tools in the AppExchange.
Q: What archiving strategies should be used?
A: Archiving strategies are often time-based. For example, archive all Opportunities closed for more than six months. You could use the Force.com Bulk API to query and copy this data to an external system, then use the Bulk API to delete the records you just archived.
Q: Will there be a different tool than the Developer Console be available in the next releases to make query-testing easier? If so, will it be able to advise on how to improve the performance of the query?
A: (Safe Harbor) We are considering enhancements that will help you profile the performance of queries, but keep in mind that providing suggestions on how to improve the performance is a very hard problem in computer science. i.e. showing that a query performs poorly is easy, but rewriting is hard.
Q: Can SFDC change to use ANSI standard SQL?
A: Force.com's query language, SOQL, is not ANSI standard SQL. It is, however, similar in many ways and should be easy to learn for most SQL-literate developers. If you really must use SQL with Force.com, consider third-party data access drivers (ODBC and JDBC), such as those from Progress Data Direct (http://www.datadirect.com/products/odbc/salesforce_crm_odbc_driver/index.html).
Q: Why is salesforce.com biased on threshold? I know there is some overhead, but most of the time customers are interested in custom indexes and not standard indexes.
A: The difference in thresholds between standard and custom indexes is driven by the database architecture and our multi-tenancy model. The Force.com Query Optimizer webinar (www.youtube.com/watch?v=mrOzUcycqZM) shows how you could leverage optimizations that could maximize gains of using custom indexes.
Q: What will happen if our query hits the threshold?
A: When the optimizer determines that a given filter condition exceeds the corresponding selectivity threshold for an index, the optimizer does not use the index because the overhead of using the index is not optimal.
Q: Will an index be used if there is null values in indexed field?
A: Starting with Winter '13, indexes can include nulls. All standard indexes automatically include nulls. If a filter condition specifically searches for nulls and the index supports nulls, then the optimizer will consider the index -- it can use the index if the number of nulls in the index is below the selectivity threshold.
Q: What happens with record visibility when we have the class without sharing? Would it search the complete database?
A: The optimizer considers record visibility when determining a query execution plan. If sharing rules do not limit the visibility of data for the user executing a query, then the optimizer can only consider other approaches for record selectivity (indexes, skinny tables, etc.).
Q: Do System Administrator queries get handled quicker than other profiles?
A: Profiles which have view all rights don't go through Sharing evaluation and SQLs run will be different compared to profiles with limited view of data. Depending on selectivity we drive from Sharing or selective filters, whichever is most efficient.
Q: Is there a limit to how many values you can have inside in IN() clause?
A: There is no limit on the number of values inside the IN clause. The only limit would be that you cannot exceed the max size of SOQL query (10,000 characters in Spring ‘13 and 20,000 characters in Summer ‘13).
Q: Do cross object formula fields have performance implications?
A: Several impact possibilities: We do not support indexes for formulae as detailed by the following Wiki article: http://blogs.developerforce.com/engineering/2013/02/force-com-soql-best-practices-nulls-and-formula-fields.html
Formulae are also expanded into more elaborate SQL and, based on their complexity, can throw a SQL query limit exception. They are generally slower to filter by when not indexed.
Q: If we need to display 1M rows by setting a Visualforce page as read only, could indexes play any part in such a scenario?
A: Within the controller / extension class, yes: any SOQL that retrieves records to be displayed is subject to the issues discussed in this article. Performance for those records to be rendered on the page is based on your Visualforce markup.
Q: What's the meaning of SOSL?
A: SOSL stands for Salesforce Object Search Language. It is the search language that you use to search for text across one or more objects. The language closely resembles the search language you use with Apache Lucene. Please see the following link for more information: http://www.salesforce.com/us/developer/docs/soql_sosl/index_Left.htm#StartTopic=Content/sforce_api_calls_sosl.htm
Q: Should we prefer SOQL over SOSL?
A: SOQL is Force.com's database query language, similar to SQL. SOSL is Force.com's full-text search language. If you need to retrieve less than 200 records and you are searching over text fields by using an OR clause, SOSL should be used instead. Please notice in Summer ‘13 SOSL will retrieve up to 2000 records. See the Quick Links here to read more about each language: http://wiki.developerforce.com/page/Documentation
Q: When should we decide if SOSL is going to perform better than SOQL?
A: Generally speaking, SOSL is optimal in these scenarios:
Searching, across multiple objects, text-oriented fields that are not indexed
When you need "fuzzy" text matching
When support of Chinese, Japanese, and Korean (CJK) languages is required.
Q: What is the requirement to exercise the benefit of standard index? What fields are part of standard index?
A: Various standard fields (e.g. Name, Owner, etc.) for both standard and custom objects have an index. If you use such fields in a filter condition, the optimizer considers using the index.
Q: When will custom indexes be considered by Salesforce?
A: Presence of indexes is checked by our query optimizer when the query is run. Custom indexing a field will improve performance only when the filter that uses this field is selective. If you need assistance with a particular query, please raise a case with Salesforce Support.
Q: Is there a way to know what fields are indexes?
A: Please check the "Database Query & Search Optimization Cheat Sheet" that you will find at http://help.salesforce.com/help/pdfs/en/salesforce_query_search_optimization_developer_cheatsheet.pdf.
At present the setup UI does not display custom indexes, but we are working to enhance the user interface in the near future to display which fields have an index (Safe Harbor).
Q: Is External Id a standard or custom index?
A: Fields marked as External Id are always custom indexed.
Q: Are all fields marked as unique indexed automatically?
A: That's right, Unique and ExternalId and foreign keys are automatically indexed whenever possible.
Q: Are there any plans to make datetime field "indexable"?
A: With the help of Salesforce Support, you can request an index for Date/Time fields.
Q: Are the contact emails and phone fields indexed or only if they are requested?
A: Contact Email field is standard indexed, but phone field is not indexed by default.
Q: Is there any limit on number of custom indexes?
A: Yes, the platform supports a limited number of indexes per object. Generally speaking though, it is best practice to put in place custom indexes only when it will help performance of queries, but not so many that it detracts from the performance of DML operations (INSERT, UPDATE, DELETE) and bulk data loads.
Q: Can custom indexes be packaged?
A: Custom indexes associated to fields marked as External Id will be packaged.
Q: Will the index roll over to customers installing the app? Lets say I built appexchange app and ask Customer Support to create an index. Will the indexes hold true in my customers org?
Creation of indexed fields
Q: How do you index a custom field which isn't an external ID? How do we create a custom index?
A: To create custom indexes on picklists, checkboxes, and formula fields you will need to contact Salesforce Support. You can add a custom index to text fields by marking them as External Id.
Q: Any plan to include the functionality of creating index as an option on the UI while the object fields are being created?
A: Right now, there is no plan to allow for creation of custom indexes that way. The overhead is what we like to keep in mind and have only necessary indexes in place. Adding indexes, evaluating them and then having to drop them is not ideal, hence we would like to analyze the impact of custom index first, before creation.
Q: Any plans to automate things like index or skinny table creation if trends are spotted in an organizations queries?
A: Internally, the optimizer does keep track of candidates for custom indexes, and automatically creates one-column indexes that it determines will help selectivity. Skinny tables are created only after working with Salesforce Support. This blog post talks about the considerations needed before creating skinny tables: http://blogs.developerforce.com/engineering/2013/03/long-and-short-term-approaches-for-tuning-force-com-performance.html
Q: How long does it take for Support to index a field once we make a case?
A: Salesforce Support will consider each scenario on a case by case basis and will suggest what approach to take, as indexing a field will not necessarily make the query perform better if the filter that uses the indexed field is not selective.
Q: How long does it take for the index to "kick in" after I mark a field as an external ID?
A: Once you mark a field as external ID, Force.com queues the creation of the index. The amount of time to create the index depends on how much data is already in the object. If the object is empty, then the index is created very quickly. If the object has millions of records, it might take a few minutes for the index to be built in the background.
Optimizing SOQL queries
Q: Beside using SOQL directly in for loop for large data set, what else can be done to optimize it?
A: Using SOQL for loops makes the code within the for loop be executed once per 200 records, but it does not actually improve the performance of the query.
Q: Does converting date literals to specific date/datetimes improve performance?
A: This should make no performance improvement.
Q: Is using nested queries good practice?
A: Nested queries in a SOQL would face the same rules as any other SOQL when it comes to making it perform well. Adding selective filters, limiting the number of records retrieved, and removing redundant joins to objects will help improve performance. Use of nested queries should be driven from the use case.
Q: Will NOT IN work the same as IN when it comes to optimization?
A: The optimizer interprets "field NOT IN (value, value, value)" as "field != value AND field != value AND field != value". Notice at present only != null and != on booleans are optimizable.
Q: What is the best way to write a "not in" SOQL query, so that the query optimizer will use an index?
A: These are some ideas:
1) Switch to an IN operator if the number of values is doable and the union of them would be selective
2) Add another selective filter to your query.
Q: Let's say we have 30 values out of 32 possible values in our IN clause. Is having all that in the IN Clause going to be better than using a not equal to?
A: In this scenario, field != value1 and field != value2 is not optimizable. However if the result of field IN (value1, value2, …, value30) is selective (below threshold), the index will be considered.
Q: Is ORDER BY Id DESC without a LIMIT going to make a query selective?
A: An ORDER BY clause doesn't have anything to do with selectivity. Selectivity is determined by available indexes that align with filter conditions (WHERE clause) and record visibility (sharing rules, etc.). Once the optimizer determines which rows to return, it applies the ORDER BY logic to sort the records in the return set.
Q: We include an "ORDER BY Name" statement in an important SOQL statement that has performance issues at times. We could probably get away with not doing this. Is this practice good or bad or does it not impact much?
A: ORDER BY without LIMIT clause is not going to be optimizable. However, ORDER BY + LIMIT on an indexed field (standard or custom) can actually be a very useful optimization (i.e. a page displaying the first 10 rows of a potentially large, 1000+, list of records).
Q: What about dynamic SOQL? Is there any optimization of repeated invocations of a dynamic SELECT with the same fields retrieved and in the WHERE, but different lookup values in the WHERE?
A: The optimization will be determined based on each individual value provided per invocation.
Q: How do Divisions help? What are the best practices for including Divisions on filters.
A: Divisions is a feature that you can use to physically partition data. Divisions makes sense when you query data according to the partition key (the field that determines in which division to place a record). For more information, see https://help.salesforce.com/HTViewHelpDoc?id=admin_division.htm&language=en_US
Q: What approach should I follow to test selectivity?
A: A sample analysis can be seen here: http://www.salesforce.com/us/developer/docs/apexcode/Content/langCon_apex_SOQL_VLSQ.htm
Q: In a batch process I want to query one million records from an object through an indexed field. It works if the object has 7 million records, but if the object has more than 10 million records, I get a timeout from the db and the batch won't work. What can i do?
A: Take a look at this blog about large data volumes and batch apex. http://blogs.developerforce.com/engineering/2013/02/force-com-batch-apex-and-large-data-volumes.html
Q: I have a query (with multiple subqueries) that takes up to 40 seconds to execute the very first time its executes, regardless of the # of records returned. Any further query are quick. Is this possibly due to the fact the query is not known at compile time so it can't be optimized? I could try to rewrite the query with an explicit query. Or would it be better to break out subqueries into separate transactions?
A: Runtime varies as once most of the rows are in the db cache, the fetching time will decrease considerably. If the subqueries join large objects, you may want to split the query into multiple transactions. Additionally query cursor locators are something to consider when using subqueries.
Q: How can I optimize a query like the one below?
SELECT id, Email, Phone, MobilePhone, HomePhone, OtherPhone, Owner.profile.name, Account.IsPersonAccount
Email IN :emails
OR Phone IN :phones
OR MobilePhone IN :phones
OR HomePhone IN :phones
OR OtherPhone IN :phones
A: If the above query is within threshold (OR clause returns less than 10% of the records), the query will be optimized, but if it would return most of the Contact records, then it won’t be optimized and SOSL should be considered instead.
Q: What are skinny tables?
A: A skinny table is a custom table in the Force.com platform that contains a subset of fields from a standard or custom base Salesforce object. For more information, please read http://blogs.developerforce.com/engineering/2013/03/long-and-short-term-approaches-for-tuning-force-com-performance.html
Q: What is the storage cost for skinny tables ?
A: The storage consumed by a skinny table does not count toward your org's storage.
Q: What is the validity of Skinny table? Does it expire after enabling?
A: Skinny tables do not expire, and are automatically updated by the system as data is uploaded.
Q: If I create a new field will it automatically be added to the Skinny table?
A: Although data for existing fields in the Skinny table are automatically updated, the skinny table will need to be recreated if a new field has been created.
Q: Do we have to ask salesforce.com Support to enables Skinny tables ?
A: Skinny tables are created by our DB Performance Engineering team upon analysing each specific case, if it would improve the performance of reports, list views, or SOQL queries.
Q: Would we get an option for creating skinny tables or indexes by ourselves in future releases?
A: It's not planned at this time. We want to analyse the use case to make sure that skinny tables will actually help it.
Q: Do skinny tables count against the data storage limit?
A: No, they don't count against your org's data storage.
Q: What does a customer see when a skinny table exists? table name, etc? Is there any artifact visible in the web UI, or customers need to track which skinny tables exist for their orgs?
A: There's nothing visible to the customer when a skinny table exists. It's entirely transparent.
Q: Can skinny tables be created to fetch columns from parent-detail objects?
A: No. You may want to consider denormalizing the data using, for instance, a trigger to copy the value from the parent record to the detail record.
Q: Can a skinny table be created on the Task object?
A: No, they cannot be created on Activites (Tasks or Events).
Q: Can skinny tables be created on custom objects?
A: Yes, skinny tables can be created on custom objects.
Q: Where can I find more information on skinny tables?