You are here:
Compare Data Federation Methods
Data can be federated via live queries, accelerated queries to the local data cache, or via file federation. Each option supports different use cases. Performance, data freshness, and credit consumption considerations vary.
Understanding the differences between the three types of data federation helps you select the right method for your use case.
Live Query Federation
Live queries run in real time and return fresh results from the data source.
- Use Cases: Live query data federation is best for interactive analysis and real-time dashboards. Optimized for infrequent queries. Live query data federation supports real-time personalization and dynamic workflows.
- Data Source Location: External data lakehouses
- Data Freshness: Queries run in real time. Overall data freshness depends on how often the source data is updated. Live query data federation supports sub-second decisioning.
- Performance Drivers: Highly dependent on the external source system's performance. Optimized when predicates and aggregations can be pushed down to the source.
- Credit Consumption Considerations:
- Credits are consumed per query based on the number of rows accessed. Credits are primarily consumed in the Data Federation and Data Shared Accessed usage type.
- Live queries can be cost effective for infrequent queries. However, frequent queries on infrequently changing data can result in repeatedly consuming credits to query the same data, making this option more expensive than accelerated queries.
Accelerated Query Federation
When acceleration is enabled for query federation, Data 360 maintains a local data cache that is updated at set intervals. Accelerated queries access the local data cache instead of querying the data source directly.
- Use Cases: Accelerated data federation is best when queries are frequent and data doesn’t change frequently. Enabling acceleration improves performance for frequent access patterns by keeping a local cache that’s updated on scheduled intervals. Accelerated data federation is suitable for dashboards and segmentation. If your data changes more frequently than your cache is updated, acceleration isn’t suitable for sub-second decisions.
- Data Source Location: External data lake
- Data Freshness: Freshness depends on the selected cache interval, which is configurable from 15 minutes to 7 days.
- Performance Drivers: Reduces latency compared to repeated live queries.
- Credit Consumption Considerations:
- Credits are consumed per row of data added to or updated in the cache. Credits are consumed in the Batch Data Pipeline usage type when the cache is updated.
- Credits are also consumed by use of data queries when the cache is queried.
- The accelerated query cache can contribute to usage of your org’s Data 360 storage.
- Credit consumption for accelerated queries is similar to credit consumption for the same amount of batch data via batch data ingestion.
File Federation
In file federation, Data 360 uses metadata from the data source to build virtual tables, which are used to access and read files in the data source.
- Use Cases: File federation is best for large-scale batch processing and for AI and machine learning model training. File federation is ideal for historical analytics and petabyte-scale reporting. It isn’t suitable for real-time dashboarding.
- Data Source Location: Cloud object stores or cloud data lakes
- Data Freshness: Freshness depends on how often the source files are updated.
- Performance Drivers: Performance depends heavily on object format, partitioning, and external system throughput. Use partitioned, columnar formats (Parquet).
- Credit Consumption Considerations:
- When using an AWS setup that is in the same region as your Data 360 tenant, credits aren’t consumed for rows accessed. For example, if the customer has data in S3 in US-East-1 and their Data 360 tenant is also in US-East-1, then there credits aren’t consumed for rows accessed from the customer’s data source by Data 360.
- If your AWS setup in a different region than your Data 360 tenant or uses a different cloud, such as Azure, then the you consume credits for rows accessed.

