Data Stream Settings and Refresh Modes

You are here:

Data Stream Settings and Refresh Modes

With data streams, you can schedule or run jobs to ingest data from a source system. These jobs pull data from the source system into the connected data stream. The method and frequency options for data ingestion depends on the type of connector. Use the data stream settings to create or change the execution parameters of a data stream.

Required Editions

Available in: Lightning Edition

Refresh Modes

The Refresh Mode settings determine how data is ingested into a data lake object (DLO). Efficient use of these modes ensures data freshness and cost reduction by improving data processing efficiency.

Setting Description

Incremental

Setting	Description
Incremental	This mode is available for most non-file-based data streams that have a Datetime field. Select the field that indicates when the incoming record was last modified. For each refresh, the data stream is updated with data added or changed since the last refresh. To make sure that there’s no data loss, the last few records from the previous run are reprocessed during each job run. To remove records from an incrementally updated DLO, mark deleted records in your source system with a Delete Record Flag. For example, `record_delete_flag = true`. The DLO will then remove these records during the next update. You can permanently delete them from your source system after a successful update run. Incremental refresh is also supported in Zero Copy Data Federation, but deletion of records during incremental refresh operations isn't supported. For more information, refer Caching or Acceleration in Data Federation.
Upsert	This mode is available for data streams created from file-based connectors. For each refresh, the data stream is updated with data added or changed since the last refresh. When the refresh mode is set to upsert, you can efficiently delete records from a DLO using a file. See Delete Records by Using a Delete File.
Full Refresh	During each refresh cycle, all existing data is deleted and replaced with the newly imported dataset. If no other refresh setting is available, full refresh is the default refresh mode.

This mode is available for most non-file-based data streams that have a Datetime field. Select the field that indicates when the incoming record was last modified. For each refresh, the data stream is updated with data added or changed since the last refresh.

To make sure that there’s no data loss, the last few records from the previous run are reprocessed during each job run.

To remove records from an incrementally updated DLO, mark deleted records in your source system with a Delete Record Flag. For example, record_delete_flag = true. The DLO will then remove these records during the next update. You can permanently delete them from your source system after a successful update run.

Incremental refresh is also supported in Zero Copy Data Federation, but deletion of records during incremental refresh operations isn't supported. For more information, refer Caching or Acceleration in Data Federation.

Upsert

This mode is available for data streams created from file-based connectors. For each refresh, the data stream is updated with data added or changed since the last refresh.

When the refresh mode is set to upsert, you can efficiently delete records from a DLO using a file. See Delete Records by Using a Delete File.

Full Refresh During each refresh cycle, all existing data is deleted and replaced with the newly imported dataset. If no other refresh setting is available, full refresh is the default refresh mode.

Conditional Settings

Setting	Description
Filter	Use a filter to extract only the records that fulfill the specified condition. The syntax of the condition depends on the source system. For example, on a source that relies on SQL92 syntax, the filter could be, `s_nationkey = 10 and s_acctbal > 2000`.
Supported Operators	For database sources, see the list in the section entitled "Supported Operators" below for more details.

File Settings

These settings are specific to data streams that use file-based connectors.

Setting	Description
Log an error if no file is found	When enabled and a file isn’t found during the data stream run, the line item in the refresh history for that run shows a failure status. A notification is triggered if you subscribe to data stream failure notifications. For more information, see Data Cloud: Control Event Notifications Using Flows. When this setting isn’t enabled and a file isn’t found during the data stream run, the refresh history shows 0 records but with a success status.
Is headerless file retrieval enabled	When set, Data 360 accepts the data records without headers. Don’t turn on this setting while you’re creating the data stream because the header information is necessary for the initial run when the schema is being generated. You can enable the setting after the initial run. To avoid data corruption, maintain the order of fields in the incoming data records.
Refresh only new files	When set, all files are retrieved when the data stream is first run. For subsequent runs, only new files are retrieved.
Refresh initial file immediately	When set, Data 360 doesn't wait until the scheduled run and retrieves the initial file immediately.

Supported Operators, Functions, and Keywords

For database sources,the WHERE clause supports the following:

Supported Operators

Comparison: <, >, <=, >=, =, <>
Pattern matching: LIKE, RLIKE
Null checks: IS NULL, IS NOT NULL
Set membership: IN, NOT IN
Range evaluation: BETWEEN
Uniqueness: DISTINCT

Supported Functions and Keywords

The following SQL functions and expressions are allowed within the WHERE clause:

CEIL
CFLOOR
CAST
TRIM
TIMESTAMP
DATE

Not Supported in the WHERE Clause

The following constructs are not supported and will result in an error if used:

DML operations: INSERT, UPDATE, DELETE
Complex query structures: JOIN, UNION, INTERSECT, and subqueries
Control characters: semicolons (;)
Any other SQL functions or keywords not explicitly listed in the supported sections above

Schedule

The frequency setting describes the time interval between executions of your data stream job. Based on connector capabilities, supported schedule intervals can vary.

Setting	Description
Frequency	Set the frequency of data refresh by choosing the relevant frequency. The available refresh frequencies are: Hourly Daily at a specified hour Weekly on a specified day and time Monthly on a specified day and time All frequency times are in UTC and are applicable for both full and incremental refreshes. AWS S3, Azure Blob Storage, and Google Cloud Storage data streams have 5-, 15-, or 30-minute intervals. These connectors can synchronize data and deduplicate jobs to provide these narrow intervals.

Setting

Description

Frequency

Set the frequency of data refresh by choosing the relevant frequency. The available refresh frequencies are:

Hourly
Daily at a specified hour
Weekly on a specified day and time
Monthly on a specified day and time

All frequency times are in UTC and are applicable for both full and incremental refreshes.

AWS S3, Azure Blob Storage, and Google Cloud Storage data streams have 5-, 15-, or 30-minute intervals. These connectors can synchronize data and deduplicate jobs to provide these narrow intervals.

Did this article solve your issue?

Let us know so we can improve!

Data Stream Settings and Refresh Modes

Required Editions

Refresh Modes

Conditional Settings

File Settings

Supported Operators, Functions, and Keywords

Schedule

General Information

Required Cookies

Functional Cookies

Advertising Cookies

General Information

Required Cookies

Functional Cookies

Advertising Cookies

Cookie List

Product Area

Feature Impact

Edition

Experience

Data Stream Settings and Refresh Modes

Required Editions

Refresh Modes

Conditional Settings

File Settings

Supported Operators, Functions, and Keywords

Schedule