Incremental Load in an Ingest Instance
Note: This article is intended for database administrators. Ensure the following tasks are completed by someone with the appropriate database administration privileges.
This article describes how to set up incremental loading for an Ingest Instance. See Incremental load in Prepare Instances for more information on setting up incremental loading in a data area within a Prepare instance.
Set up incremental loading in an Ingest instance
In Ingest instances, the incremental load setup is based on rules that are set up to apply for specific columns. When there is a "hit", incremental load can be applied, only loading data where the value is larger than the maximum value for the previous load.
Before you set up these rules, we recommend that you set up the primary keys on the tables that will be incrementally loaded. See Set up Primary Key Rules on an Ingest Instance Data Source for more information.
To add an incremental load setup rule, follow the steps below.
- Right-click the data source and click Set Up Incremental Load. The Incremental Load window opens.
- Click Add… to open the Add Incremental Load Setup Rule window.
- In the Add Incremental Load Setup Rule window, enter the Schema, Table, and Column that contains the value the incremental load should be based on. For each item, you can choose different comparison options: contains, equals, not contains, not equals, and case-insensitive variants of each. This makes it easy to set up a rule that applies across the data source if all tables contain the same useful incremental column, e.g. "ModifiedDate".
- Subtract from value: This option allows subtraction from the field to which the rule applies. This feature is useful for data sources where the modified date is a Date field instead of a DateTime field. Additionally, it can be used for data sources with a created date but no modified date, where incremental loading would improve performance. In such cases, incremental loading based on the created date with an offset allows updates to occur on rows recently created, provided the changes happen within the defined interval. The Amount specifies the decrement to apply, which could be on a timestamp or similar field. Time defines the units—Seconds, Minutes, Hours, Days, Weeks, or Years—that will be subtracted from the last added DateTime.
- If rows can be updated or deleted in the source table(s) you are loading from, the settings under Additional actions can keep the tables up to date without regular full loads. Select Handle primary key updates to update rows in the Ingest storage when a row with the same primary key is loaded from the data source. Select Handle primary key deletes to remove rows from the Ingest storage that have been removed from the data source. When you enable the Handle primary key deletes feature, a primary key (PK) table or folder will be created where the primary keys are stored for comparison to the source for deletes.
- Click OK.
The rules are applied on execution. To see what tables will be affected, click Refresh and review the bottom list in the Incremental Load window.
If you have enabled Handle primary key updates/deletes, Jet Analytics Data Integration will check if the affected tables have a primary key set. If not, the tables that are missing primary keys will be listed in the Incremental Load Primary Key Validation window. Select the primary keys for these tables, then click OK.
Set up incremental load for REST data sources
Performing an incremental load on a REST data source is different from doing so on a traditional database. It is not as simple as appending a general WHERE-statement as you could do with a database data source. Each REST data source has its own requirements and behavior.
In order to support incremental load on any REST data source, a mechanism is needed to create the HTTP request that the data source requires. For the Jet AnalyticsREST data source, this is accomplished by using dynamic values, which allow the last maximum value to be placed at any point in the request that is needed.
The REST API flavor of incremental load does not support handling of updates and deletes, so periodic full loads may be required to keep data completely in sync with the data source.
A quick refresher on dynamic values: Dynamic values are placeholders that are replaced at runtime with values from different sources (another endpoint, a SQL query, etc.). These values can be injected into several places, for instance in the path, headers, query parameters, post body, and even the table builder.
Setting up a REST data source for incremental load is a two-step process. The first step is to select which columns to use for incremental load inJet Analytics Data Integration, and the second step is to use them in the data source configuration in the Jet Analytics Portal as dynamic values.
Step 1: Selecting columns in Jet Analytics Data Integration
To select the columns, follow the steps below.
- Right-click the data source and click Set Up Incremental Load. The Incremental Load window opens.
- Click Add to add a new incremental value column. Unlike the regular functionality, these columns are explicitly selected for each table, not based on rules.
- (1) is the endpoint name.
- (2) is the table name.
- (3) are the available columns. Only columns that can be used for incremental load will be listed.
- Under Incremental load rule, set the options for the selected column:
- Subtract from value: A predefined constant will be subtracted from the last maximum value. For a column with the DateTime data type, this is a time value and the granularity can be changed in the right-hand dropdown.
- Dynamic value name: The dynamic value that must be used in the data source configuration in order to use the last maximum value. This is the core of how incremental load is driven for REST data sources. In the Jet Analytics Portal, this will be the placeholder for the dynamic value and should be written in brackets, e.g. "{TXINC_StandardDataTypeResponse.DateTimeData}". If a subtract value is set, the subtracted value is what will be used in the dynamic value. The name can be changed, but if it is changed it must also be updated in the Jet Analytics Portal wherever it is used.
- Full load value: The value that will be used for the dynamic value when a full load or import metadata is executed. In an incremental load setup for a database data source, performing a full load is straightforward—simply remove the WHERE-statement from the query. When doing incremental load on a REST data source, a value must be included in the query since sending an empty value would not work.
-
Date time format: How to format the datetime when using it in the data source. Only displayed if the column data type is DateTime. Choose a format from the list or enter your own. In addition to datetime-like formats, unix_timestamp and unix_timestamp_ms are available for REST data sources that require them.
- Decimal separator: What to use for the decimal separator. Only displayed if the data type supports it.
- Click Use target-based incremental load to enable a form of incremental load for data sources that do not support real source-based incremental load. With this option enabled, dynamic values are not used. Instead, all data will be loaded from the data source, but only new data will be inserted into the Ingest storage. For this reason, performance improvements are usually limited compared to a regular full load, and this functionality is provided as a backup option only for when the latest data cannot be loaded directly from the data source. Show advanced features must be enabled for this option to be visible.
- Click OK to save your changes.
Step 2: Setup in the Jet Analytics Portal
The changes required in the Jet Analytics Portal will depend on the data source being connected to. Refer to the data source documentation on how to extract only the required data. In the example below, the data source requires adding a query parameter that instructs the REST data source to start from a specific date and retrieve only data that is newer.
Enabling incremental load on a task
A transfer task will use incremental load if Use incremental load when available is enabled and incremental load rules are set up.