Incremental load in Prepare Instances

Note: This article is intended for database administrators. Ensure the following tasks are completed by someone with the appropriate database administration privileges.

Note: Relates to Jet Analytics Data Integration 6024.1 and later versions.

Incremental Load Rules in a Data Area

Tables within a Data Area can have multiple mappings, with each mapping possessing its own incremental load rules, rather than a single overarching rule. This allows for flexibility: one mapped table can run a full load, another can run incrementally with deletes, and a third can run incrementally with updates.

Incremental loading operates by using the max batch number from the incremental table and the source table, comparing it with the Primary Key table (the _PK table) to identify updates, deletes, or new rows. Therefore, it is recommended to configure the primary key field for the table when setting it up for incremental loading.

To set up incremental loading for a table in a Data Area, incremental loading must first be configured in the data source within the Ingest instance to which the table is mapped. For detailed instructions, refer to Incremental Load in an Ingest Instance.

Setting up incremental load on tables

Mapping a table from an Ingest source to a Data Area table

When you map a new table from an Ingest instance, the data extraction setting for the table will be set to Automatic. If the table has an incremental load rule applied in the source, the table will load incrementally. This will be represented by an "I" icon for the mapping in the mappings folder under the table.

Below is an example of a table that is brought in from the Ingest to the Prepare instance, where the mapped table has an incremental load rule applied in the Ingest data source.

It is possible to enable soft or hard deletes by right-clicking on the table and selecting Table Settings, then selecting the Data Extraction tab. There is no need to change the data extraction setting from Automatic to Incremental Load, since the table is already using incremental load automatically based on the incremental mapping; however, you can change it to incremental load if you wish.

If there is no incremental load rule applied in the source, the mapping for the table will not display an "I" icon, and a full load will occur when the table is executed. See an example of this below.

Data Extraction is still set to Automatic by default, and since there is no incremental rule applied in the source and the table mapping is not incremental, the table will automatically be fully loaded. Therefore, there is no requirement to change the data extraction setting from Automatic to Full Load; however, you can change it to Full Load if you wish.

If you want to force a table that has an incremental load mapping to always be fully loaded instead, this can be done by changing the setting to Full Load. Note that this will gray out the delete options, as they will no longer apply.

Mapping multiple tables to a Data Area table from multiple Ingest tables

Incremental rules will be set on each of the mapped tables individually, so it is possible to have one mapped table that is being loaded incrementally and another mapped table that is fully loaded. The Automatic setting is appropriate in these instances.

Mapping tables from another Data Area

If you are mapping a table to a table from another Data Area, and the source table has an incremental rule set up in the Ingest Instance, we recommend that you use the Automate feature to base the incremental rule on the auto-suggested IncrementalTimeStamp field. This IncrementalTimeStamp field is mapped to the DW_Timestamp field of the source table. However, if the source table does not have an incremental rule set up in the Ingest Instance, we recommend that you set up the incremental rule on a date field (e.g. a modified date field).

When the rule has been applied, the incremental load table will contain the incremental rule.

This will require that the data extraction settings are set to Incremental Load and not Automatic.

If you are mapping more than one table that is being incrementally loaded, it is possible to see that there is more than one incremental rule when reviewing the incremental table, as shown below.

If a table has a mapping from an Ingest instance and another mapping from a table in another Prepare instance, it will not be possible to use incremental load. This is because one uses the DW_TimeStamp field and the other uses an Ingest Batch number. You will need to split them into two tables and then merge them at another point.

The incremental table

The _I table has seven fields.

Last id field
- This is the max DW_Id of the table before the latest execution.
Mapping id field
- This is the GUID value of the mapping of the two tables. It is used to ensure the correct table is always referenced.
Status field
- If you open a table during execution and view the _I table, this will contain a null value; after a successful execution it will be set to OK.
Create time field
- This is the value that is applied as the incremental value when data has been added for each table.
Ingest Batch Number field
- This is the reference field in the _R table that is used to determine the largest batch.
Ingest Batch Timestamp field
- This is the date of creation of the latest full load folder or table, depending on the store.
- Data Lake:
- SQL:
Ingest Batch Number Full Load field

Below is how the incremental table looks when two tables are mapped.

As shown here, the table has been executed at least twice but has not received new data from the source. That is why the Ingest fields contain null values.

The raw table

Two fields are added to a raw Data Area table when adding data from an Ingest instance to a Prepare instance.

Ingest Batch Number field
- The reference batch number from the incremental table.
Ingest Batch Id field
- The id of the batches. This is necessary because you can map more than one table, each having a batch 0, and you would not be able to differentiate between them otherwise.

Below is how the raw table looks when there is new data.

Relating that to the incremental table, the batch numbers will contain the same values.

Below is the reference in the data lake. Each file has a higher number, which is the batch number referenced in the other tables.

Below is the reference in an SQL store. It is a field containing the batch number.

Changes to the valid table

The only change is that since a specific field in the source is not used to generate an incremental field, that field is not present in the valid table.

Deletes

As with the other parts, it is also possible to set up deletes on Data Area tables.

Deletes work by comparing new rows with the primary key and then either deleting the row or changing the Is Tombstone field value from False to True.

You can see how this is implemented in the code by looking at the data cleansing procedure. Below is the procedure for the Country table using hard deletes.

-- Incremental load: primary key hard deletes

IF EXISTS

(

SELECT TOP 1 1

FROM [NAV].[Country_I]

)

BEGIN

DELETE V

FROM [NAV].[Country] V

WHERE

NOT EXISTS

(

SELECT TOP 1 1

FROM [NAV].[Country_PK] P

WHERE

P.[Company] = V.[Company]

AND P.[Code] = V.[Code]

)

AND NOT EXISTS

(

SELECT TOP 1 1

FROM [NAV].[Country_R] R

WHERE

R.[Company] = V.[Company]

AND R.[Code] = V.[Code]

)

END

You can also see how updates and soft deletes are applied by reviewing the data cleansing procedure in the same way.

Incremental load in Prepare Instances

Incremental Load Rules in a Data Area

Setting up incremental load on tables

Mapping a table from an Ingest source to a Data Area table

Mapping multiple tables to a Data Area table from multiple Ingest tables

Mapping tables from another Data Area

The incremental table

The raw table

Changes to the valid table

Deletes

Was this article helpful?