Skip to main content

Use History to implement Slowly Changing Dimensions

Note: This article is intended for database administrators. Ensure the following tasks are completed by someone with the appropriate database administration privileges.

This article clarifies the concepts of slowly changing dimensions and explains how to implement a History table in Jet Analytics Data Integration.

What is SCD?

A Slowly Changing Dimension or SCD is a complex data warehouse pattern that maintains historical data for reporting even when that data is no longer available in the source.

Consider this example scenario: You report on customer shipments. Unfortunately, your data source only supports a primary address for each customer and previous versions are not available. Subsequently, a customer moves to a new location and the account manager updates the customer address in the source system. The Prepare Instance is updated, but now the historical shipping records point to the row in the customer table that has the new address, instead of the original address they were shipped to.

By implementing a slowly changing dimension, you can maintain the historical dimension data. There are several different types of slowly changing dimensions depending on how you want to store and display the data. Jet Analytics Data Integration allows you to specify these types uniquely for each field in the table.

SCD Types

Type 0 - Retain Original: Fields with this setting will retain the original value appearing in the table. These are dimensional attributes that should never change, such as birth date or original credit score.

Type I - Overwrite: This is the default option for all History table fields and will update values with the most recent source data. When enabled, the valid table will not truncate during execution, so deleted source records are kept, the natural key will remain unique, and DW_Id values will always stay aligned with the natural key. However, this option does not provide any insight into historical field values.

Type II - Add New Row: In order to preserve historical changes, this method will identify when the given field values have changed, maintain the original values, and create additional records with the new values. In this case, the natural key may be listed several times for each change to the source entry.

History Fields

In order to help keep track of the different versions of data in the table, Jet Analytics Data Integration auto-generates several system fields. To see these fields, you may need to enable Show System Fields in the application's View menu.

  • DW_ID is a unique identifier for each row and should be used as a dependable surrogate key for SCD Type II tables.
  • SCD From DateTime is the earliest date the record was valid.
  • SCD To DateTime is the latest date the record was valid.
  • SCD Is Current is set to 1 for rows that currently align with source data and 0 for all other historical rows.
  • SCD Is TombStone is set to 1 if the record was removed from the source. To see this field, it must be enabled in table settings.

Example

Below is a [Customer] table with the [Id] field set as a Natural Key and [City] and [State] as Type II. Here is how the data is handled when customer "ABC Solutions" moves to Austin, Texas:

As you can see, both records still maintain the natural key of "1", however there are now two records for this customer. The first shows their original City and State of "Portland, Oregon" and the second entry shows their new location. A new DW_Id value was generated to ensure this value remains unique. You can also see that SCD From DateTime, SCD To DateTime, and SCD Is Current have been updated to reflect when the records became valid.

Instructions

The following steps show how to enable History and configure SCD.

Enable History

  1. Ensure that the table has a Primary Key, typically the Natural Key as described above.

  2. Right-click on the table and select Table Settings, then select the History tab.
  3. Check the Enable History box.

  4. Select the Data Extraction tab and disable Truncate valid table before data cleansing.

  5. Click OK to apply and close Table Settings.

Configure Fields

  1. Expand the table and click on History Settings.

    1. The Natural Key is used to determine whether records should be considered an insert or update. By default, this is the primary key of the table.
    2. Type 0 fields will always retain their original values.
    3. Type I fields are the default and will update normally without creating new records.
    4. Type II fields will be tracked for changes. Any change in the values of these fields will result in a new row with the new values, while the original row and its values are preserved.

Additional Settings

  1. Use soft deletes: Records are never deleted from a History-enabled table. However, enabling this setting will create the SCD Is Tombstone system field, which will be set to "1" on deleted rows.
    1. Create a new record when a deleted record reappears in the source
      • When disabled: If the same natural key reappears in the source, the SCD Is Tombstone field value will change from "1" to "0".
      • When enabled: If the same natural key reappears in the source, a new record will be created.
  2. Update also historical records with new value on type I change
    • When disabled: Type I field values will be updated only on the current version of the record.
    • When enabled: Type I field values will be updated on all versions of the record.
    • This may be useful in reporting where the purpose is to have a recognizable or consistent name rather than the correct name at a specific point in time.
  3. Treat type II as type I when field value is null
    • When disabled: Type II field values that change from null to a non-null value will create a new record.
    • When enabled: Type II field values that change from null to a non-null value will simply update in place with no new record created.
    • This setting can be enabled when keeping track of null field values being populated is not necessary. A common use case is when a new field is added to a table and all the initial values for that field are null.
  4. New records valid from:
    • 1900-01-01: The default option. Uses the value "1900-01-01" in the SCD From DateTime field for the first version of new records.
    • Time of load: Uses the current DateTime value at the time of execution in the SCD From DateTime field for the first version of new records.
Note:

Jet Analytics Data Integration does not manage the SQL indexes for History-enabled tables. See Guidelines for maintaining clustered SQL indexes on Incremental Load and History tables for more information on how and when to manage SQL indexes for these types of tables.

Surrogate Keys

Once you enable History and Type II fields, your dimension can have duplicate natural keys. In order to maintain referential integrity with your fact table, you should use the system field DW_Id as a reliable Surrogate Key.

  1. The DW_Id field is auto-generated on every table in the Prepare Instance Data Area and can be found by expanding the System Fields node at the bottom of the field list. You may need to enable Show System Fields in the application's View menu to see them.

  2. Relate your dimension table to your fact table by dragging the primary key on the dimension to the foreign key on the fact table.

  3. Drag the DW_Id from your History-enabled dimension table to the fact table name to create a conditional lookup of the DW_Id field.

  4. Select Copy joins from and choose the relation you just created. This ensures the surrogate key lookup will always use the correct join, even if you change your relations.

  5. If History is properly enabled on your dimension table, you should receive the following prompt:

    • Use the time from this field: This will add additional joins to look up the value of the record that was active at the time of the selected date/time field. This is the recommended option for SCD Surrogate Keys.

    • Use current values: This will add additional joins to look up the value of the current record. All records will display the current value for the attribute regardless of what the value might have been at the time of the fact table transaction.
    • I will setup conditions myself: This will not add any additional joins.
    • Click Yes, which equates to None, if the following lookup dialog appears. The relations as configured will allow for the return of the proper value.

  6. Rename the lookup field to match your desired naming convention, commonly something like <DimensionName>Key or <DimensionName>_SK (e.g. CustomerKey).

  7. Map your tables to the Prepare instance presentation layer Data Area.
    1. Map the Dimension's DW_Id field as a new field in the table, rename it using the desired naming convention, and make it the new primary key.

    2. Map the Fact table.

  8. Relate the two tables in the Prepare Instance presentation layer using the newly-created Surrogate Key.

Was this article helpful?

We're sorry to hear that.