In this post, we will discover Slowly Changing Dimension in Talend Open Studio.
First, What is SCD?
- SCD Type 1 overwrites original data in SCD table with new data. There is no tracking of historical changes (old data cannot be recovered). It allows to correct some kinds of error, for example misspellings in names.
- SCD Type 2 tracks historical changes: if new data is different from old data, a new record is added in the SCD table, instead of erasing the old one. The old record becomes inactive, and the new record active.
Now, an example
For this example, we use a simple table client.
| id | name | city |
The SCD table client_scd we use to track modifications of clients over time is:
| tk | id | name | city | start_date | end_date |
| tk | Unique identifier of records in SCD table (technical key). |
| id | Source key, identifier in source table (that is, field id in client table in our example). |
| name | Name of clients (matching field name in client table). We use SCD Type 1 to manage modifications of this column. |
| city | City where clients live (matching field city in client table). We use SCD Type 2 to manage modifications of this column. |
| start_date | Date of insertion of new records for SCD Type 2 modifications. |
| end_date | End of validity of records. A record becomes invalid when a new record with the same source key is inserted in SCD table. The last inserted record for a given source key has a NULL value as end date. |
Note that columns start_date and end_date are not needed if SCD type 2 is not used.
The Use case
We insert two records in client table, with a mispelling in Robert’s name.
| id | name | city |
| 1 | Rober | San Francisco |
| 2 | Bill | LA |
Now, we update client_scd table (let’s say we are July 1st). Since field id of each record in client table has no matching id (source key) in client_scd table, each record of client table is inserted in client_scd. start_date receive current date (ie when the records were inserted in SCD table).
| tk | id | name | city | start_date | end_date |
| 1 | 1 | Rober | San Francisco | 2007-07-01 | NULL |
| 2 | 2 | Bill | LA | 2007-07-01 | NULL |
Let’s imagine Robert moves to LA. The updated client table would look like this:
| id | name | city |
| 1 | Rober | LA |
| 2 | Bill | LA |
When SCD table is updated (we are July 5th), there is already an existing source key in client_scd matching id of each record in client table. Bill has not been modified, so his record is not updated in SCD table. Robert has moved to LA, and city column is managed with SCD Type 2, so a new record will be inserted in SCD table. start_date of the new record will receive current date, so will end_date.
| tk | id | name | city | start_date | end_date |
| 1 | 1 | Rober | San Francisco | 2007-07-01 | 2007-07-05 |
| 2 | 2 | Bill | LA | 2007-07-01 | NULL |
| 3 | 1 | Rober | LA | 2007-07-05 | NULL |
Now, let’s imagine the mispelling on Robert’s name is corrected. The client table looks like:
| id | name | city |
| 1 | Robert | LA |
| 2 | Bill | LA |
When SCD table is updated (July 10th), Robert’s record has changed. Because name is managed as SCD Type 1, all records matching source key of Robert will be updated, without tracking modifications.
| tk | id | name | city | start_date | end_date |
| 1 | 1 | Robert | San Francisco | 2007-07-01 | 2007-07-05 |
| 2 | 2 | Bill | LA | 2007-07-01 | NULL |
| 3 | 1 | Robert | LA | 2007-07-05 | NULL |
Talend SCD components
Now, let’s see how SCD rules are implemented in tOracleSCD / tMysqlSCD / tDB2SCD / tIngresSCD / tMSSQLSCD / tSybaseSCD components.
SCD Component - General settings

| Parameters | |
| Table | Name of SCD table. |
| Schema type | Schema of SCD table. |
| Source keys | Key fields in source table. |
SCD Component - SCD Type 1

Figure 2: ToracleSCD properties, part 2
If you want to use SCD type 1, check the box, and add input fields which should be managed with SCD type 1.
SCD Component - SCD Type 2

If you want to use SCD type 2, check the box, and add input fields which should be managed with SCD type 2. You must set start and end dates fields (used when new records are inserted in SCD table).
Additionnaly, SCD component provides two options for SCD type 2 rules.
- First is active status: when this option is checked, you need to set an active field, which will contain 1 in tOracleSCD (’true’ in tMysqlSCD) for all records whose end date is NULL, 0 (’false’) for other records.
scd_client table of our last example would look like:
| tk | id | name | city | start_date | end_date | active |
| 1 | 1 | Robert | San Francisco | 2007-07-01 | 2007-07-05 | 0 |
| 2 | 2 | Bill | LA | 2007-07-01 | NULL | 1 |
| 3 | 1 | Robert | LA | 2007-07-05 | NULL | 1 |
- Second option is version management: when this option is checked, you need to set a version field. Each time a new record is inserted in SCD table, either it is the first record of a given source keys set and version field will receive value 1, or there is already a record with source keys matching keys of the new record, and version will be incremented by 1 for the new record.
scd_client table of our last example would look like:
| tk | id | name | city | start_date | end_date | version |
| 1 | 1 | Robert | San Francisco | 2007-07-01 | 2007-07-05 | 1 |
| 2 | 2 | Bill | LA | 2007-07-01 | NULL | 1 |
| 3 | 1 | Robert | LA | 2007-07-05 | NULL | 2 |
Job example
Here is a job example (use case we have seen before) : we read records from a source table, and update a SCD table with these records.

Enjoy SCD component!







Latest Comments