Transformations

PipelineWise can perform row level load time transformations between tap and target components and makes and ideal place to obfuscate, mask or filter sensitive data that should never be replicated in the Data Warehouse.

Warning

Important: Load Time Transformations are not designed for complex transformations, mapping, joins or aggregations. It was designed primarily to meet certain data security requirements and to make sure that PII and other sensitive data does not end up at external service providers like AWS, MS Azure or similar.

If you want to apply complex transformations you’ll need to do it at a later stage once the data is ingested into the Data Warehouse. PipelineWise fits into the ELT landscape and not doing traditional ETL. ELT ingests data first into DWH in the original format and the “transformation” is shifting to the end of the data pipeline.

Transformation Methods

The following transformations can be added optionally into the YAML configuration:

  • SET-NULL: Transforms any column to NULL

  • HASH: Transforms string columns to hash

  • HASH-SKIP-FIRST-n: Transforms string columns to hash skipping first n characters, e.g. HASH-SKIP-FIRST-2

  • MASK-DATA: Replaces the months and day parts of date columns to be always 1st of Jan

  • MASK-NUMBER: Transforms any numeric column to zero

  • MASK-HIDDEN: Transforms any value to ‘hidden’

Conditional Transformations

Using the optional when keyword, you can specify conditions how the transformation should be applied. If the condition matches PipelineWise performs the transformation, otherwise it keeps the original value.

Example

Load Time transformations needs to be defined in the tables section in the YAML configuration:

...
...
tables:
  - table_name: "audit_log"
    replication_method: "INCREMENTAL"
    replication_key: "id"
    transformations:
      - column: "column_1"
        type: "SET-NULL"
        when:
          - column: "class_name"
            equals: 'com.transferwise.fx.user.User'
          - column: "property_name"
            equals: 'passwordHash'

            # Tip: Use 'regex_match' instead of 'equal' if you need
            # more complex matching criteria. For example:
            # regex_match: 'password|salt|passwordHash'

      - column: "column_2"
        type: "HASH"
        when:
          - column: "class_name"
            equals: 'com.transferwise.fx.user.User'
          - column: "property_name"
            equals: 'passwordHash'
...
...