Target Snowflake

Snowflake setup requirements

Warning

You need to create a few objects in a Snowflake schema before start replicating data to Snowflake:
  • Named External Stage: to upload the CSV files to S3 and to MERGE data into snowflake tables.

  • Named File Format: to run MERGE/COPY commands and to parse the CSV files correctly

  • A Role: to grant all the required permissions

  • A User: to run PipelineWise

  1. Create a named external stage object on S3:

CREATE STAGE {database}.{schema}.{stage_name}
url='s3://{s3_bucket}'
credentials=(AWS_KEY_ID='{aws_key_id}' AWS_SECRET_KEY='{aws_secret_key}')
encryption=(MASTER_KEY='{client_side_encryption_master_key}');
Note:
  • The {schema} and {stage_name} can be any name that Snowflake accepts.

  • The encryption option is optional and used for client side encryption. If you want client side encryption you’ll need to define the same master key in the target-snowflake YAML. See the example below.

  1. Create a named file format:

CREATE FILE FORMAT {database}.{schema}.{file_format_name}
TYPE = 'CSV' ESCAPE='\\' FIELD_OPTIONALLY_ENCLOSED_BY='"';
  1. Create a Role with all the required permissions:

CREATE OR REPLACE ROLE ppw_target_snowflake;
GRANT USAGE ON DATABASE {database} TO ROLE ppw_target_snowflake;
GRANT CREATE SCHEMA ON DATABASE {database} TO ROLE ppw_target_snowflake;

GRANT USAGE ON SCHEMA {database}.{schema} TO role ppw_target_snowflake;
GRANT USAGE ON STAGE {database}.{schema}.{stage_name} TO ROLE ppw_target_snowflake;
GRANT USAGE ON FILE FORMAT {database}.{schema}.{file_format_name} TO ROLE ppw_target_snowflake;
GRANT USAGE ON WAREHOUSE {warehouse} TO ROLE ppw_target_snowflake;

Replace database, schema, warehouse, stage_name and file_format_name between { and } characters to the actual values from point 1 and 2.

  1. Create a user and grant permission to the role:

CREATE OR REPLACE USER {user}
PASSWORD = '{password}'
DEFAULT_ROLE = ppw_target_snowflake
DEFAULT_WAREHOUSE = '{warehouse}'
MUST_CHANGE_PASSWORD = FALSE;

GRANT ROLE ppw_target_snowflake TO USER {user};

Replace warehouse between { and } characters to the actual values from point 3.

Configuring where to replicate data

PipelineWise configures every target with a common structured YAML file format. A sample YAML for Snowflake target can be generated into a project directory by following the steps in the Generating Sample Pipelines section.

Example YAML for target-snowflake:

---

# ------------------------------------------------------------------------------
# General Properties
# ------------------------------------------------------------------------------
id: "snowflake"                        # Unique identifier of the target
name: "Snowflake"                      # Name of the target
type: "target-snowflake"               # !! THIS SHOULD NOT CHANGE !!


# ------------------------------------------------------------------------------
# Target - Data Warehouse connection details
# ------------------------------------------------------------------------------
db_conn:
  account: "rtxxxxx.eu-central-1"               # Snowflake account
  dbname: "<DB_NAME>"                           # Snowflake database name
  user: "<USER>"                                # Snowflake user
  password: "<PASSWORD>"                        # Plain string or vault encrypted
  warehouse: "<WAREHOUSE>"                      # Snowflake virtual warehouse

  # We use an external stage on S3 to load data into Snowflake
  # S3 Profile based authentication
  aws_profile: "<AWS_PROFILE>"                  # AWS profile name, if not provided, the AWS_PROFILE environment variable or the 'default' profile will be used

  # S3 Non-profile based authentication
  #aws_access_key_id: "<ACCESS_KEY>"            # Plain string or vault encrypted. Required for non-profile based auth. If not provided, AWS_ACCESS_KEY_ID environment variable will be used.
  #aws_secret_access_key: "<SECRET_ACCESS_KEY"  # Plain string or vault encrypted. Required for non-profile based auth. If not provided, AWS_SECRET_ACCESS_KEY environment variable will be used.
  #aws_session_token: "<AWS_SESSION_TOKEN>"     # Optional: Plain string or vault encrypted. If not provided, AWS_SESSION_TOKEN environment variable will be used.

  #aws_endpoint_url: "<FULL_ENDPOINT_URL>"      # Optional: for non AWS S3, for example https://nyc3.digitaloceanspaces.com

  s3_bucket: "<BUCKET_NAME>"                    # S3 external stbucket name
  s3_key_prefix: "snowflake-imports/"           # Optional: S3 key prefix
  #s3_acl: "<S3_OBJECT_ACL>"                    # Optional: Assign the canned ACL to the uploaded file on S3

  # stage and file_format are pre-created objects in Snowflake that requires to load and
  # merge data correctly from S3 to tables in one step without using temp tables
  #  stage      : External stage object pointing to an S3 bucket
  #  file_format: Named file format object used for bulk loading data from S3 into
  #               snowflake tables.
  stage: "<SCHEMA>.<STAGE_OBJECT_NAME>"
  file_format: "<SCHEMA>.<FILE_FORMAT_OBJECT_NAME>"

  # Optional: Client Side Encryption
  # The same master key has to be added to the external stage object created in snowflake
  #client_side_encryption_master_key: "<MASTER_KEY"> # Plain string or vault encrypted