Target Snowflake

Snowflake setup requirements

Warning

You need to create a few objects in a Snowflake schema before start replicating data to Snowflake:

Named External Stage: to upload the CSV files to S3 and to MERGE data into snowflake tables.
Named File Format: to run MERGE/COPY commands and to parse the CSV files correctly
A Role: to grant all the required permissions
A User: to run PipelineWise

Create a named external stage object on S3:

CREATE STAGE {database}.{schema}.{stage_name}
url='s3://{s3_bucket}'
credentials=(AWS_KEY_ID='{aws_key_id}' AWS_SECRET_KEY='{aws_secret_key}')
encryption=(MASTER_KEY='{client_side_encryption_master_key}');

Note:

The {schema} and {stage_name} can be any name that Snowflake accepts.
The encryption option is optional and used for client side encryption. If you want client side encryption you’ll need to define the same master key in the target-snowflake YAML. See the example below.

Create a named file format:

CREATE FILE FORMAT {database}.{schema}.{file_format_name}
TYPE = 'CSV' ESCAPE='\\' FIELD_OPTIONALLY_ENCLOSED_BY='"';

Create a Role with all the required permissions:

CREATE OR REPLACE ROLE ppw_target_snowflake;
GRANT USAGE ON DATABASE {database} TO ROLE ppw_target_snowflake;
GRANT CREATE SCHEMA ON DATABASE {database} TO ROLE ppw_target_snowflake;

GRANT USAGE ON SCHEMA {database}.{schema} TO role ppw_target_snowflake;
GRANT USAGE ON STAGE {database}.{schema}.{stage_name} TO ROLE ppw_target_snowflake;
GRANT USAGE ON FILE FORMAT {database}.{schema}.{file_format_name} TO ROLE ppw_target_snowflake;
GRANT USAGE ON WAREHOUSE {warehouse} TO ROLE ppw_target_snowflake;

Replace database, schema, warehouse, stage_name and file_format_name between { and } characters to the actual values from point 1 and 2.

Create a user and grant permission to the role:

CREATE OR REPLACE USER {user}
PASSWORD = '{password}'
DEFAULT_ROLE = ppw_target_snowflake
DEFAULT_WAREHOUSE = '{warehouse}'
MUST_CHANGE_PASSWORD = FALSE;

GRANT ROLE ppw_target_snowflake TO USER {user};

Replace warehouse between { and } characters to the actual values from point 3.

Configuring where to replicate data

PipelineWise configures every target with a common structured YAML file format. A sample YAML for Snowflake target can be generated into a project directory by following the steps in the Generating Sample Pipelines section.

Example YAML for target-snowflake:

---

# ------------------------------------------------------------------------------
# General Properties
# ------------------------------------------------------------------------------
id: "snowflake"                        # Unique identifier of the target
name: "Snowflake"                      # Name of the target
type: "target-snowflake"               # !! THIS SHOULD NOT CHANGE !!


# ------------------------------------------------------------------------------
# Target - Data Warehouse connection details
# ------------------------------------------------------------------------------
db_conn:
  account: "rtxxxxx.eu-central-1"               # Snowflake account
  dbname: "<DB_NAME>"                           # Snowflake database name
  user: "<USER>"                                # Snowflake user
  private_key: "<private_key_path>"             # File contains PEM forrmat for connecting to Snowflake
  warehouse: "<WAREHOUSE>"                      # Snowflake virtual warehouse

  # We use an external stage on S3 to load data into Snowflake
  # S3 Profile based authentication
  aws_profile: "<AWS_PROFILE>"                  # AWS profile name, if not provided, the AWS_PROFILE environment
                                                # variable or the 'default' profile will be used, if not
                                                # available, then IAM role attached to the host will be used.

  # S3 Credentials based authentication
  #aws_access_key_id: "<ACCESS_KEY>"            # Plain string or vault encrypted. Required for non-profile based auth. If not provided, AWS_ACCESS_KEY_ID environment variable will be used.
  #aws_secret_access_key: "<SECRET_ACCESS_KEY"  # Plain string or vault encrypted. Required for non-profile based auth. If not provided, AWS_SECRET_ACCESS_KEY environment variable will be used.
  #aws_session_token: "<AWS_SESSION_TOKEN>"     # Optional: Plain string or vault encrypted. If not provided, AWS_SESSION_TOKEN environment variable will be used.

  #aws_endpoint_url: "<FULL_ENDPOINT_URL>"      # Optional: for non AWS S3, for example https://nyc3.digitaloceanspaces.com

  s3_bucket: "<BUCKET_NAME>"                    # S3 external stbucket name
  s3_key_prefix: "snowflake-imports/"           # Optional: S3 key prefix
  #s3_acl: "<S3_OBJECT_ACL>"                    # Optional: Assign the canned ACL to the uploaded file on S3

  # stage and file_format are pre-created objects in Snowflake that requires to load and
  # merge data correctly from S3 to tables in one step without using temp tables
  #  stage      : External stage object pointing to an S3 bucket
  #  file_format: Named file format object used for bulk loading data from S3 into
  #               snowflake tables.
  stage: "<SCHEMA>.<STAGE_OBJECT_NAME>"
  file_format: "<SCHEMA>.<FILE_FORMAT_OBJECT_NAME>"

  # Optional: Client Side Encryption
  # The same master key has to be added to the external stage object created in snowflake
  #client_side_encryption_master_key: "<MASTER_KEY"> # Plain string or vault encrypted