Target Redshift

Configuring PostgreSQL as a replication target is straightforward. You need to have a user with permissions to create new schemas and tables in an Redshift database and you can replicate data from all the supported Taps (Data Sources).

Configuring where to replicate data

PipelineWise configures every target with a common structured YAML file format. A sample YAML for Redshift target can be generated into a project directory by following the steps in the Generating Sample Pipelines section.

Example YAML for target-redshift:

---

# ------------------------------------------------------------------------------
# General Properties
# ------------------------------------------------------------------------------
id: "redshift"                        # Unique identifier of the target
name: "Amazon Redshift"               # Name of the target
type: "target-redshift"               # !! THIS SHOULD NOT CHANGE !!


# ------------------------------------------------------------------------------
# Target - Data Warehouse connection details
# ------------------------------------------------------------------------------
db_conn:
  host: "xxxxx.redshift.amazonaws.com"          # Redshift host
  port: 5439                                    # Redshift port
  user: "<USER>"                                # Redshift user
  password: "<PASSWORD>"                        # Plain string or vault encrypted
  dbname: "<DB_NAME>"                           # Redshift database name

  # We use an intermediate S3 to load data into Redshift
  # S3 Profile based authentication
  aws_profile: "<AWS_PROFILE>"                  # AWS profile name, if not provided, the AWS_PROFILE environment
                                                # variable or the 'default' profile will be used, if not
                                                # available, then IAM role attached to the host will be used.

  # S3 Credentials based authentication
  #aws_access_key_id: "<ACCESS_KEY>"            # Plain string or vault encrypted. Required for non-profile based auth. If not provided, AWS_ACCESS_KEY_ID environment variable will be used.
  #aws_secret_access_key: "<SECRET_ACCESS_KEY"  # Plain string or vault encrypted. Required for non-profile based auth. If not provided, AWS_SECRET_ACCESS_KEY environment variable will be used.
  #aws_session_token: "<AWS_SESSION_TOKEN>"     # Optional: Plain string or vault encrypted. If not provided, AWS_SESSION_TOKEN environment variable will be used.
  #aws_redshift_copy_role_arn: "<ROLE_ARN>"     # Optional: AWS Role ARN to be used for the Redshift COPY operation.
                                                #           Allow the user to use environment credentials and delegate the COPY command to a role
                                                #           Used instead of the given AWS keys for the COPY operation if provided

  s3_bucket: "<BUCKET_NAME>"                    # S3 external bucket name
  s3_key_prefix: "redshift-imports/"            # Optional: S3 key prefix
  #s3_acl: "<S3_OBJECT_ACL>"                    # Optional: Assign the canned ACL to the uploaded file on S3

  # Optional: Overrides the default COPY options to load data into Redshift
  #           The values below are the defaults and fit for purpose for most cases.
  #           Some basic file formatting parameters are fixed values and not
  #           recommended overriding by custom ones.
  #           They are like: CSV GZIP DELIMITER ',' REMOVEQUOTES ESCAPE
  #copy_options: "
  #  EMPTYASNULL BLANKSASNULL TRIMBLANKS TRUNCATECOLUMNS
  #  TIMEFORMAT 'auto'"