Tap Oracle

Warning

Oracle Instant Client is required to use Tap Oracle. If PipelineWise is Running in Docker then no further action needed because PipelineWise Docker Image includes Oracle Instant Client automatically.

If PipelineWise Installing from source then you have to install Oracle Instant Client manually to your machine.

Oracle setup requirements

Step 1. Create a PipelineWise database user

You’ll create a dedicated database user for PipelineWise. Create a new user and grant the required permissions on the database, schema and tables that you want to replicate:

  • CREATE USER pipelinewise IDENTIFIEDBY <password>

  • GRANT CONNECT TO pipelinewise

  • GRANT CREATE SESSION TO pipelinewise

  • GRANT UNLIMITED TABLESPACE TO TO pipelinewise

  • GRANT USAGE ON SCHEMA <schema_name> TO pipelinewise

  • GRANT SELECT ON <schema_name>.<table_name> TO pipelinewise (Repeat this grant on every table that you want to replicate)

Step 2: Check if you have all the required credentials for replicating data from Oracle

Access to V$DATABASE and V_$THREAD performance views. These are required to verify setting configuration while setting up your Oracle database and to retrieve the database’s Oracle System ID.

Step 3: Configure Log-based Incremental Replication with LogMiner

Note

This step is only required if you use Log Based replication method.

Step 3.1: Verify the database’s current archiving mode

To check the database’s current mode, run:

SELECT LOG_MODE FROM V$DATABASE

If the result is ARCHIVELOG, archiving is enabled and no further action is required. Skip to Step 3.3 to configure RMAN backups.

Step 3.2: Enable ARCHIVELOG mode

  1. Shut down the database instance. The database and any associated instances must be shut down before the database’s archiving mode can be changed.

SQL> SHUTDOWN IMMEDIATE
SQL> STARTUP MOUNT
SQL> ALTER DATABASE ARCHIVELOG
SQL> ALTER DATABASE OPEN

Step 3.3: Set retention period by RMAN

RMAN> CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 3 DAYS;

Note: To ensure that archive log files don’t consume all of your available disk space, you should also set the DB_RECOVERY_FILE_DEST_SIZE parameter to a value that agrees with your available disk quota. Refer to Oracle’s documentation for more info about this parameter.

Step 3.4: Enable supplemental logging

SQL> ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS

Note: Alternatively to enable supplemental logging at the table level, run ALTER TABLE <SCHEMA_NAME>.<TABLE_NAME> ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS for every table you want to replicate.

Verify that supplemental logging was successfully enabled by running the following query:

SELECT SUPPLEMENTAL_LOG_DATA_MIN FROM V$DATABASE

If the returned value is YES or IMPLICIT, supplemental logging is enabled.

Warning

If you want to use Log-based Incremental Replication, you’ll also need to grant additional permissions to the pipelinewise user:

  • GRANT EXECUTE_CATALOG_ROLE TO PIPELINEWISE

  • GRANT SELECT ANY TRANSACTION TO PIPELINEWISE

  • GRANT SELECT ANY DICTIONARY TO PIPELINEWISE

  • GRANT EXECUTE ON DBMS_LOGMNR TO PIPELINEWISE

  • GRANT EXECUTE ON DBMS_LOGMNR_D TO PIPELINEWISE

  • GRANT SELECT ON SYS.V_$DATABASE TO PIPELINEWISE

  • GRANT SELECT ON SYS.V_$ARCHIVED_LOG TO PIPELINEWISE

  • GRANT SELECT ON SYS.V_$LOGMNR_CONTENTS TO PIPELINEWISE

If you’re using version 12 of Oracle, you’ll also need to grant the LOGMINING privilege to the PipelineWise user:

  • GRANT LOGMINING TO PIPELINEWISE

Configuring what to replicate

PipelineWise configures every tap with a common structured YAML file format. A sample YAML for Oracle replication can be generated into a project directory by following the steps in the Generating Sample Pipelines section.

Example YAML for tap-oracle:

---

# ------------------------------------------------------------------------------
# General Properties
# ------------------------------------------------------------------------------
id: "oracle_sample"                    # Unique identifier of the tap
name: "Sample Oracle Database"         # Name of the tap
type: "tap-oracle"                     # !! THIS SHOULD NOT CHANGE !!
owner: "somebody@foo.com"              # Data owner to contact
#send_alert: False                     # Optional: Disable all configured alerts on this tap
#slack_alert_channel: "#tap-channel"   # Optional: Sending a copy of specific tap alerts to this slack channel


# ------------------------------------------------------------------------------
# Source (Tap) - Oracle connection details
# ------------------------------------------------------------------------------
db_conn:
  sid: "<SID>"                        # Oracle SID
  host: "<HOST>"                      # Oracle host
  port: 1521                          # Oracle port
  user: "<USER>"                      # Oracle user
  password: "<PASSWORD>"              # Plain string or vault encrypted
  #filter_schemas: "SCHEMA1,SCHEMA2"  # Optional: Scan only the required schemas
                                      #           to improve the performance of
                                      #           data extraction


# ------------------------------------------------------------------------------
# Destination (Target) - Target properties
# Connection details should be in the relevant target YAML file
# ------------------------------------------------------------------------------
target: "snowflake"                    # ID of the target connector where the data will be loaded
batch_size_rows: 20000                 # Batch size for the stream to optimise load performance
stream_buffer_size: 0                  # In-memory buffer size (MB) between taps and targets for asynchronous data pipes
#batch_wait_limit_seconds: 3600        # Optional: Maximum time to wait for `batch_size_rows`. Available only for snowflake target.

# Options only for Snowflake target
#archive_load_files: False                      # Optional: when enabled, the files loaded to Snowflake will also be stored in `archive_load_files_s3_bucket`
#archive_load_files_s3_prefix: "archive"        # Optional: When `archive_load_files` is enabled, the archived files will be placed in the archive S3 bucket under this prefix.
#archive_load_files_s3_bucket: "<BUCKET_NAME>"  # Optional: When `archive_load_files` is enabled, the archived files will be placed in this bucket. (Default: the value of `s3_bucket` in target snowflake YAML)


# ------------------------------------------------------------------------------
# Source to target Schema mapping
# ------------------------------------------------------------------------------
schemas:

  - source_schema: "SCHEMA1"           # Source schema in Oracle with tables
    target_schema: "repl_oracle"       # Target schema in the destination Data Warehouse
    target_schema_select_permissions:  # Optional: Grant SELECT on schema and tables that created
      - grp_stats

    # List of tables to replicate from Oracle to destination Data Warehouse
    #
    # Please check the Replication Strategies section in the documentation to understand the differences.
    # For LOG_BASED replication method you might need to adjust the source Oracle database.
    tables:
      - table_name: "TABLE_ONE"
        replication_method: "INCREMENTAL"   # One of INCREMENTAL, LOG_BASED and FULL_TABLE
        replication_key: "LAST_UPDATE"      # Important: Incremental load always needs replication key

        # OPTIONAL: Load time transformations
        #transformations:
        #  - column: "last_name"            # Column to transform
        #    type: "SET-NULL"               # Transformation type

      # You can add as many tables as you need...
      - table_name: "TABLE_TWO"
        replication_method: "LOG_BASED"     # Important! Log based must be enabled in Oracle

  # You can add as many schemas as you need...
  # Uncommend this if you want replicate tables from multiple schemas
  #- source_schema: "another_schema_in_oracle"
  #  target_schema: "another