.. _tap-oracle: Tap Oracle ---------- .. warning:: `Oracle Instant Client `_ is required to use Tap Oracle. If PipelineWise is :ref:`running_in_docker` then no further action needed because **PipelineWise Docker Image includes Oracle Instant Client** automatically. If PipelineWise :ref:`building_from_source` then you have to **install Oracle Instant Client manually** to your machine. Oracle setup requirements ''''''''''''''''''''''''' **Step 1. Create a PipelineWise database user** You’ll create a dedicated database user for PipelineWise. Create a new user and grant the required permissions on the database, schema and tables that you want to replicate: * ``CREATE USER pipelinewise IDENTIFIEDBY `` * ``GRANT CONNECT TO pipelinewise`` * ``GRANT CREATE SESSION TO pipelinewise`` * ``GRANT UNLIMITED TABLESPACE TO TO pipelinewise`` * ``GRANT USAGE ON SCHEMA TO pipelinewise`` * ``GRANT SELECT ON . TO pipelinewise`` (Repeat this grant on every table that you want to replicate) **Step 2: Check if you have all the required credentials for replicating data from Oracle** Access to ``V$DATABASE`` and ``V_$THREAD`` performance views. These are required to verify setting configuration while setting up your Oracle database and to retrieve the database’s Oracle System ID. **Step 3: Configure Log-based Incremental Replication with LogMiner** .. note:: This step is only required if you use :ref:`log_based` replication method. **Step 3.1: Verify the database's current archiving mode** To check the database’s current mode, run: .. code-block:: bash SELECT LOG_MODE FROM V$DATABASE If the result is ``ARCHIVELOG``, archiving is enabled and no further action is required. Skip to Step 3.3 to configure RMAN backups. **Step 3.2: Enable ARCHIVELOG mode** 1. Shut down the database instance. The database and any associated instances must be shut down before the database’s archiving mode can be changed. .. code-block:: bash SQL> SHUTDOWN IMMEDIATE SQL> STARTUP MOUNT SQL> ALTER DATABASE ARCHIVELOG SQL> ALTER DATABASE OPEN **Step 3.3: Set retention period by RMAN** .. code-block:: bash RMAN> CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 3 DAYS; **Note**: To ensure that archive log files don’t consume all of your available disk space, you should also set the ``DB_RECOVERY_FILE_DEST_SIZE`` parameter to a value that agrees with your available disk quota. Refer to `Oracle's documentation `_ for more info about this parameter. **Step 3.4: Enable supplemental logging** .. code-block:: bash SQL> ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS **Note**: Alternatively to enable supplemental logging at the table level, run ``ALTER TABLE . ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS`` for every table you want to replicate. Verify that supplemental logging was successfully enabled by running the following query: .. code-block:: bash SELECT SUPPLEMENTAL_LOG_DATA_MIN FROM V$DATABASE If the returned value is ``YES`` or ``IMPLICIT``, supplemental logging is enabled. .. warning:: If you want to use Log-based Incremental Replication, you’ll also need to **grant additional permissions** to the ``pipelinewise`` user: * ``GRANT EXECUTE_CATALOG_ROLE TO PIPELINEWISE`` * ``GRANT SELECT ANY TRANSACTION TO PIPELINEWISE`` * ``GRANT SELECT ANY DICTIONARY TO PIPELINEWISE`` * ``GRANT EXECUTE ON DBMS_LOGMNR TO PIPELINEWISE`` * ``GRANT EXECUTE ON DBMS_LOGMNR_D TO PIPELINEWISE`` * ``GRANT SELECT ON SYS.V_$DATABASE TO PIPELINEWISE`` * ``GRANT SELECT ON SYS.V_$ARCHIVED_LOG TO PIPELINEWISE`` * ``GRANT SELECT ON SYS.V_$LOGMNR_CONTENTS TO PIPELINEWISE`` **If you’re using version 12 of Oracle**, you’ll also need to grant the ``LOGMINING`` privilege to the PipelineWise user: * ``GRANT LOGMINING TO PIPELINEWISE`` Configuring what to replicate ''''''''''''''''''''''''''''' PipelineWise configures every tap with a common structured YAML file format. A sample YAML for Oracle replication can be generated into a project directory by following the steps in the :ref:`generating_pipelines` section. Example YAML for ``tap-oracle``: .. code-block:: yaml --- # ------------------------------------------------------------------------------ # General Properties # ------------------------------------------------------------------------------ id: "oracle_sample" # Unique identifier of the tap name: "Sample Oracle Database" # Name of the tap type: "tap-oracle" # !! THIS SHOULD NOT CHANGE !! owner: "somebody@foo.com" # Data owner to contact #send_alert: False # Optional: Disable all configured alerts on this tap #slack_alert_channel: "#tap-channel" # Optional: Sending a copy of specific tap alerts to this slack channel # ------------------------------------------------------------------------------ # Source (Tap) - Oracle connection details # ------------------------------------------------------------------------------ db_conn: sid: "" # Oracle SID host: "" # Oracle host port: 1521 # Oracle port user: "" # Oracle user password: "" # Plain string or vault encrypted #filter_schemas: "SCHEMA1,SCHEMA2" # Optional: Scan only the required schemas # to improve the performance of # data extraction # ------------------------------------------------------------------------------ # Destination (Target) - Target properties # Connection details should be in the relevant target YAML file # ------------------------------------------------------------------------------ target: "snowflake" # ID of the target connector where the data will be loaded batch_size_rows: 20000 # Batch size for the stream to optimise load performance stream_buffer_size: 0 # In-memory buffer size (MB) between taps and targets for asynchronous data pipes #batch_wait_limit_seconds: 3600 # Optional: Maximum time to wait for `batch_size_rows`. Available only for snowflake target. # Options only for Snowflake target #archive_load_files: False # Optional: when enabled, the files loaded to Snowflake will also be stored in `archive_load_files_s3_bucket` #archive_load_files_s3_prefix: "archive" # Optional: When `archive_load_files` is enabled, the archived files will be placed in the archive S3 bucket under this prefix. #archive_load_files_s3_bucket: "" # Optional: When `archive_load_files` is enabled, the archived files will be placed in this bucket. (Default: the value of `s3_bucket` in target snowflake YAML) # ------------------------------------------------------------------------------ # Source to target Schema mapping # ------------------------------------------------------------------------------ schemas: - source_schema: "SCHEMA1" # Source schema in Oracle with tables target_schema: "repl_oracle" # Target schema in the destination Data Warehouse target_schema_select_permissions: # Optional: Grant SELECT on schema and tables that created - grp_stats # List of tables to replicate from Oracle to destination Data Warehouse # # Please check the Replication Strategies section in the documentation to understand the differences. # For LOG_BASED replication method you might need to adjust the source Oracle database. tables: - table_name: "TABLE_ONE" replication_method: "INCREMENTAL" # One of INCREMENTAL, LOG_BASED and FULL_TABLE replication_key: "LAST_UPDATE" # Important: Incremental load always needs replication key # OPTIONAL: Load time transformations #transformations: # - column: "last_name" # Column to transform # type: "SET-NULL" # Transformation type # You can add as many tables as you need... - table_name: "TABLE_TWO" replication_method: "LOG_BASED" # Important! Log based must be enabled in Oracle # You can add as many schemas as you need... # Uncommend this if you want replicate tables from multiple schemas #- source_schema: "another_schema_in_oracle" # target_schema: "another