Resync Tables

Sometimes you will need to resync tables in ad-hoc bases. For example when binlog position deleted in MySQL or when a database migrated to another server and previous CDC bookmarks are not transferable, etc.

  1. Full resync

If you want to resync every table from a specific tap then use the sync_tables command and specify the tap and target ids:

$ pipelinewise sync_tables --target <target_id> --tap <tap_id>

PipelineWise will update the bookmark(s) in the internal state files automatically and at the next normal run it will load only the changes since the resync.

If you want to resync only a list of specific tables then add the --tables argument:

$ pipelinewise sync_tables --target <target_id> --tap <tap_id> --tables schema.table_one,schema.table_two

Warning

The value of the optional --tables argument needs to be a comma separated list of table names using the <schema_name>.<table_name> format. Schema and table names have to be the names in the source database.

Warning

Based on the tap setting, tables can be fully synced or partial synced if they are defined as partial synced. Currently this option is available only for Tap MySQL and Tap PostgreSQL to Snowflake.

Attention

There is an option for Tap MySQL and Tap PostgreSQL to Target Snowflake in main pipelinewise config file for ignoring resync in a case the size of a table in the tap is greater than the defined value. this setting is optional and even yet you can force the resync by using --force argument.

$ pipelinewise sync_tables –target <target_id> –tap <tap_id> –force

this setting can be added in the config.yml for checking the table size:

allowed_resync_max_size:
  table_mb: <integer/float>

Attention

There is an option to chose tables for re-sync which has a specific replication method by --replication_method_only <name of replication method>

$ pipelinewise sync_tables –target <target_id> –tap <tap_id> –replication_method_only log_based

  1. Partial resync

If you want to partial resync a table from a specific tap then use the partial_sync_table command and specify the tap and target ids and table, column ,start_value and end_value(optional)

$ pipelinewise partial_sync_table --target <target_id> --tap <tap_id> --table schema.table --column column_name --start_value start_value_from_column --end_value end_value_from_column

note if there is no end_value, the internal state file will be updated with the replication value (gtid, wal, etc) that was captured at the start of the partial sync

Warning

The value of the --table argument needs to be in the <schema_name>.<table_name> format. Schema and table name have to be the names in the source database.

More description about different cases of partial resync can be found here Different cases of partial resync