Arnoni Rabuske - Portfolio - Sync Data Pipeline

This project highlights a data synchronization pipeline developed with Azure Data Factory (ADF) to migrate SQL Server databases between two servers. The solution leverages SQL Server’s change tracking feature to identify and transfer modified data, ensuring accurate and efficient synchronization. The process is not incremental but relies on capturing and processing changes directly from the source database. Stored procedures are executed on the source server to handle the data transformation and movement. This approach helps reduce downtime and guarantees data consistency throughout the migration.

The pipeline is designed to support large-scale database environments while minimizing operational disruptions.

In this project, two stored procedures are used to manage data identified by SQL Server's Change Tracking during the synchronization process. These procedures work together to ensure that data modifications are accurately captured and processed, respecting table relationships during inserts and deletes.

Capturing and Storing Changed Data:

This procedure is responsible for querying tables enabled for change tracking and collecting the primary key values of modified records. It dynamically generates and executes SQL statements to retrieve change data from all relevant tables. The results are stored in a staging table (ChangeTrackingData).

Ordering and Managing Referential Integrity:

After the change data is collected, this procedure handles the ordering of tables based on referential integrity. The procedure updates the OrderReference column in ChangeTrackingData, ensuring that inserts and deletes occur in the correct sequence to avoid violating foreign key constraints

The "Sync Core New Tenant" pipeline synchronizes SQL Server databases between environments with minimal downtime, leveraging SQL Server Change Tracking to capture and process data modifications (inserts, updates, and deletes). This automation reduces manual effort and ensures data integrity during migration.

The pipeline extracts and transfers change data to the target environment, applying inserts and updates first, followed by deletions. This incremental approach avoids full data reloads, making it scalable for large datasets and complex table relationships. Upsert operations prevent duplication and ensure synchronization accuracy.

By integrating Azure Data Factory with SQL Server's change tracking, this pipeline offers a reliable, automated solution for database migration and synchronization, minimizing downtime and maintaining consistency.

Upsert Strategy:

The pipeline uses the upsert operation to avoid duplication and ensure seamless data integration. By using the primary key, records are inserted if they don't exist or updated if they do.

Ordered Processing:

The use of OrderReference ensures that operations are executed in the correct sequence, preventing issues with foreign key dependencies.

Parallel and Sequential Execution:

Inserts and updates are processed sequentially to maintain data integrity.

Deletions are handled separately in reverse order to avoid constraint violations.

Timeout and Retry Policies:

Each activity has a timeout of 12 hours and no automatic retries, providing stability during large data migrations.

This pipeline provides a robust solution for synchronizing SQL Server data between tenants with minimal manual intervention. By automating the handling of inserts, updates, and deletes, the migration process is streamlined, ensuring data consistency and reducing the risk of errors during migration.

Azure Data Factory
SQL Server

Recente Projetcs

Pipeline do sync sql server data

Using Stored Procedures to Handle Change Tracking Data in SQL Server Migration

SQL Server Data Synchronization Pipeline – Azure Data Factory

Key Technical Details:

Conclusion

Stack