Performance considerations

Omni Loader is highly optimized. However, performance you will be able to extract from it depends on your specific environment.

Parallelism

The faster CPU cores you have and the more CPU cores you have, the better. Omni Loader will automatically scale vertically and use all your CPU cores (unless limited by the license purchased).

Every table, if it has primary or unique keys, will automatically be sliced and each slice processed by a separate worker.

Location

We recommend to run Omni Loader close to your source database. That allows you to transfer just the compressed data over the network in case of data warehouse targets such as Fabric, Snowflake, Databricks, or BigQuery.

If you are doing on-premises to cloud migration and run Omni Loader at the cloud instance, the data will flow uncompressed over Internet to the Omni Loader instance, be compressed there and sent to the storage. But if you run Omni Loader on-premises, only 20% of the data volume will need to be sent over the network which will result in uncomparably better performance.

Data flow

Data always flows from the source database(s) into Omni Loader machine where it is transformed and sent directly into the target database (when OLTP target) or into storage account (when data warehouse target). If you use Omni Loader cluster, Omni Loader machine serves as an orchestrator to the agent machines - in that case data flows from sources into agent machines where it is transformed, then continues to the target databases or storage accounts.

When storage accounts are used, Omni Loader triggers ingestion once all the blobs for a specific table or partition are ready, and waits until the target database completes the ingestion.

A typically overlooked performance killer is a VPN setup. For example, if a person working from home and using a VPN to be in a company network works on an intranet database migration can expect things to be fast because source database, target database, and their machine are in the same fast local network. However, while VPN abstracts the network topology, it cannot get around the fact that data physically has to flow from source database, over typically underpowered encryption nodes, all the way into person's home where it is processed on the laptop and sent over typically abysmal uplink back to the company network and forwarded to the target database or storage.

CPU

The faster the CPU, the better. On-premises machines can have much faster machines than cloud instance. The reason for that is that high-end desktop CPUs run at up to 6GHz. High-end server CPUs, used on the cloud, are designed to have as many cores as possible, subset of which are exposed to individual customers. CPUs with 64 cores produce a lot of heat and therefore have to run at a significantly lower frequency.

If running Omni Loader on a cloud VM, make sure to select a compute-optimized instance:

Azure: F-series
AWS: C7a
GCP: C2

Disk

Disk speed doesn't matter much as Omni Loader processes all of the data in-memory to maximize performance. Only the internal database keeping track of the projects and individual migration run work items persists on disk. We recommend never to use a HDD, but SSD is good and NVME is great.

Network latency

The closer the database servers and Omni Loader are, the better. Network ping (time needed for a message to travel between nodes) is larger the farther the nodes are from each other. One part of the slowdown is due to the fact that signals travel with a finite speed, and another part is that data flows over many intermediate servers over the route, each adding to the latency.

Extracting data from Synapse Dedicated Pool

If source database is Synapse Dedicated Pool, you can choose to use our standard SQL SELECT mode, in which case sections above still apply. However, the most efficient way to move the data out of Synapse Dedicated Pool is Polybase.

The difference is that, in Polybase mode, Omni Loader acts as a simple orchestrator. We basically just request Synapse Dedicated Pool nodes to directly export Parquet files into ADLS Gen 2 storage and wait for the process to complete. The efficiency comes from the fact that Synapse nodes all export their data shards in parallel (in SQL SELECT mode, data is routed through a single node to Omni Loader). With Polybase, there is no significant network traffic with Omni Loader at all and there are no performance concerns. You can run Omni Loader wherever it suits you and on an underpowered machine, if desired.

In Polybase mode, we currently only support Fabric Warehouse as a target. The workflow is very simple, we wait for each table to be exported by Polybase and then trigger Fabric Warehouse ingestion. Unfortunately, you do lose nice progress bars and duration estimation as we don't know how long will either Polybase export or Fabric Warehouse ingestion take.

PreviousDatabases NextPrerequisites

Last updated 5 months ago