Performance considerations
Omni Loader is highly optimized. However, performance you will be able to extract from it depends on your specific environment.
Identifying bottlenecks
Omni Loader is highly scalable and fully self-tuning. By default, it will use all of the resources available to get the highest performance out of your hardware. With large migrations, something is always a bottleneck limiting your ability to get an even better performance.
It is important to identify your bottlenecks as Omni Loader is running, and adjust the resources as necessary:
If your CPU is pegged at a 100%, you are CPU-limited. Using faster CPU with more cores will unlock better performance.
Looking at network traffic in, for example, Windows Task Manager, you may notice upload close to the uplink your machine has. You are network-limited and will need to provide higher network throughput to increase performance. Keep in mind that cloud VMs are throttling network and providing better throughput for VMs with high CPU core count. So skimping on CPU resources means settling with low-throughput network.
Looking at memory usage (again, for example in Task Manager) you may see that your machine is using 90% or more of your available system memory. This is a very bad situation because your system is swapping memory to the disk and back. Even modern super-fast NVMe drives are still 1000x slower than memory so everything will slow to the crawl. In this situation always provision a VM with more memory.
If CPU and RAM usage is low and network is not a problem, but you are not happy with the performance, check your databases. Either the target database struggles ingesting the data at speed Omni Loader is serving it, or the source database cannot serve the data quickly enough for that many connections. Check your database metrics, scale your database server if needed for the duration of the migration.
Keep in mind that loading data into existing schema with indexes in place is typically 5x slower than without indexes. Omni Loader does the optimal thing. If you insist on using your existing schema, disable indexes during the load, then re-enable them afterwards.
Location matters! Data flows from source database to Omni Loader to target database. Make sure that path is fast. Your home laptop is not the place for Omni Loader when both databases are on the cloud. Ensure that network is high-bandwidth and ping is as low as possible.
Parallelism
The faster CPU cores you have and the more CPU cores you have, the better. Omni Loader will automatically scale vertically and use all your CPU cores (unless limited by the license purchased).
Every table, if it has primary or unique keys, will automatically be sliced and each slice processed by a separate worker.
Location
We recommend to run Omni Loader close to your source database. That allows you to transfer just the compressed data over the network in case of data warehouse targets such as Fabric, Snowflake, Databricks, or BigQuery.
If you are doing on-premises to cloud migration and run Omni Loader at the cloud instance, the data will flow uncompressed over Internet to the Omni Loader instance, be compressed there and sent to the storage. But if you run Omni Loader on-premises, only 20% of the data volume will need to be sent over the network which will result in uncomparably better performance.
Data flow
Data always flows from the source database(s) into Omni Loader machine where it is transformed and sent directly into the target database (when OLTP target) or into storage account (when data warehouse target).
A typically overlooked performance killer is a VPN setup. For example, if a person working from home and using a VPN to be in a company network works on an intranet database migration can expect things to be fast because source database, target database, and their machine are in the same fast local network. However, while VPN abstracts the network topology, it cannot get around the fact that data physically has to flow from source database, over typically underpowered encryption nodes, all the way into person's home where it is processed on the laptop and sent over typically abysmal uplink back to the company network and forwarded to the target database or storage.
CPU
The faster the CPU, the better. On-premises machines can have much faster machines than cloud instance. The reason for that is that high-end desktop CPUs run at up to 6GHz. High-end server CPUs, used on the cloud, are designed to have as many cores as possible, subset of which are exposed to individual customers. CPUs with 64 cores produce a lot of heat and therefore have to run at a significantly lower frequency.
If you are moving many billions of records to a data warehouse, Omni Loader will compress all that data into a columnar format, optimal for fast ingestion. Do not expect that to be a fast operation if all you provide is a 8-vcpu general purpose VM. Compressing terabytes requires fast CPU and a lot of cores.
As a general rule, provide at least 16vcpu VMs and go with 64vcpu VMs for terabytes of data and tens/hundreds of billions of records. If your CPU is always at a 100% utilization, provision a larger VM with more CPU cores to increase performance.
If running Omni Loader on a cloud VM, make sure to select a compute-optimized instance:
Azure: F-series
AWS: C7a
GCP: C2
Memory
The more parallel workers you use, the more memory Omni Loader will need. When migrating to a standard row-based relational target, memory usage will be low. However, if you are migrating to a data warehouse such as Fabric, Snowflake, Databrick, BigQuery - those workers will compress the whole source database content and transform it into a Parquet format for fast ingestion. Compressed chunks are being uploaded to the cloud storage and streamed during compressing. That means, a lot of data will be in memory at the same time.
If your VM memory usage is at 90% or higher, performance will suffer. If that happens, you should provision a VM with more memory. If you can't do that, reduce the number of parallel workers (which will also incur performance penalty). When system is low on memory, operating system will be paging data to/from disk. Disk is several orders of magnitude slower than RAM so everything will slow down and be unresponsive.
Disk
Disk speed doesn't matter much as Omni Loader processes all of the data in-memory to maximize performance. Only the internal database keeping track of the projects and individual migration run work items persists on disk. We recommend never to use a HDD, but SSD is good and NVME is great.
Network latency
The closer the database servers and Omni Loader are, the better. Network ping (time needed for a message to travel between nodes) is larger the farther the nodes are from each other. One part of the slowdown is due to the fact that signals travel with a finite speed, and another part is that data flows over many intermediate servers over the route, each adding to the latency.
Extracting data from Synapse Dedicated Pool
If source database is Synapse Dedicated Pool, you can choose to use our standard SQL SELECT mode, in which case sections above still apply. However, the most efficient way to move the data out of Synapse Dedicated Pool is Polybase.
The difference is that, in Polybase mode, Omni Loader acts as a simple orchestrator. We basically just request Synapse Dedicated Pool nodes to directly export Parquet files into ADLS Gen 2 storage and wait for the process to complete. The efficiency comes from the fact that Synapse nodes all export their data shards in parallel (in SQL SELECT mode, data is routed through a single node to Omni Loader). With Polybase, there is no significant network traffic with Omni Loader at all and there are no performance concerns. You can run Omni Loader wherever it suits you and on an underpowered machine, if desired.
In Polybase mode, we currently only support Fabric Warehouse as a target. The workflow is very simple, we wait for each table to be exported by Polybase and then trigger Fabric Warehouse ingestion. Unfortunately, you do lose nice progress bars and duration estimation as we don't know how long will either Polybase export or Fabric Warehouse ingestion take.
For a large number of tables, Synapse may not be able to return metadata information in reasonable amount of time. We have a built-in workaround to that system limitation: Caching Synapse metadata
Last updated