Performance
Last updated
Last updated
By default, Omni Loader will read the metadata (database structure) for all databases in parallel. It will read all the details like columns, data types, indexes, foreign keys, etc. This allows you to have a super-snappy UI because once we fetch the metadata, we don't need to touch the database again if you want to, for example, examine the table columns.
On the other hand, some database have tens of thousands of tables and are very slow. For example, mainframes. If you need to migrate just a few tables, you may not want to wait for 2 hours until mainframe reads its system views to tell us the structure of all of the tables.
If this option is enabled, we will fetch just a list of tables, and all of the tables in the project will be deselected by default. Only when you select a table will we read its structure.
On very large tables it can take a while to determine number of records. We use that information to display a nice progress bar and statistics during the migration. So, by default we query database statistics only. If you want to know exactly how many records are in tables before copying them, disable this option and wait a little longer while measuring tables step completes.
Omni Loader migration engine is a massively parallel distributed cluster. You can have hundreds of workers, each doings the job assigned, never blocking any of the others. To optimally assign jobs, an agent runs as many jobs in parallel as there are CPU cores. You can adjust the pressure on each agent by specifying more or less workers.
Number of jobs ingesting the data into the data warehouse target from the data lake storage. This is separate from the data extraction workers, as usually one can have dozens of extractions jobs without overloading the source database, but not so many ingestion workers as ingestion puts more pressure on the target.
If you have very large tables, it is important to read its many parts in parallel. A single reader has a very real ceiling of performance, so you can multiply the throughput by using more than one connection. Omni Loader will dynamically generate the SQL condition for each slice and load as many of them in parallel as there are available workers.
Especially for data warehouse target, you don't want to add unnecessary pressure by loading many tiny files from the data lake. This setting will lower the number of slices if necessary.
When reading non-indexed large tables, there is no efficient way to subdivide the work and load many parts of that table in parallel. So, to avoid multiple huge table scans we employ a single read. However, to avoid a very slow ingestion, Omni Loader outputs several files and ingests them into the target in parallel. You can specify number of rows for each staging file here.
For Azure Synapse Analytics, we will copy the distribution if source table has it. Alternatively, you can specify the table distribution model here. Please note that for tables less than 4GB in size, we will use REPLICATE distribution.