JSON job specification

Omni Loader can be run as a console like this:

OmniLoader.exe -job \path\to\the\file.json

Things to know:

  • Tables are optional. If omitted, all tables are copied.

  • Source connection needs to have an alias, as this plays important role in the customizations.

  • Most relational databases need just the following parameters: server, port, username, password, database.

  • If several source databases are used, instead of a single Source element, you should use Sources and have it contain an array of items.

A typical example - SQL Server database to Synapse Dedicated Pool

Please note that WHERE and OPTIONS table parameters are optional.

{
  "Source": {
    "Alias": "AdventureWorks2017",
    "SqlServer": {
      "Server": "",
      "Port": 1433,
      "Database": "AdventureWorks2017",
      "Username": "{{user}}",
      "Password": "{{password}}"
    },
    "Tables": [
      { "Name": "Person.Person", "Where": "MiddleName = 'J'", "Options": "DISTRIBUTION = ROUND_ROBIN" }
    ]
  },
  "Target": {
    "Synapse": {
      "Server": "{{yourserver}}.database.windows.net",
      "Database": "{{database name}}",
      "Username": "{{user}}",
      "Password": "{{password}}",
      "AuthenticationType": "sqlserver",  // Windows|SqlServer
      "StorageType": "adlsv2",  // Local|AzureBlob|ADLSv2
      "StorageFormat": "parquet", // CSV|Parquet
      "StorageCompression": "snappy", // None|Snappy|GZip
      "ConnectionString": "{{connection string from Microsoft Azure Storage Explorer}}",
      "Container": "{{container}}",
      "ContainerDirectory": "{{directory in the container}}"
    }
  }
}

An example of a single table to be copied

{
  "Source": {
    "Alias": "Adv",
    "SqlServer": {
      "Server": "",
      "Port": 1433,
      "Database": "AdventureWorks2017",
      "Username": "sa",
      "Password": "test"
    },
    "Tables": [
      { "Name": "dbo.DatabaseLog" }
    ]
  },
  "Target": { ... }
}

Certain useful project options are exposed in JSON job as well.

Log filenames can be relative or absolute. If relative, they are placed into the default logs folder (logs/projects/{project-name}/runs/{timestamp}). Mapping rules can be separated via new lines or semicolons.

{
  "Source": { ... },
  "Target": { ... },
  "Options": {
    "SummaryLogFilename": "summary.log",
    "ErrorLogFilename": "error.log",
    "WarningLogFilename": "warning.log",
    "SqlLogFilename": "sql.log",
    "ErrorSqlLogFilename": "errorsql.log",
    "JsonLogFilename": "details.json",
    "SchemaMapping": "person=people",
    "TableNameMapping": "production=prod",
    "ColumnNameMapping": "parent=parent_id",
    "DataTypeMapping": "varchar=nvarchar;char(10)=char(30)",
    "DefaultValuesMapping": "now=current_timestamp",
    "TableSlices": 16,
    "TableSliceMinMB": 100,
    "FileTableRowLimit": 100000,
    "DryRun": false,
    "DataCopyingMode": "Everything"  // None|Everything|Delta
  }
}

SQL Server

SQL Server authentication

{
  "SqlServer": {
    "Server": "R12",
    "Database": "AdventureWorks2017",
    "AuthenticationType": "SqlServer",  // Windows|SqlServer
    "Username": "sa",
    "Password": "test"
  }
}

Windows authentication

{
  "SqlServer": {
    "Server": "R12",
    "Database": "AdventureWorks2017",
    "AuthenticationType": "Windows" // Windows|SqlServer
  }
}

Oracle

{
  "Oracle": {
    "ConnectionType": "Direct", // Direct|OracleClient
    "Server": "{{server ip, or empty if localhost}}",
    "Port": 11522,
    "Sid": "{{sid}}", // or "ServiceName": "{{service name}}",
    "Username": "sys",
    "Password": "{{password}}",
    "Role": "SysDBA"  // Default|SysDBA|SysOPER
  }
}

Synapse Dedicated Pool

{
  "Synapse": {
    "Server": "{{yourserver}}.database.windows.net,
    "Database": "{{database name}}",
    "Username": "{{user}}",
    "Password": "{{password}}",
    "AuthenticationType": "sqlserver",
    "StorageType": "adlsv2",  // Local|AzureBlob|ADLSv2
    "StorageFormat": "parquet", // CSV|Parquet
    "StorageCompression": "snappy", // None|Snappy|GZip
    "ConnectionString": "{{conn string from storage explorer}}",
    "Container": "{{container}}",
    "ContainerDirectory": "{{directory name in the container}}"
  }
}

PostgreSQL

{
  "PostgreSQL": {
    "Server": "localhost",
    "Port": 5432,
    "Database": "{{db name}}",
    "Username": "postgres",
    "Password": "{{password}}"
  }
}

Teradata

{
  "Teradata": {
    "Server": "127.0.0.1",
    "Port": 1025,
    "Database": "{{db name}}",
    "Username": "dbc",
    "Password": "dbc"
  }
}

Parquet

{
  "Parquet": {
    "DataStorageType": "adlsv2",  // Local|AzureBlob|ADLSv2
    "StorageFormat": "parquet", // CSV|Parquet
    "StorageCompression": "snappy", // None|Snappy|GZip
    "ConnectionString": "{{conn string from storage explorer}}",
    "Container": "adv",
    "ContainerDirectory": "console"
  }
}

Slicing the tables

To improve the performance of data load, specify how many workers should divide the work on a single table. Only tables with a slicing key (usually a PK) can be sliced.

{
  "Source": { ... },
  "Target": { ... },
  "Options": {
    "TableSlices": 16,
    "TableSliceMinMB": 100
  }
}

Tables with no indexes cannot be efficiently sliced as each read requires a full table scan regardless of SQL WHERE filters. Even if read requires a single worker one can still generate multiple target files for efficient ingestion into the data warehouse. FileTableRowLimit option specifies how many records should each target file have. Use it for CSV and Parquet targets. Please note that TableSlices must be set to at least 2 to enable the row limit option.

{
  "Source": { ... },
  "Target": { ... },
  "Options": {
    "TableSlices": 2,
    "FileTableRowLimit": 100000
  }
}

Last updated