Use Azure Data Factory for Data Movement
Relates to Jet Analytics Data Integration 6024.1 and later versions.
Azure Data Factory (ADF) is ideal for moving large data sets. Data is transferred directly from source to target without relying on the resources of the application server.
Jet Analytics Data Integration can use Azure Data Factory to move data in the following two ways:
- Source to Ingest instance storage
- Ingest instance storage to Prepare instance storage
Create an Azure Data Factory resource
If you already have a Data Factory you want to use, you can skip this section.
- In the Azure Portal, go to Create a new Resource > Data Factory > Create.
- Select Version = V2.
- Assign a Subscription name, Resource Group, and Location. Git configuration is not required and can be disabled.
- After deployment, note the following properties — they will be needed later:
- Azure Data Factory Name
- Subscription ID
- Resource Group name
Create an App Registration
Note: Azure Administrator access is required to register the application in the Azure Portal.
To access Data Factory resources from Jet Analytics Data Integration, configure an App Registration in the Azure portal.
- In the Azure Portal, go to Azure Active Directory > App Registrations and click New Registration.
- Enter a name and select Accounts in this organizational directory only. The Redirect URI is the URL where your application is hosted. Click Register.
- On the new App Registration, go to Certificates & secrets and create a New Client Secret. The secret is encrypted after saving — record it somewhere safe. It will appear after you click Add.
- Note the following App Registration properties — they will be needed later:
- Azure Tenant ID
- Application ID
- Client Secret (Application Key)
Enable App Registration access to Data Factory
The following access control steps describe the minimum permissions required in most cases. In production environments, you may adjust these permissions to align with your business and compliance requirements. See the Microsoft Azure RBAC documentation for details.
- Navigate to the resource group where your Data Factory resource is located and open the Data Factory resource.
- Select Access Control (IAM) and click Add a role assignment.
- Add the App Registration you just created to the Data Factory Contributor role.
After adding or removing role assignments, wait at least 5 minutes before executing a task in an Ingest instance. Changes can take up to 30 minutes to take effect. See Troubleshoot Azure RBAC for more details.
Integration runtime
If you are connecting to a data source that is not hosted in Azure — such as a local SQL database, a SQL Server running on a VM, or an Oracle server — you will need a Self-Hosted Integration Runtime installed at that location. The same applies if your Prepare instance is running on a local SQL database and you want to use Azure Data Factory for the transfer.
See the Self-Hosted Integration Runtime Issues section below for troubleshooting the most common issues.
Transfer from Ingest instance to Prepare instance using Azure Data Factory
This option is not supported when using Azure Synapse SQL Pool (SQL DW).
A separate Azure Data Factory resource can be configured for each environment.
- In the Portal, open Instances and click to add a new Prepare instance, or edit an existing one.
- Configure the instance with a Name and database type as required.
- Scroll to the Transfer from Ingest instance section at the bottom of the menu.
- Change Technology from ADO.Net to Azure Data Factory.
- The required fields will appear.
- Complete the following fields:
- Azure Tenant ID — found in the overview of the App Registration.
-
Subscription ID — found in the Data Factory overview in Azure.
- (Optional) Azure Data Factory folder name — pipelines and datasets will be placed in this folder in ADF.
- Azure Data Factory Data Factory Name — the name of your Data Factory in Azure.
- Azure AD App Registration details:
- Application ID
- Application Key (Client Secret)
- Integration Runtime — required when transferring data from or to on-premises systems. See Configure a Self-Hosted Integration Runtime.
- Resource Group — the resource group where your Data Factory is located.
Transfer data from source to Ingest instance storage using Azure Data Factory
- On the Data sources page, search for data factory to select one of the available options.
- On the Connection Info page, enter the required information and click Next:
Azure Data Factory info
These fields are located at the bottom of the data source type menu and always require the same setup as described in the section above. The only field not present here is the Integration Runtime.
Azure SQL connection
- Authentication type: SQL Authentication is the only available option.
- Command timeout: in seconds. The default is 3600 (1 hour).
- Data source: the fully qualified server name. This is the connection property used by ADF when extracting data and synchronizing metadata (schemas, table names, field names, and data types).
- Database: the database name on the data source.
- Integration runtime name: required only when connecting to on-premises data sources. If used, you must also configure a Self-Hosted Integration Runtime.
- Username: the SQL authentication username.
- Password: the password for the above user account.
MySQL connection
- Command timeout: in seconds. The default is 3600 (1 hour).
- Database: the database name on the server.
- Integration runtime name: required only when connecting to on-premises data sources.
- Password: the password for the user account.
- Port: default is 3306.
- Server: the host where MySQL is running.
- SSL mode
- User ID: an account with at least read access to the database.
Oracle connection
- The required fields are the same as those described in the Jet Analytics Oracle Data Source Provider article using the EZCONNECT method. The difference is that the Host field here corresponds to the Server field in that guide, and a Self-Hosted Integration Runtime must be running on the Oracle server.
- If your Ingest instance is using an Azure Data Lake Gen2 container for storage, the Integration Runtime requires the Java Runtime Environment (JRE) to write and parse Parquet files. Install JRE on the same system as the Integration Runtime from java.com/en/download.
PostgreSQL connection
- Database name: a database on the server (e.g. dvdrental).
- Username: a user with read rights on the server. The standard admin account created during installation can be used.
- Password: the password for the user account.
- Command timeout: set to 0 for no timeout, or 3600 (recommended) for 1 hour.
- Force unicode conversion: when enabled, attempts to convert all strings to NVarChar.
- Azure Database for PostgreSQL: enable this when the source is hosted in Azure.
- Integration runtime name: required when the server is on-premises rather than in Azure.
- Server name: the server where PostgreSQL is hosted.
- Port: default is 5432.
Troubleshooting
Connectivity error
Error:System.Data.SqlClient.SqlException: A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible.
The data source must be accessible by both Azure Data Factory and the Ingest instance. If you receive this error, isolate which component cannot reach the data source.
If you have made changes to Data Lake or Data Factory configuration (Access Control, IAM, role assignments, etc.):
- Wait for the changes to take effect in Azure. See Troubleshoot Azure RBAC.
- Execute the task and verify it succeeds.
- Preview tables in Data Factory to verify connectivity.
- Synchronize the Ingest instance against the Prepare instance if Data Source changes were made.
- Preview tables in Ingest instance storage to verify the Data Lake setup.
- Deploy and execute in Jet Analytics Data Integration.
Also check your Azure Data Factory settings. In the Author pane under Datasets, locate the Destination dataset created for this data source and click Test connection.
Self-Hosted Integration Runtime issues
Verify that the Self-Hosted Integration Runtime is installed and running. For detailed troubleshooting, see Troubleshoot self-hosted integration runtime in Azure Data Factory.
Review your Data Factory settings in the Author pane under Monitor > Pipeline runs.
JreNotFound error
This error occurs when you have a working Ingest instance using Azure Data Lake storage and add a new ADF data source with a Self-Hosted Integration Runtime. The Sync task completes successfully but the Transfer task fails with a JreNotFound error.
Cause: ADF packages data in Parquet format on the client side before uploading to Azure Data Lake. Parquet format requires the Java Runtime Environment (JRE).
Resolution: Install the Java Runtime Environment from java.com/en/download on the same machine as the Self-Hosted Integration Runtime. See also: UserErrorJreNotFound error when running a copy activity to Azure.
Invalid Client Secret
Error:Failed to get access token by using service principal — invalid client secret.
In your Data Factory settings, go to the Author pane under Datasets and locate the Destination dataset for this data source.
Resolve the access token error, then click Test connection to confirm it works.
Failed to get access token by using MSI authenticator
Error:Failed to get access token by using MSI authenticator. Acquire MI token from AAD failed. ErrorCode: invalid_client — A configuration issue is preventing authentication.
Check whether the SQL authentication password has changed. If so, update and test the linked service in the ADF setup: go to Manage > Linked Services and review the SQL Server connection.
Cannot set the AccessToken property
Error:Cannot set the AccessToken property if the 'Integrated Security' connection string keyword has been set to 'true' or 'SSPI'.
If your Ingest instance storage is on a SQL Server, change its authentication from Windows authentication to SQL authentication using a username and password.