Skip to main content

Jet Analytics CSV Data Source

This data source connects to text or CSV files across folders and subfolders in a variety of locations. Multiple files can be merged into one or more specified tables, and a range of settings are available to handle non-standard file setups.

Configuration manual

The following settings are available for the Jet Analytics CSV Data Source connector.

General notes on setting comma-separated parameters

For every setting that accepts a comma-separated list as a parameter, individual items can be wrapped in square brackets. Any special character — including commas — is permitted within the brackets.

For example, the following date format list is parsed as three items (note the comma inside the brackets):

yyyyMMdd,[dd,MM,yyyy],yyyy-MM-dd

Connection settings

These settings specify the location of the file or files and how to access them.

Path

The path to the CSV file or files. Only one path can be specified per data connector. The path can point to a folder or a single file. For non-local locations, a separate field such as a drive ID (for SharePoint) specifies the root, in which case a / can be used as the path to indicate the root level.

Include sub-folders

Specifies whether CSV files should also be collected from subdirectories.

Note:

This setting is ignored for AWS, Azure, and Google Cloud locations.

Included file types

A comma-separated list of file extensions to read. Any file with an extension not listed here will be ignored.

File aggregation pattern

A comma-separated list of file patterns that should be treated as a single table. Supported wildcard characters are * and ?. When set, all CSV files matching a pattern are treated as one table. Files grouped by a pattern are expected to share the same schema — a schema mismatch will cause the process to break during a full data load.

Location specific fields

The remaining connection settings vary depending on the selected Location type. Only the fields relevant to the chosen location are editable, and those fields are mandatory.

Location

The following location types are supported: Local file or folder, Azure Blob Storage, AWS S3 Bucket, SharePoint or OneDrive, Google Cloud Storage, and SFTP.

Local file or folder

Use when the file is stored locally on a drive. To connect to an Azure File Share, use the UNC path format:

\\<storageAccountName>.file.core.windows.net\<fileShareName>\<folder>\<filename.csv>

Before connecting to an Azure File Share, the file share must be mounted as a network drive. To do this, locate the file share, click Connect, then Show Script, and copy the PowerShell script using the Copy code button.

Run this script in PowerShell as the same user account that runs the Ingest Instance Service. This cannot be done when running as Local System. To launch PowerShell as a different user, search for PowerShell, right-click it, choose Open file location, then right-click the application and select Run as a different user.

Paste the copied script into PowerShell and press Enter. A message will confirm that the folder has been created.

Azure Blob Storage settings

  • Azure blob connection string: The complete connection string to the container. See Configure a connection string — Azure Storage for details. A SAS connection string is also supported.
  • Azure tenant ID: The tenant ID of the storage account. Required for OAuth 2.0 authentication using the client grant type.
  • Azure client ID: The client ID of the application used for OAuth 2.0 client authentication.
  • Azure client secret: The secret value configured for the application.
  • Azure storage account name: The name of the storage account that hosts the container.
  • Azure blob container: The container name.

AWS S3 Bucket settings

This connector supports connecting with Access Key ID only. See the AWS documentation for more information.

  • AWS region: The region string (e.g. eu-west-2).
  • AWS access key ID: The access key ID for the S3 bucket.
  • AWS secret access key: The secret access key for the S3 bucket.
  • AWS bucket: The bucket name.

SharePoint or OneDrive settings

  • SharePoint/OneDrive client ID: The Client ID of the app used for access. The app requires the Application permissions Files.Read.All and Sites.Read.All. For broader access, Group.Read.All and User.Read.All may also be added.
  • SharePoint/OneDrive client secret: The secret code configured for the app.
  • SharePoint/OneDrive Tenant ID: The tenant ID of the company that hosts the SharePoint site.
  • SharePoint/OneDrive Drive ID: The drive ID pointing to the Document Library you are connecting to. See Locate a SharePoint/OneDrive drive ID.

Google Cloud Storage settings

This connector supports GCM authentication with service account keys. See Create and delete service account keys — Google Cloud for more information.

  • Google credential file: Path to the Service Account private key file (JSON or P12).
  • Google storage bucket name: The bucket name.

SFTP settings

This connector supports SFTP authentication with either a password or a public key file. One of these methods is required.

  • SFTP host
  • SFTP port
  • SFTP user name
  • SFTP key path

Culture setup

These settings control culture-specific parsing behaviour and control characters.

Culture and client culture

The specific culture to use for metadata parsing. Both are optional — the built-in invariant culture is used by default.

Delimiter

The delimiter character used to separate fields. Defaults to a comma. For tab-separated files, enter TabDelimited.

Line ending

The character or character sequence used to separate lines. By default, all standard newline characters are supported.

Quote character

The character used to wrap a string that contains control characters.

Ignore quotes

When enabled, the configured quote character is treated as literal text rather than a string wrapper.

Header setup

These settings control how file headers and comment rows are handled.

Has header record

Specifies whether the file contains a header row. If comment rows are present and configured to be skipped, they are excluded before the header record is parsed.

Include empty headers

When a header row contains empty column names, this setting controls whether the associated values are included or ignored during parsing.

Skip top

The number of rows to skip from the top before parsing begins. This count excludes the header row if one is present.

Skip comment rows

Specifies whether comment rows in the file should be ignored. This must be enabled if comment rows are present — failing to do so can cause processing errors.

Comment start character(s)

The character or character sequence that identifies a row as a comment row.

Ignore brackets around column names

When enabled, square brackets surrounding field names in the file are stripped from the column names.

Incomplete row handling

These settings control how empty, incomplete, and null values are handled during parsing.

Ignore blank rows

When enabled, empty rows are ignored during parsing. If disabled, an empty row will break the processing.

Ignore incomplete rows

When enabled, incomplete rows are ignored during parsing. If disabled, an incomplete row will break the processing.

Trim spaces

When enabled, leading and trailing whitespace characters are trimmed from field values.

Empty fields equal to null

When enabled, empty fields are processed as null values in the resulting data tables.

Default value if null

The default value applied when a null is encountered. Only applies when Empty fields equal to null is enabled.

Datatype handling

These settings control how data types are inferred and parsed.

Infer datatypes

When enabled, the connector attempts to infer field data types from the file content. If disabled, all fields are read as string values.

Metadata URI

(Optional) Specify the path to a single file to use as a template for defining metadata across all files in the configured path. This can reduce execution time for the import metadata task by limiting the rows scanned for data type inference to just the rows in the metadata file, rather than rows across all files. For example: C:\FlatFiles\Sales.csv

Note:

When this setting is used, the Infer datatypes and Parsed row limit settings apply to the metadata file rather than the files specified by the path property.

Max cell length

The maximum length for text columns. If not set, the column length is dynamically determined from the parsed rows during metadata parsing, taking the maximum observed length plus an extra 25%. If a longer value is encountered during a full data load than the configured cell length, the process will break.

Data format

A comma-separated list of custom date formats following .NET custom date and time format strings. The default value is derived from culture settings. When this setting is configured, culture-based date formatting is ignored.

True values

A comma-separated list of case-insensitive values that represent a boolean True.

False values

A comma-separated list of case-insensitive values that represent a boolean False.

Parsed row limit

The number of rows parsed to extract metadata. For values of 0 or lower, all rows are parsed. Defaults to 1000 rows.

Was this article helpful?

We're sorry to hear that.