Thursday 11 February 2021

Extracting Data from CSV file to SQL Server using Azure Blob Storage and Azure Data Factory

Objective

One of the general ETL practice that is needed in the cloud is to inject data from a CSV file to a SQL the file will be within Azure Blob Storage and the destination SQL is a Azure SQL Server. Once moving to the cloud the the ETL concept changes because of the large amount of data or large amount of ETL file loads and for that a automated/configured mechanism is required to load the hundreds and hundreds of files maybe an IoT data load might be a good example, an IoT service may have have more than one hundred devices, each device generates hundreds of files every day.
The solution is to create a config table in SQL that contained all the information about an ETL like 
1 - Destination SQL table
2 - Mapping fields
3 - ...
4 - Source File Path
5 - Source File name and extension 
6 - Source File Delimiter
7 - etc...

Requirements

    The Azure Data Factory (ADF) will read the configuration table that contains the source and the destination information and finally process the ETL load from Azure Blob Storage (Container) to Azure SQL database, the most important part is that the config tale contains te mapping field in a JSON format in the config table


        A mapping JSON example looks like 
'
{"type": "TabularTranslator","mappings": [
{"source":{"name":"Prénom" ,"type":"String" ,"physicalType":"String"},"sink": { "name": "FirstName","type": "String","physicalType": "nvarchar" }},
{"source":{"name":"famille Nom" ,"type":"String" ,"physicalType":"String"},"sink": { "name": "LastName","type": "String","physicalType": "nvarchar" }},
{"source":{"name":"date de naissance" ,"type":"String" ,"physicalType":"String"},"sink": { "name": "DOB","type": "DateTime","physicalType": "date" }}
]}
'

Some of the config table are as mentioned
    Source Fields 
  1. [srcFolder] nvarchar(2000)
  2. [srcFileName] nvarchar(50)
  3. [srcFileExtension] nvarchar(50)
  4. [srcColumnDelimiter] nvarchar(10)
  5. etc...
    Destination Fields
  1. [desTableSchema] nvarchar(20)
  2. [desTableName] nvarchar(200)
  3. [desMappingFields] nvarchar(2000)
  4. ....
  5. [desPreExecute_usp] nvarchar(50)
  6. [desPostExecute_usp_Success] nvarchar(50)
  7. ....

Please note…
  • The DFT is called by a Master pipeline and has no custom internal logging system
  • The three custom SQL stored procedures are not included in this solution
  • Three custom SQL stored procedures can be used/set within the configuration table for each ETL , one for PreETL, second PostETLSuccess and Finally PostETLFailure
Do you want the code? Click here (From Google).

Extracting Data from SQL to CSV files using SSIS and PowerShell 7.x

 Objective

I decided to combine SSIS and PowerShell 7.0 to create one new ETL SSIS package that transfers CSV files to SQL server using a configuration table, the objective is to create a config table in SQL that contained all the information about an ETL like 
1 - Source SQL table
2 - Selected fields
3 - Filters like the t-sql WHERE clause 
4 - Tracking mechanism
5 - Destination File Path 
6 - Destination File name and extension
7 - etc...

.

Requirements

    The powershell script will read CSV files and insert the data into a SQL Server destiantion table, the parameters are as mentioned.
  1. [String]$srcSQLServerConnectionString = "Server = ABC ; Database = XYZ; Integrated Security = True;"
  2. [String]$srcSELECTQuery = "SELECT Fields  FROM tblXYZ"
  3. [String]$desFilePath = "<DRIVER>:\...\ETLFolder\FileToBeSFTPed"
  4. [String]$desFileName = "YYYY-MM-DD HHMMSS FileName"
  5. [String]$desFileExtension = "csv"                                 OR "txt"
  6. [String]$desColumnDelimiter = "TAB"                       # "TAB"   "`t"    |    ;    `
    At this point the SSIS package will read the config table, start a "Tracking mechanism" (getting the MAX identity field or MAX modified date) then create a final T-SQL string for the PowerShell.

            Some of the config table are as mentioned
    Source Fields 
  1. [srcTableSchema] nvarchar(20)
  2. [srcTableName] nvarchar(200)
  3. [srcFieldList] nvarchar(2000)
  4. [srcWHEREClause]                 nvarchar(1000)
  5. etc...
    Destination Fields
  1. [desFolder] nvarchar(2000)
  2. [desFileName] nvarchar(50)
  3. [desFileExtension] nvarchar(50)
  4. ....
  5. [srcPreExecute_usp] nvarchar(50)
  6. [srcPostExecute_usp_Success] nvarchar(50)
  7. ....

Please note…
  • PowerShell script may need to be registered
  • The SSIS has an internal logging system
  • Three custom SQL stored procedures can be used/set within the configuration table for each ETL , one for PreETL, second PostETLSuccess and Finally PostETLFailure
Do you want the code? Click here (From Google).