Sunday 1 April 2018

Looping Through Flat Files using SSIS 2016

Objective

Yes I know I know, ……… the “Looping through CSV or text files” is very very very old subject and it may have 100s of solutions, but I had to blog about it again because looping through CSV or text files is very commonly required in the ETL world, but I see that most of the companies are using very very old blogs (one like mine from 2010) and not seeing that the old approaches require polishing like….
  1. Converting the SSIS to SSIS2016+
  2. Using SSIS Project Model
  3. SSIS Project Parameters
  4. Package Parameters
  5. New techniques/tricks/approach/design
  6. Package Part
  7. SSISDB
  8. Etc…
      I decided to create a small ETL that extracts Flat File to SQL table and apply as much as I can from the above list, so let’s start.

Requirements

  1. A main folder, in my case it is “C:\P_C\2018\SQL2016\SSIS\Txt2SQL” and it will be set in the Package Parameter called “uPara_MainFolderPath”
  2. Create 2 Sub folders in the above folder “SampleFile”, “ToBeProcessed”
  3. A Sample CSV file called “SampleFile.csv” with data in the “Samplefolder”

Over look at the SSIS Project and package




Variables at 3 Levels, "Project", "Package Parameter" and "Package Variables"

        After creating a SSIS package, first thing is to create variables at 3 levels/locations within the SSIS project and package.

        1 - Variables at the “Project Parameter” level
               Create 2 Variables, one for your destination SQL server and one for your destination DB name.        



        2 - Variables at the “Package Parameters” level

  • uPara_MainFolderPath: Mandatory folder that will contain all the required folder related to this ETL package, some folders can be changed/redirected with the next 2 variables.
  • uPara_BackupMainFolderPath: if you don’t want the backup folder to be in the same folder as “uPara_MainFolderPath” you can change it by adding a valid path.
  • uPara_ToBeProcessedFolder: if you don’t want the source folder to be in the same folder as “uPara_MainFolderPath” you can change it by adding a valid path.


        3 - Variables at the “Package Variables” level
                The expression variables will be erased during the run time (My new design and technique). The only “Package Variable” you might want to change during your design time is the “uVar_SourceFileType” that if the source flat file is a text file (*.txt), you can change the “*.csv” to “*.txt”



Source and Destination Connection Objects

        We need one source “Flat File” connection object to do the reading and one SQL destination “OLEDB” connection object to do the writing.

1.“Flat File” (Source)


2.“OLEDB” (Destination)



Package Flow

The package has 2 main section, one is to create backup folder, clear Expressions, and some other variable settings, the second one is to loop through the Flat Files do the ETL and finally move the file to the backup folder, I will not explain the above two section except how I set the “For Each Loop” in SSIS. I have added a T-SQL script to create the destination table (look for DestinationClientTable.sql file)
The Loop must be set in 3 sections


1.Collection

2.Expression

3.Variable Mapping


How To Test?


  1. Set your SQL server name and SQL DB Name in the project parameters.
  2. Create the destination table (You can use DestinationClientTable.sql).
  3. Create a folder in my case it is “C:\P_C\2018\SQL2016\SSIS\Txt2SQL”.
  4. Copy the above Path in the “uPara_MainFolderPath” variable.
  5. Create 2 Sub folders 1 – SampleFile 2- ToBeProcessed.
  6. You must have your Sample.csv file copied in the SampleFile folder (I have provided one).
  7. Copy all of your source files into the ToBeProcessed folder or you can use the Sample.csv to get the ball rolling.
  8. Run your SSIS package
  9. Using windows explorer go to the same path as defined in “uPara_MainFolderPath” variable, you will see a sub folder named “BackupFolder”.
  10. Now your files have moved from ToBeProcessed folder to the BackupFolder folder under todays “…\BackupFolder\yyyy-mm\yyyy-mm-dd-HHmmss\”
  11. Also please check your destination table, my sample uses the [dbo].[Client] table.

Please note if…

  • If your ToBeProcessed folder (Source Folder) is located on another folder/machine you can enter the valid path in the “uPara_ToBeProcessedFolder” variable.
  • If your Backup folder is located on another folder/machine you can enter the valid path in the “uPara_BackupMainFolderPath” variable.
Do you want the code? Click here (From Google).

3 comments: