Wednesday, 30 May 2018

Azure Stream Analytics Presentation

Azure Stream Analytics Job




        May the 31st, 2018 I will be presenting an event at Microsoft Canada (1950 Meadowvale blvd, Mississauga, Ontario, Canada) .

Details of the Presentation:

        “Azure Stream Analytics” will show you how you can simulate your data and work with ASAJ query.


Speaker's Profile:

        Nik is a Snr Technical Team Lead, Senior BI Consultant and designer for 2 teams at Finastar (D+H), his main focus is on Business Intelligence, Data Platform Solutions Architect, Big Data, IoT Hub, Azure Stream Analytics, Power BI, SSIS, SQL Tabular DM, and anything related to data/backend.
Other things he does are, blogging, working with other online Forums, mentoring, presentation, and lots more.

SNikkhah@yahoo.com
https://www.linkedin.com/in/nnikkhah/
http://sqldataside.blogspot.ca/

How to Register

Click here to register
https://www.meetup.com/C-Corner-Toronto-Chapter-Meetup/events/250530743/


Microsoft Canada
1950 Meadowvale blvd, 
Mississauga, 
Ontario, Canada

Wednesday, 18 April 2018

Presentation at Microsoft Canada on "Azure IoT Hub"

Azure IoT Hub




        May the 3rd, 2018 I will be presenting an event at Microsoft Canada (1950 Meadowvale blvd, Mississauga, Ontario, Canada) .

Details of the Presentation:

        “From: IoTHub To: Power BI Via: Stream Analytics” will show you how you can simulate your data and display the data as a “Live Data Streaming” in Power BI.



Speaker's Profile:

        Nik is a Snr Technical Team Lead, Senior BI Consultant and designer for 2 teams at Finastar (D+H), his main focus is on Business Intelligence, Data Platform Solutions Architect, Big Data, IoT Hub, Azure Stream Analytics, Power BI, SSIS, SQL Tabular DM, and anything related to data/backend.
Other things he does are, blogging, working with other online Forums, mentoring, presentation, and lots more.

SNikkhah@yahoo.com
https://www.linkedin.com/in/nnikkhah/
http://sqldataside.blogspot.ca/

How to Register

Click here to register
https://www.meetup.com/C-Corner-Toronto-Chapter-Meetup/events/249922416/


Microsoft Canada
1950 Meadowvale blvd, 
Mississauga, 
Ontario, Canada

Tuesday, 10 April 2018

From: IoTHub To: Power BI Via: Stream Analytics

From: IoTHub To: Power BI Via: Stream Analytics
















        April the 23rd, 2018 I will be presenting and sponsoring an event at Microsoft Canada (Downtown Toronto) .

Details of the sponsor’s presentation:

        “From: IoTHub To: Power BI Via: Stream Analytics” will show you how you can simulate your data and display the data as a “Live Data Streaming” in Power BI.

Speaker's Profile:

        Nik is a Snr Technical Team Lead, Senior BI Consultant and designer for 2 teams at Finastar (D+H), his main focus is on Business Intelligence, Data Platform Solutions Architect, Big Data, IoT Hub, Azure Stream Analytics, Power BI, SSIS, SQL Tabular DM, and anything related to data/backend.
Other things he does are, blogging, working with other online Forums, mentoring, presentation, and lots more.

SNikkhah@yahoo.com
https://www.linkedin.com/in/nnikkhah/
http://sqldataside.blogspot.ca/

How to Register

Microsoft Advertising
222 Bat Street, Suite 1201, M5K 1E7
Toronto, ON

https://www.meetup.com/Toronto-Power-BI-Meetup/


Sunday, 8 April 2018

Looping Through Excel (97-2003) Files And Sheets using SSIS 2016

Objective

After writing an article about “LoopingThrough Flat Files using SSIS 2016” I decided to blog about the same subject but this time related to excel, again I have to say yes it is a very old subject but I see that most of the companies are using very very old blogs (one like mine from 2010) and not seeing that the old approaches require polishing like….

  1. Converting the SSIS to SSIS2016+
  2. Using SSIS Project Model
  3. SSIS Project Parameters
  4. Package Parameters
  5. New techniques/tricks/approach/design
  6. Package Part
  7. SSISDB
  8. Etc…

      I decided to create a small ETL that extracts all the sheets from an Excel (97-2003) files to SQL table and apply as much as I can from the above list, so let’s start.

Requirements

       1.A main folder, in my case it is “C:\P_C\2018\SQL2016\SSIS\ Exl2SQL-Exl97-2003” and it will be set in the Package Parameter called “uPara_MainFolderPath”
       2.Create 2 Sub folders in the above folder “SampleFile”, “ToBeProcessed”
       3.A Sample CSV file called “SampleFile.csv” with data in the “Samplefolder”

Over look at the SSIS Project and package





Variables at 3 Levels, "Project", "Package Parameter" and "Package Variables"


        After creating a SSIS package, first thing is to create variables at 3 levels/locations within the SSIS project and package.



        1 - Variables at the “Project Parameter” level
               Create 2 Variables, one for your destination SQL server and one for your destination DB name.        



        2 - Variables at the “Package Parameters” level


        3 - Variables at the “Package Variables” level
                Except the “uVar_ExcelConnectionString” variable all other expression variables will be erased during the run time (My new design and technique). The only “Package Variable” you might want to change during your design time is the “uVar_ExcelConnectionString” variable.



             You can see that I have added 3 new package variables related to the Excel file and its related objects, the only variable that you might need to change is the excel connection string variable (For example I added IMEX=1), and the sheet name during the development time.

1- "uVar_ExcelConnectionString": This is the excel connection string
"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + @[User::uVar_SourceFile]  + ";Extended Properties=\"EXCEL 8.0;HDR=YES;IMEX=1\";"

2- "uVar_ExcelSheetName": Change this sheet name to your default sheet name in the SampleFolder, this name will only help you during the development and during runtime it will be overwritten by new sheet name each time.

3- "uVar_ExcelSheetObjectList": This is a SSIS package object Variable that will contain the list of Excel Sheet Names.

Source and Destination Connection Objects

        We need one source “Excel File” connection object to do the reading and one SQL destination “OLEDB” connection object to do the writing.


 1.“Excel File” (Source)

 2.“OLEDB” (Destination)




Package Flow

          The package has 2 main section, one is to create backup folder, clear Expressions, and some other variable settings, the second one is to loop through each excel (97-2003) Files one by one and loop through each Sheet one by one, do the ETL and finally move the file to the backup folder, I will not explain the above two section except how I set the second “For Each Loop” in SSIS. I have added a T-SQL script to create the destination table (look for DestinationClientTable.sql file)

         The first Loop must be set in 3 sections that you can go to my previous blog and check how, click on “Looping Through Flat Files using SSIS 2016”.

         After the first loop you will see a .Net code object script “SCRIPT-----Get The List of Excel Sheets” that will extract the list of excel sheets and place them into the variable “uVar_ExcelSheetObjectList” which is a SSIS object variable.


The .Net code that loops through the sheets in each excel files goes like this.

    Public Sub Main()
        Try
            '--------------------------------------------------------
            ' User::uVar_ExcelConnectionString
            ' User::uVar_ExcelSheetObjectList
            '--------------------------------------------------------

            '--- Added by SQL Data Side Inc.  Nik-Shahriar Nikkhah
            ' Don't forget to add >>>>>    Imports System.Data.OleDb
            Dim excelConnection As OleDbConnection

            Dim connectionString As String
            Dim tablesInFile As DataTable
            Dim tableCount As Integer = 0
            Dim tableInFile As DataRow
            Dim currentTable As String
            Dim tableIndex As Integer = 0
            Dim excelTables As String()
            Dim LoopForNumnberOfRealTabs As Integer = 0

            connectionString = Dts.Variables("User::uVar_ExcelConnectionString").Value.ToString

            excelConnection = New OleDbConnection(connectionString)
            excelConnection.Open()

            tablesInFile = excelConnection.GetSchema("Tables")
            tableCount = tablesInFile.Rows.Count

            For Each tableInFile In tablesInFile.Rows
                currentTable = tableInFile.Item("TABLE_NAME").ToString
                'str = tableInFile.Item("TABLE_Type").ToString
                'str = tableInFile.Item("TABLE_SCHEMA").ToString
                'str = tableInFile.Item("TABLE_CATALOG").ToString
                currentTable = currentTable.Replace("'", "")

                If Right(currentTable, 1) = "$" Then
                    LoopForNumnberOfRealTabs += 1
                    ReDim Preserve excelTables(LoopForNumnberOfRealTabs - 1)
                    excelTables(LoopForNumnberOfRealTabs - 1) = currentTable
                End If
            Next

            excelConnection.Close()
            excelConnection = Nothing

            Dts.Variables("User::uVar_ExcelSheetObjectList").Value = excelTables
            Dts.TaskResult = ScriptResults.Success

        Catch ex As Exception

            Dim strEX As String
            strEX = ex.Message.ToString
            Dts.TaskResult = ScriptResults.Failure

        End Try
    End Sub
------------------------------

Then comes the second loop that uses the “uVar_ExcelSheetObjectList” package variable (populated in the “SCRIPT-----Get The List of Excel Sheets” step) which contains the list of the excel sheets and loops through them one by one, on each loop it places the excel sheet name into the “uVar_ExcelSheetName” variable that will be used in the source object in the DFT.

With 2 steps you can set up the second loop.

1.Collection



 2.Variable Mapping



Please check the DFT settings




How To Test?


  1. Set your SQL server name and SQL DB Name in the project parameters.
  2. Create the destination table (You can use DestinationClientTable.sql).
  3. Create a folder in my case it is “C:\P_C\2018\SQL2016\SSIS\Exl2SQL-Exl97-2003”.
  4. Copy the above Path in the “uPara_MainFolderPath” variable.
  5. Create 2 Sub folders 1 – SampleFile 2- ToBeProcessed.
  6. You must have your SampleFile97-2003.xls (97-2003 format) file copied in the SampleFile folder (I have provided two files, one of then has multiple sheets “SampleFile97-2003 - MultipleSheet.xls”).
  7. Copy all of your source files into the ToBeProcessed folder or you can use the SampleFile97-2003.xls to get the ball rolling (or use “SampleFile97-2003 - MultipleSheet.xls”).
  8. Run your SSIS package
  9. Using windows explorer go to the same path as defined in “uPara_MainFolderPath” variable, you will see a sub folder named “BackupFolder”.
  10. Now your files have moved from ToBeProcessed folder to the BackupFolder folder under todays “…\BackupFolder\yyyy-mm\yyyy-mm-dd-HHmmss\”
  11. Also please check your destination table, my sample uses the [dbo].[Client] table.

Please note if…


  • If your ToBeProcessed folder (Source Folder) is located on another folder/machine you can enter the valid path in the “uPara_ToBeProcessedFolder” variable.
  • If your Backup folder is located on another folder/machine you can enter the valid path in the “uPara_BackupMainFolderPath” variable.
Do you want the code? Click here (From Google).

References…

  1. http://plexussql.blogspot.ca/2010/04/looping-through-excel-files-and-sheets.html
  2. https://docs.microsoft.com/en-us/sql/integration-services/connection-manager/excel-connection-manager
  3. https://docs.microsoft.com/en-us/sql/integration-services/connection-manager/connect-to-an-excel-workbook
  4. https://docs.microsoft.com/en-us/sql/integration-services/extending-packages-scripting-task-examples/working-with-excel-files-with-the-script-task
  5. https://docs.microsoft.com/en-us/sql/integration-services/load-data-to-from-excel-with-ssis
  6. https://docs.microsoft.com/en-us/sql/integration-services/extending-packages-scripting-task-examples/working-with-excel-files-with-the-script-task
  7. http://www.madeiradata.com/load-data-excel-ssis-32-bit-vs-64-bit/

Sunday, 1 April 2018

Looping Through Flat Files using SSIS 2016

Objective

Yes I know I know, ……… the “Looping through CSV or text files” is very very very old subject and it may have 100s of solutions, but I had to blog about it again because looping through CSV or text files is very commonly required in the ETL world, but I see that most of the companies are using very very old blogs (one like mine from 2010) and not seeing that the old approaches require polishing like….
  1. Converting the SSIS to SSIS2016+
  2. Using SSIS Project Model
  3. SSIS Project Parameters
  4. Package Parameters
  5. New techniques/tricks/approach/design
  6. Package Part
  7. SSISDB
  8. Etc…
      I decided to create a small ETL that extracts Flat File to SQL table and apply as much as I can from the above list, so let’s start.

Requirements

  1. A main folder, in my case it is “C:\P_C\2018\SQL2016\SSIS\Txt2SQL” and it will be set in the Package Parameter called “uPara_MainFolderPath”
  2. Create 2 Sub folders in the above folder “SampleFile”, “ToBeProcessed”
  3. A Sample CSV file called “SampleFile.csv” with data in the “Samplefolder”

Over look at the SSIS Project and package




Variables at 3 Levels, "Project", "Package Parameter" and "Package Variables"

        After creating a SSIS package, first thing is to create variables at 3 levels/locations within the SSIS project and package.

        1 - Variables at the “Project Parameter” level
               Create 2 Variables, one for your destination SQL server and one for your destination DB name.        



        2 - Variables at the “Package Parameters” level

  • uPara_MainFolderPath: Mandatory folder that will contain all the required folder related to this ETL package, some folders can be changed/redirected with the next 2 variables.
  • uPara_BackupMainFolderPath: if you don’t want the backup folder to be in the same folder as “uPara_MainFolderPath” you can change it by adding a valid path.
  • uPara_ToBeProcessedFolder: if you don’t want the source folder to be in the same folder as “uPara_MainFolderPath” you can change it by adding a valid path.


        3 - Variables at the “Package Variables” level
                The expression variables will be erased during the run time (My new design and technique). The only “Package Variable” you might want to change during your design time is the “uVar_SourceFileType” that if the source flat file is a text file (*.txt), you can change the “*.csv” to “*.txt”



Source and Destination Connection Objects

        We need one source “Flat File” connection object to do the reading and one SQL destination “OLEDB” connection object to do the writing.

1.“Flat File” (Source)


2.“OLEDB” (Destination)



Package Flow

The package has 2 main section, one is to create backup folder, clear Expressions, and some other variable settings, the second one is to loop through the Flat Files do the ETL and finally move the file to the backup folder, I will not explain the above two section except how I set the “For Each Loop” in SSIS. I have added a T-SQL script to create the destination table (look for DestinationClientTable.sql file)
The Loop must be set in 3 sections


1.Collection

2.Expression

3.Variable Mapping


How To Test?


  1. Set your SQL server name and SQL DB Name in the project parameters.
  2. Create the destination table (You can use DestinationClientTable.sql).
  3. Create a folder in my case it is “C:\P_C\2018\SQL2016\SSIS\Txt2SQL”.
  4. Copy the above Path in the “uPara_MainFolderPath” variable.
  5. Create 2 Sub folders 1 – SampleFile 2- ToBeProcessed.
  6. You must have your Sample.csv file copied in the SampleFile folder (I have provided one).
  7. Copy all of your source files into the ToBeProcessed folder or you can use the Sample.csv to get the ball rolling.
  8. Run your SSIS package
  9. Using windows explorer go to the same path as defined in “uPara_MainFolderPath” variable, you will see a sub folder named “BackupFolder”.
  10. Now your files have moved from ToBeProcessed folder to the BackupFolder folder under todays “…\BackupFolder\yyyy-mm\yyyy-mm-dd-HHmmss\”
  11. Also please check your destination table, my sample uses the [dbo].[Client] table.

Please note if…

  • If your ToBeProcessed folder (Source Folder) is located on another folder/machine you can enter the valid path in the “uPara_ToBeProcessedFolder” variable.
  • If your Backup folder is located on another folder/machine you can enter the valid path in the “uPara_BackupMainFolderPath” variable.
Do you want the code? Click here (From Google).

Monday, 27 November 2017

List of BI Tools - On Premises Enterprise BI Architecture & Design from a bird's-eye view.

List of BI Tools

There are a variety of BI tools that you need to design yourself. These tools are mechanisms that are required in any Enterprise BI environment.

See the overall objective and complete agenda here: Overall Objective and Agenda

List of BI Tools

HashByte Creator Tool

Problem Statement

I have more than 100 SQL tables that needs to have a "checksum" field something like HashByte MD5, the major problem is that the number of fields keep on changing on the fly and I need to fix my checksum field each time, I need a tool to handle the issues and I need to have a mechanism that can identify or apply the new HashByte changes.

Basic knowledge of HashByte

Hashbytes is a mathematical algorithm that comes in different variety that are known as… MD2, MD4, MD5, SHA, SHA1, SHA2_256, SHA2_512.
Click here for more info.

Where can HashByte be used?

Hashbyte is mainly used in 2 areas

1 - Encryption : When encrypting credit card information, the developer now can save the encrypted information in a SQL table.

2 - Checksum : When having 100s of fields in a table that is a SCD table, the 100s of fields can be replaced with one Hashbyte field.

http://sqlbanana.blogspot.ca/2014/07/checksum-hashbytes-and-slowly-changing.html
https://en.wikipedia.org/wiki/Slowly_changing_dimension

Fields in a Table

First you must know your fields very well you must identity ....
1 - External fields (Data related): fields imported from the outside world into your database/table
  • [Natural Key(s)]
  • Other data related fields

2 - Internal fields (Not Data related): This is the list of the fields that have been defined for your table(s)
  • [Surrogate Key] (Numeric)
  • Checksum field "[iHashByteChecksum] [binary](16) NOT NULL"
  • [FromDate], [ToDate] (SCD related)
  • [BatchLogID], [RowNumber]
  • [CreateDate], [UpdatedDate] , [UserName]

 HashByte Creator Tool

 I am providing few SQL scripts that you will have to read and run in sequence of the file number

  1. "010 CreateDB.sql"
  2. "020 CreateSchema.sql"
  3. "030 frmwrkDestinationTable.sql"
  4. "040 HashByteConfigTable.sql"
  5. "050 ufnCreateHashByteString.sql"
  6. "060 Test.sql"

  • The file "030 frmwrkDestinationTable.sql" will play a role of a dimension table or any table that you need to have a Hashbyte field on it

  • The file "040 HashByteConfigTable.sql" is part of the "HashByte Creator Tool" what it does it creates a configuration table, you must configure and set this table according to your sql tables that needs to be HashByted (in the BI world mainly the dimension).
Scenario: If a new field is added in the dimension table what happens?
Answer: The function picks all the fields except, Natural Keys, Internal fields, and Excluded fields, in that case the new added fields will get picked up in the next run, check the WHERE clause of the SQL statement.

FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = @SchemaName
AND TABLE_NAME = @TableName
AND COLUMN_NAME NOT LIKE '%checksum%'
AND COLUMN_NAME NOT LIKE '%HashByte%'
AND COLUMN_NAME NOT LIKE 'i%'
AND CHARINDEX (COLUMN_NAME , @SurrogateKeys) < 1
AND CHARINDEX (COLUMN_NAME , @NaturalKeys)   < 1
AND CHARINDEX (COLUMN_NAME , @ExcludeFields) < 1

  • The "050 ufnCreateHashByteString.sql" file creates a string that can be used in a Hashbyte function that looks like
     ISNULL(RTRIM(CONVERT(varchar(11),[DataField2])),'')
 + ISNULL(RTRIM(CONVERT(varchar(23),[DataField3])),'')
 + ISNULL(RTRIM([DataField4]),'')

  • The "060 Test.sql" does the test and wraps the above string into a Hashbyte string like..
UPDATE [fw].[frmwrkDestinationTable]
SET [iHashByteChecksum] = convert(varbinary(16) , HASHBYTES('MD5',
   ISNULL(RTRIM(CONVERT(varchar(11),[DataField2])),'')
+ ISNULL(RTRIM(CONVERT(varchar(23),[DataField3])),'')
+ ISNULL(RTRIM([DataField4]),'')
))

Mandatory fields required

All of your SQL tables in the HashByteConfigTable must have the fields as mentioned
  1. One Natural Key field           (external data)
  2. Data field                               (external data)
  3. [iHashByteChecksum] [binary](16) NOT NULL        (designed by the developer)
  4. [iSurrogateKey]            [Int]    NOT NULL                 (designed by the developer)
Other mandatory
Depending of the type of the SCD the fields must be added by the developer in this case I picked type2.
  1. [iStartDateKey]          --- SCD Type2.2
  2. [iEndDateKey]           --- SCD Type2.2
  3. [iVersion]                   --- SCD Type2.1
  4. [iCurrentFlag]             --- SCD Type2.3 
Please note that the SCD is not covered in this blog
 

The Result

The results can be seen in the in he next screenshot.


Now with the help of the "HashByte Creator Tool" and the HashByteConfigTable you can configure as much as SQL tables that you want to be HashByted.

all you need to do now is to ...
  1. looping through the HashByteConfigTable record by record 
  2. Call the function ufnCreateHashByteString on each loop
  3. Wrap the result in a HashByte format
  4. Execute the string
 Click here to download the code.

References:

https://www.mssqltips.com/sqlservertip/2543/using-hashbytes-to-track-and-store-historical-changes-for-sql-server-data/

https://www.mssqltips.com/sqlservertip/2543/using-hashbytes-to-track-and-store-historical-changes-for-sql-server-data/
 
http://www.sqlservercentral.com/articles/Data+Warehousing/69679/
 
https://blogs.technet.microsoft.com/srd/2013/11/12/security-advisory-2880823-recommendation-to-discontinue-use-of-sha-1/
 
http://www.mathstat.dal.ca/~selinger/md5collision/
 
http://www.links.org/?p=6