SQL Data Side Inc.: 2017

Monday, 27 November 2017

List of BI Tools - On Premises Enterprise BI Architecture & Design from a bird's-eye view.

List of BI Tools

There are a variety of BI tools that you need to design yourself. These tools are mechanisms that are required in any Enterprise BI environment.

See the overall objective and complete agenda here: Overall Objective and Agenda

HashByte Creator Tool

Problem Statement

I have more than 100 SQL tables that needs to have a "checksum" field something like HashByte MD5, the major problem is that the number of fields keep on changing on the fly and I need to fix my checksum field each time, I need a tool to handle the issues and I need to have a mechanism that can identify or apply the new HashByte changes.

Basic knowledge of HashByte

Hashbytes is a mathematical algorithm that comes in different variety that are known as… MD2, MD4, MD5, SHA, SHA1, SHA2_256, SHA2_512.
Click here for more info.

Where can HashByte be used?

Hashbyte is mainly used in 2 areas

1 - Encryption : When encrypting credit card information, the developer now can save the encrypted information in a SQL table.

2 - Checksum : When having 100s of fields in a table that is a SCD table, the 100s of fields can be replaced with one Hashbyte field.

http://sqlbanana.blogspot.ca/2014/07/checksum-hashbytes-and-slowly-changing.html
https://en.wikipedia.org/wiki/Slowly_changing_dimension

Fields in a Table

First you must know your fields very well you must identity ....
1 - External fields (Data related): fields imported from the outside world into your database/table

[Natural Key(s)]
Other data related fields

2 - Internal fields (Not Data related): This is the list of the fields that have been defined for your table(s)

[Surrogate Key] (Numeric)
Checksum field "[iHashByteChecksum] [binary](16) NOT NULL"
[FromDate], [ToDate] (SCD related)
[BatchLogID], [RowNumber]
[CreateDate], [UpdatedDate] , [UserName]

HashByte Creator Tool

I am providing few SQL scripts that you will have to read and run in sequence of the file number

"010 CreateDB.sql"
"020 CreateSchema.sql"
"030 frmwrkDestinationTable.sql"
"040 HashByteConfigTable.sql"
"050 ufnCreateHashByteString.sql"
"060 Test.sql"

The file "030 frmwrkDestinationTable.sql" will play a role of a dimension table or any table that you need to have a Hashbyte field on it

The file "040 HashByteConfigTable.sql" is part of the "HashByte Creator Tool" what it does it creates a configuration table, you must configure and set this table according to your sql tables that needs to be HashByted (in the BI world mainly the dimension).

Scenario: If a new field is added in the dimension table what happens?
Answer: The function picks all the fields except, Natural Keys, Internal fields, and Excluded fields, in that case the new added fields will get picked up in the next run, check the WHERE clause of the SQL statement.

FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = @SchemaName
AND TABLE_NAME = @TableName
AND COLUMN_NAME NOT LIKE '%checksum%'
AND COLUMN_NAME NOT LIKE '%HashByte%'
AND COLUMN_NAME NOT LIKE 'i%'
AND CHARINDEX (COLUMN_NAME , @SurrogateKeys) < 1
AND CHARINDEX (COLUMN_NAME , @NaturalKeys) < 1
AND CHARINDEX (COLUMN_NAME , @ExcludeFields) < 1

The "050 ufnCreateHashByteString.sql" file creates a string that can be used in a Hashbyte function that looks like

ISNULL(RTRIM(CONVERT(varchar(11),[DataField2])),'')

+ ISNULL(RTRIM(CONVERT(varchar(23),[DataField3])),'')

+ ISNULL(RTRIM([DataField4]),'')

The "060 Test.sql" does the test and wraps the above string into a Hashbyte string like..

UPDATE [fw].[frmwrkDestinationTable]
SET [iHashByteChecksum] = convert(varbinary(16) , HASHBYTES('MD5',

ISNULL(RTRIM(CONVERT(varchar(11),[DataField2])),'')

+ ISNULL(RTRIM(CONVERT(varchar(23),[DataField3])),'')

+ ISNULL(RTRIM([DataField4]),'')

))

Mandatory fields required

All of your SQL tables in the HashByteConfigTable must have the fields as mentioned

One Natural Key field (external data)
Data field (external data)
[iHashByteChecksum] [binary](16) NOT NULL (designed by the developer)
[iSurrogateKey] [Int] NOT NULL (designed by the developer)

Other mandatory
Depending of the type of the SCD the fields must be added by the developer in this case I picked type2.

[iStartDateKey] --- SCD Type2.2
[iEndDateKey] --- SCD Type2.2
[iVersion] --- SCD Type2.1
[iCurrentFlag] --- SCD Type2.3

Please note that the SCD is not covered in this blog

The Result

The results can be seen in the in he next screenshot.

Now with the help of the "HashByte Creator Tool" and the HashByteConfigTable you can configure as much as SQL tables that you want to be HashByted.

all you need to do now is to ...

looping through the HashByteConfigTable record by record
Call the function ufnCreateHashByteString on each loop
Wrap the result in a HashByte format
Execute the string

Click here to download the code.

References:

https://www.mssqltips.com/sqlservertip/2543/using-hashbytes-to-track-and-store-historical-changes-for-sql-server-data/

https://www.mssqltips.com/sqlservertip/2543/using-hashbytes-to-track-and-store-historical-changes-for-sql-server-data/

http://www.sqlservercentral.com/articles/Data+Warehousing/69679/

https://blogs.technet.microsoft.com/srd/2013/11/12/security-advisory-2880823-recommendation-to-discontinue-use-of-sha-1/

http://www.mathstat.dal.ca/~selinger/md5collision/

http://www.links.org/?p=6

Friday, 27 October 2017

111 Approach - On Premises Enterprise BI Architecture & Design from a bird's-eye view.

111 Approach (One, One, One, Approach)

How fast can the data be provided and with what level of accuracy? What's the best approach?

The best approach is referred to as the 111 Approach. This approach accomplishes the following:

Minimum BI interference/dependency to get the ball rolling
Agile approach, a standard software engineering approach
Self-serve BI by the business, an accuracy in providing the requirement before going to the IT department/Designers.
Understanding the data in advance and getting feedback before design and production
Providing relational information within the requirement

See the overall objective and complete agenda here: Overall Objective and Agenda

1 Day

Assuming you have a BI environment up and running, your data scientist is dealing with all the corporate data and suddenly he comes up with an idea. He imports some external data into any tool that he is comfortable with (in this case, I picked Power BI) and studies the data.

Finally, a new self-service report is created that is handed over to the analyst/business team for feedback.

1 Week

The data scientist asks the IT department to find an environment for the external data so that it can be shared within the team. The IT department copies the external data to an environment so that it can be accessed, but the IT department will not support this new entity, this is a one day job for the IT department.

Now the analyst/business team studies the data and provides feedback, documentation and most important part in the requirement for the IT department/designers.

At this point we have enough information/requirements to confirm the design.

1 Month (1 Sprint)

The IT department/designers now have the requirements and can start adding the new external data into the current corporate BI environment so that it follows all the BI standards. For example, the external data will be incrementally loaded, it will follow in the auditing process, data cleansing, framework reports, etc.

Sunday, 15 October 2017

Today's BI Design - On Premises Enterprise BI Architecture & Design from a bird's-eye view.

Today's Enterprise BI Architecture

A good BI enterprise design must be done as though the application will be for sale. It should be able to accommodate any input data, or be as flexible as required for that particular BI business logic. The legacy data should depend on the Source DB because the Source DB has a closer design to the BI DW.

See the overall objective and complete agenda here: Overall Objective and Agenda

Generally you will need the following databases to design an optimized BI environment:

DL DW
A Data Lake data warehouse is known as the Source DB, it's the landing area of the data. Some designers like to call it a flat database because it dose not have any Foreign Keys (FK) and the tables likely have no more than 2-3 indexes.
BI DW
This is the location where part of the business logic is applied. Other things like SCD Type2 using HASHBYTE also happens in this database. You must know the business to create optimized tables with the right Primary Keys (PK) and indexes. One of the things that I generally do is to create all the FK constraints and make sure that they are all disabled.
DM
A DM is a database but not a SQL database and it's the second location where business logic is applied. When I am evaluating a DM I look for:

1. How many measures are being designed.
2. Check the hierarchy and cross join of the hierarchies
3. Perspectives
4. Partitioning
5. Dynamic Processes
6. Documentation
7. See more
Config DB
The configuration database controls the entire enterprise application. It contains lots of non-confidential information and other tools such as:

1. BI Incremental load mechanism
2. HASHBYTE Process
3. Internal/External logs
4. Auditing
5. Report generator tools
6. Multi-language support
7. Documentation
8. Date Dimension
9. SQL and DM Partition
Other Databases
Depending on your companies security/design policies you might need more databases/environments for example you will need SSISDB for SSIS Catalog deployment, or you might need [msdb] database because you are trying to fire off a SQL job using a front-end tool.

The best practice is a combination of the old approach and the new BI Architecture.

Friday, 6 October 2017

Legacy BI Design - On Premises Enterprise BI Architecture & Design from a bird's-eye view.

Objective

It's always good to know the history of everything, including the history of BI, how it started, how it changed and transformed in to what it is today. Did you know that the word BI was born around the year 2005? Prior to that, it was know as Management Information System (MIS).

See the overall objective and complete agenda here: Overall Objective and Agenda

In the Late 90's....

Back in the 90's, a copy of the day to day Transaction DB was taken. It was known as the "Sync DB" or "Reporting DB", all the reports were pointing to this database. There were many ways to achieve this, including SQL Replication, SQL Mirroring, Log Shipping and etc... The report was basically SSRS and Crystal Report.

Why was it designed like this? The Transaction DB was getting big and the number of transactions was growing. The reports were locking/blocking the transactions and making the system slow.

Circa the year 2000

The Sync DB was renamed to Golden DB and some basic logic was added. The Golden DB definition was never clear.

2005-2015ish

This was the start of the cube era; the Golden DB was designed as if it were a Transactional DB and not a BI DB.

Thursday, 5 October 2017

On Premises Enterprise BI Architecture & Design from a bird's-eye view.

Objective

Over the years I have been asked by many people how I design an enterprise level BI from scratch. They're looking to learn what approaches I take to make my design perform at a high level of accuracy and efficiency. To achieve these things, we need to look at many aspects of the overall design, including the framework in use and the set of tools that are available or we have to design.

In order to explain my process in depth and help you better understand the approach I take, I will create a series of blog posts to cover BI Design of the past and present, as well as the recommended approach and BI tools. I have also put together a PowerPoint presentation that I will update over time to include other topics covered in this blog series. You can access this PowerPoint presentation here: OnPrem Enterprise BI Design.pptx

Agenda

Legacy BI Designs
Today's BI Design
List of BI Tools
111 Approach
Reporting Frameworks

Saturday, 20 May 2017

Partition - Document and Track changes in your Tabular Model

Objective

Extract the metadata from a tabular model using a PowerShell script and SQL Server 2016. In this section, we are going to extract the list of Tabular partitions and their corresponding metadata.

Requirements

Please see the requirements listed here: Document and Track changes in your Tabular Model

Steps

If you do not want to store metadata in a separate database, please proceed to step 4.

Download the script file: 0500 Partition.sql
Run the script in SSMS.
Download the PowerShell script: 500--Partition.ps1.
Use PowerShell ISE (or other PowerShell tool) to open the PowerShell file.
Set the parameters in the PowerShell script to align with your environment (see Requirements).

This script can either display the information or populate the table created in the step above.

Run the PowerShell script (generally I use a *.bat file to load all of my PowerShell scripts).

Conclusion

Once the above steps have been completed, your SQL Server instance will have a table called [Partition] within your [ASTabularPropertyCollector] database containing the metadata information for your SQL tabular model.

Querying this table should provide results similar to those seen below.

Thursday, 4 May 2017

Creating SSIS Package Components and Features Programmatically using .NET

Problem Statement

This might sound strange, but I am writing about a subject that has been around since 2005. However, since then, the objects within SSIS 2016 have changed and I recently had a customer request that prompted me to document my findings.

I needed to add and configure some features in an SSIS package and I had old .NET code that did the magic for me in 2007. I used the same code for my SQL 2016 SSIS package but it didn't work. WHAT?!?!

Solution

Some of the changes that were required included the following:

OLE provider is now "Provider=SQLNCLI11.1"
LogProvider object in SQL 2016 is no longer 2, it's 5: "DTS.LogProviderSQLServer.5"

NOTE: As of when this blog post was written, this information could not be found on MSDN Online.

And more...

Benefits of having objects in the package created programmatically

The package framework allows for a standardized approach.
Achieve more consistency in your SSIS packages created through package framework.
Save time and be more accurate in designing the required changes.
Understand the inner layer of SSIS package design.

Solution Breakdown

I am going to create or configure the below objects in a sample SSIS package

Add a Connection Object
Dim conMgrDest As ConnectionManager = Pkg.Connections.Add("OLEDB")
Add Event Handlers
Pkg.EventHandlers.Add("OnError")
Set the SSIS Logging
Dim logProv As LogProvider = Pkg.LogProviders.Add("DTS.LogProviderSQLServer.5")

Pkg.LoggingOptions.SelectedLogProviders.Add(logProv)

Pkg.LoggingMode = DTSLoggingMode.Enabled

Pkg.LoggingOptions.EventFilter = arrayEventFilter

Steps

Download the Sample Solution
I have created 4 empty SSIS packages that are in the "SampleEmptySSISPackages" folder, in the case of my customer I had 300 packages.
Copy all packages into the "ToBeProcessed" folder.
Open the solution, "DotNetCodeOnSSISPkgs.sln".
Open "pkgFwDotNetCodeOnSSISPkgs.dtsx" and go to Parameters.
Make sure that the "uPara_FolderPath" is pointing to the location of your "ToBeProcessed" folder.
To create the Connection Manager, you will need to set the "SSISLogServer" and the "SSISLogDB".
Run the SSIS package

Validating Results

To check the results, open the packages in the folder "ToBeProcessed" in Visual studio. You will see the changes mentioned below:

New Connection Object "conn_OLEDB_SSISLoggingConnection" is added
The Event Handlers: OnError, OnPostExecute and OnPreExecute events have been added at the package level.
In the menu SSIS --> Logging, you will see the following items:

Containers: Package is checked
Providers and logs: new name has been added, "My SSIS Log Provider for SQL Server", and the same connection object has been used.
Details: OnError, OnExecuteStatusChanged, OnPostExecute, OnPreExecute are all checked

NOTE: To do the test again, you must copy the sample files from the "SampleEmptySSISPackages" folder into the "ToBeProcessed" folder and rerun the "pkgFwDotNetCodeOnSSISPkgs.dtsx" package.

References

MSDN LogProviders.Add Method
Adding Connections Programmatically