I have more than 100 SQL tables that needs to have a "checksum" field something like HashByte MD5, the major problem is that the number of fields keep on changing on the fly and I need to fix my checksum field each time, I need a tool to handle the issues and I need to have a mechanism that can identify or apply the new HashByte changes.
Basic knowledge of HashByte
Hashbytes is a mathematical algorithm that comes in different variety that are known as… MD2, MD4, MD5, SHA, SHA1, SHA2_256, SHA2_512.
Click here for more info.
Where can HashByte be used?
Hashbyte is mainly used in 2 areas
1 - Encryption : When encrypting credit card information, the developer now can save the encrypted information in a SQL table.
2 - Checksum : When having 100s of fields in a table that is a SCD table, the 100s of fields can be replaced with one Hashbyte field.
First you must know your fields very well you must identity .... 1 - External fields (Data related): fields imported from the outside world into your database/table
Other data related fields
2 - Internal fields (Not Data related): This is the list of the fields that have been defined for your table(s)
[Surrogate Key] (Numeric)
Checksum field "[iHashByteChecksum] [binary](16) NOT NULL"
[FromDate], [ToDate] (SCD related)
[CreateDate], [UpdatedDate] , [UserName]
HashByte Creator Tool
I am providing few SQL scripts that you will have to read and run in sequence of the file number
The file "030 frmwrkDestinationTable.sql" will play a role of a dimension table or any table that you need to have a Hashbyte field on it
The file "040 HashByteConfigTable.sql" is part of the "HashByte Creator Tool" what it does it creates a configuration table, you must configure and set this table according to your sql tables that needs to be HashByted (in the BI world mainly the dimension).
Scenario: If a new field is added in the dimension table what happens? Answer: The function picks all the fields except, Natural Keys, Internal fields, and Excluded fields, in that case the new added fields will get picked up in the next run, check the WHERE clause of the SQL statement.
FROMINFORMATION_SCHEMA.COLUMNS WHERETABLE_SCHEMA=@SchemaName AND TABLE_NAME = @TableName ANDCOLUMN_NAMENOTLIKE'%checksum%' AND COLUMN_NAME NOTLIKE'%HashByte%' AND COLUMN_NAME NOTLIKE'i%' ANDCHARINDEX(COLUMN_NAME,@SurrogateKeys)<1 ANDCHARINDEX(COLUMN_NAME, @NaturalKeys)< 1 ANDCHARINDEX(COLUMN_NAME, @ExcludeFields)< 1
The "050 ufnCreateHashByteString.sql" file creates a string that can be used in a Hashbyte function that looks like
Assuming you have a BI environment up and running, your data scientist is dealing with all the corporate data and suddenly he comes up with an idea. He imports some external data into any tool that he is comfortable with (in this case, I picked Power BI) and studies the data.
Finally, a new self-service report is created that is handed over to the analyst/business team for feedback.
The data scientist asks the IT department to find an environment for the external data so that it can be shared within the team. The IT department copies the external data to an environment so that it can be accessed, but the IT department will not support this new entity, this is a one day job for the IT department.
Now the analyst/business team studies the data and provides feedback, documentation and most important part in the requirement for the IT department/designers.
At this point we have enough information/requirements to confirm the design.
1 Month (1 Sprint)
The IT department/designers now have the requirements and can start adding the new external data into the current corporate BI environment so that it follows all the BI standards. For example, the external data will be incrementally loaded, it will follow in the auditing process, data cleansing, framework reports, etc.
A good BI enterprise design must be done as though the application will be for sale. It should be able to accommodate any input data, or be as flexible as required for that particular BI business logic. The legacy data should depend on the Source DB because the Source DB has a closer design to the BI DW.
Generally you will need the following databases to design an optimized BI environment:
DL DW A Data Lake data warehouse is known as the Source DB, it's the landing area of the data. Some designers like to call it a flat database because it dose not have any Foreign Keys (FK) and the tables likely have no more than 2-3 indexes.
BI DW This is the location where part of the business logic is applied. Other things like SCD Type2 using HASHBYTE also happens in this database. You must know the business to create optimized tables with the right Primary Keys (PK) and indexes. One of the things that I generally do is to create all the FK constraints and make sure that they are all disabled.
DM A DM is a database but not a SQL database and it's the second location where business logic is applied. When I am evaluating a DM I look for:
1. How many measures are being designed. 2. Check the hierarchy and cross join of the hierarchies 3. Perspectives 4. Partitioning 5. Dynamic Processes 6. Documentation 7. See more
Config DB The configuration database controls the entire enterprise application. It contains lots of non-confidential information and other tools such as:
1. BI Incremental load mechanism 2. HASHBYTE Process 3. Internal/External logs 4. Auditing 5. Report generator tools 6. Multi-language support 7. Documentation 8. Date Dimension 9. SQL and DM Partition
Other Databases Depending on your companies security/design policies you might need more databases/environments for example you will need SSISDB for SSIS Catalog deployment, or you might need [msdb] database because you are trying to fire off a SQL job using a front-end tool.
The best practice is a combination of the old approach and the new BI Architecture.
It's always good to know the history of everything, including the history of BI, how it started, how it changed and transformed in to what it is today. Did you know that the word BI was born around the year 2005? Prior to that, it was know as Management Information System (MIS).
Back in the 90's, a copy of the day to day Transaction DB was taken. It was known as the "Sync DB" or "Reporting DB", all the reports were pointing to this database. There were many ways to achieve this, including SQL Replication, SQL Mirroring, Log Shipping and etc... The report was basically SSRS and Crystal Report.
Why was it designed like this? The Transaction DB was getting big and the number of transactions was growing. The reports were locking/blocking the transactions and making the system slow.
Circa the year 2000
The Sync DB was renamed to Golden DB and some basic logic was added. The Golden DB definition was never clear.
This was the start of the cube era; the Golden DB was designed as if it were a Transactional DB and not a BI DB.
Over the years I have been asked by many people how I design an enterprise level BI from scratch. They're looking to learn what approaches I take to make my design perform at a high level of accuracy and efficiency. To achieve these things, we need to look at many aspects of the overall design, including the framework in use and the set of tools that are available or we have to design.
In order to explain my process in depth and help you better understand the approach I take, I will create a series of blog posts to cover BI Design of the past and present, as well as the recommended approach and BI tools. I have also put together a PowerPoint presentation that I will update over time to include other topics covered in this blog series. You can access this PowerPoint presentation here: OnPrem Enterprise BI Design.pptx
Use PowerShell ISE (or other PowerShell tool) to open the PowerShell file.
Set the parameters in the PowerShell script to align with your environment (see Requirements).
This script can either display the information or populate the table created in the step above.
Run the PowerShell script (generally I use a *.bat file to load all of my PowerShell scripts).
Once the above steps have been completed, your SQL Server instance will have a table called [Partition] within your [ASTabularPropertyCollector] database containing the metadata information for your SQL tabular model.
Querying this table should provide results similar to those seen below.
This might sound strange, but I am writing about a subject that has been around since 2005. However, since then, the objects within SSIS 2016 have changed and I recently had a customer request that prompted me to document my findings.
I needed to add and configure some features in an SSIS package and I had old .NET code that did the magic for me in 2007. I used the same code for my SQL 2016 SSIS package but it didn't work. WHAT?!?!
Some of the changes that were required included the following:
OLE provider is now "Provider=SQLNCLI11.1"
LogProvider object in SQL 2016 is no longer 2, it's 5: "DTS.LogProviderSQLServer.5"
NOTE: As of when this blog post was written, this information could not be found on MSDN Online.
Benefits of having objects in the package created programmatically
The package framework allows for a standardized approach.
Achieve more consistency in your SSIS packages created through package framework.
Save time and be more accurate in designing the required changes.
Understand the inner layer of SSIS package design.
I am going to create or configure the below objects in a sample SSIS package
Add a Connection Object Dim