Conversion of BRIO to SSIS Made Easy
Prerequisite is that we must know how each BRIO query work and some knowledge of Practical implementation of things in SSIS. A BRIO query may be composed of a set of dependant or independant queries.
Let's summarize essential steps for migration:
1) Each BRIO query may consist of list of queries and each of those results may have dependencies for further stages. Study how they actually work.
a) What are the Sources used
b) How many are the Results output
c) How many are computed columns
d) what are the limits for the rows
e) On what basis to sort the rows
d) Whether they have joins ? If yes what type of join and so on.
2) Create all connections required for querying using Connection Managers in SSIS:
a) Make sure to use appropriate drivers. For instance, IBM Red Brick Driver is essentially an ODBC driver. Sql Server can use Native Client, If we use Oracle connection on 64 bit Box, MSDAORA works with RunTime64bit Property set to false. If we use ORACLE 64-bit Driver for OLEDB, then we must cast those columns in the query mentioning appropriate precisions of defined type.
3) Dealing with BRIO Computed Columns by using matching functionalities in SSIS or SSIS transformations on the fly:
a) Replacing BRIO functions - "Prior" , "Next", "Cume" and other cumulative functions to SSIS equivalent functionality (Note: Require extra attention)
Beware of these functions as they require traversing between prev and next rows or require row wise manipulation. Some of my colleagues were more comfortable using cursors after they found no better way of tracking previous or next rows for manipulation of the outgoing rows, as the usage of SSIS design time variables degrade performance. However we should keep in mind that we can arrive at the right output choosing the best among different ways (You might think I'm very conservative..). Well in our case we tracked that "Next" function was used to remove duplicates in a Column after sorting, for which luckily we had Sort transformation with delete duplicate option selected in SSIS. But in the other case we used Cume, Brio syntax: Computed_Col_C = Cume(A,B), Where A is by default 1. and B increments the cumulative value of A based on the repitition of Key column. Cursor was a bruteforce method to be used here since even newly available functionality like "CTEs using ROW_NUMBER OVER (ORDER BY ID)" was unable replace the equivalent functionality.
However its always better to avoid cursors (impact of study of Inside Sql Server 2005 Programming, Microsoft Press by Grant Fritchey) wherever applicable as they are I/O intensive and main cause for performance degrade as I've dicussed about this already while speaking about dealing with tempdb issues.
b) Solution for BRIO If Else, Nested If Else, Nvl and Nested Nvls computed Columns using Derived Column transformation in SSIS.
NVL Equivaent script in SSIS Derived Transform:
ISNULL(ColA)? (ISNULL(ColB)?ColC:ColB):ColA
A Nested if else equivalent in SSIS Script Component Transform, will be very much straight foward.
4) Limits in BRIO can be replaced by Conditional Split Transform, where the output row can be named based on the filtering condition and can be routed to further transformations or to Destination.
5) BRIO joins using source tables coming from different sources in SSIS involves using Sort transform for every query extract based on a key column and then use merge join selecting the nature of join appropriately. Note that while sorting you may be interest in removing duplicate key columns if appropriate.
6) Lookup or SCD transformations can be applied in some cases where we need to keep track of the changes in records or in case we require to track or re-direct the duplicate rows for further processing.
7) You can parallelize loads in situations where BRIO extracts data from different sources that are to be merged into a single dimension table. Do all the transformations necessary for each of the source query, finally adding the Merge Transformation that simply merge the results which you can further Process or send it to the destination.
Refer Screenshots of few SSIS Packages that are results of BRIO query conversion:
Fig 1 and 2, Showing how we can convert BRIO Logic to extract from different sources, clean, massage, sort, merge and join to load finally into a dimension.
Let's summarize essential steps for migration:
1) Each BRIO query may consist of list of queries and each of those results may have dependencies for further stages. Study how they actually work.
a) What are the Sources used
b) How many are the Results output
c) How many are computed columns
d) what are the limits for the rows
e) On what basis to sort the rows
d) Whether they have joins ? If yes what type of join and so on.
2) Create all connections required for querying using Connection Managers in SSIS:
a) Make sure to use appropriate drivers. For instance, IBM Red Brick Driver is essentially an ODBC driver. Sql Server can use Native Client, If we use Oracle connection on 64 bit Box, MSDAORA works with RunTime64bit Property set to false. If we use ORACLE 64-bit Driver for OLEDB, then we must cast those columns in the query mentioning appropriate precisions of defined type.
3) Dealing with BRIO Computed Columns by using matching functionalities in SSIS or SSIS transformations on the fly:
a) Replacing BRIO functions - "Prior" , "Next", "Cume" and other cumulative functions to SSIS equivalent functionality (Note: Require extra attention)
Beware of these functions as they require traversing between prev and next rows or require row wise manipulation. Some of my colleagues were more comfortable using cursors after they found no better way of tracking previous or next rows for manipulation of the outgoing rows, as the usage of SSIS design time variables degrade performance. However we should keep in mind that we can arrive at the right output choosing the best among different ways (You might think I'm very conservative..). Well in our case we tracked that "Next" function was used to remove duplicates in a Column after sorting, for which luckily we had Sort transformation with delete duplicate option selected in SSIS. But in the other case we used Cume, Brio syntax: Computed_Col_C = Cume(A,B), Where A is by default 1. and B increments the cumulative value of A based on the repitition of Key column. Cursor was a bruteforce method to be used here since even newly available functionality like "CTEs using ROW_NUMBER OVER (ORDER BY ID)" was unable replace the equivalent functionality.
However its always better to avoid cursors (impact of study of Inside Sql Server 2005 Programming, Microsoft Press by Grant Fritchey) wherever applicable as they are I/O intensive and main cause for performance degrade as I've dicussed about this already while speaking about dealing with tempdb issues.
b) Solution for BRIO If Else, Nested If Else, Nvl and Nested Nvls computed Columns using Derived Column transformation in SSIS.
NVL Equivaent script in SSIS Derived Transform:
ISNULL(ColA)? (ISNULL(ColB)?ColC:ColB):ColA
A Nested if else equivalent in SSIS Script Component Transform, will be very much straight foward.
4) Limits in BRIO can be replaced by Conditional Split Transform, where the output row can be named based on the filtering condition and can be routed to further transformations or to Destination.
5) BRIO joins using source tables coming from different sources in SSIS involves using Sort transform for every query extract based on a key column and then use merge join selecting the nature of join appropriately. Note that while sorting you may be interest in removing duplicate key columns if appropriate.
6) Lookup or SCD transformations can be applied in some cases where we need to keep track of the changes in records or in case we require to track or re-direct the duplicate rows for further processing.
7) You can parallelize loads in situations where BRIO extracts data from different sources that are to be merged into a single dimension table. Do all the transformations necessary for each of the source query, finally adding the Merge Transformation that simply merge the results which you can further Process or send it to the destination.
Refer Screenshots of few SSIS Packages that are results of BRIO query conversion:
Fig 1 and 2, Showing how we can convert BRIO Logic to extract from different sources, clean, massage, sort, merge and join to load finally into a dimension.
Comments
but what is BRIO? a list of queries?!!
How you did the borders of your figures?! :-)
Cheers!!
WISH YOU PROSPEROUS NEW YEAR AND MAY ALL YOUR DREAM COME TRUE..
Brio Intelligence is a very powerful tool for Sophisticated querying and Analysis. Usually it is comprized of queries that leverage Heterogeneous Enterprise Sources.
For applying borders We need to use Microsoft Outlook 2007 -> Under Format Section -> choose the above format.
Thanks for your Comments
Subhash Subramanyam
I wish a good year 2008 for you!!!
Visit my blog to share some news!
regards!!!
G'day, I would like to know if you know of any tool that reports fast. I have a huge set of data and I am not sure what should I use for ETL. I prefer open source.
Look forward to hear from you soon.
Best,
David
Unfortunately I can possibly help you answer queries on Microsoft Sql Server suite for ETL and Reporting as I have exposure working on them.
You may want to try this link: http://www.google.com/search?hl=en&rls=com.microsoft:en-us&ei=NGjfSbHFHMKrtgfmsLXwDA&sa=X&oi=spell&resnum