Hello and good day to everyone! Today I wanted to take the time to talk about the fully-blocking, asynchronous Sort Transform in SSIS. The purpose of this transformation is to ensure that your data is flowing through the constraints with a certain order determined by you. Here is what it looks like:
Why is this a problem you ask? What the transform is accomplishing is not bad but how it does this is the problem. Because this component is asynchronous it uses a new buffer for the data coming into it. This means that it will wait until all the data coming from the components preceding the Sort Transform and then release it, to move onto other parts of the Data Flow. So if you are pulling 1 million records from a SQL Server table and then have a Sort Transform, it will wait until all 1 million rows are present and then release them in batches (usually 10,000 rows) down the rest of the package, as shown below:
This only occurs with asynchronous components (yes, there are others!), normally the records would process through whatever transform in batches. Do not worry though, this is not a blog to merely gripe about blocking transforms, I will show you how to accomplish this same goal without using the Sort Transform. This solution should help out with performance tuning on packages of any size. Next comes the easy steps to get it done:
1. IMMEDIATELY DELETE THE SORT TRANSFORM!
2. Open the OLE DB source and switch the Data access mode to “SQL commad”:
3. Once this is done, you will need to use T-SQL to select which columns you would like from the appropriate table. The important item in this step is that you will want to setup an ORDER statement by the necessary column (the identity column most likely)
**Here is an example from the AdventureWorks 2012 Database using the Production.Product table**
4. Once this SQL statement is set the data is now sorted but we need to mark it as such for SSIS to recognize this sorting. To do this, right-click that same OLE DB source and select the Show Advanced Editor option
5. You will now be presented with a window that has four tabs and we will be selecting the Input and Output Properties tab. There should only be two option available to you and you will want to highlight the OLE DB Source Output choice and its properties will be populated in the window to the right. The property we are going to manipulate is the IsSorted option and you will want to set this option to True, this is how we tell SSIS about the sorting.
6. Now we need to indicate by which column we are sorting by. Inside the same window as the previous step expand the OLE DB Source Output item to reveal two options ; External Columns and Output Columns. Here we will be expanding the Output Columns options and be presented with the entire list of columns inside of our source. Simply locate the column that you selected in your ORDER BY clause and highlight it, its properties will appear in the window to the right. Locate the field labeled SortKeyPosition and input the value of 1. It should be known that there may be times when you ORDER BY more than one column. In this advanced editor you will just want to go to each column you Ordered by and place a value according to its priority with 1 being the highest moving downwards to 2 and so forth. So if the SQL statement had an order statement as follows: ORDER BY ProductID, SalesID. The ProductID would have a value of 1 placed in the SortKeyPosition field while the SalesID column would have the value of 2.
Once this is done you can hit okay and everything is set! Although this method is a bit longer than simply adding a component and checking a box this solution will provide a huge boost to package performance when dealing with larger data sets. If you are pulling down less the 100,000 rows you can really use either of the two options for sorting. I do hope that this walk-though was helpful and you see good performance gains for your SSIS packages!