From Excel into sequel server will then look at how to create a sorting transformation to sort rows and aggregate transformation to group data together we'll look at how to create row.
And percentage sampling transformations how to combine data from two or more different sources using a union all transformation and I'll explain about synchronous and asynchronous.
Transformations so let's get started before I begin showing you about transformations let's look at our basic example package we're going to take an Excel workbook of winners of The X.
Factor series you can see there's nine series and for each series there's a winner let's start on the finish date of the series and what gender they are and we're going to export that into a sequel.
Server table so that the end result looks like this we'll have the nine winners and if you look at the design of that table you can see I've made a couple of simplifications I've made both.
Use importing data from excel in a later tutorial in this series we'll look at how to relax those constraints and how to do data conversion so the data imports successfully into just a normal.
Var car field but that's for a later date so returning to the issue at hand I've got a basic package setup which consists of two control flow tasks the first one is a simple sequel command and.
What it does is delete any rose from the TBL series table we've got so that they can rien afresh if I double click on that and have a look you can see the command is running is delete from the.
Table the second control flow task is the data flow one which will do the courting so it imports the rose from the Excel workbook into the series table and if I double-click on now to have a look.
At it you can see it has a source which is a list of x-factor Series competitions and it has a destination which is a table of series in sequel server and if I now run this package you.
Posts Related:
Can see that it will successfully
Complete and if I then close that down or rather stop it running and go into sequel server if i refresh my table and go to ask you to believe that it has.Deleted the records and re-imported them successfully because that's actually what happened so now that I've established what my basic package is what I'm going to do is.
Go about adding a transformation into it the first transformation I'm going to add is a sort transform which will simply sort the data and I can find it in the common transformations it's.
Clearly not something I want to do very often because it's so much more easily done either in Excel or sequel server or another application but it's a good first transformation to show you I've.
Got my source I've got my destination my transformation needs to sit between the two so what I'm going to have to do is break the link between the two by deleting it and then add in my salt.
Transformation and reconnect up my pipe work so I'm going to drag the successful data from the source into the transform and I'm going to drag the successful data from the transform into the.
Destination so it's just like an intermediate stage which data will flow through I'm going to rename it as sort x-factor series and you can see at the moment there's a Red Cross because I.
Haven't specified what I'm sorting by so I can remedy that by double-clicking on the transformation which is usually the way to edit a task and I can specify what I want to sort by I'm going to sort.
Them by gender first and then within gender by the name of the winner winner now in the incredibly unlikely event there's two people of the same gender and the.
Name winner's name I'm then going to sub
Sort them by the series number and you can see the priority order one two three four those thought fields now the start date and the finish date I want to pass.Through my transformation if I had untie clothes what would happen is they wouldn't be available to store in my final sequel server table so I'm going to make sure that even though I'm not.
Sorted by those I include them as the output from the transformation so I can choose ok to confirm that all the red crosses disappear and I should be in a position now to run that and what should.
Happen is in my sequel server table will contain exactly the same records if I stop that and have a look at it but you can see they're sorted differently all the females come first followed by the.
Male's the females are sorted in alphabetical order as are the males as I said sorting is not something you commonly do in integration services particularly it's an example of.
Something called an asynchronous transformation which means it runs quite slowly more on that towards the end of this tutorial so now we've had a look at sorting let's go on to have a look at.
Another transformation aggregating before working on my aggregation I need to firstly get rid of some of my old bits of my data flow task so I'm going to get rid of the sorting transform and.
Also the destination so I've just got the Excel workbook source now what my aggregation is going to do is take the data and aggregate it according to whether people were male or female so.
The end result will be female and next to that I'll show the number of winners which is 3 and then male and next to that I'll show the number of winners which is 6 instead of counting you can.
Also sum average perform standard deviations and a number of other statistics as well now before we continue to add our aggregate transformation it's worth saying that.
This too is something you won't often do in integration services because it's normally much quicker to do it in either sequel Excel access Oracle whatever you may be working with.
But that said let's add an aggregation task in aggregation transformation and what I'll do is feed the output from the Excel workbook into the transformation and.
Then I'll rename it to say what it's going to do which is to count the series by gender there's a red cross next to it because at the moment it's not actually outputting anything so to remedy that I.
Can edit the task and I can say what I wanted to do firstly I'm going to group it by gender and secondly I'm going to count how many series there are for each gender and going relatively correctly.
Through this because most people will have seen this either in Excel access or sequel server you can see having grouped by gender I can perform a number of different statistics what I'm going to.
Do is to count how many series there are it also makes sense to change the output alias the name of this output column and I'm going to call it number of series because that's what it actually is so if.
I then choose okay my Red Cross will disappear and I could now run this package but it would be hard to tell what it was actually doing now I don't want to go to all the trouble of.
Creating a sequel server table just to hold the output from this so I'm going to do a common little cheat and add a union all transformation what this is usually used to do is to combine the.
Results of two different tables but it can also be used to mop up data as a final stage in your data flow so what I can do is connect the results of the aggregation straight into the Union all.
And in order to be able to see what's actually going on with this because the data will be lost as soon as a package finishes executing I'm going to add a data viewer so that as the data flows.
Out of the transformation of the aggregation and into the Union all I'll be able to monitor it to see whether it's working successfully so if I now try running that package what I hope.
We're going to see is the data viewer popping up containing what's actually going on and you can see there that at this stage you can see TIROS giving aggregation of.
The numbers series by gender so that's all working successfully so I can close that down and stop my package running so that's how aggravating what I'm now going to do is take a little look as a.
Less common transform which is row sampling and maybe percentage sampling as well now in order to be able to do this I'm going to have to get rid of my aggregate transformation so bye-bye to.
That and also my Union or one as well what row sampling and percentage sampling allow you to do is to take the full set of data and some either a given number rows or a given percentage of it.
So for example I could say show me three randomly chosen rows or I could say show me the 10% of the randomly chosen rows so let's do an example of each of those I'll start with showing three randomly.
Selected rows the reason you might want to do this is either to check the integrity of your data or to prepare a sample of data to feed into a data mining algorithm in analysis services so.
I'm going to add in my row sampling transformation and I'm going to change a name via to say whoops sample sample three rows and I'm going to feed the excel source into that and then.
Configure it to say what it's going to do so I'm going to say how many rows I'm going to choose and the answer 3 the random number seed at the bottom allows you as it says in this description of.