Saturday, May 9, 2009

Don't forget the data!

Data migration is a nightmare. You never have enough time for it, and the source of the data always looks like this:



And I've got that expression on too. You're likely to encounter this situation at some point, unless you have the luck of writing a brand new system that just requires basic master data (where do I sign up?!).

I've been happily impressed though, with SQL Server Integration Services (SSIS). It's commonly used to do basic mapping and importing of data from Excel from within SQL Server, however using Business Intelligence Studio, you can create some powerful data migration packages by using a workflow of data transformation tasks. Very 5th generation, says a colleague. I'm not going to give a huge breakdown, as that is more effectively given by the SSIS project on CodePlex, but here's a basic example of a data flow task:



The greater Control Flow can contain many data flow tasks as well as other tasks (eg Bulk Inserts, Execute SQL, FTP Tasks, Sending Mail and many more). The data flow is probably the most complex, as this is where your specific data transformations occur. In the screenshot, you see that we read from a source destination (in this case an MS Access database), from which we do some data conversions, followed by a lookup of another field based on one in the source database (think of it like a VLOOKUP in Excel), creating a derived column, perhaps a calculation, and then inserting the transformed data into our SQL Server destination. Of course, you need to set up all of these by double clicking on the individual tasks and performing some MacGuyver tricks but you're hopefully getting the idea of how useful this tool is.


The key is that these packages are reusable. I've spent some many hours massaging the same data for different time periods in separate Excel spreadsheets. One of these packages could've got my public holidays back!

Testing the packages

While the proverbial jury may still be out on unit testing, I've personally found it invaluable in projects. Not as much during initial development, but its usefulness becomes reallyclear during support.

Now I've recently come across the ssisUnit project on CodePlex. This uses the familiar ~unit test format - setup, test, teardown (with setup and teardown used to create and remove test data, respectively) - using XML. I'm busy playing around with this and will post my thoughts up in the next week.


Now, providing a business case for the additional effort required to include any type of unit testing an a project is a whole other story...

No comments:

Post a Comment