Pages

Wednesday, December 28, 2011

The Art of Merging

A preface, I am going to try better to document all the issues I have while programming and my solutions. I've used the wonders of other people's documentation of their solutions to problems to help me fix my own, it's time that I gave back. In reality, no one is probably ever going to come to this blog to solve a programming problem, but I gotta do my part and at least put it out there.

No, I'm not talking about merging your vehicle onto the highway, although that's another post in itself about the proper technique. I'm talking about database merging or more specifically merging data between two tables.

The Problem

I am building an interface between my home grown software on MS SQL and an out of box application that runs on an Oracle database. That isn't particularly pertinent to this solution but it's a basis of why you might want to use this technique. In this interface there is a set of Assets in Oracle and for the ease of use, all those Assets and information are copied into my program in a similarly structured table. 15-20K records are used at one time. In total, several hundred thousand.

Table Equipment in MS SQL
ID
OracleID
SerialNumber
Description
ImportantValue
Active

Table Asset in Oracle
OracleID
SerialNumber
Description
ImportantValue
Active

The problem is that once we load all the values from Asset into Equipment how do we know what changes in Asset so that we can update the records in Equipment? If I were allowed to modify the database that Asset resides in, this would be an easy run time solution using triggers. Unfortunately I am barely allowed read access. That program is locked down tight. In some cases records are added to Asset and we need to add those to Equipment as well. Records will never be deleted from Asset only deactivated.

The Solution

My first thought was to use EXCEPT to find the differences between the table and then run a CURSOR (which is just a while loop in SQL) to do an update or insert on each of the returned records. I really really didn't want to do this because a CURSOR should really be a last resort, it is slow. If you have to traverse potentially tens of thousands of records, that's not the way to go. Not to mention the code is long, and while readable isn't particularly elegant.

After some research I discovered MERGE is a newly supported function in MS SQL Server 2008, although I got the vibe it existed in 2003 as well. What happened in 2005? I don't know. All I know is I could use it in my 2008 database and that makes me happy. Now I don't claim to know everything about the execution and how all these functions work, but I do know it was fast and it did what I wanted to do and I didn't need a cursor.

Here's the full microsoft explaination of MERGE 

This page gave a sample using static values to enter or update, but I modified my own query to use another query instead.


MERGE INTO Equipment AS Target
USING (SELECT OracleID, SerialNumber, Description, ImportantValue, Active
FROM Asset)
  AS Source (NewOracleID, NewSerialNumber, NewDescription, NewImportantValue, NewActive)
ON Target.OracleID = Source.NewOracleID
WHEN MATCHED THEN
UPDATE SET
                    SerialNumber=NewSerialNumber
                    Description=NewDescirption
                    ImportantValue=NewImportantValur
                    Active=NewActive
WHEN NOT MATCHED BY TARGET THEN
INSERT (OracleID, SerialNumber, Description, ImportantValue, Active) VALUES  
                    (NewOracleID, NewSerialNumber, NewDescription, NewImportantValue, NewActive)

And there you have it, a data merge between Asset and Equipment. This case will actually select ALL the rows in Asset and compare them all and perform updates on everything it matches. If you only wanted to update or insert the differences, then you would modify the SELECT statement after the USING clause to only return the affected rows. That can be done using the EXCEPT function.

For Example


SELECT * FROM (SELECT OracleID, SerialNumber, Description, ImportantValue, Active
FROM Asset
EXCEPT
SELECT OracleID, SerialNumber, Description, ImportantValue, Active
FROM Equipment) A


This will return any records changed or added to the Asset table. I found that the query time was acceptable using the MERGE without the EXCEPT so I left it as is.

Program On.

0 comments:

Post a Comment