Sunday, March 13, 2011

The automation maker's nightmare: undocumented API changes.

One of my main job duties is taking the mundane and manual and figuring out a way to use technology to automate them. I've been doing this for over 10 years now, and I like to think I've gotten pretty good at it. The processes I've built (with the help of some amazingly intelligent business folk who understood the problems and were able to empower me to help them) have saved the company not just in dollars (which is easy to justify when you're liberating employees from the menial tasks of pure data-entry job responsibilities), but in reputation by driving inaccuracies out and efficiency in. It's not the case that what I write is flawless—I'm not that good, and I wouldn't trust anyone who claimed to be—but that flaws tend to show up in a way that is repeatable and (usually) correctable.

And so it goes...

One of the most powerful tools I have in my automation tool-set is Perl. The rich API that is available gives me the capability of adding lots of functionality and automation to the business environment. When it comes to getting external data into SQL databases, it's the first thing for which I reach.

I had a script that I wrote a few years ago, and while it worked well at getting records pushed into a table, it wasn't very fast. In short, I was doing a lot of work in the script that could have been done by the DB server if I'd just take the time to write a stored procedure for the "heavy lifting." But, because I was only parsing data on a handful of rows, and I was doing it no more often than every 10 minutes, it wasn't worth the effort.

That all changed a few days ago: I discovered that the script had been disabled for about 18 months, and all that data that should have been pushed into the target table had not. The data in question was the raw feed from an external data source; the script's job was to keep a rarely-used "raw data" table populated for the occasional special report. In our normal process, the raw data is transformed before being consumed by the business systems, and that hadn't been affected by the disabled script—which is why this issue hadn't come to light earlier.

But now I had one of those "occasions" where I needed the raw data for a special report, and 18 months of data was over 1 million rows to read and insert from thousands of text files.

Under the original design, here's what the script did:
  1. read a line of data
  2. parse the data into fields
  3. check the target table to see if the data existed
    1. if it didn't exist, insert it
    2. if it existed but was exactly identical, do nothing
    3. if it existed but was different, update the record
That logic worked, but it relied on 3 separate SQL statements ( a SELECT, an INSERT and an UPDATE), and two of them would necessarily run for each record being inserted. Updating the script to use a stored procedure (SP) to take care of the process would be easy; because I wanted to retain all the debugging and logging information in the original script, I would need to take some additional steps to make sure the new SP would return useful information back to the calling script. My DBMS allows the use of output parameters, so that wasn't an obstacle.

So I building the new SP. The target engine was Microsoft SQL 2005, so it looked a bit like this:

CREATE PROCEDURE dbo.usp_Archive_AddRec (
   @arg1 int, --unique identifier for the record
   @arg2 int, --additional record data
   @oldarg2 int=NULL OUTPUT --previous value of @arg2 if updated
)
AS 
SET NOCOUNT ON;
DECLARE @status int;
SET @status=0; -- returned value: 0=no changes, 1=record inserted, 2=record updated

DECLARE @oldbits TABLE (item int);

BEGIN TRY
   INSERT dbo.tbl_Archive (pk,item)
   VALUES (@arg1,@arg2);
   SET @status=1;
END TRY
BEGIN CATCH
   --most typical reason for the failure to insert
   --is an existing record; do an update instead
   BEGIN TRY
      UPDATE dbo.tbl_Archive
      SET item=@arg2
      OUTPUT DELETED.item INTO @oldbits --capture the previous value during update!
      WHERE
         pk=@arg1
         AND item != @arg2 --only update if the record is actually different
      ;
      IF @@rowcount > 0 BEGIN
         SELECT TOP 1 @oldarg2=item FROM @oldbits;
         SET @status=2;
      END;
   END TRY
   BEGIN CATCH
      SET @status=-1; --some unexpected SQL result/error
   END CATCH;
END CATCH
RETURN @status;

Well, that worked great in manual testing, so I went to update the perl code to accommodate the changes. My first roadblock was the inability to get ahold of the return value from the SP; the database driver will create an automatic parameter called @RETURNVALUE, but the DBI engine for perl doesn't have any documented way to get at it (at least, not when the SP isn't returning a dataset). I spent a while searching the Internet for a fix, but no avail. I'm sure it's possible, but in the interest of getting things done, I decided to punt and added the status as another OUTPUT argument.

I moved on to get the code running with the combination of input and output parameters, using the bind_param_inout form of parameter binding in order to get the return data.

It took a number of iterations to figure out the idiosyncrasies, but I finally had the script running on my sample dataset. At that point, I was able to turn it loose on the unprocessed data files from my original raw source; it was fantastic: The script chewed through 18 months of records in just over an hour, inserting over 1.1 million records. In comparison, the previous script ran through less than 5 months of files in around 10 hours.

Given the all-over success, I put the new script into the production automation process to take advantage of the additional performance (even on small datasets, improved efficiency is a good thing on a multiuser database system).

And immediately started getting syntax errors from the automation system.

What? I checked and rechecked, but everything was right. It tested perfectly on my development machine. I checked the versions of the Perl DBI modules: identical. I checked the SQL drivers: while they weren't identical, they were the "latest/greatest" for each platform. Okay, so that was a possible source; unfortunately, I could prove that it wasn't the problem because I could use the driver and a command-line utility and run the code without error. At that point, I knew the issue was within Perl, and a quick check showed that I was running 5.8 on my dev machine, but 5.10 on the automation host.

More troubleshooting, and I finally discovered the root cause through a Profiler run. On my dev machine running 5.8, the Perl statement I wrote and the statement that Profiler showed the DBMS receiving, they were identical:
exec dbo.usp_Archive_AddRec (
      @arg1=@p1,
      @arg2=@p2,
      @oldarg2=@p3 OUTPUT,
      @status=@p4 OUTPUT
   );
But Profiler showed me that the 5.10 production system was not sending the SQL as I wrote it; instead, it was doing some automatic edits:
exec dbo.usp_Archive_AddRec (
      @arg1=@p1,
      @arg2=@p2,
      @oldarg2=@p3 OUTPUT OUTPUT,
      @status=@p4 OUTPUT OUTPUT
   );
As near as I can tell, when the bind_param_inout DBI method is used in the 5.10 environment, it automatically updates the SQL query to convert the argument to an OUTPUT parameter. I spent a couple of hours searching for a switch or other argument to disable the behavior, but no luck. I was stuck with it.

My final solution now tests for the Perl version (using the built-in special variable $]) and adjusts the SQL statement accordingly; it now works without modification on every Perl environment in our datacenter.

No comments:

Post a Comment