Data Conversion Testing

Most projects involve some kind of data conversion from a legacy system. Data Conversion Testing it is boring as hell and very easy. Here’s how to do it.

Table of Contents

Fundamentally, data conversion or integration testing is no different from any other types of testing. You should have some sort of requirement spec and as normal you should write your tests off that. The main difference between the way data conversion testing is done and the way functional testing is done is that for the former you should (for the most part) use SQL queries, for the latter you should use the GUI. Therefore the testers that will be writing/performing the data conversion testing need to know SQL. You can get by without knowing the syntax for the INSERT, UPDATE etc queries but knowledge of the SELECT query and everything that can be used with it is pretty much essential. A fine memory-jogger is the SQL Pocket Guide from O’Reilly. The tester will also need an even higher-than-normal boredom threshold level than is usually needed for functional testing. Data conversion testing is *really, really* boring.

So, now on to the different levels of Data Conversion Testing. There are basically 2 levels at which the testing is done: technical testing and business “warm-and-fuzzy” testing. The technical testing will verify the conversion against the specs the business testing will give business representatives a warm and fuzzy feeling inside – ie: confidence that when they lay their old system to rest, all their precious data has been copied across correctly into the new system.

Technical Testing

The starting point of technical testing is a Data Conversion Specification. It should detail the various processes that will be used to convert the data.
It should include statements like: “Copy all [specific object type to convert] records from the [legacy system] with the following criteria: (1) only [field1] = 1 or 2 or 3 but not 4, (2)”, and “for every [insert converted object] create a new record in the [something] table with the fields populated with the following data: [field1] = [value1], [field2] = [value2], [field3] = [value3]“.

Like any other testing, you should begin by establishing test traceability. At least one test should be written against each statement like the above ones. By doing this, you can ensure that the tests will cover all data that is to be converted.

Next: how to write the tests. Just like any other testing, there are actions and there are expected results when writing data conversion tests. The action is usually something like “Execute the following query and record the result”, which is followed by the query that is to be executed to test the condition. The Expected Result should contain something like “Zero rows are returned.” You could also include a phrase such as: “Any rows that are returned are [specific object] IDs that are missing the expected [object property]” – defect diagnosis is made quite a bit easier.

So… the queries to be executed. There are many ways that you could write queries to test that a particular record has been correctly converted. I’ll concentrate on the 2. They are the most commonly used, and you get the biggest bang for your buck with them. They are:

Row Counts
Identify Objects with Missing Data

Row Counts
These are easy. In their simplest format, they go something like this:
select count(*) from [a table]

These can be performed on the source and target tables to compare the number of records. If the conversion is a straight conversion, the above query will do (well, for checking numbers of rows anyway). If only records with particular parameters are being converted, the way to check that the number of rows in the target table is correct is to run the following queries:

Run against Source Table

select count(*) from [source table]
where
[field1] = [some condition] AND
[field2] = [some other condition]

Run against Target Table

select count(*) from [target table]

That will give you the number of rows in the subset of the source table that are to be converted and the total number of rows in the target table. The test passes if the numbers match.

Identify Objects with Missing Data
These queries are more complex than row counts, but they provide a fair bit more value. Their purpose is to identify parent objects whose child objects that should have been converted are missing. In the example spec above, we had the following:
“for every [insert converted object] create a new record in the [something] table with the fields populated with the following data: [field1] = [value1], [field2] = [value2], [field3] = [value3]“

To test that each parent object has their child object, write the SQL query in the following stages:

1) Write a query that will pull the foreign key that goes back to the parent table out of the child objects table, eg:

select parent_id from childObjectTable
where
[field1] = [value1] AND
[field2] = [value2] AND
[field3] = [value3]

This will give you the all of the correctly converted child objects.

2) Write a query against the parent table that pulls out all of the parent_id’s that should have a child object in the childObjectTable, eg:

select id from parentTable
where
[fieldX] = [some value]

3) Now combine the above 2 queries so that the new query only displays the parent_id’s that don’t exist in the first query, ie:

select id from parentTable
where
[fieldX] = [some value] and
id not in (
select parent_id from childObjectTable
where
[field1] = [value1] AND
[field2] = [value2] AND
[field3] = [value3]
)

The end result of this is to list the entire parent id’s that are missing a correctly converted/generated child object in the childObjectTable.

The cool thing about this style of the query is that if it comes back with 0 rows in the result set, the test has passed. Little to no paperwork for anyone involved. If rows *were* returned in the result set, you’ve identified the bad rows and can immediately investigate them, or give the list to the data conversion developer for them to sort out. No fuss.

Now, of course, you’ll have more than just the above 2 styles of queries, but you’ll probably find that you can write the vast majority of the tests in that style.

Business Testing

As part of the development process, business representatives should have written a checklist of things that they’d like to check. Rarely are they in more detail than the technical testing. I’ve usually been handed a sheet of paper with 10 bullet points on saying things like: “Show that the account for customer ABC has got the same balance on both systems”. An interesting thing to keep your eye out for is statements like: “Show that we can still run batch job XYZ against converted data”. This is *not* data conversion testing, it is functional testing being done against converted data. Although this needs to be tested, it should be done as part of functional testing, not data conversion testing.
Anyway, the business tests, once identified can be done either by using SQL queries or by running GUI tests. I’d recommend using the GUI. Business is much happier that way and will have a lot more confidence in the new system and its new set of data. So, write out a test in the style of a functional test that includes navigation through the system etc… that will show the set of data in both the legacy and the new system. The business representative will know what they’re looking for in the old system and by following the actions in the test, you can demonstrate that the data they are interested in has been correctly converted.

Also, check:

Data Conversion Testing