The table is your friend: universal patterns in data validation

Henrik Saterdag
3 min readDec 17, 2020
Icons made by Freepik from Flaticon (https://www.flaticon.com/authors/freepik)

When approaching the topic of data validation, it is helpful to get a few things straight before diving into the details. One of that fundamental things is: bring the data you want to validate into the right shape to get it validated with low amount of effort. And the right shape is always: a table. In many cases this is obvious: checking open purchase orders, account balances, the financial journal — it’s all table like data.

But the same is true also for other things you would want to check. It’s all tables. Multiple datasets with a common format — some attributes are key attributes and others are relevant for comparison. Expected state vs. actual state. If you look at it like that you will be able to validate everything: any data you select from any table, JSONs data, string data, table row counts, object consistency information, a tree, a squirrel. Give me something and I’ll make a table out of it.

If you achieve this, you could have 1000 different checks you want to perform. Still the evaluation will always consist of exactly one implementation, a one size fits all functionality. It can only do one thing — but this it will do with brute force at the speed of light: comparison of two sets of table-like data that represents the actual state and the expected state. If necessary, gigabytes of data.

At that point in time it should also be clear that you will have to have a proper tool in place which could run that algorithm. A tool that can access data in any format and do stuff that software does best: computing data. And I think if you work with SAP systems it should be a no-brainer that this tool is something that also runs on the SAP system (Excel, you’re out!). Good old ABAP (which has evolved quite a bit in the recent years). But we will come to that later.

After chewing the data, the evaluation will give you the following result:

a) Data where actual state and expected state is a perfect match. This is the part of the result that makes everybody happy.

b) Data which exists as expected but has differences in its values. Example: the open purchase order for a certain product exists, but it doesn’t contain the right delivery date. In more generic terms: there is a match for the key attributes (material number) but a difference in the non-key attribute (delivery date) in the expected data set and the actual data set.

c) Data which is expected but does not exist in the actual state. Example: the open purchase order for material number 123 does not exist. In more generic terms: there is no match for the key attributes (material number) for an expected data set in the actual data set.

d) Data which is not expected but exists in the actual state. Example: even though no open purchase order should have been created for material 456, it did happen. In more generic terms: there is no match for the key attributes (material number) for an actual data set in the expected data set.

Everything which shows up in a result set other than a) is an error.

Of course, what I have not mentioned yet at all: how to get the actual and (sometimes even more challenging) the expected data into a table format? What data to select and what steps to take come up with a reasonable basis to perform the comparison?

Well, we will come to that. Spoiler alert: at some point it is inevitable to bring your functional knowledge, because not everything is as generic as comparing to tables. But you may be surprised that it’s actually less functional knowledge than you had expected.

--

--

Henrik Saterdag

data guy, tech guy, married to ABAP (having an affair with HANA); 2000–2019: working at SAP, since 2020 working at Capgemini