Big Data

Always noted as an exception structure, a transactional link is actually not a link. A table construct that contains a business key, context attributes that describe that key, and relationships in the form of foreign keys is simply not in vault form. This type of table construct would be a closer candidate for a third normal form (3NF) designation. Though that too might be a stretch.

A Transactional Link contains the relationship forming foreign keys (FKs) similar to that of a Link but also includes some set of context attributes (typically metrics, measures) and possibly also business keys from related Hubs as well as some form of transaction key (unique identifier for a specific transaction). It is built out of necessity when the volumes of data and/or burst rate is so high that the standard vaulted form constructs cannot manage (Telco CDRs or Financial Services stock trades for example).

While there are good reasons to deploy this form of construct, I think it is also important to recognize the impact of doing so. Why? Because otherwise we may incorporate it into our standard modeling pattern and make the exception a rule. And it would be odd and potentially risky to use an exception structure by default. But this is easy to fall into. Data vault means we have to think differently. So we can say that vaulting is unconventional and in the same way unvaulting is conventional (it is how we are accustomed to doing things).

How much does this matter? That depends on your specific situation. But it could have a big impact on your EDW. If you are vaulting for agility, for a lower TCO, for easy incremental builds, for ease in absorbing changes (new attributes, sources, rules, etc.) then what is the cost of unvaulting?

Unvaulting is the opposite of vaulting. Let me explain. With data vault we breakout the component parts of a core business concept into their own constructs. Whereas a 3NF Entity keeps together the business key, the context attributes, and the relationships (M:1 relationships) in a single table construct, the data vault modeling representation of the same concept includes separate table constructs for each component part.


Vaulting and Unvaulting

There would be a Hub (business key), Satellites (context attribute) and Links (relationships) all working together to represent the same concept. Breaking the concept into component parts is the process of vaulting. See the above diagram. Now if we take the component parts and put them back together into a single table construct, then we are in effect “unvaulting” that concept. Hence a transactional link is more of an unvaulted construct.

To answer whether this is good or bad is not possible without looking at each unique situation. It could be great and it could be not so great. The only thing we can do here is to help understand why this is an exception and also to make ourselves aware of the impact of what we are doing. Now to understand the impact of unvaulting, one of the first questions might be to ask why we were vaulting in the first place.

We vault to absorb changes without impacting the architecture. So, will the attributes defining a transaction be subject to change in the future?

With vaulting we track slowly changing dimensional attributes easily. So, will the data feeding your transactional link be subject to change? Can any of the context of the transaction be updated by subsequent feeds?

Relationships are split out so can be added or changed easily. So, will the foreign keys making up your transactional link be subject to change in the future?

Well if none of these are an issue, then it could very well be that a transactional link exception structure is the best choice for this case. But there is one more question. Is the volume or burst rate a problem that cannot be solved in any other way? Because if it can, then maybe it makes sense to stay closer to your design pattern.

In any event, there is no form of judgment here on the use of transactional links. Merely information to help us categorize them and to help us make informed decisions concerning our modeling patterns.