Archive

Tag Archives: Link

Unit of Work Demystified

The Data Vault is about business key alignment, auditability, enterprise-wide data integration, effectively managing history, and unprecedented adaptability.  But in addition, the Data Vault is also about clarity, usability and simplicity.  The concept of Unit of Work (UOW) continues to be an elusive concept and most can agree that it is among the most difficult to understand.

Unit of Work is the label we attach to one of the factors we consider when we are designing our Links in the Data Vault. 

The analysis of the Unit of Work related to the design of our Links will typically impact the decisions we make concerning the number of FKs in a particular Link.  We may, for example, consider having two Links connecting three Hubs or we may have a three-way Link connecting all three. 

The Unit of Work is one factor used to help us represent in the Data Vault our understanding of the relationship between Hubs

So when you are working with Unit of Work analysis, you are really working with one of the factors for understanding the relationship between business keys (Links).  Just as the Hubs should be based on true Business Keys, all Links should focus on natural business relationships.   

In teaching Data Vault we have been using a set of factors to take into consideration (a correlated set of data, a way to keep grouped things together, establishing consistency between arriving data & data stored in and around Links, and grouped relationships consistent with enterprise wide business keys). 

When you are working on the analysis and design of Links, and contemplating the Unit of Work, you can consider the business defined correlations (correlated set of relationships).  For lack or a better way to say it, these are “relationships between relationships.”  

Relationship:  Customer is related to Motorcycle. 

Correlated set of Relationships.  Customer is related to Sale, Customer is related to Motorcycle, Customer is related to Employee, Sale is related to Motorcycle, and Sale is related to Employee.

We always have to deal with this situation when building Links – find the natural business relationships.  So here we proceed with a diligent design process and analyze the relationships.  Applying the factor of Unit of Work, we find that the Sale is the “correlating” factor for a set of relationships. 

Unit of Work Relationship Link.  A single Link combining Sale with Employee, Customer and Motorcycle. 

But what drives our design of Unit of Work?  In the case above it was the actual business process.  Where does the source data feed come in?   Assume for example that a commission processing system sends us a file with all Sales and the Employee on those Sales.  Another source (POS system) sends us a set of Sales with the Customers and Motorcycles.  Would our Unit of Work analysis of these source systems lead to a different design?   So how do we design our Links?

Let’s first step away from the trees and have a look at the forest… From the top of the page: “The Data Vault is about business key alignment, auditability, enterprise-wide data integration, effectively managing history, and unprecedented adaptability.  But in addition, the Data Vault is also about clarity, usability and simplicity.”    

So the guidelines for what should drive the Unit of Work can be found in the answers to these questions:

  • Are you building a single source data warehouse or an enterprise data warehouse?
  • Are you planning to maintain separate source silos or integrate around business key?
  • Are you designing source driven links or looking for natural business relationships?

Assuming you are working on an EDW (the right side of each of these questions above) then the only remaining issue is related to the question of how to maintain auditability and traceability when the sources do not align.  By asking this question, you have moved the conversation to another fun discussion – the Raw Data Vault.   

So going back to the question raised earlier, the EDW should be built on the enterprise wide business keys and natural business relationships – including the interpretation of Unit of Work at this level.  Frequently (most often) the source systems do not deliver data consistent with this view.  In these cases we need to take two steps into the data warehouse where the first landing point is the Raw and directly auditable/traceable layer.

This is a quick overview of the Data Vault Core Components

The data vault consists of three core components, the Hub, Link and Satellite.  While we are all discussing certain exceptions and details around Data Vault deployments, I thought it would be useful to “get back to basics” on the main building blocks.  In the end, committing to these constructs in the most pure form should be our main goal.

The Hub represents a Business Key and is established the first time a new instance of that business key is introduced to the EDW.  It may require a multiple part key to assure an enterprise wide unique key however the cardinality of the Hub must be 1:1 with the true business key.  The Hub contains no descriptive information and contains no FKs.  The Hub consists of the business key only, with a warehouse machine sequence id, a load date/time stamp and a record source.

A Link represents an association between business keys and is established the first time this new unique association is presented to the EDW.  It can represent an association between several Hubs and other Links.  It does maintain a 1:1 relationship with the business defined association between that set of keys.  Just like the Hub, it contains no descriptive information.  The Link consists of the sequence ids from the Hubs and Links that it is relating only, with a warehouse machine sequence id, a load date/time stamp and a record source.

The Satellite contains the descriptive information (context) for a business key.  There can be several Satellites used to describe a single business key (or association of keys) however a Satellite can only describe one key (Hub or a Link).  There is a good amount of flexibility afforded the modelers in how they design and build Satellites.  Common approaches include using the subject area, rate of change, source system, or type of data to split out context and design the Satellites. The Satellite is keyed by the sequence id from the Hub or Link to which it is attached plus the date/time stamp to form a two part key. 

Note that the Satellite then is the only construct that manages time slice data (data warehouse historical tracking of values over time). 

These three constructs are the building blocks for the DV EDW.  Together they can be used to represent all integrated data from the organization.  The Hubs are the business keys, the Links represent all relationships and the Satellites provide all the context and changes over time.