Unit of Work Demystified
The Data Vault is about business key alignment, auditability, enterprise-wide data integration, effectively managing history, and unprecedented adaptability. But in addition, the Data Vault is also about clarity, usability and simplicity. The concept of Unit of Work (UOW) continues to be an elusive concept and most can agree that it is among the most difficult to understand.
Unit of Work is the label we attach to one of the factors we consider when we are designing our Links in the Data Vault.
The analysis of the Unit of Work related to the design of our Links will typically impact the decisions we make concerning the number of FKs in a particular Link. We may, for example, consider having two Links connecting three Hubs or we may have a three-way Link connecting all three.
The Unit of Work is one factor used to help us represent in the Data Vault our understanding of the relationship between Hubs.
So when you are working with Unit of Work analysis, you are really working with one of the factors for understanding the relationship between business keys (Links). Just as the Hubs should be based on true Business Keys, all Links should focus on natural business relationships.
In teaching Data Vault we have been using a set of factors to take into consideration (a correlated set of data, a way to keep grouped things together, establishing consistency between arriving data & data stored in and around Links, and grouped relationships consistent with enterprise wide business keys).
When you are working on the analysis and design of Links, and contemplating the Unit of Work, you can consider the business defined correlations (correlated set of relationships). For lack or a better way to say it, these are “relationships between relationships.”
Relationship: Customer is related to Motorcycle.
Correlated set of Relationships. Customer is related to Sale, Customer is related to Motorcycle, Customer is related to Employee, Sale is related to Motorcycle, and Sale is related to Employee.
We always have to deal with this situation when building Links – find the natural business relationships. So here we proceed with a diligent design process and analyze the relationships. Applying the factor of Unit of Work, we find that the Sale is the “correlating” factor for a set of relationships.
Unit of Work Relationship Link. A single Link combining Sale with Employee, Customer and Motorcycle.
But what drives our design of Unit of Work? In the case above it was the actual business process. Where does the source data feed come in? Assume for example that a commission processing system sends us a file with all Sales and the Employee on those Sales. Another source (POS system) sends us a set of Sales with the Customers and Motorcycles. Would our Unit of Work analysis of these source systems lead to a different design? So how do we design our Links?
Let’s first step away from the trees and have a look at the forest… From the top of the page: “The Data Vault is about business key alignment, auditability, enterprise-wide data integration, effectively managing history, and unprecedented adaptability. But in addition, the Data Vault is also about clarity, usability and simplicity.”
So the guidelines for what should drive the Unit of Work can be found in the answers to these questions:
- Are you building a single source data warehouse or an enterprise data warehouse?
- Are you planning to maintain separate source silos or integrate around business key?
- Are you designing source driven links or looking for natural business relationships?
Assuming you are working on an EDW (the right side of each of these questions above) then the only remaining issue is related to the question of how to maintain auditability and traceability when the sources do not align. By asking this question, you have moved the conversation to another fun discussion – the Raw Data Vault.
So going back to the question raised earlier, the EDW should be built on the enterprise wide business keys and natural business relationships – including the interpretation of Unit of Work at this level. Frequently (most often) the source systems do not deliver data consistent with this view. In these cases we need to take two steps into the data warehouse where the first landing point is the Raw and directly auditable/traceable layer.