Unified Decomposition sounds a bit like an oxymoron. And sure enough the combining of unifying with the idea of breaking things into parts does seem innately contradictory. But upon closer inspection this idea makes a good deal of sense – especially for the field of data warehousing.
With an enterprise data warehouse (EDW), we want to break things out into component parts for reasons of flexibility, adaptability, agility, and generally to facilitate the capture of things that are either interpreted in different ways or changing independently of each other. At the same time a core premise of data warehousing is integration and moving to a common standard view of unified concepts. So we want to tie things together at the same time as we are breaking them out into parts.
If you ever worked with object oriented design, you are probably accustomed the idea of encapsulation. The idea of encapsulation is to bring together methods and data into the same object so that everything that deals with that object is contained within it. One of the advantages of this kind of design is the ability to take an object class from one area and place it in another area knowing that everything it needs to exist (keys and descriptive context) and to perform (behaviors) moved along with it. The object is self-contained.
Another way to look at this is to think about a “self-contained underwater breathing apparatus” or “SCUBA” for short. The idea is that everything you need to breathe underwater is contained in the same thing (apparatus). You don’t need hoses to feed you air from a boat above. Because the air, the tanks, the hoses, the mask, the regulator, and etc. are all contained in the same apparatus.
These concepts both deal with bringing component parts together to form a whole. This is the idea of unifying. That we encircle everything we need to define a concept and keep all of the component parts together in this circle somehow.
Breaking into Parts
The other part of unified decomposition is the idea of breaking things into component parts. The decomposition is in some ways the opposite of unifying. If we strive to keep things together, why then would we want to break them apart? One major reason in data warehousing is that things change. In fact things change all of the time. If there is one constant it is that things change. But not everything about a concept changes at the same time.
If the concept parts are all kept together (in the same table for example) then that would mean any change to any one component part would have an impact on the whole. If we want to limit the impact of the changes we need to isolate the part that is changing. In data modeling (especially for data warehousing) this theory is being deployed in many different forms. If we are designing a database that needs to integrate data and also needs to maintain history then the benefits of decomposing the core concepts is very compelling. This happens in Dimensional modeling with mini-dimensions and factless facts, it happens in Data Vault with hubs, links and satellites, but it also happens with other approaches such as Anchor Modeling, 2G and Focal Point. The common theme is data warehousing and the common thread is decomposition.
Putting it all Together
If all we did was to break things apart then we would be missing half the story. Much like the modem translates a digital signal into an analog signal (modulation) it is not of much use without taking that analog signal and translating it back to digital at the other end (demodulation).
Taking a core concept which is represented as an entity (physically a table) and breaking it into component parts (lower level tables, held together by a common key) is the “mo” (modulator) part of the modem. While this is great for data warehousing agility, it does not do much for the business users. People who are considering some form of table decomposition variation (Data Vault, Anchor, 2G, Focal Point) are often stumped when they think about how to get their business intelligence team to access this data for their reporting and analytics. Really the answer is simple. Don’t access it. We need the “dem” (demodulator) part of the modem – we need to first translate it back to digital – or in this case move it back to a combined form (entity). And this combined form is typically a Dimension in a Data Mart.
Think of it this way. In your data warehouse architecture, you take a concept which is represented as a combined form (entity) and you break it into parts (hub, link, satellite). You then put it back together into a combined form (dimension) and deliver it to your downstream users.
Because we need to put it back together before we deliver to the data marts, the factor of unifying is a critical feature of unified decomposition. That is to say that the modeling pattern that we deploy must somehow address the unifying at the same time as it is breaking things into parts. With data vault modeling decomposition is the breaking out to Hubs, Links and Satellites and the unified is accomplished through the direct connection between the Hub and the surrounding Satellites and Links.
© Copyright Hans Hultgren, 2012. All rights reserved. Unified Decomposition™