<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>The Hans Blog</title>
	<atom:link href="http://hanshultgren.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://hanshultgren.wordpress.com</link>
	<description>Data Warehousing, DW2.0, EDW, DWBI Data Vault, and Agility</description>
	<lastBuildDate>Tue, 21 Feb 2012 23:54:23 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='hanshultgren.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>The Hans Blog</title>
		<link>http://hanshultgren.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://hanshultgren.wordpress.com/osd.xml" title="The Hans Blog" />
	<atom:link rel='hub' href='http://hanshultgren.wordpress.com/?pushpress=hub'/>
		<item>
		<title>EDW: All Data is Unstructured</title>
		<link>http://hanshultgren.wordpress.com/2011/10/12/edw-all-data-is-unstructured/</link>
		<comments>http://hanshultgren.wordpress.com/2011/10/12/edw-all-data-is-unstructured/#comments</comments>
		<pubDate>Wed, 12 Oct 2011 15:03:59 +0000</pubDate>
		<dc:creator>hanshultgren</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://hanshultgren.wordpress.com/?p=49</guid>
		<description><![CDATA[EDW: All Data is Unstructured While I was preparing to speak at Bill Inmon’s Advanced Architecture conference the other day I had a couple of epiphanies or “sudden realizations of great truth”… two of them in fact. They both deal with enterprise data warehousing (EDW); meaning that they relate to real data warehousing including the [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hanshultgren.wordpress.com&amp;blog=13139046&amp;post=49&amp;subd=hanshultgren&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><strong>EDW: All Data is Unstructured</strong></p>
<p>While I was preparing to speak at Bill Inmon’s Advanced Architecture conference the other day I had a couple of epiphanies or “sudden realizations of great truth”… two of them in fact.</p>
<p>They both deal with enterprise data warehousing (EDW); meaning that they relate to real data warehousing including the classic characteristics of integrated, non-volatile, time-variant and etc.  So here is the first one:</p>
<p>1)      <strong>All data is Unstructured to an EDW</strong>.</p>
<p>To be more specfic, all of the data that we source into the EDW can be considered unstructured.  For all of you who just said “unstructured data is a misnomer, all data has some structure” – please use semi-structured or multi-structured instead.  For now let’s use the label of <strong><em>n</em>-structured</strong> for the superset of these categories.</p>
<p>The concept of “structure” in the world of data effectively translates to context, consistency and predictability.  A structured source of data has table definitions that contain attributes with field definitions and some representation of context.  In this way we have predictability (we know what to expect), there is consistency (all data for this source arrives in the same structures as defined), and we have some level of associated context (from the simple association of entity keys defined by their attributes to more comprehensive metadata and possibly domain values for validation). </p>
<p>The contemporary concepts of <strong><em>n</em>-structured</strong> data stem from the idea of working with data that somehow does not fit the above description of structured data.  This is to say that this broad category of data falls short somewhere among the concepts of context, consistency and predictability.  To carry this further, this data may not have table definitions with set attributes and field definitions.  We often don’t know what to expect, the data does not arrive consistently, and there is little to no associated context.  Examples include text blobs (contracts, emails, doctor notes, call logs, blogs, social media feeds, etc.), multi-media files (scans, images, videos, sound files, etc.), as well as key-value pair or name-value pair (KVP, NVP) data, XSD-free XML feeds, and other similar types.      </p>
<p>We recognize this type of data exists and that it should also be included in the scope of our data warehouse.  But the assertion here is that <strong>all data is Unstructured to an EDW.</strong></p>
<p>Consider that an EDW, by design, <span style="text-decoration:underline;">a) integrates data from multiple different sources</span>, and also <span style="text-decoration:underline;">b) maintans the history associated with this data</span>.  We know that the source systems do not share the same structures or context.  We also know that source systems will change over time.  So when the sources are contemplated together, and over time, they do not have context, consistency and predictability.  Since there are changes over time, all of the source data does not have consistent table definitions with set attributes and field definitions.  We don’t know what to expect over time, data does not arrive consistently, and there is little to no associated enterprsie-wide context. </p>
<p>So, <span style="text-decoration:underline;">from several disparate systems, and over time, data is not structured</span>.  Since the EDW integrates data from several disparate systems, and maintains history over time, the EDW sees all data as not structured.  In this regard, <strong>all data is Unstructured to an EDW.</strong></p>
<p><strong>2)      </strong><strong>With an EDW, data integration is impossible.</strong></p>
<p>One of the core concepts of data warehousing is data integration.  To this point nearly everyone in the industry will agree.  Data integration implies that we put data together with like data so that we can support a higher level, central view of the data.  But there is a problem.  To integrate data we need to integrate <em>around</em> something – some form of integration point.  With an EDW, the integration point is some form of central, enterprise-wide concept or key.  This enterprise-wide key should then represent the enterprise view of that concept or key.  In other words, the integration point is not source-system centric, is not department centric but rather is organization-wide centric. </p>
<p>These should then be consistent with the ongoing MDM initiatives, business glossary, and other data centralization initiatives.  The problem is that no such initiatives have been fully completed and adopted in any company.  In fact we can expect that true semantic integration at the organizational level will never happen. </p>
<p>So if we don’t have a defined integration point, we can’t integrate to it.  Which means that <strong>in an EDW, data integration is impossible</strong>. </p>
<p>Then why do we continue to try and integrate data in the EDW?  There are two answers to this.  First, we can do our best to integrate data around keys that have already been defined, and second, we can target a expanded concept of integration, alignment and reconciliation.  This second point implies that we <strong>integrate</strong> where possible, then <strong>align</strong> keys where they remain separated, and at the same time provide for the ability to <strong>reconcile </strong>these differences.</p>
<p>If we put off the data warehouse until all central meaning has been defined and adopted then we will never have a data warehouse.  So by adopting this concept of integrate / align / reconcile we can start now and be part of the process to move towards central context and meaning.</p>
<p>And I believe that this is our charge.  We <em>should</em> be part of the process to move towards an integrated and centralized view of an organizations data.  At the same time however we should recognize that the end goal is not to achieve fully integrated data.  But rather data that is integrated to the extent possible, aligned so that we can contrast and understand the differences, and reconciled so that we can meet the needs of both the departments and the enterprise from a trusted and auditable EDW.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hanshultgren.wordpress.com/49/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hanshultgren.wordpress.com/49/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hanshultgren.wordpress.com/49/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hanshultgren.wordpress.com/49/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hanshultgren.wordpress.com/49/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hanshultgren.wordpress.com/49/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hanshultgren.wordpress.com/49/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hanshultgren.wordpress.com/49/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hanshultgren.wordpress.com/49/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hanshultgren.wordpress.com/49/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hanshultgren.wordpress.com/49/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hanshultgren.wordpress.com/49/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hanshultgren.wordpress.com/49/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hanshultgren.wordpress.com/49/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hanshultgren.wordpress.com&amp;blog=13139046&amp;post=49&amp;subd=hanshultgren&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://hanshultgren.wordpress.com/2011/10/12/edw-all-data-is-unstructured/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/1ef977d486ff04cc299c505fa22283e9?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">hanshultgren</media:title>
		</media:content>
	</item>
		<item>
		<title>Unit of Work Demystified</title>
		<link>http://hanshultgren.wordpress.com/2011/05/04/unit-of-work-demystified/</link>
		<comments>http://hanshultgren.wordpress.com/2011/05/04/unit-of-work-demystified/#comments</comments>
		<pubDate>Wed, 04 May 2011 12:33:55 +0000</pubDate>
		<dc:creator>hanshultgren</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://hanshultgren.wordpress.com/?p=26</guid>
		<description><![CDATA[Unit of Work Demystified The Data Vault is about business key alignment, auditability, enterprise-wide data integration, effectively managing history, and unprecedented adaptability.  But in addition, the Data Vault is also about clarity, usability and simplicity.  The concept of Unit of Work (UOW) continues to be an elusive concept and most can agree that it is [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hanshultgren.wordpress.com&amp;blog=13139046&amp;post=26&amp;subd=hanshultgren&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><strong>Unit of Work Demystified </strong></p>
<p>The Data Vault is about business key alignment, auditability, enterprise-wide data integration, effectively managing history, and unprecedented adaptability.  But in addition, the Data Vault is also about clarity, usability and simplicity.  The concept of Unit of Work (UOW) continues to be an elusive concept and most can agree that it is among the most difficult to understand.</p>
<p>Unit of Work is the label we attach to one of the factors we consider when we are designing our Links in the Data Vault. </p>
<p>The analysis of the Unit of Work related to the design of our Links will typically impact the decisions we make concerning the number of FKs in a particular Link.  We may, for example, consider having two Links connecting three Hubs or we may have a three-way Link connecting all three. </p>
<p><strong><em>The Unit of Work is one factor used to help us represent in the Data Vault our understanding of the relationship between Hubs</em></strong>. </p>
<p>So when you are working with Unit of Work analysis, you are really working with one of the factors for understanding the relationship between business keys (Links).  Just as the Hubs should be based on true Business Keys, all Links should focus on natural business relationships.   </p>
<p>In teaching Data Vault we have been using a set of factors to take into consideration (a correlated set of data, a way to keep grouped things together, establishing consistency between arriving data &amp; data stored in and around Links, and grouped relationships consistent with enterprise wide business keys). </p>
<p>When you are working on the analysis and design of Links, and contemplating the Unit of Work, you can consider the business defined correlations (correlated set of relationships).  For lack or a better way to say it, these are “relationships between relationships.”  </p>
<p><strong>Relationship:</strong>  Customer is related to Motorcycle. </p>
<p><strong>Correlated set of Relationships</strong>.  Customer is related to Sale, Customer is related to Motorcycle, Customer is related to Employee, Sale is related to Motorcycle, and Sale is related to Employee.</p>
<p>We always have to deal with this situation when building Links – find the natural business relationships.  So here we proceed with a diligent design process and analyze the relationships.  Applying the factor of Unit of Work, we find that the Sale is the “correlating” factor for a set of relationships. </p>
<p><strong>Unit of Work Relationship Link</strong>.  A single Link combining Sale with Employee, Customer and Motorcycle. </p>
<p>But what drives our design of Unit of Work?  In the case above it was the actual business process.  Where does the source data feed come in?   Assume for example that a commission processing system sends us a file with all Sales and the Employee on those Sales.  Another source (POS system) sends us a set of Sales with the Customers and Motorcycles.  Would our Unit of Work analysis of these source systems lead to a different design?   So how do we design our Links?</p>
<p>Let’s first step away from the trees and have a look at the forest… From the top of the page: “The Data Vault is about business key alignment, auditability, enterprise-wide data integration, effectively managing history, and unprecedented adaptability.  But in addition, the Data Vault is also about clarity, usability and simplicity.”    </p>
<p>So the guidelines for what should drive the Unit of Work can be found in the answers to these questions:</p>
<ul>
<li>Are you building a single source data warehouse or an enterprise data warehouse?</li>
<li>Are you planning to maintain separate source silos or integrate around business key?</li>
<li>Are you designing source driven links or looking for natural business relationships?</li>
</ul>
<p>Assuming you are working on an EDW (the right side of each of these questions above) then the only remaining issue is related to the question of how to maintain auditability and traceability when the sources do not align.  By asking this question, you have moved the conversation to another fun discussion – the Raw Data Vault.   </p>
<p>So going back to the question raised earlier, the EDW should be built on the enterprise wide business keys and natural business relationships – including the interpretation of Unit of Work at this level.  Frequently (most often) the source systems do not deliver data consistent with this view.  In these cases we need to take two steps into the data warehouse where the first landing point is the Raw and directly auditable/traceable layer.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hanshultgren.wordpress.com/26/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hanshultgren.wordpress.com/26/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hanshultgren.wordpress.com/26/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hanshultgren.wordpress.com/26/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hanshultgren.wordpress.com/26/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hanshultgren.wordpress.com/26/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hanshultgren.wordpress.com/26/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hanshultgren.wordpress.com/26/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hanshultgren.wordpress.com/26/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hanshultgren.wordpress.com/26/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hanshultgren.wordpress.com/26/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hanshultgren.wordpress.com/26/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hanshultgren.wordpress.com/26/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hanshultgren.wordpress.com/26/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hanshultgren.wordpress.com&amp;blog=13139046&amp;post=26&amp;subd=hanshultgren&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://hanshultgren.wordpress.com/2011/05/04/unit-of-work-demystified/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/1ef977d486ff04cc299c505fa22283e9?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">hanshultgren</media:title>
		</media:content>
	</item>
		<item>
		<title>Data Vault Core Components</title>
		<link>http://hanshultgren.wordpress.com/2011/03/25/data-vault-core-components/</link>
		<comments>http://hanshultgren.wordpress.com/2011/03/25/data-vault-core-components/#comments</comments>
		<pubDate>Fri, 25 Mar 2011 14:53:59 +0000</pubDate>
		<dc:creator>hanshultgren</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://hanshultgren.wordpress.com/?p=18</guid>
		<description><![CDATA[This is a quick overview of the Data Vault Core Components The data vault consists of three core components, the Hub, Link and Satellite.  While we are all discussing certain exceptions and details around Data Vault deployments, I thought it would be useful to &#8220;get back to basics&#8221; on the main building blocks.  In the end, [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hanshultgren.wordpress.com&amp;blog=13139046&amp;post=18&amp;subd=hanshultgren&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><strong>This is a quick overview of the Data Vault Core Components</strong></p>
<p>The data vault consists of three core components, the Hub, Link and Satellite.  While we are all discussing certain exceptions and details around Data Vault deployments, I thought it would be useful to &#8220;get back to basics&#8221; on the main building blocks.  In the end, committing to these constructs in the most pure form should be our main goal.</p>
<p>The <strong>Hub</strong> represents a Business Key and is established the first time a new instance of that business key is introduced to the EDW.  It may require a multiple part key to assure an enterprise wide unique key however the cardinality of the Hub must be 1:1 with the true business key.  The Hub contains no descriptive information and contains no FKs.  The Hub consists of the business key only, with a warehouse machine sequence id, a load date/time stamp and a record source.</p>
<p><a href="http://hanshultgren.files.wordpress.com/2011/03/hub.jpg"><img class="size-full wp-image-19 aligncenter" title="hub" src="http://hanshultgren.files.wordpress.com/2011/03/hub.jpg?w=600" alt=""   /></a></p>
<p>A<strong> Link</strong> represents an association between business keys and is established the first time this new unique association is presented to the EDW.  It can represent an association between several Hubs and other Links.  It does maintain a 1:1 relationship with the business defined association between that set of keys.  Just like the Hub, it contains no descriptive information.  The Link consists of the sequence ids from the Hubs and Links that it is relating only, with a warehouse machine sequence id, a load date/time stamp and a record source.</p>
<p><a href="http://hanshultgren.files.wordpress.com/2011/03/link.jpg"><img class="aligncenter size-full wp-image-20" title="link" src="http://hanshultgren.files.wordpress.com/2011/03/link.jpg?w=600" alt=""   /></a></p>
<p>The<strong> </strong><strong>Satellite</strong> contains the descriptive information (context) for a business key.  There can be several Satellites used to describe a single business key (or association of keys) however a Satellite can only describe one key (Hub or a Link).  There is a good amount of flexibility afforded the modelers in how they design and build Satellites.  Common approaches include using the subject area, rate of change, source system, or type of data to split out context and design the Satellites. The Satellite is keyed by the sequence id from the Hub or Link to which it is attached plus the date/time stamp to form a two part key. </p>
<p>Note that the Satellite then is the only construct that manages time slice data (data warehouse historical tracking of values over time). </p>
<p><a href="http://hanshultgren.files.wordpress.com/2011/03/satellite.jpg"><img class="aligncenter size-full wp-image-21" title="Satellite" src="http://hanshultgren.files.wordpress.com/2011/03/satellite.jpg?w=600" alt=""   /></a></p>
<p>These three constructs are the building blocks for the DV EDW.  Together they can be used to represent all integrated data from the organization.  The Hubs are the business keys, the Links represent all relationships and the Satellites provide all the context and changes over time.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hanshultgren.wordpress.com/18/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hanshultgren.wordpress.com/18/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hanshultgren.wordpress.com/18/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hanshultgren.wordpress.com/18/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hanshultgren.wordpress.com/18/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hanshultgren.wordpress.com/18/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hanshultgren.wordpress.com/18/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hanshultgren.wordpress.com/18/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hanshultgren.wordpress.com/18/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hanshultgren.wordpress.com/18/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hanshultgren.wordpress.com/18/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hanshultgren.wordpress.com/18/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hanshultgren.wordpress.com/18/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hanshultgren.wordpress.com/18/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hanshultgren.wordpress.com&amp;blog=13139046&amp;post=18&amp;subd=hanshultgren&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://hanshultgren.wordpress.com/2011/03/25/data-vault-core-components/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/1ef977d486ff04cc299c505fa22283e9?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">hanshultgren</media:title>
		</media:content>

		<media:content url="http://hanshultgren.files.wordpress.com/2011/03/hub.jpg" medium="image">
			<media:title type="html">hub</media:title>
		</media:content>

		<media:content url="http://hanshultgren.files.wordpress.com/2011/03/link.jpg" medium="image">
			<media:title type="html">link</media:title>
		</media:content>

		<media:content url="http://hanshultgren.files.wordpress.com/2011/03/satellite.jpg" medium="image">
			<media:title type="html">Satellite</media:title>
		</media:content>
	</item>
		<item>
		<title>Data Vault layers &amp; the Raw Vault</title>
		<link>http://hanshultgren.wordpress.com/2011/02/20/data-vault-layers-the-raw-vault/</link>
		<comments>http://hanshultgren.wordpress.com/2011/02/20/data-vault-layers-the-raw-vault/#comments</comments>
		<pubDate>Sun, 20 Feb 2011 11:49:55 +0000</pubDate>
		<dc:creator>hanshultgren</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://hanshultgren.wordpress.com/?p=15</guid>
		<description><![CDATA[Concerning the various layers that we describe as part of the data warehouse architecture.  Basically, this is MDM for the business terms used in the data warehousing industry.  Ronald Damhof wrote some excellent posts on his blog http://tinyurl.com/6huc2ps and kicked off a much needed discussion on this topic.  The layers that we actually deploy in dv-based [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hanshultgren.wordpress.com&amp;blog=13139046&amp;post=15&amp;subd=hanshultgren&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Concerning the various layers that we describe as part of the data warehouse architecture.  Basically, this is MDM for the business terms used in the data warehousing industry. </p>
<p>Ronald Damhof wrote some excellent posts on his blog <a href="http://tinyurl.com/6huc2ps">http://tinyurl.com/6huc2ps</a> and kicked off a much needed discussion on this topic.  The layers that we actually deploy in dv-based data warehouses today &#8211; given a common categorization &#8211; probably don&#8217;t vary as much as we think.  In any case, they would probably all fit into a small set of valid variations. </p>
<p>The focus of this discussion has been the Raw Data Vault or Raw Vault.  And in fact the term has been used in connection with different meanings.  As Ronald points out, the Raw Vault from an &#8220;auto-generated&#8221; perspective is of limited value in the overall DV architecture as it is (just as the sources are) on the left side of the semantic gap.  The Data Vault is based on Business Keys (Hubs) and these are defined by the business.  The keys we derive from the sources are not these same keys. </p>
<p>The Raw Vault from a perspective of an &#8220;auditable, DV-based, Business Key aligned&#8221; perspective represents a layer that moves towards the semantic integration but does so only to the point that it can remain traceable back to the sources without the need for soft business rules.  Because the DV based data warehouse is charged with auditability diligence (an integrated mirror of the sources), this layer needs to be persisted before soft business rules are applied.</p>
<p>So the layers that we work with, now going back to Ronald&#8217;s blog, include (1) Staging &#8211; either persistent or not, (2) Data Vault, (3) Staging out/EDW+/CDW+, (4) datamarts.  Let&#8217;s look at each one a bit more closely:</p>
<p>(1) Staging.  Persisted or Not.  This layer is a copy of sources primarily utilized for supporting the process of moving data from various sources to the data warehouse.  This layer is 1:1 with the source systems, typically in the same format as the sources, has no rules applied, and is commonly not persisted.  Alias: System of Record, SoR, and Stage-in. </p>
<p>(2) Data Vault.  The core historized layer, aligned with business keys (to the extent possible), all in-scope data is loaded and persisted, auditability is maintained.  At the heart of all data warehousing is integration &#8211; and this layer integrated data from multiple sources around the enterprise-wide business keys.  Alias: EDW, CDW, Raw Vault, data warehouse.</p>
<p>(3) Staging out/EDW+/CDW+.  This layer represents the data following the application of the soft business rules that may be required for a) the alignment with the business keys, and b) for common transformations required by the enterprise.  This layer makes the final move to the enterprise-wide business view, including gold-record designations, business driven sub-typing, classifications, categorizations and alignment with reference models.  Alias: EDW, CDW, Business Data Vault (BDV), Business Data Warehouse (BDW), and Mart Stage.</p>
<p>(4) Data Marts.  These represent the presentation layer and are intended to be requirements-driven, scope specific, subsets of the data warehouse data that can be regenerated (so typically not persisted).  With the DV data warehouse approach, these are very flexible and NOT restricted by federated mart paradigms (limited to dimensional models, maintaining conformed dimensions, persistance, etc.).    While this layer tends to be mainly deployed using dimensional modeling, marts can also be flat files, xml, and other structures.  Alias: DM, Marts. </p>
<p>A quick look at the Alias possibilities and we can see that the terms are not universally applied.  However, the delineation of the core layers represents a common understanding.  Deployments that differ from this common view represent exceptions.  In most cases, these exceptions are valid and represent acceptable and encouraged data warehousing practices.  Of course they could also represent an issue that will cause problems in the long run.  But by having a standard understanding of the core layers, we will continue to have a reference point for our analysis and discussions.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hanshultgren.wordpress.com/15/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hanshultgren.wordpress.com/15/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hanshultgren.wordpress.com/15/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hanshultgren.wordpress.com/15/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hanshultgren.wordpress.com/15/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hanshultgren.wordpress.com/15/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hanshultgren.wordpress.com/15/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hanshultgren.wordpress.com/15/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hanshultgren.wordpress.com/15/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hanshultgren.wordpress.com/15/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hanshultgren.wordpress.com/15/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hanshultgren.wordpress.com/15/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hanshultgren.wordpress.com/15/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hanshultgren.wordpress.com/15/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hanshultgren.wordpress.com&amp;blog=13139046&amp;post=15&amp;subd=hanshultgren&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://hanshultgren.wordpress.com/2011/02/20/data-vault-layers-the-raw-vault/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/1ef977d486ff04cc299c505fa22283e9?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">hanshultgren</media:title>
		</media:content>
	</item>
		<item>
		<title>More on Data Vault and the new EDW</title>
		<link>http://hanshultgren.wordpress.com/2011/01/30/more-on-data-vault-and-the-new-edw/</link>
		<comments>http://hanshultgren.wordpress.com/2011/01/30/more-on-data-vault-and-the-new-edw/#comments</comments>
		<pubDate>Sun, 30 Jan 2011 23:53:17 +0000</pubDate>
		<dc:creator>hanshultgren</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://hanshultgren.wordpress.com/?p=12</guid>
		<description><![CDATA[Data warehousing has evolved.  Data warehousing requirements continue to expand as business needs evolve and new, smarter processes are developed.  Organizations are now generally more mature in their application of business intelligence and data warehousing (DWBI).   This can be attributed to both a higher level of awareness and  the expanding capabilities of the DWBI [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hanshultgren.wordpress.com&amp;blog=13139046&amp;post=12&amp;subd=hanshultgren&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Data warehousing has evolved.  Data warehousing requirements continue to expand as business needs evolve and new, smarter processes are developed.  Organizations are now generally more mature in their application of business intelligence and data warehousing (DWBI).   This can be attributed to both a higher level of awareness and  the expanding capabilities of the DWBI teams within organizations.</p>
<p>For those contemplating true enterprise data warehousing (EDW, versus an ODS or analytical application), the bar has been raised in a number of ways.  The new EDW needs to be fully auditable, maintaining a complete and source-accurate history of all data loaded.  At the same time, the new EDW needs to quickly and efficiently adapt to changes including new sources and new downstream requirements.  This agility is also paired with faster throughput as requirements are increasingly operational (low latency and near real time NRT).</p>
<p>As Bill Inmon has defined clearly in his DW2.0 framework, the new EDW needs also to accommodate unstructured data integration and the time relevancy of data.</p>
<p>Today we are seeing an increase in the adoption of Data Vault modeling.  This is happening around the world with large EDW projects.   Some of the core factors driving this increased rate of adoption are in fact key requirements of the new EDW.  And the Data Vault is perfectly suited to address these new and expanding requirements.</p>
<p>Data warehouse agility, for example, means that the EDW needs to be capable of quickly and efficiently adapting to changes.  So new sources and source attributes need to be absorbed into the EDW with minimal time and effort.   This concept of agility (which is well handled through the data vault&#8217;s separation of keys and context) is less of a one-off requirement and more of a new standard.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hanshultgren.wordpress.com/12/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hanshultgren.wordpress.com/12/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hanshultgren.wordpress.com/12/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hanshultgren.wordpress.com/12/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hanshultgren.wordpress.com/12/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hanshultgren.wordpress.com/12/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hanshultgren.wordpress.com/12/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hanshultgren.wordpress.com/12/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hanshultgren.wordpress.com/12/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hanshultgren.wordpress.com/12/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hanshultgren.wordpress.com/12/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hanshultgren.wordpress.com/12/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hanshultgren.wordpress.com/12/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hanshultgren.wordpress.com/12/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hanshultgren.wordpress.com&amp;blog=13139046&amp;post=12&amp;subd=hanshultgren&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://hanshultgren.wordpress.com/2011/01/30/more-on-data-vault-and-the-new-edw/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/1ef977d486ff04cc299c505fa22283e9?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">hanshultgren</media:title>
		</media:content>
	</item>
		<item>
		<title>A Quick Data Vault Quiz</title>
		<link>http://hanshultgren.wordpress.com/2010/04/14/a-quick-data-vault-quiz/</link>
		<comments>http://hanshultgren.wordpress.com/2010/04/14/a-quick-data-vault-quiz/#comments</comments>
		<pubDate>Wed, 14 Apr 2010 23:44:07 +0000</pubDate>
		<dc:creator>hanshultgren</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://hanshultgren.wordpress.com/?p=4</guid>
		<description><![CDATA[Please take the Data Vault Quiz here: http://tinyurl.com/yypj6rx<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hanshultgren.wordpress.com&amp;blog=13139046&amp;post=4&amp;subd=hanshultgren&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Please take the Data Vault Quiz here:</p>
<p><strong>http://tinyurl.com/yypj6rx</strong></p>
<p><strong><br />
</strong></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hanshultgren.wordpress.com/4/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hanshultgren.wordpress.com/4/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hanshultgren.wordpress.com/4/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hanshultgren.wordpress.com/4/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hanshultgren.wordpress.com/4/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hanshultgren.wordpress.com/4/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hanshultgren.wordpress.com/4/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hanshultgren.wordpress.com/4/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hanshultgren.wordpress.com/4/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hanshultgren.wordpress.com/4/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hanshultgren.wordpress.com/4/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hanshultgren.wordpress.com/4/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hanshultgren.wordpress.com/4/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hanshultgren.wordpress.com/4/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hanshultgren.wordpress.com&amp;blog=13139046&amp;post=4&amp;subd=hanshultgren&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://hanshultgren.wordpress.com/2010/04/14/a-quick-data-vault-quiz/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/1ef977d486ff04cc299c505fa22283e9?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">hanshultgren</media:title>
		</media:content>
	</item>
	</channel>
</rss>
