There are far too many conflicting and confusing definitions of Data Mart and Data Warehouse floating around. The long running debate between Ralph Kimball and Bill Inmon, the two Titans of Data Warehousing, only adds to the confusion.

In this post, we’ll try to get some sanity around the concepts, without getting drawn (hopefully) into the crossfire.

A Data Mart is a specific, subject oriented, repository of data designed to answer specific questions for a specific set of users. So an organization could have multiple data marts serving the needs of marketing, sales, operations, collections, etc. A data mart usually is organized as one dimensional model as a star-schema (OLAP cube) made of a fact table and multiple dimension tables.

In contrast, a Data Warehouse (DW) is a single organizational repository of enterprise wide data across many or all subject areas. The Data Warehouse is the authoritative repository of all the fact and dimension data (that is also available in the data marts) at an atomic level.

Unfortunately, this is where consensus begins to break down into chaos. There are two broad schools of thought lead by Kimball and Inmon that disagree on the details.

Kimball School:
Ralph Kimball began with the Data Mart as a dimensional model for departmental data and viewed the Data Warehouse as the enterprise wide collection of Data Marts. This is the bottom-up approach. You may begin with the Sales Data Mart, after sometime you put in place the Ops Data Mart, and so on an so forth. If you want you could have even more specific Data Marts serving specific questions like customer Churn. If you take care of consistency of metadata (making sure each departmental Data Mart calls an Apple an Apple) and connectivity, you have a Data Warehouse. So the Data Warehouse is really a virtual collection of Data Marts collected together on a Data Warehouse Bus, and in that sense the data flows from multiple Marts into the Warehouse.

Inmon School:
Inmon’s approach is the exact opposite and avoids the problem of metadata consistency by looking at the Enterprise Data Warehouse as a single repository that feeds subject oriented Data Marts. You still have your Sales, Marketing, Ops and Churn Data Marts containing atomic or aggregated information, but they are based on the Data Warehouse and are really subsets of the data contained therein. This is the top-down approach.

Kimball’s approach is easier to implement as you are dealing with smaller subject areas to begin with, but the end result often has meta data inconsistencies and can be a nightmare to integrate. Inmon’s approach, on the other hand does not defer the integration and consistency issues, but takes far longer to implement (which makes it easier for the project to fail). Also, in my experience, organizations that are just starting to do analytics usually do not have the patience or commitment required for Inmon’s approach.

Any BI initiative is extremely iterative in nature. Unless you are confident that you would still have the CEO’s buy-in and a budget one year down the line, it might be better to begin with a Data Mart (to start delivering, and to manage expectations) keeping the meta data consistency requirements in mind, and then scale towards the Data Warehouse.

If you are interested in knowing more about the Great Debate, you can read an article by Katherine Drewek called “Data Warehousing: Similarities and Differences of Inmon and Kimball”.

P.S.: This is what I think, and I’ll be glad to hear of your experiences and views, specially if you have been a part of an Inmon style implementation.