Wed 2 Nov 2005
So how is a data warehouse different from your regular database? After all, both are databases, and both have some tables containing data. If you look deeper, you’d find that both have indexes, keys, views, and the regular jing-bang. So is that ‘Data warehouse’ really different from the tables in you application? And if the two aren’t really different, maybe you can just run your queries and reports directly from your application databases!
Well, to be fair, that may be just what you are doing right now, running some EOD (end-of-day) reports as complex SQL queries and shipping them off to those who need them. And this scheme might just be serving you fine right now. Nothing wrong with that if it works for you.
But before you start patting yourself on the back for having avoided a data warehouse altogether, do spend a moment to understand the differences, and to appreciate the pros and cons of either approach.
The primary difference betwen you application database and a data warehouse is that while the former is designed (and optimized) to record , the latter has to be designed (and optimized) to respond to analysis questions that are critical for your business. Application databases are OLTP (On-Line Transaction Processing) systems where every transation has to be recorded, and super-fast at that. Consider the scenario where a bank ATM has disbursed cash to a customer but was unable to record this event in the bank records. If this started happening frequently, the bank wouldn’t stay in business for too long. So the banking system is designed to make sure that every trasaction gets recorded within the time you stand before the ATM machine. This system is write-optimized, and you shouldn’t crib if your analysis query (read operation) takes a lot of time on such a system.
A Data Warehouse (DW) on the other end, is a database (yes, you are right, it’s a database) that is designed for facilitating querying and analysis. Often designed as OLAP (On-Line Analytical Processing) systems, these databases contain read-only data that can be queried and analysed far more efficiently as compared to your regular OLTP application databases. In this sense an OLAP system is designed to be read-optimized.
Separation from your application database also ensures that your business intelligence solution is scalable (your bank and ATMs don’t go down just because the CFO asked for a report), better documented and managed (god help the novice who is given the application database diagrams and asked to locate the needle of data in the proverbial haystack of table proliferation), and can answer questions far more efficietly and frequently.
Creation of a DW leads to a direct increase in quality of analyses as the table structures are simpler (you keep only the needed information in simpler tables), standardized (well documented table structures), and denormalized (to reduce the linkages between tables and the corresponding complexity of queries). A DW drastically reduces the ‘cost-per-analysis’ and thus permits more analysis per FTE. Having a well-designed DW is the foundation successful BI/Analytics initiatives are built upon.
If you are still running your reports off the main application database, answer this simple question: Would the solution still work next year with 20% more customers, 50% more business, 70% more users, and 300% more reports? What about the year after next? If you are sure that your solution will run without any changes, great!! However, if you have already budgeted to buy new state-of-the-art hardware and 25 new Oracle licenses with those partition-options, and the 33 other cool-sounding features, you might consider calling up Oracle and letting them know. There’s a good chance they’d make you their brand ambassador.
It’s probably more sensible and simpler to create a new DW exclusively for your BI needs. And if you are cash strapped, you could easily do that at extremely low costs by using excellent open source databases like MySQL.
Tomorrow we’ll see what makes MySQL an obvious choice for data warehousing and analytics. We’ll also see how a DW is designed differently from your usual application database.
Keep watching…
November 24th, 2005 at 9:59 pm
I think this is great stuff especially for those who have yet not stepped onto the analytics bandwagon and ironically (though more importantly), also for those who have with a lot of fanfare.
November 25th, 2005 at 11:51 pm
*LOL* You are bang on the spot Nikhil about many of those who are doing analytics with a lot of fanfare!
November 14th, 2006 at 5:17 pm
This site is excellant
December 13th, 2006 at 3:53 am
Cool!
Many thanks for the very interesting article!
December 18th, 2006 at 11:31 pm
Thank you.I was actually preparing for a paper presentation and it helped me a lot in understanding the concepts deep hidden.
January 23rd, 2007 at 9:14 am
You delivered a very informative information. And I’m gona forward this to many of my friends across US.
January 23rd, 2007 at 12:52 pm
Thanks Vinodh, Anusha, Valentin and John,
I wrote this article as comin from an IT background I had a tough time figuring out the difference first. And there wasn’t much material on the net that I found useful (unfortunately even today a lot of articles from the industry leaders are less on substance and more on marketing hype). Personally I had to really look around and learn from veterans before I finally understood these simple but critical differences.
Thank you for taking the time for dropping your comments. Its readers like you that make this worthwhile.
February 8th, 2007 at 12:23 am
Wow - one of the most concise articles I’ve come across in 3+ years on this subject; also enables the audience to understand why a data warehouse may be value-added and/or cost beneficial (they’re commonly seen as simply a necessary overhead). Nice job and keep up the great work.
February 16th, 2007 at 2:40 pm
It s interesting for me to hear My Sql as a datawarehouse DB. Because only the word “datawarehouse” seemed to spent lots of money for me.
Thanks…
February 16th, 2007 at 8:48 pm
Hi
This is very useful to me because am preparing for paper presentation. From this i’m going to know the difference between application database and datawarehouses
March 1st, 2007 at 7:31 am
This was very informative article for a person like me, who does not have technology backgroud but keep hearing these terms in all company discussions. Great job Nishith. Thanks!!
April 3rd, 2007 at 3:26 pm
I really appreciate the info captured, it directly hits the query raised.
In my first visit only, the site has moved in my fav.
Thanks
May 8th, 2007 at 9:47 pm
This is good..! one can find it interesting, if they are really in need of such kind of information. Thanks Nishith and every one.
June 3rd, 2007 at 1:34 pm
Thank You Nishit
They seem same but, infact there is huge gulf between them.
I was surfing for the difference between these two technologies and finally I understood from your simple interpretation.
August 27th, 2007 at 12:19 pm
The major difference between data mart and data warehousing is that data mart is not full data i.e. it will be selection of data not completed data where data warehousing is replica of full database.
December 13th, 2007 at 2:36 am
Thanks for such a helpful article.
Where can I find the MySQL related article which you have mentioned?
December 16th, 2007 at 12:40 pm
In his post regarding a Data Mart, Raj brings up an important next question frequently asked after one understands what a data warehouse is.
The only piece I can add is that the “not full” data he mentions in a Data Mart is typically data specific to a department (sales, human resources, etc.) while a Data Warehouse combines data for the entire organization.
December 30th, 2007 at 8:33 pm
HI,
First thanks Nisthit . very informative article.
January 7th, 2008 at 4:46 am
Huge WOW from me, this article helped me finally to understand what I heard on a daily base in the last five years or so. Thank you so much for your informative post.
January 22nd, 2008 at 12:57 am
hey thanks a lot for this article.it helped a lot .what i was tring to understand for such a long time i understood it in 5 minutes. really helpful .
February 10th, 2008 at 10:23 am
Hi,
Thanks for such a informative article.. I being working on DWH migration could not articulate teh difference so beautifully..
It was really helpful..
March 26th, 2008 at 4:45 pm
This was a very good article.
Explained in very simple words which worked wonders.
Thanks for the insight of the OLTP and OLAP differences which really differentiate the analysis feasibility to a database system.
Please could you guide where I would find your next article on MySQL
April 5th, 2008 at 1:48 am
The main difference between DW and DB is the first is designed to answer questions that require to have history of data but the second as mentioned in others’ postings is for OLTP, in other words, with no or very limited history of data.
We do not try to have a DW because the business has grown. The data volume, number of users or reports has nothing to do with having a DW, but the type questions (historical). If we see a huge volume of data in DW compared to a DB within same organization is just because we want to keep the history of data (old data) in the DW for analysis purposes, rather than a DB is designed specifically for keeping only the most up-to-date (or very limited history) data.
In short, having the factor (dimension) of time in our design is the most important reason of having DW which enables us to answer those analytical questions about the past and hence forecasting the future (”what-if scenarios”).
Regards,
Babak
May 4th, 2008 at 8:06 pm
Hi, excellent article. Good to see that it’s still relevant although published 2.5 yrs ago. Thanks a lot. Has cleared up a lot for me. And it’s good to see i wasn’t the only one confused, even embarrassed at not understanding the difference.
May 22nd, 2008 at 4:13 pm
hi all;
This was really informative article, and this reply is to motivate for these kind of articles .
June 4th, 2008 at 7:02 pm
Yes, definitely a clear, concise explanation of the difference.
A point raised by Raj is that the data warehouse is a replica of the application database, this is not strictly true. The DW does not have to have an identical structure by any means, often this is categorically not the case. Many DW implementations collate multiple databases into a single DW. And from this DW, multiple ‘data marts’ - subsets of the DW - are then created for specific users/departments/domains/etc.
David
June 12th, 2008 at 10:57 am
Nishith should be thanked for publishing such an informative article on data warehousing.
I visited this site for the first time and am highly impressed by the information given here on various topics.
June 16th, 2008 at 11:34 pm
Gracias me a servido mucho este articulo
June 21st, 2008 at 4:03 pm
Dear Nishith & all,
I wonder why I didn’t discover this site before.
Very well explained in brief.
I am a novice. Will someone explain:
1. How data is transferred from DB to DW?
2. what are the factors to keep in mind while designing a DW?
3. How the analytical tools are run on the DW?
Ansuman
July 15th, 2008 at 12:27 pm
Good work Nishith. Learnt a lot. Keep adding.
RK
August 10th, 2008 at 6:38 pm
Hi Nishith and all others joining in here,
I recently found this excellent blog as I work on design of a new software product. I have some very basic questions about database design for my project.
I am more of a business guy than the technical expert, so bear with me . . .
We will be performing a variety of analytical reporting on a set of transaction data. Many of the metrics we need to generate can be based on aggregated data in a fact table, as is typical in the usual data mart design using star schema. But, there is also a need to drill down and build reports that expose/list the individual transactions in the original data set - sorted and filtered by time, geography, etc.
The size of the original data set is quite small in the context of typical data warehouse and BI applications. I only expect the number of records in the original data set to be in the range of 10,000 records per month.
So, the question is whether or not to use an aggregation approach in the data mart schema design. Maybe we should create a consolidated database of transactions as part of the database, as well as the fact tables in star schema with some fine level of aggregation grain.
Is this kind of question commonly asked? What do you recommend?
I look forward to your comments on this.
Thanks,
Jim
August 19th, 2008 at 4:38 pm
its a well explained and documnted article like a datawarehouse.
i was looking for exact topic for my exam preparation.
thanx for such a wonderful piece of article
August 21st, 2008 at 12:10 am
What do you think of the data warehouse database from Greenplum?
Here’s the link: http://www.greenplum.com/products/greenplum-database/
August 31st, 2008 at 4:48 pm
Thanks for the article. A very good explanation on the differences between the two.
Well done.
November 2nd, 2008 at 3:28 pm
Just found out about this site today. I am really impressed with the quality and simplicity of the explanation. Nice Work.
November 11th, 2008 at 5:46 pm
nishith can i have your email id plssss