Wed 2 Nov 2005
So how is a data warehouse different from your regular database? After all, both are databases, and both have some tables containing data. If you look deeper, you’d find that both have indexes, keys, views, and the regular jing-bang. So is that ‘Data warehouse’ really different from the tables in you application? And if the two aren’t really different, maybe you can just run your queries and reports directly from your application databases!
Well, to be fair, that may be just what you are doing right now, running some EOD (end-of-day) reports as complex SQL queries and shipping them off to those who need them. And this scheme might just be serving you fine right now. Nothing wrong with that if it works for you.
But before you start patting yourself on the back for having avoided a data warehouse altogether, do spend a moment to understand the differences, and to appreciate the pros and cons of either approach.
The primary difference betwen you application database and a data warehouse is that while the former is designed (and optimized) to record , the latter has to be designed (and optimized) to respond to analysis questions that are critical for your business. Application databases are OLTP (On-Line Transaction Processing) systems where every transation has to be recorded, and super-fast at that. Consider the scenario where a bank ATM has disbursed cash to a customer but was unable to record this event in the bank records. If this started happening frequently, the bank wouldn’t stay in business for too long. So the banking system is designed to make sure that every trasaction gets recorded within the time you stand before the ATM machine. This system is write-optimized, and you shouldn’t crib if your analysis query (read operation) takes a lot of time on such a system.
A Data Warehouse (DW) on the other end, is a database (yes, you are right, it’s a database) that is designed for facilitating querying and analysis. Often designed as OLAP (On-Line Analytical Processing) systems, these databases contain read-only data that can be queried and analysed far more efficiently as compared to your regular OLTP application databases. In this sense an OLAP system is designed to be read-optimized.
Separation from your application database also ensures that your business intelligence solution is scalable (your bank and ATMs don’t go down just because the CFO asked for a report), better documented and managed (god help the novice who is given the application database diagrams and asked to locate the needle of data in the proverbial haystack of table proliferation), and can answer questions far more efficietly and frequently.
Creation of a DW leads to a direct increase in quality of analyses as the table structures are simpler (you keep only the needed information in simpler tables), standardized (well documented table structures), and denormalized (to reduce the linkages between tables and the corresponding complexity of queries). A DW drastically reduces the ‘cost-per-analysis’ and thus permits more analysis per FTE. Having a well-designed DW is the foundation successful BI/Analytics initiatives are built upon.
If you are still running your reports off the main application database, answer this simple question: Would the solution still work next year with 20% more customers, 50% more business, 70% more users, and 300% more reports? What about the year after next? If you are sure that your solution will run without any changes, great!! However, if you have already budgeted to buy new state-of-the-art hardware and 25 new Oracle licenses with those partition-options, and the 33 other cool-sounding features, you might consider calling up Oracle and letting them know. There’s a good chance they’d make you their brand ambassador.
It’s probably more sensible and simpler to create a new DW exclusively for your BI needs. And if you are cash strapped, you could easily do that at extremely low costs by using excellent open source databases like MySQL.
Tomorrow we’ll see what makes MySQL an obvious choice for data warehousing and analytics. We’ll also see how a DW is designed differently from your usual application database.
Keep watching…
November 24th, 2005 at 9:59 pm
I think this is great stuff especially for those who have yet not stepped onto the analytics bandwagon and ironically (though more importantly), also for those who have with a lot of fanfare.
November 25th, 2005 at 11:51 pm
*LOL* You are bang on the spot Nikhil about many of those who are doing analytics with a lot of fanfare!
November 14th, 2006 at 5:17 pm
This site is excellant
December 13th, 2006 at 3:53 am
Cool!
Many thanks for the very interesting article!
December 18th, 2006 at 11:31 pm
Thank you.I was actually preparing for a paper presentation and it helped me a lot in understanding the concepts deep hidden.
January 23rd, 2007 at 9:14 am
You delivered a very informative information. And I’m gona forward this to many of my friends across US.
January 23rd, 2007 at 12:52 pm
Thanks Vinodh, Anusha, Valentin and John,
I wrote this article as comin from an IT background I had a tough time figuring out the difference first. And there wasn’t much material on the net that I found useful (unfortunately even today a lot of articles from the industry leaders are less on substance and more on marketing hype). Personally I had to really look around and learn from veterans before I finally understood these simple but critical differences.
Thank you for taking the time for dropping your comments. Its readers like you that make this worthwhile.
February 8th, 2007 at 12:23 am
Wow – one of the most concise articles I’ve come across in 3+ years on this subject; also enables the audience to understand why a data warehouse may be value-added and/or cost beneficial (they’re commonly seen as simply a necessary overhead). Nice job and keep up the great work.
February 16th, 2007 at 2:40 pm
It s interesting for me to hear My Sql as a datawarehouse DB. Because only the word “datawarehouse” seemed to spent lots of money for me.
Thanks…
February 16th, 2007 at 8:48 pm
Hi
This is very useful to me because am preparing for paper presentation. From this i’m going to know the difference between application database and datawarehouses
March 1st, 2007 at 7:31 am
This was very informative article for a person like me, who does not have technology backgroud but keep hearing these terms in all company discussions. Great job Nishith. Thanks!!
April 3rd, 2007 at 3:26 pm
I really appreciate the info captured, it directly hits the query raised.
In my first visit only, the site has moved in my fav.
Thanks
May 8th, 2007 at 9:47 pm
This is good..! one can find it interesting, if they are really in need of such kind of information. Thanks Nishith and every one.
June 3rd, 2007 at 1:34 pm
Thank You Nishit
They seem same but, infact there is huge gulf between them.
I was surfing for the difference between these two technologies and finally I understood from your simple interpretation.
August 27th, 2007 at 12:19 pm
The major difference between data mart and data warehousing is that data mart is not full data i.e. it will be selection of data not completed data where data warehousing is replica of full database.
December 13th, 2007 at 2:36 am
Thanks for such a helpful article.
Where can I find the MySQL related article which you have mentioned?
December 16th, 2007 at 12:40 pm
In his post regarding a Data Mart, Raj brings up an important next question frequently asked after one understands what a data warehouse is.
The only piece I can add is that the “not full” data he mentions in a Data Mart is typically data specific to a department (sales, human resources, etc.) while a Data Warehouse combines data for the entire organization.
December 30th, 2007 at 8:33 pm
HI,
First thanks Nisthit . very informative article.
January 7th, 2008 at 4:46 am
Huge WOW from me, this article helped me finally to understand what I heard on a daily base in the last five years or so. Thank you so much for your informative post.
January 22nd, 2008 at 12:57 am
hey thanks a lot for this article.it helped a lot .what i was tring to understand for such a long time i understood it in 5 minutes. really helpful .
February 10th, 2008 at 10:23 am
Hi,
Thanks for such a informative article.. I being working on DWH migration could not articulate teh difference so beautifully..
It was really helpful..
March 26th, 2008 at 4:45 pm
This was a very good article.
Explained in very simple words which worked wonders.
Thanks for the insight of the OLTP and OLAP differences which really differentiate the analysis feasibility to a database system.
Please could you guide where I would find your next article on MySQL
April 5th, 2008 at 1:48 am
The main difference between DW and DB is the first is designed to answer questions that require to have history of data but the second as mentioned in others’ postings is for OLTP, in other words, with no or very limited history of data.
We do not try to have a DW because the business has grown. The data volume, number of users or reports has nothing to do with having a DW, but the type questions (historical). If we see a huge volume of data in DW compared to a DB within same organization is just because we want to keep the history of data (old data) in the DW for analysis purposes, rather than a DB is designed specifically for keeping only the most up-to-date (or very limited history) data.
In short, having the factor (dimension) of time in our design is the most important reason of having DW which enables us to answer those analytical questions about the past and hence forecasting the future (”what-if scenarios”).
Regards,
Babak
May 4th, 2008 at 8:06 pm
Hi, excellent article. Good to see that it’s still relevant although published 2.5 yrs ago. Thanks a lot. Has cleared up a lot for me. And it’s good to see i wasn’t the only one confused, even embarrassed at not understanding the difference.
May 22nd, 2008 at 4:13 pm
hi all;
This was really informative article, and this reply is to motivate for these kind of articles .
June 4th, 2008 at 7:02 pm
Yes, definitely a clear, concise explanation of the difference.
A point raised by Raj is that the data warehouse is a replica of the application database, this is not strictly true. The DW does not have to have an identical structure by any means, often this is categorically not the case. Many DW implementations collate multiple databases into a single DW. And from this DW, multiple ‘data marts’ – subsets of the DW – are then created for specific users/departments/domains/etc.
David
June 12th, 2008 at 10:57 am
Nishith should be thanked for publishing such an informative article on data warehousing.
I visited this site for the first time and am highly impressed by the information given here on various topics.
June 16th, 2008 at 11:34 pm
Gracias me a servido mucho este articulo
June 21st, 2008 at 4:03 pm
Dear Nishith & all,
I wonder why I didn’t discover this site before.
Very well explained in brief.
I am a novice. Will someone explain:
1. How data is transferred from DB to DW?
2. what are the factors to keep in mind while designing a DW?
3. How the analytical tools are run on the DW?
Ansuman
July 15th, 2008 at 12:26 pm
Good work Nishit. Learnt a lot. Keep adding.
RK
July 15th, 2008 at 12:27 pm
Good work Nishith. Learnt a lot. Keep adding.
RK
August 10th, 2008 at 6:38 pm
Hi Nishith and all others joining in here,
I recently found this excellent blog as I work on design of a new software product. I have some very basic questions about database design for my project.
I am more of a business guy than the technical expert, so bear with me . . .
We will be performing a variety of analytical reporting on a set of transaction data. Many of the metrics we need to generate can be based on aggregated data in a fact table, as is typical in the usual data mart design using star schema. But, there is also a need to drill down and build reports that expose/list the individual transactions in the original data set – sorted and filtered by time, geography, etc.
The size of the original data set is quite small in the context of typical data warehouse and BI applications. I only expect the number of records in the original data set to be in the range of 10,000 records per month.
So, the question is whether or not to use an aggregation approach in the data mart schema design. Maybe we should create a consolidated database of transactions as part of the database, as well as the fact tables in star schema with some fine level of aggregation grain.
Is this kind of question commonly asked? What do you recommend?
I look forward to your comments on this.
Thanks,
Jim
August 19th, 2008 at 4:38 pm
its a well explained and documnted article like a datawarehouse.
i was looking for exact topic for my exam preparation.
thanx for such a wonderful piece of article
August 21st, 2008 at 12:10 am
What do you think of the data warehouse database from Greenplum?
Here’s the link: http://www.greenplum.com/products/greenplum-database/
August 31st, 2008 at 4:48 pm
Thanks for the article. A very good explanation on the differences between the two.
Well done.
November 2nd, 2008 at 3:28 pm
Just found out about this site today. I am really impressed with the quality and simplicity of the explanation. Nice Work.
November 11th, 2008 at 5:46 pm
nishith can i have your email id plssss
December 24th, 2008 at 3:48 am
this is very informative especially for database/dw hacks such as myself.
-mc
December 24th, 2008 at 3:46 pm
Awesome information.
Glad you took the time to delve into the differences between the dw & db.
January 16th, 2009 at 12:32 am
Thanks for such a wonderful explanation.
January 24th, 2009 at 1:32 am
This was execellent. My professor in my graduate class has 30 years of experience with databases. He didn’t know anything about Data Warehouse’s. That’s odd. This page is great.
February 16th, 2009 at 4:07 am
Great!!!! Thanks for your concise and Simple explanation to understand the difference btn. the two.
February 16th, 2009 at 7:50 pm
“20% more customers, 50% more business, 70% more users, and 300% more reports”.
Could you please explain how you arrived on the above mentioned numbers and statistics.
March 25th, 2009 at 7:16 pm
Ingres is also Opensource
Solid State Disks will shortly become the norm.
DW is marketing blurb, just write some SQL and you can get a report.
March 26th, 2009 at 12:01 pm
Thanks for such a wonderful explanation.
June 18th, 2009 at 3:54 am
you helped me getting 98 out of 100 in my presentation on Datawarehouse, i stood first in my class presentation
September 16th, 2009 at 3:35 pm
thanks a lot
October 3rd, 2009 at 7:23 pm
the explanation was really usefull..thanks a lot man..
October 25th, 2009 at 5:26 pm
Realy informative artcile ..
I got all the terms i needed for a concise explaination in interview .Thnks
December 18th, 2009 at 8:42 pm
absolutely great content this is…….we can compare database and data warehouse and we can draw differences and similarities between them….
March 12th, 2010 at 12:34 pm
Nice to know about the Data ware house and database.. If there will be simple example then it would be nice..
advance thanks
May 1st, 2010 at 1:28 am
wonderful.
May 19th, 2010 at 3:05 pm
Thanks , Nitesh
July 6th, 2010 at 4:50 pm
This is a simple yet superb explanation of Database and DWH.
July 11th, 2010 at 7:57 pm
Great article. Thanks
August 30th, 2010 at 2:02 pm
It’s good and know more about data base and data ware house
August 30th, 2010 at 2:04 pm
It’s good and know more about data base and data ware house and thank’s for my presenttation
June 8th, 2011 at 11:48 am
thanks nitesh. it’s very good
June 24th, 2011 at 1:00 pm
nice article
July 22nd, 2011 at 4:12 pm
Thanks a lot Nishith.. Its really helpful
July 28th, 2011 at 8:21 pm
thanks! appreciate the great summary, esp for newbies.
September 19th, 2011 at 9:41 pm
Interesting article, but it left out one other compelling reason for a separate data warehouse repository for analytics: if you have a need to report based on aggregated information housed in disparate databases/applications.
September 21st, 2011 at 4:27 am
Data warehouses and databases do only have a small semantic difference when using relational products such as MySQL and Oracle. There truly are no differences in technology when implementing a DW or database. You optimize for reads typically using a star or snowflake schema in a DW, and a database uses an OLTP structure using 3NF schema which is optimized for writes and minimally optimized for reads. This is true even applying to column store databases such as Infobright, Vertica, Vectorwise and LucidDB which offer significant read performances on star/snowflake schemas.
However DWs take on a whole new meaning when you consider proprietary technologies like Netezza, Aster Data which are a combination of hardware and software. These are not databases. These are appliances that don’t even operate on a relational level.
Taking it further with open source, you have very scalable technologies like Hadoop with Hive and HBase which provide interfaces allowing for relational-like queries, but operate on the principles of MapReduce which are massively scalable on commodity hardware.
Data warehouses do in fact have significant and measurable differences when using the appropriate technologies. MIS/IT departments building DWs on MySQL or Oracle quickly find themselves in search of alternative technologies simply because while they can serve the purpose of a DW, they simply weren’t designed for it. I’d highly recommend doing some reading on MapReduce and Hadoop and you can see some very real-world examples where terabyte sized DWs are not only functional, but solving very real problems of Big Data, performance and reliability.
October 5th, 2011 at 2:24 am
Interesting article, it makes understanding data base and data ware house very easy.
October 9th, 2011 at 9:56 pm
Hey, i have 1 question.
Can a company have more than 1 datawarehouse ?
October 16th, 2011 at 6:48 am
Hi Nishith & Patrick (comment #63),
Wonderful wealth of knowledge you have given to me! Much appreciated and bookmarked.
Cheers
Alex
February 8th, 2012 at 10:11 am
This really helped me and i haven’t got information about the warehouse this much important.
Thank man…..
February 9th, 2012 at 10:37 pm
Great article Nishith. The information is exactly to the point and very helpful. Keep up the nice work.
Cheers,
Jyothsna
February 13th, 2012 at 10:40 am
Great article. I am a recruiter yet found this article interesting & easily understandable with the ATM example. Looking forward to see many more similar articles.
Regards,
Srividhya.
February 24th, 2012 at 11:16 am
good article on DW and let us know more on open source techologies on DW.
April 16th, 2012 at 2:25 pm
thanx alot for uploading such a nice article…..helpful with awesome examples…….!!KEEP IT UP!! Students do need people like u
April 30th, 2012 at 10:22 pm
this is an awesome article. I cant wait to continue reading to find out more. Am designing inventory system for my company and I believe this will give me the right tools to make the right decisions to update the system.