Thu 3 Nov 2005
MySQL makes an obvious and affordable choice for data warehousing. Its traditional weaknesses (from OLTP system point of view) are actually its strengths when it comes to Data warehousing and OLAP.
If you look at the requirements from a DW, you’d see that unlike OLTP systems, transactions are not required, and 60×24x7 uptimes are not really needed. On the contrary the system should be able to handle queries most efficiently and should provide capabilities for high-speed bulk loading in batch jobs. This is where MySQL shines over competition.
MySQL’s non-transactional MyISAM db engine is one of the fastest around when it comes to querying large amounts of data. You don’t get transaction support, and thats alright because you don’t need transactions (and the huge overheads they bring). And its bulk insert features (LOAD DATA INFILE…) are extemely fast as well!
In case you didn’t know it, MyISAM tables can be merged into larger tables for querying. So if your data is split into 4 tables (1 for each quarter of year), you can dfine a merge table on top of them so that your users can directly run SQL on single merge table instead of running 4 queries on 4 tables. Sounds similar to Oracle’s partitioning? Well, it gives you the same benefits free of cost.
MySQL is simple to install and maintain, can be easily setup and optimized for reporting, and is open source licensed under GPL.
A good design for a reporting system based on MySQL is to keep the fact tables as MyISAM (fastest disk based tables) and keep the smaller dimension tables in memory. You’ll be surprised to see how your reports fly. We’ll consider such a design in our comprehensive analytics project this month.
Keep watching…
November 3rd, 2005 at 7:31 pm
Look forward to the comprehensive analytics project, I think it is something that everyone who is into data warehousing will be interested in. I would love to see my reports fly, without help from ceiling fans
November 4th, 2005 at 2:41 am
Thanks RT. I am sure we’ll be able to do away with them silly ceiling fans.
April 12th, 2006 at 1:59 am
Aren’t MyISAM tables more prone to corruption though? InnoDB can do a considerable amount of self-repair, which is useful when the largest (and usually the most valuable) tables crash, while MyISAM would then require you to write a supporting backup/redundancy system to handle crashes.
April 12th, 2006 at 8:51 am
Hi Yishan,
Not really. Since Data Warehouses do not have constant insert/updates ( as required for OLTP systems), MyISAM tables do quite well and shouldn’t get corrupted unless the ETL job fails in some very weird way (server restart, hardware crash, etc.).
Even then, the corrupted portion would be right at the end (Data Warehouse ETL is usually an append process) which makes the repair simpler (you can just throw out all the data from the last load job).
Of course, it’s a good habit to keep checking the table periodically (CHECK TABLE or MYISAMCHK).
July 25th, 2006 at 2:33 pm
Hi Nishith, I really enjoy readingyour posts.
I am interested with the technical detail of the concept. Where can I look for it?