Data Warehousing


Most Data Warehousing projects fail. As many as 70-80% as per some claims. Still, no one talks about them.

Data Mining, Analytics and BI roll-outs are unlike any other project your organization may have undertaken. Political and non-technical issues can derail the fragile project which is anyway struggling to handle ambiguous and constantly changing requirements.
(more…)

Those of you who have been using MySQL for sometime now would know that the MySQL 5.0 Online Reference Manual is not just a manual but also a repository of user comments exploring and solving common and/or deeper problems. If you are stuck with a particular problem that you find unable to frame a SQL for, the comments on the manual pages would usually have a solution.
(more…)

KETL is an open source ETL tool by Kinetic Networks that is gaining mindshare of late. It is currently downloadable as part of Bizgres BI project, but can be setup for other databases with a little tweaking.

KETL is different from Kettle, another open source ETL tool. You can read more about the similar names here at Nicholas Goodman’s blog. While Kettle is GUI oriented, KETL is scripted and probably more robust.

Read the KETL training doc to know more about its architecture and usage.

Building a successful Data Warehouse as part of a BI roll out is going to test both your tolerance for ambiguity and the resilience of your development methodology. Traditional water-fall model tends to fail as BI requirements change frequently. So if the traditional big-bang waterfall is not likely to work, what does?

Agile development is an approach that “cycles” through the development phases, from gathering requirements to delivering functionality into a working release. (more…)

If you have followed some of the earlier posts, you would remember that a data mart is created as a star schema through a process known as dimensional modeling. In this post we will create a dimensional model for Sales data mart at a hypothetical retailer. (more…)

While we may all differ on the definitions of BI, we do know that it is all about extracting and delivering specific and useful information in the midst of the data-explosion around us. And all the definitions and implementations, in their own ways, are geared towards that objective.

Margaret Dunham, the author of Data Mining: Introductory and Advanced Topics once said:

Data mining research and practice is in a state similar to that of databases in the 1960s.

(more…)

There are far too many conflicting and confusing definitions of Data Mart and Data Warehouse floating around. The long running debate between Ralph Kimball and Bill Inmon, the two Titans of Data Warehousing, only adds to the confusion.

In this post, we’ll try to get some sanity around the concepts, without getting drawn (hopefully) into the crossfire.

(more…)

On Friday we released DecisionStudio Professional – a comprehensive and free desktop BI Platform that gives you all the tools needed for analytics under a single package licensed under GNU Public License (GPL).

DecisionStudio Professional (DSP) is an advanced graphical data mining, reporting, modeling, and analysis environment built on top of the best-of-breed open source projects. Some of these include:
Optimized MySQL database as data warehouse platform
SQL Workbench (MySQL Query Browser and DBDesigner) for Data Analysts
R environment for statistical analysis and modeling
iReport Reporting GUI and JasperReport reporting library
Python with Boa Constructor IDE for application and GUI development

DecisionStudio Professional is the only end-to-end open source analytics platform that provides comprehensive capabilities to each role. Data Analysts get to store, process, and publish data on a standard MySQL platform; Reporting Analysts would like iReport and the integration with Office tools; and Modelers would love the excellent R Environment. It also includes Python along with a drag-n-drop GUI building environment for analytics Application Developers.

You can find out more about DecisionStudio Professional at decisionstudio.com, and can download your copy at Sourceforge.net. Click here to download the product brochure (PDF).

Go ahead, it’s completely free and will always stay so. ;-)

Happy New Year, and back to work.

The Data Warehouse is the foundation of any analytics initiative. You take data from various data sources in the organization, clean and pre-process it to fit business needs, and then load it into the data warehouse for everyone to use. This process is called ETL which stands for ‘Extract, transform, and load’.
(more…)

As I was saying in an earlier post (Database vs, Data Warehouse), your Application database is not your Data Warehouse for the simple reason that your application database was never designed to answer queries. Your app is a jumble of tables interlinked to define the entities in your business and their relationships (usually depicted in an Entity-Relationship Diagram). You just have to look at an Entity-Relationship Diagram to know that its not an easy job trying to get some queries and answers out of it. (You can see an example in the article linked to at the end of this post).

To design a good Data Warehouse that actually answers user queries effectively you need to do what is known as Dimensional Modeling. (more…)

Next Page »