November 2005


Sorry guys to have been AWOL all this while. Some hectic travelling followed by some ill health and a lot of overdue work has been keeping me away. Having said that, I still should have posted at least a line to inform the regular visitors. By the way, it might be a good idea to grab the RSS feed for this blog, so that you get informed whenever its getting updated.

Now back to work.

A concern some of you have raised about the Consumer Finance data model we discussed (see Open Source Analytics in a month and subsequent posts) is that it appears far too simple to be able to deliver any analytical value to the business. Wouldn’t we be needing the behavioral, payment, response, clickstream, usage data, blah-this, blah-that, and blah-other in order to deliver any value? Isn’t a three table (Loan, Customer, and Marketing) demo too simplistic to be of any real use?

This post is really about answering this. Get out of the hype-victim mode and start thinking. If you look close enough there is enough you can do with just this much data. And in this post we explore just that. (more…)

As I was saying in an earlier post (Database vs, Data Warehouse), your Application database is not your Data Warehouse for the simple reason that your application database was never designed to answer queries. Your app is a jumble of tables interlinked to define the entities in your business and their relationships (usually depicted in an Entity-Relationship Diagram). You just have to look at an Entity-Relationship Diagram to know that its not an easy job trying to get some queries and answers out of it. (You can see an example in the article linked to at the end of this post).

To design a good Data Warehouse that actually answers user queries effectively you need to do what is known as Dimensional Modeling. (more…)

For those who joined the party late, this blog is about doing advanced analytics with free open source tools. And this month we are developing a completely free analytical solution for a hypothtical consumer finance company. You can get more details in the Open Source Analytics Category.

Today we shall design a Data Warehouse for the data.

In Consumer Finance Data Model – I, we described what AFS (our fictitious consumer finance organization) does, and the main pieces of information it deals with. To summarize, here are the pieces of data reproduced below: (more…)

If you found the course on data-mining a little too long or abstract for your taste, here is your Field Guide to Data Mining.

Data mining offers great promise in helping organizations uncover patterns hidden in their data that can be used to predict the behavior of customers, products and processes. However, to realize the value of data-mining, it has to be done by people who understand the business, the data, and the general nature of the analytical methods involved. Unfortunately, due to the amount of hype involved, a surprisingly large number of people mistake the tools for the craft of data mining. (more…)

Big-bang is dead. And its awfully obvious these days that Less can be a competitive advantage.

I have had a tough time explaining this. And it’s a pleasant surprise to see today that the guys at 37signals.com do get it.

Times have changed. All other people’s money gets you these days is into debt. And that’s not a great place to start anything from. You don’t need money for hardware — hardware is cheap. You don’t need money for software — software is free. You don’t need money for marketing — there are a variety of ways get your message out online to a huge audience for next to free. Money doesn’t buy you time and money doesn’t buy you passion (and passion is something you need a boatload of).

(more…)

Let’s begin with what Acme Financial Services (AFS), our fictitious Consumer Finance company, does. Because this will tell us the critical pieces of information AFS deals with, and that in turn will give us the required insights for building its data model. (more…)

MySQL makes an obvious and affordable choice for data warehousing. Its traditional weaknesses (from OLTP system point of view) are actually its strengths when it comes to Data warehousing and OLAP.

If you look at the requirements from a DW, you’d see that unlike OLTP systems, transactions are not required, and 60×24x7 uptimes are not really needed. On the contrary the system should be able to handle queries most efficiently and should provide capabilities for high-speed bulk loading in batch jobs. This is where MySQL shines over competition. (more…)

The craft of analytics has suffered due to the severe hype surrounding commercial data-mining tools. Try checking the websites of most commercial data-mining software vendors to confirm for yourself. Buying that million dollar tool will not help if you do not know the craft. Unfortunately, even the ‘consultants’ of such companies confuse the tool for the art. Often enough, all that these consultants can do is regurgitate the same meaningless crap their marketing guy shafted into their heads.

But the playing field is changing. As in other fields of human endeavour, the tools are getting better and cheaper, thus removing the first barrier to entry: Access to professional tools. (more…)

So how is a data warehouse different from your regular database? After all, both are databases, and both have some tables containing data. If you look deeper, you’d find that both have indexes, keys, views, and the regular jing-bang. So is that ‘Data warehouse’ really different from the tables in you application? And if the two aren’t really different, maybe you can just run your queries and reports directly from your application databases!

Well, to be fair, that may be just what you are doing right now, running some EOD (end-of-day) reports as complex SQL queries and shipping them off to those who need them. And this scheme might just be serving you fine right now. Nothing wrong with that if it works for you.

But before you start patting yourself on the back for having avoided a data warehouse altogether, do spend a moment to understand the differences, and to appreciate the pros and cons of either approach. (more…)