“I am a bootstrapper. I have initiative and insight and guts, but not much money. I will succeed because my efforts
and my focus will defeat bigger and better-funded competitors. I am fearless. I keep my focus on growing the business—
not on politics, career advancement, or other wasteful distractions.” — Seth Godin’s The Bootstrapper’s Bible.

(more…)

Decision trees are one of the most widely used and practical forms of machine learning and data mining. They have been widely researched and applied to a large variety of data mining problems. (Decision trees are also known as Classification Trees or Regression Trees based on whether the classification is being done on real values or on categorical variables.)
Decision Tree: Forecasting whether Golf will be played based on the Weather condition
(more…)

In Europe they call it Operations Research (OR). OR is the discipline of applying advanced analytical methods to help make better decisions.

By using techniques such as mathematical modeling to analyze complex situations, operations research gives executives the power to make more effective decisions and build more productive systems based on:

* More complete data
* Consideration of all available options
* Careful predictions of outcomes and estimates of risk
* The latest decision tools and techniques

Read The Executive Guide to OR Research by the Institute for Operations Research and the Management Sciences.

Sorry guys to have been AWOL all this while. Some hectic travelling followed by some ill health and a lot of overdue work has been keeping me away. Having said that, I still should have posted at least a line to inform the regular visitors. By the way, it might be a good idea to grab the RSS feed for this blog, so that you get informed whenever its getting updated.

Now back to work.

A concern some of you have raised about the Consumer Finance data model we discussed (see Open Source Analytics in a month and subsequent posts) is that it appears far too simple to be able to deliver any analytical value to the business. Wouldn’t we be needing the behavioral, payment, response, clickstream, usage data, blah-this, blah-that, and blah-other in order to deliver any value? Isn’t a three table (Loan, Customer, and Marketing) demo too simplistic to be of any real use?

This post is really about answering this. Get out of the hype-victim mode and start thinking. If you look close enough there is enough you can do with just this much data. And in this post we explore just that. (more…)

As I was saying in an earlier post (Database vs, Data Warehouse), your Application database is not your Data Warehouse for the simple reason that your application database was never designed to answer queries. Your app is a jumble of tables interlinked to define the entities in your business and their relationships (usually depicted in an Entity-Relationship Diagram). You just have to look at an Entity-Relationship Diagram to know that its not an easy job trying to get some queries and answers out of it. (You can see an example in the article linked to at the end of this post).

To design a good Data Warehouse that actually answers user queries effectively you need to do what is known as Dimensional Modeling. (more…)

For those who joined the party late, this blog is about doing advanced analytics with free open source tools. And this month we are developing a completely free analytical solution for a hypothtical consumer finance company. You can get more details in the Open Source Analytics Category.

Today we shall design a Data Warehouse for the data.

In Consumer Finance Data Model – I, we described what AFS (our fictitious consumer finance organization) does, and the main pieces of information it deals with. To summarize, here are the pieces of data reproduced below: (more…)

If you found the course on data-mining a little too long or abstract for your taste, here is your Field Guide to Data Mining.

Data mining offers great promise in helping organizations uncover patterns hidden in their data that can be used to predict the behavior of customers, products and processes. However, to realize the value of data-mining, it has to be done by people who understand the business, the data, and the general nature of the analytical methods involved. Unfortunately, due to the amount of hype involved, a surprisingly large number of people mistake the tools for the craft of data mining. (more…)

Big-bang is dead. And its awfully obvious these days that Less can be a competitive advantage.

I have had a tough time explaining this. And it’s a pleasant surprise to see today that the guys at 37signals.com do get it.

Times have changed. All other people’s money gets you these days is into debt. And that’s not a great place to start anything from. You don’t need money for hardware — hardware is cheap. You don’t need money for software — software is free. You don’t need money for marketing — there are a variety of ways get your message out online to a huge audience for next to free. Money doesn’t buy you time and money doesn’t buy you passion (and passion is something you need a boatload of).

(more…)

Let’s begin with what Acme Financial Services (AFS), our fictitious Consumer Finance company, does. Because this will tell us the critical pieces of information AFS deals with, and that in turn will give us the required insights for building its data model. (more…)

MySQL makes an obvious and affordable choice for data warehousing. Its traditional weaknesses (from OLTP system point of view) are actually its strengths when it comes to Data warehousing and OLAP.

If you look at the requirements from a DW, you’d see that unlike OLTP systems, transactions are not required, and 60×24x7 uptimes are not really needed. On the contrary the system should be able to handle queries most efficiently and should provide capabilities for high-speed bulk loading in batch jobs. This is where MySQL shines over competition. (more…)

« Previous PageNext Page »