Open Source Analytics


KETL is an open source ETL tool by Kinetic Networks that is gaining mindshare of late. It is currently downloadable as part of Bizgres BI project, but can be setup for other databases with a little tweaking.

KETL is different from Kettle, another open source ETL tool. You can read more about the similar names here at Nicholas Goodman’s blog. While Kettle is GUI oriented, KETL is scripted and probably more robust.

Read the KETL training doc to know more about its architecture and usage.

If you have followed some of the earlier posts, you would remember that a data mart is created as a star schema through a process known as dimensional modeling. In this post we will create a dimensional model for Sales data mart at a hypothetical retailer. (more…)

In this recent article called “The Open Source BI Trend Will Grow – Here’s Why” on the DM Direct Newsletter, Rick Mortensen of MARVELIt explains why open source BI is gaining traction and will continue to grow.
(more…)

I would, in due course, spend some time on the R Environment (available as part of DecisionStudio Professional, and separately downloadable from http://r-project.org). R provides an excellent alternative to commercial products for modeling, statistical analysis, and graphics. Originally designed by AT&T Bell Labs, the R environment is fast becoming the standard for cutting edge number crunching.
(more…)

On Friday we released DecisionStudio Professional – a comprehensive and free desktop BI Platform that gives you all the tools needed for analytics under a single package licensed under GNU Public License (GPL).

DecisionStudio Professional (DSP) is an advanced graphical data mining, reporting, modeling, and analysis environment built on top of the best-of-breed open source projects. Some of these include:
Optimized MySQL database as data warehouse platform
SQL Workbench (MySQL Query Browser and DBDesigner) for Data Analysts
R environment for statistical analysis and modeling
iReport Reporting GUI and JasperReport reporting library
Python with Boa Constructor IDE for application and GUI development

DecisionStudio Professional is the only end-to-end open source analytics platform that provides comprehensive capabilities to each role. Data Analysts get to store, process, and publish data on a standard MySQL platform; Reporting Analysts would like iReport and the integration with Office tools; and Modelers would love the excellent R Environment. It also includes Python along with a drag-n-drop GUI building environment for analytics Application Developers.

You can find out more about DecisionStudio Professional at decisionstudio.com, and can download your copy at Sourceforge.net. Click here to download the product brochure (PDF).

Go ahead, it’s completely free and will always stay so. ;-)

Analytics and Business Intelligence is really about the conversion of raw data into optimal and actionable decisions to create tangible business value. Otherwise, what’s the point?
(more…)

In the last post (OLAP Reporting on Open Source Software – I) we spoke about Mondrian, an open-source OLAP server.

In this post we would be setting up OLAP reporting for a hypothetical retailer called FoodMart that sells various grocery products in a chain of stores across US, Canada, and Mexico.
(more…)

OLAP (On-Line Analytical Processing) reporting systems provide what is commonly known as “slice-and-dice” functionality to non-technical end users. Users are able to see ad-hoc reports and charts to answer ad-hoc questions they may want answered. Another commonly used name is “drill-down reporting” on “OLAP cubes”. In essence this is similar to the Excel based Pivot reports, only that OLAP systems can do the same thing on massive amounts of data.

The OLAP Reporting revolves around two simple concepts: Dimensions, and Measures.
(more…)

Sorry guys to have been AWOL all this while. Some hectic travelling followed by some ill health and a lot of overdue work has been keeping me away. Having said that, I still should have posted at least a line to inform the regular visitors. By the way, it might be a good idea to grab the RSS feed for this blog, so that you get informed whenever its getting updated.

Now back to work.

A concern some of you have raised about the Consumer Finance data model we discussed (see Open Source Analytics in a month and subsequent posts) is that it appears far too simple to be able to deliver any analytical value to the business. Wouldn’t we be needing the behavioral, payment, response, clickstream, usage data, blah-this, blah-that, and blah-other in order to deliver any value? Isn’t a three table (Loan, Customer, and Marketing) demo too simplistic to be of any real use?

This post is really about answering this. Get out of the hype-victim mode and start thinking. If you look close enough there is enough you can do with just this much data. And in this post we explore just that. (more…)

For those who joined the party late, this blog is about doing advanced analytics with free open source tools. And this month we are developing a completely free analytical solution for a hypothtical consumer finance company. You can get more details in the Open Source Analytics Category.

Today we shall design a Data Warehouse for the data.

In Consumer Finance Data Model – I, we described what AFS (our fictitious consumer finance organization) does, and the main pieces of information it deals with. To summarize, here are the pieces of data reproduced below: (more…)

Next Page »