<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Open Source Analytics &#187; Data Warehousing</title>
	<atom:link href="http://opensourceanalytics.com/category/data-warehousing/feed/" rel="self" type="application/rss+xml" />
	<link>http://opensourceanalytics.com</link>
	<description>Comprehensive Analytics on Open Source Software.</description>
	<lastBuildDate>Tue, 25 Sep 2007 15:12:42 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.3</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>What They Didn&#8217;t Tell You About Data Warehousing</title>
		<link>http://opensourceanalytics.com/2006/09/26/what-they-didnt-tell-you-about-data-warehousing/</link>
		<comments>http://opensourceanalytics.com/2006/09/26/what-they-didnt-tell-you-about-data-warehousing/#comments</comments>
		<pubDate>Tue, 26 Sep 2006 06:09:55 +0000</pubDate>
		<dc:creator>Nishith</dc:creator>
				<category><![CDATA[BI, Data Mining, Analytics]]></category>
		<category><![CDATA[Data Warehousing]]></category>

		<guid isPermaLink="false">http://opensourceanalytics.com/?p=55</guid>
		<description><![CDATA[Most Data Warehousing projects fail.  As many as 70-80% as per some claims.  Still, no one talks about them.
Data Mining, Analytics and BI roll-outs are unlike any other project your organization may have undertaken.  Political and non-technical issues can derail the fragile project which is anyway struggling to handle ambiguous and constantly [...]]]></description>
			<content:encoded><![CDATA[<p>Most Data Warehousing projects fail.  As many as 70-80% as per some claims.  Still, no one talks about them.</p>
<p>Data Mining, Analytics and BI roll-outs are unlike any other project your organization may have undertaken.  Political and non-technical issues can derail the fragile project which is anyway struggling to handle ambiguous and constantly changing requirements.<br />
<span id="more-55"></span><br />
Mark Demarest, in his classic 1997 article <strong><a href="http://www.noumenal.com/marc/dwpoly.html">The Politics of Data Warehousing</a></strong> says:</p>
<blockquote><p>Data warehousing projects are frequently side-tracked or derailed completely by non-technical factors, in particular the political treaty lines within the firm, and the politicized nature of data itself. Because data warehouses are infrastructure for sociotechnical systems (STSs) within the firm, politics and the exercise of power are inherent in data warehousing projects, and data warehouse designers have to adopt work practices and methods from non-technical disciplines, think of themselves in new ways, and employ some fairly sophisticated qualitatively sociological methods in order to optimize the chances for successful deployment of data warehouses.
</p></blockquote>
<p><strong><a href="http://www.noumenal.com/marc/dwpoly.html">The Politics of Data Warehousing</a></strong> also lists down 10 warning signals to detect politicization of the project, and 10 countermeasures &#8211; essentially the recipe for delivering on the project promise.</p>
<p>Larry Greenfield, in a more recent article titled <strong><a href="http://www.dwinfocenter.org/politics.html">Data Warehousing Political Issues</a></strong> identifies three common threads as:</p>
<blockquote><p>1) Data warehousing imposes new obligations whose responsibilities are unclear<br />
2) Data warehousing requires changes in processes that an organization is comfortable with<br />
3) Data warehousing requires agreement on some, but not all, definitions of data.
</p></blockquote>
<p>Larry classifies the political issues into those that are within the IS organization (IS to IS), those that are between IS and the users (IS to Users), and those that are between users (User to User).  <a href="http://www.dwinfocenter.org/politics.html">Click here to read <strong>Data Warehousing Political Issues.</strong></a></p>
<p>And don&#8217;t miss the article <a href="http://www.dwinfocenter.org/gotchas.html"><strong>Data Warehousing Gotchas</strong></a>, listing some little known nuggets of wisdom and experience that can save your project.  As Larry says, &#8220;Forewarned is forearmed!&#8221;  <a href="http://www.dwinfocenter.org/gotchas.html">Click here to read <strong>Data Warehousing Gotchas</strong></a></p>
]]></content:encoded>
			<wfw:commentRss>http://opensourceanalytics.com/2006/09/26/what-they-didnt-tell-you-about-data-warehousing/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Advanced MySQL Join Tips &amp; Tricks</title>
		<link>http://opensourceanalytics.com/2006/07/29/advanced-mysql-join-tips-tricks/</link>
		<comments>http://opensourceanalytics.com/2006/07/29/advanced-mysql-join-tips-tricks/#comments</comments>
		<pubDate>Sat, 29 Jul 2006 13:19:55 +0000</pubDate>
		<dc:creator>Nishith</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>

		<guid isPermaLink="false">http://opensourceanalytics.com/?p=53</guid>
		<description><![CDATA[Those of you who have been using MySQL for sometime now would know that the MySQL 5.0 Online Reference Manual is not just a manual but also a repository of user comments exploring and solving common and/or deeper problems.  If you are stuck with a particular problem that you find unable to frame a [...]]]></description>
			<content:encoded><![CDATA[<p>Those of you who have been using MySQL for sometime now would know that the <a href="http://dev.mysql.com/doc/refman/5.0/en/">MySQL 5.0 Online Reference Manual</a> is not just a manual but also a repository of user comments exploring and solving common and/or deeper problems.  If you are stuck with a particular problem that you find unable to frame a SQL for, the comments on the manual pages would usually have a solution.<br />
<span id="more-53"></span><br />
For example, consider a common scenario:  You have some data in table1 and table2, and you want to find the data in table1 which <em>does not</em> exist in table2.  This is a typical outer join problem, and an anonymous user comment on the <a href="http://dev.mysql.com/doc/refman/5.0/en/join.html">MySQL Manual page for JOIN syntax</a> gives the solution:</p>
<p><code>SELECT table1.* FROM table1 LEFT JOIN table2 ON table1.id=table2.id WHERE table2.id IS NULL</code></p>
<p>Now suppose you want to search not from the entire table2, but from a subset of table2 (say CityID = 1).  How do you do that?  You can try and see for yourself that the following SQL <em>is not </em>the solution:</p>
<p><code>SELECT table1.* FROM table1 LEFT JOIN table2 ON table1.id=table2.id<br />
WHERE table2.id IS NULL<br />
and table2.CityID = 1</code></p>
<p>Note that this is because we are looking for records that <em>do not exist </em>in table2 and hence cannot have a where clause on table2 data.</p>
<p>Read the manual page a little further, and another user comment points out:</p>
<blockquote><p>Conditions for the &#8220;right table&#8221; go in the ON clause.<br />
Conditions for the &#8220;left table&#8221; go in the WHERE clause,<br />
except for the joining conditions themselves.</p></blockquote>
<p>That makes the solution as:</p>
<p><code>SELECT table1.* FROM table1 LEFT JOIN table2 ON (table1.id=table2.id and table2.CityID = 1)<br />
WHERE table2.id IS NULL</code></p>
<p>Happy digging!  <img src='http://opensourceanalytics.com/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://opensourceanalytics.com/2006/07/29/advanced-mysql-join-tips-tricks/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>KETL ETL tool Training Document</title>
		<link>http://opensourceanalytics.com/2006/07/24/ketl-etl-tool-training-document/</link>
		<comments>http://opensourceanalytics.com/2006/07/24/ketl-etl-tool-training-document/#comments</comments>
		<pubDate>Mon, 24 Jul 2006 10:42:15 +0000</pubDate>
		<dc:creator>Nishith</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Open Source Analytics]]></category>

		<guid isPermaLink="false">http://opensourceanalytics.com/?p=51</guid>
		<description><![CDATA[KETL is an open source ETL tool by Kinetic Networks that is gaining mindshare of late.  It is currently downloadable as part of Bizgres BI project, but can be setup for other databases with a little tweaking.
KETL is different from Kettle, another open source ETL tool.  You can read more about the similar [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.kineticnetworks.com/opensrc.html">KETL</a> is an open source ETL tool by Kinetic Networks that is gaining mindshare of late.  It is currently downloadable as part of <a href="http://www.bizgres.org/home.php">Bizgres BI project</a>, but can be setup for other databases with a little tweaking.</p>
<p>KETL is different from <a href="http://www.kettle.be/">Kettle</a>, another open source ETL tool.  <a href="http://www.nicholasgoodman.com/bt/blog/2005/12/20/ketl-kettle/">You can read more about the similar names here at Nicholas Goodman&#8217;s blog.</a>  While Kettle is GUI oriented, KETL is scripted and probably more robust.</p>
<p><a href="http://www.kineticnetworks.com/KETL/KETL_Open_Source_Training.pdf">Read the KETL training doc to know more about its architecture and usage.  </a></p>
]]></content:encoded>
			<wfw:commentRss>http://opensourceanalytics.com/2006/07/24/ketl-etl-tool-training-document/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Agile Data Warehouse Development Methodology</title>
		<link>http://opensourceanalytics.com/2006/06/07/agile-data-warehouse-development/</link>
		<comments>http://opensourceanalytics.com/2006/06/07/agile-data-warehouse-development/#comments</comments>
		<pubDate>Tue, 06 Jun 2006 18:35:43 +0000</pubDate>
		<dc:creator>Nishith</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>

		<guid isPermaLink="false">http://opensourceanalytics.com/?p=46</guid>
		<description><![CDATA[Building a successful Data Warehouse as part of a BI roll out is going to test both your tolerance for ambiguity and the resilience of your development methodology.  Traditional water-fall model tends to fail as BI requirements change frequently.  So if the traditional big-bang waterfall is not likely to work, what does?
Agile development [...]]]></description>
			<content:encoded><![CDATA[<p>Building a successful Data Warehouse as part of a BI roll out is going to test both your tolerance for ambiguity and the resilience of your development methodology.  Traditional water-fall model tends to fail as BI requirements change frequently.  So if the traditional big-bang waterfall is not likely to work, what does?</p>
<p><strong>Agile development </strong>is an approach that &#8220;cycles&#8221; through the development phases, from gathering requirements to delivering functionality into a working release. <span id="more-46"></span>  Two widely used agile development frameworks are the <a href="http://www.microsoft.com/technet/itsolutions/msf/default.mspx">Microsoft Solutions Framework (MSF)</a> and <a href="http://c2.com/cgi/wiki?ExtremeProgramming">Extreme Programming (XP)</a>.  </p>
<p>Agile development methodologies begin with &#8216;user stories&#8217; which are brief 2-3 sentence problem statement.  They tend to describe just enough of user requirements to get a rough idea of effort and scope.  These are then used to develop multi-release plans that establish the roadmap for the team and establish the longer term vision for the team.  </p>
<p>The actual development is done in frequent cycles of small releases (both fresh development releases as well stabilization releases).  Stories are assigned to individual development cycles depending upon resources, time and requirements.  This Resource-Time-Requirement triangle is actively managed through triage to make sure that release schedules are met.</p>
<p><a href="http://www.dmreview.com/portals/portalarticle.cfm?articleId=1025869&#038;topicId=230004">Click here to read about applying Agile development methodologies to Data Warehousing projects</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://opensourceanalytics.com/2006/06/07/agile-data-warehouse-development/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Sales Data Mart &#8211; Dimensional Model for Retail</title>
		<link>http://opensourceanalytics.com/2006/04/28/sales-data-mart-dimensional-model-for-retail/</link>
		<comments>http://opensourceanalytics.com/2006/04/28/sales-data-mart-dimensional-model-for-retail/#comments</comments>
		<pubDate>Fri, 28 Apr 2006 06:12:08 +0000</pubDate>
		<dc:creator>Nishith</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[DecisionStudio-Professional]]></category>
		<category><![CDATA[Open Source Analytics]]></category>

		<guid isPermaLink="false">http://opensourceanalytics.com/?p=48</guid>
		<description><![CDATA[If you have followed some of the earlier posts, you would remember that a data mart is created as a star schema through a process known as dimensional modeling.  In this post we will create a dimensional model for Sales data mart at a hypothetical retailer.  
Note:  To go through the examples [...]]]></description>
			<content:encoded><![CDATA[<p>If you have followed some of the earlier posts, you would remember that a data mart is created as a star schema through a process known as dimensional modeling.  In this post we will create a dimensional model for Sales data mart at a hypothetical retailer.  <span id="more-48"></span></p>
<p><strong>Note:</strong>  To go through the examples here, you would need MySQL database and DB Designer for database modeling.  You can either install <a href="http://decisionstudio.com/product" target="_blank"><strong>DecisionStudio Professional</strong></a> (<a href="https://sourceforge.net/projects/ds-professional" target="_blank">download from sourceforge</a>), which has both MySQL and DB Designer along with many other analytics goodies, or else you can install them individually from <a href="http://mysql.com" target="_blank">MySQL website</a> and <a href="http://fabforce.net/downloads.php" target="_blank">DB Designer website</a>.  You would also need to <a href="https://sourceforge.net/projects/ds-professional" target="_blank">download the sample foodmart database</a> available along with DecisionStudio Professional.</p>
<p>Now let&#8217;s assume you are an IT person at FoodMart (a hypothetical retailer) who has decided to build a sales data mart as the first step in rolling out comprehensive analytics.  In discussions with the sales department you have figured that the <strong>no. of units sold, dollar amount of sales, and the number of unique customers </strong>in a segment are the main metrics they look at.  Digging deeper you figure that the sales guys are likely to want <strong>analysis by product, product category/class, brand, store location (city, state, region, country, &#8230;), customer demographics, and also by individual promotions and promotion categories</strong>.  It may not be explicitly mentioned, but the metrics would also be analyzed by time (day, week, month, quarter, &#8230;)</p>
<p>Now that you have figured out the business metrics to be measured, this gives you the facts you would need in the data mart <strong>&#8216;fact table&#8217; </strong>for calculating them.  Similarly you have figured out the potential segments for analysis, and that gives you the <strong>&#8216;dimensions&#8217; </strong>for analysis.  The &#8216;fact table&#8217; linked to the &#8216;dimension tables&#8217; makes up the <strong>&#8217;star-schema&#8217;</strong> (because of the star-like structure), also known as the data mart.</p>
<p>With this information in place, we have the high level <strong>Dimensional Model for Sales</strong>.<br />
<a href="http://opensourceanalytics.com/wordpress/wp-content/FoodMartDimensionalModelSales.PNG" target="_blank"><img src='http://opensourceanalytics.com/wordpress/wp-content/thumb-FoodMartDimensionalModelSales.PNG' alt='Dimensional Model for Sales Cube at FoodMart' /></a></p>
<p>Sales_Fact_1998 is the main fact table that has sales information by store/location, product, time, customer, and promotion.  Correspondingly there are 5 dimension tables joined to the fact table through foreign keys in the star-schema.</p>
<p>The dimension tables in turn have detailed data that can now be used for <strong>defining ad-hoc analysis segments</strong>.  For example, we can put demographic filters on the customer dimension (say age&lt;30, married, college-educated), choose specific product class(es) in the product table (say Dairy Products), specify a limited time period, and then get our metrics calculated for the ad-hoc segment.</p>
<p>The image below shows the detailed information available in the dimension tables for defining ad-hoc segments.<br />
<a href="http://opensourceanalytics.com/wordpress/wp-content/FoodMartSalesCubeData.PNG" target="_blank"><img src='http://opensourceanalytics.com/wordpress/wp-content/thumb-FoodMartSalesCubeData.PNG' alt='Dimensional Data for analysis in Sales Cube' /></a></p>
<p>You can <a href="http://opensourceanalytics.com/wordpress/wp-content/FoodMartDatabaseModel.xml" title='Sales Star Schema, and FoodMart database schema' target="_blank">download the data model here</a>, and then open the saved model using DB Designer in DecisionStudio Professional (Start -> Program Files -> DecisionStudio Professional -> Data Analyst -> DB Designer Workbench).  You can see other tables in the FoodMart database by scrolling around on the canvas (scroller in top-right corner).  </p>
<p>Do note that our dimensional model for sales covers only a small relevant set of tables from the entire FoodMart database.  You can load the entire downloaded FoodMart data into MySQL <a href="http://decisionstudio.com/wiki/doku.php?id=restoring_foodmart_data" target="_blank">as outlined here</a>, and can query on the data using Query Browser (Start -> Program Files -> DecisionStudio Professional -> Data Analyst -> MySQL Query Browser).  </p>
]]></content:encoded>
			<wfw:commentRss>http://opensourceanalytics.com/2006/04/28/sales-data-mart-dimensional-model-for-retail/feed/</wfw:commentRss>
		<slash:comments>33</slash:comments>
		</item>
		<item>
		<title>BI &#8211; An Abstraction Architecture</title>
		<link>http://opensourceanalytics.com/2006/03/27/bi-an-abstraction-architecture/</link>
		<comments>http://opensourceanalytics.com/2006/03/27/bi-an-abstraction-architecture/#comments</comments>
		<pubDate>Mon, 27 Mar 2006 06:35:11 +0000</pubDate>
		<dc:creator>Nishith</dc:creator>
				<category><![CDATA[BI, Data Mining, Analytics]]></category>
		<category><![CDATA[Data Warehousing]]></category>

		<guid isPermaLink="false">http://opensourceanalytics.com/2006/03/27/bi-an-abstraction-architecture/</guid>
		<description><![CDATA[While we may all differ on the definitions of BI, we do know that it is all about extracting and delivering specific and useful information in the midst of the data-explosion around us.  And all the definitions and implementations, in their own ways, are geared towards that objective.
Margaret Dunham, the author of Data Mining: [...]]]></description>
			<content:encoded><![CDATA[<p>While we may all differ on the definitions of BI, we do know that it is all about extracting and delivering specific and useful information in the midst of the data-explosion around us.  And all the definitions and implementations, in their own ways, are geared towards that objective.</p>
<p>Margaret Dunham, the author of <a href="http://www.amazon.com/exec/obidos/ASIN/0130888923/opensourceana-20">Data Mining: Introductory and Advanced Topics</a> once said:</p>
<blockquote><p>Data mining research and practice is in a state similar to that of databases in the 1960s.</p></blockquote>
<p><span id="more-38"></span><br />
That is true, but will have to change.  Databases have long been standardized, and data-mining &#038; BI would get standardized in due course.  Till then, I guess, we&#8217;ll have to do with whatever information is available, ignore the hype, and take the plunge for defining our own solutions.</p>
<p>So what does it take to deliver upon the BI promise?  Data, for sure.  And then you need a lot of meta data as well to make sense out of the data.  And most importantly, the context for analysis &#8211; the what, why, and how of the effort.  In today&#8217;s implementations, the context is implicitly identified and built into the solution, which may not be flexible enough to meet the changing needs of the users.</p>
<p>As an excellent effort in the right direction, <a href="http://blogs.ittoolbox.com/bi/confessions/">Dratz</a> recently published a white-paper outlining why BI should be viewed as an Abstraction Architecture for Information, and how it should be structured.  And why a flexible BI architecure should have be the ability to apply dynamic contexts at run-time for the conversion of data into meaning.  He proposes a very interesting piece therein: a Transaction Clearinghouse that would house the data till all pieces of an enterprise transaction (possibly spanning across multiple systems) are available, and would apply contexts dynamically to create meaning out of the data.</p>
<p>Even if you aren&#8217;t sure what all this means, do read <a href="http://blogs.ittoolbox.com/bi/confessions/archives/008241.asp">Introduction to an Abstraction Architecture for BI white paper</a> by Dratz.  It will give you a very good overview of what BI is, why it is needed, and how the vision could be actualized.  </p>
<p>In case you&#8217;d like to first read a little bit of background about the white paper, you can read his post <a href="http://blogs.ittoolbox.com/bi/confessions/archives/007871.asp">Abstraction Architectures&#8230;or, Why Everyone Should be thinking of BI.</a></p>
<p>Do let me know what you think?</p>
]]></content:encoded>
			<wfw:commentRss>http://opensourceanalytics.com/2006/03/27/bi-an-abstraction-architecture/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data Mart vs Data Warehouse &#8211; The Great Debate</title>
		<link>http://opensourceanalytics.com/2006/03/14/data-mart-vs-data-warehouse-the-great-debate/</link>
		<comments>http://opensourceanalytics.com/2006/03/14/data-mart-vs-data-warehouse-the-great-debate/#comments</comments>
		<pubDate>Tue, 14 Mar 2006 09:26:34 +0000</pubDate>
		<dc:creator>Nishith</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>

		<guid isPermaLink="false">http://opensourceanalytics.com/2006/03/14/data-mart-vs-data-warehouse-the-great-debate/</guid>
		<description><![CDATA[There are far too many conflicting and confusing definitions of Data Mart and Data Warehouse floating around.  The long running debate between Ralph Kimball and Bill Inmon, the two Titans of Data Warehousing, only adds to the confusion.  
In this post, we&#8217;ll try to get some sanity around the concepts, without getting drawn [...]]]></description>
			<content:encoded><![CDATA[<p>There are far too many conflicting and confusing definitions of Data Mart and Data Warehouse floating around.  The long running debate between Ralph Kimball and Bill Inmon, the two Titans of Data Warehousing, only adds to the confusion.  </p>
<p>In this post, we&#8217;ll try to get some sanity around the concepts, without getting drawn (hopefully) into the crossfire.</p>
<p><span id="more-35"></span></p>
<blockquote><p>A <strong>Data Mart</strong> is a specific, subject oriented, repository of data designed to answer specific questions for a specific set of users.  So an organization could have multiple data marts serving the needs of marketing, sales, operations, collections, etc.  A data mart usually is organized as one dimensional model as a star-schema (OLAP cube) made of a fact table and multiple dimension tables.
</p></blockquote>
<blockquote><p>In contrast, a <strong>Data Warehouse (DW)</strong> is a <em>single organizational repository</em> of enterprise wide data across many or all subject areas.  The Data Warehouse is the authoritative repository of all the fact and dimension data (that is also available in the data marts) at an atomic level.
</p></blockquote>
<p>Unfortunately, this is where consensus begins to break down into chaos.  There are two broad schools of thought lead by Kimball and Inmon that disagree on the details.</p>
<p><strong>Kimball School:</strong><br />
Ralph Kimball began with the Data Mart as a dimensional model for departmental data and viewed the <strong>Data Warehouse as the enterprise wide collection of Data Marts</strong>.  This is the <strong>bottom-up approach</strong>.  You may begin with the Sales Data Mart, after sometime you put in place the Ops Data Mart, and so on an so forth.  If you want you could have even more specific Data Marts serving specific questions like customer Churn.  If you take care of consistency of metadata (making sure each departmental Data Mart calls an Apple an Apple) and connectivity, you have a Data Warehouse.  So the Data Warehouse is really a <em>virtual </em>collection of Data Marts collected together on a Data Warehouse Bus, and in that sense <em>the data flows from multiple Marts into the Warehouse.</em></p>
<p><strong>Inmon School:</strong><br />
Inmon&#8217;s approach is the exact opposite and avoids the problem of metadata consistency by looking at the Enterprise Data Warehouse as a single repository that feeds subject oriented Data Marts.  You still have your Sales, Marketing, Ops and Churn Data Marts containing atomic or aggregated information, but they are based on the Data Warehouse and are really subsets of the data contained therein.  This is the <strong>top-down approach</strong>.  </p>
<p>Kimball&#8217;s approach is easier to implement as you are dealing with smaller subject areas to begin with, but the end result often has meta data inconsistencies and can be a nightmare to integrate.  Inmon&#8217;s approach, on the other hand does not defer the integration and consistency issues, but takes far longer to implement (which makes it easier for the project to fail).  Also, in my experience, organizations that are just starting to do analytics usually do not have the patience or commitment required for Inmon&#8217;s approach.</p>
<p>Any BI initiative is extremely iterative in nature.  Unless you are confident that you would still have the CEO&#8217;s buy-in and a budget one year down the line, it might be better to begin with a Data Mart (to start delivering, and to manage expectations) keeping the meta data consistency requirements in mind, and then scale towards the Data Warehouse.</p>
<p>If you are interested in knowing more about the Great Debate, you can read an article by Katherine Drewek called <a href="http://www.b-eye-network.com/view/743">&#8220;Data Warehousing: Similarities and Differences of Inmon and Kimball&#8221;</a>.</p>
<p>P.S.:  This is what I think, and I&#8217;ll be glad to hear of your experiences and views,  specially if you have been a part of an Inmon style implementation.</p>
]]></content:encoded>
			<wfw:commentRss>http://opensourceanalytics.com/2006/03/14/data-mart-vs-data-warehouse-the-great-debate/feed/</wfw:commentRss>
		<slash:comments>41</slash:comments>
		</item>
		<item>
		<title>DecisionStudio Professional &#8211; Desktop BI Platform</title>
		<link>http://opensourceanalytics.com/2006/02/28/decisionstudio-professional-desktop-bi-platform/</link>
		<comments>http://opensourceanalytics.com/2006/02/28/decisionstudio-professional-desktop-bi-platform/#comments</comments>
		<pubDate>Tue, 28 Feb 2006 17:39:34 +0000</pubDate>
		<dc:creator>Nishith</dc:creator>
				<category><![CDATA[BI, Data Mining, Analytics]]></category>
		<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[DecisionStudio-Professional]]></category>
		<category><![CDATA[Modeling]]></category>
		<category><![CDATA[On Your Own]]></category>
		<category><![CDATA[Open Source Analytics]]></category>
		<category><![CDATA[Reporting]]></category>

		<guid isPermaLink="false">http://opensourceanalytics.com/2006/02/28/decisionstudio-professional-desktop-bi-platform/</guid>
		<description><![CDATA[On Friday we released DecisionStudio Professional &#8211; a comprehensive and free desktop BI Platform that gives you all the tools needed for analytics under a single package licensed under GNU Public License (GPL).
DecisionStudio Professional (DSP) is an advanced graphical data mining, reporting, modeling, and analysis environment built on top of the best-of-breed open source projects. [...]]]></description>
			<content:encoded><![CDATA[<p>On Friday we released <a href="http://decisionstudio.com/product" target="_blank"><strong>DecisionStudio Professional</strong></a> &#8211; a comprehensive and free <strong>desktop BI Platform </strong>that gives you all the tools needed for analytics under a single package licensed under <strong>GNU Public License (GPL).</strong></p>
<p>DecisionStudio Professional (DSP) is an advanced <strong>graphical data mining, reporting, modeling, and analysis environment </strong>built on top of the best-of-breed open source projects.  Some of these include:<br />
      &#8212;  <strong>Optimized MySQL database </strong>as data warehouse platform<br />
      &#8212;  <strong>SQL Workbench</strong> (MySQL Query Browser and DBDesigner) for Data Analysts<br />
      &#8212;  <strong>R environment </strong>for statistical analysis and modeling<br />
      &#8212;  <strong>iReport </strong>Reporting GUI and <strong>JasperReport </strong>reporting library<br />
      &#8212;  <strong>Python </strong>with <strong>Boa Constructor IDE </strong>for application and GUI development</p>
<p>DecisionStudio Professional is the only <strong>end-to-end open source analytics platform </strong>that provides comprehensive capabilities to each role.  Data Analysts get to store, process, and publish data on a standard MySQL platform; Reporting Analysts would like iReport and the integration with Office tools; and Modelers would love the excellent R Environment.  It also includes Python along with a drag-n-drop GUI building environment for analytics Application Developers.</p>
<p>You can <a href="http://decisionstudio.com/product" target="_blank"><strong>find out more about DecisionStudio Professional at decisionstudio.com</strong></a>, and can <a href="https://sourceforge.net/projects/ds-professional" target="_blank"><strong>download your copy at Sourceforge.net</strong></a>.   <a href="http://decisionstudio.com/site/wp-content/decisionstudio-professional.pdf" target="_blank">Click here to download the product brochure (PDF).</a> </p>
<p>Go ahead, it&#8217;s completely free and will always stay so.  <img src='http://opensourceanalytics.com/wordpress/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://opensourceanalytics.com/2006/02/28/decisionstudio-professional-desktop-bi-platform/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>ETL: Extract, Transform, Load</title>
		<link>http://opensourceanalytics.com/2006/01/29/etl-extract-transform-load/</link>
		<comments>http://opensourceanalytics.com/2006/01/29/etl-extract-transform-load/#comments</comments>
		<pubDate>Sat, 28 Jan 2006 20:40:49 +0000</pubDate>
		<dc:creator>Nishith</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>

		<guid isPermaLink="false">http://opensourceanalytics.com/?p=24</guid>
		<description><![CDATA[Happy New Year, and back to work.
The Data Warehouse is the foundation of any analytics initiative.  You take data from various data sources in the organization, clean and pre-process it to fit business needs, and then load it into the data warehouse for everyone to use.  This process is called ETL which stands [...]]]></description>
			<content:encoded><![CDATA[<p>Happy New Year, and back to work.</p>
<p>The Data Warehouse is the foundation of any analytics initiative.  You take data from various data sources in the organization, clean and pre-process it to fit business needs, and then load it into the data warehouse for everyone to use.  This process is called ETL which stands for &#8216;Extract, transform, and load&#8217;.<br />
<span id="more-24"></span><br />
While the extraction (from files and/or databases) and load (into the DW) are pretty straightforward, the real juice of ETL is in the data transformations where the data is converted into a more business usable format.  Typical Transformation tasks involve:<br />
    * Choosing the specific columns out of the available data<br />
    * Relating data from multiple sources<br />
    * Translating and/or encoding values (e.g. storing a CityID in the DW instead of the names to conserve storage and ensure integrity)<br />
    * Encoding free-form values from legacy systems (there can be a lot of valuable information in some of those memo fields)<br />
    * Deriving a new calculated values and Summarizing multiple rows of data </p>
<p>You can either write a specific processor (Python is a good choice as it aids rapid prototyping, is quite readable, and is easier to maintain) or can use an ETL Tool, a software application designed to aid the process.  While ETL tools provide ease of use and rapid deployment, they often assume that the source data is cleaner, better organized, and less quirkier than it usually is.  If there are a lot of quirks in the data, you might be better off writing one on your own.  Some ETL tools provide a plugin architecture to handle this better.</p>
<p>Some of the ETL tools from the open source domain are listed below:<br />
1)  <a href="http://www.kettle.be/">Kettle</a><br />
2)  <a href="http://octopus.objectweb.org/">Octopus</a><br />
3)  <a href="http://cloveretl.berlios.de/">Clover ETL</a></p>
<p>And a good blog post listing numerous Java based options is <a href="http://www.manageability.org/blog/stuff/open-source-etl/view">http://www.manageability.org/blog/stuff/open-source-etl/view</a>.</p>
<p>Please do let me know if I have missed some significant ones out.  If you have used any of these, we&#8217;ll all appreciate if you can comment upon your experiences. </p>
]]></content:encoded>
			<wfw:commentRss>http://opensourceanalytics.com/2006/01/29/etl-extract-transform-load/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Your App is not your Data Warehouse</title>
		<link>http://opensourceanalytics.com/2005/11/14/your-app-is-not-your-data-warehouse/</link>
		<comments>http://opensourceanalytics.com/2005/11/14/your-app-is-not-your-data-warehouse/#comments</comments>
		<pubDate>Sun, 13 Nov 2005 20:01:04 +0000</pubDate>
		<dc:creator>Nishith</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>

		<guid isPermaLink="false">http://opensourceanalytics.com/2005/11/14/your-app-is-not-your-data-warehouse/</guid>
		<description><![CDATA[As I was saying in an earlier post (Database vs, Data Warehouse), your Application database is not  your Data Warehouse for the simple reason that your application database was never designed to answer queries.  Your app is a jumble of tables interlinked to define the entities in your business and their relationships (usually [...]]]></description>
			<content:encoded><![CDATA[<p>As I was saying in an earlier post (<a href="http://opensourceanalytics.com/2005/11/02/database-vs-data-warehouse/">Database vs, Data Warehouse</a>), your Application database <em><strong>is not </strong></em> your Data Warehouse for the simple reason that your application database was never designed to answer queries.  Your app is a jumble of tables interlinked to define the entities in your business and their relationships (usually depicted in an Entity-Relationship Diagram).  You just have to look at an Entity-Relationship Diagram to know that its not an easy job trying to get some queries and answers out of it.  (You can see an example in the article linked to at the end of this post).</p>
<p>To design a good Data Warehouse that actually answers user queries effectively you need to do what is known as <strong>Dimensional Modeling</strong>.   <span id="more-15"></span></p>
<p>In a DW, the data is kept in a standard, intuitive framework or structure that facilitates high performance access.  This usually includes a <em>Fact Table </em> containing aggregatable facts (such as sum of sales amount) linked to one or more smaller <em>Dimension Tables</em> containing descriptive information (like time, region, product, etc. that you may want to analyze the facts by) in a star like structure called a <em>Star Schema</em>.   This is exactly how your application database is <em>not structured</em>.  </p>
<p>The process of identifying the facts and dimensions for a business scenario and designing the corresponding Data Warehouse is called Dimensional Modeling.  Ralph Kimball&#8217;s <a href="http://www.amazon.com/exec/obidos/redirect?link_code=ur2&amp;tag=opensourceana-20&amp;camp=1789&amp;creative=9325&amp;path=tg/detail/-/0471200247/qid=1131912654/sr=2-1/ref=pd_bbs_b_2_1?v=glance%26s=books">The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling</a><img src="http://www.assoc-amazon.com/e/ir?t=opensourceana-20&amp;l=ur2&amp;o=1" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" />is an excellent text that is frequently referred to by amateurs and professionals alike.  It is about <em>how </em>to actually design and build a repository that will deliver real value to real people.  (If you still don&#8217;t have this book yet, please stop wasting your time and go get your copy now).</p>
<p>And while you go get that book, do read this article, <a href="http://www.dbmsmag.com/9708d15.html">an original classic by Ralph Kimball (from way back in August 1997) called the Dimensional Modeling Manifesto.  </a><br />
This is the article that sets the stage (and the manifesto) for the ideas developed by Ralph in his book.  And like good wine, its relevance has only increased with time.  </p>
<p>Read it, and then go get that book.</p>
]]></content:encoded>
			<wfw:commentRss>http://opensourceanalytics.com/2005/11/14/your-app-is-not-your-data-warehouse/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
