Friday, June 5, 2009

Data Warehouse Overview

In this post I want to briefly describe what I mean by ultra-light ultra-useful data warehouse.

Our data warehouse development effort came about because our internal IT has largely failed to create a truly easy-to-use integrated data warehouse. We have numerous systems spanning patient encounters, claims, referrals, scheduling, prescriptions, labs, radiology, membership, CRM, provider credentialing, general ledger, payroll, HR and so on. We also have many different "data warehouses." We have such database/reporting technologies as Oracle, SQL Server, DB2 on Unix/mainframes, Sybase, and even a large Teradata complex with Chrystal Enterprise. Yet, in the end, reporting analysts were using Paradox and Access to cobble together integrated data sets from these source systems and existing "data warehouses" and e-mailing PDFs to the business users.

So we have created an externally-hosted SQL Server-based platform to remedy the problem. Here are its main characteristics:

1. Total database size: 480Gb
2. One production database server with SQL Server 2005 Enterprise (migrating to 2008), one production report server with Sharepoint integration, and one development DB server.
3. ETLs use Integration Services and stored procedures.
4. 2.5 FTEs manage the entire solution, excluding server and software management, which is done by the hosting provider.
5. RAD methodology is used to build dimensional models and ETLs quickly and incrementally. All ETLs are framework-driven and highly automated.

In the end, the degree of integration and performance we have achieved makes this solution highly useful, as it quickly eliminated the need for Paradox-based data integration and reporting, provided consistency among analysts, delivered Web-based self-service reports, and provided a way for a rapid response to business needs. The fact that we are building this on a SQL Server platform with 2.5 FTEs and an external hosting vendor makes this an ultra-light solution in terms of the cost to the organization.

In the upcoming posts I will document our ongoing issues and lessons learned, as well as review some interesting designs, such as our highly automated single-code-based "uber-ETL" process, technical server architecture with disk layouts, and so on.

Stay tuned.

No comments: