Quantcast

DB2 9 Data Warehouse Edition: Information Under One Roof

(No Ratings Yet)
Loading ... Loading ...

If you're new here, you may want to subscribe to my RSS feed. So that you can read the latest updates about Web2.0 tools, Making Money Online, Tips in SEO, Ajax and many more. Thanks for visiting ProgramimiCOM!

Sleek, simple, and efficient, DB2 9 Data Warehouse Edition promises a modern enterprise warehouse without the hassle.
By Bob Rumsby

Analysts across many industries — retail, banking, insurance, healthcare, telecommunications, government — need timely, accurate, and new information about their businesses to help them prosper in the marketplace. Retail companies, for example, collect huge amounts of transactional data about daily sales activity. But simply having that information isn’t enough. They need to know why sales have dropped for a particular brand or store and how to promote new products in the most cost-effective way.

Data warehousing project managers have tried to address these requirements for a long time — at great cost. They have had to manage the purchasing, maintenance, and learning curve of pulling together multiple products from multiple vendors. What if the software infrastructure and tools to solve these kinds of business problems were all under one roof? What if customers could:

  • Adopt a single design and deployment tool for data warehouse and OLAP development
  • Eliminate custom-developed SQL scripts that maintain warehouse tables
  • Simplify and accelerate the discovery of new business insights from data mining
  • Create interactive portals, with custom reports based on relational warehouse data.

DB2 Data Warehouse Edition Version 9.1 (available since mid-2006 on Linux, Unix, and Windows platforms) responds to these questions by providing the tools to build and maintain DB2 enterprise data warehouses. DWE 9.1 also supports analytic structures required for data mining and online analytic processing (OLAP) and the development of embedded analytics for business access on the Web.

DWE 9.1 brings many functional components together and exposes many of their functions through two common client tools: the Design Studio and the Administration Console. DWE components include:

  • DB2 UDB Enterprise Server
  • Database Partitioning Feature (DPF)
  • WebSphere Application Server
  • SQL Warehousing Tool
  • Rational Data Architect (RDA) physical data modeling
  • Intelligent Miner
  • Cube Views (OLAP)
  • DB2 Alphablox
  • DB2 Query Patroller.

DWE Design Studio

The Eclipse-based Design Studio is an integrated graphical development environment for designing:

  • Physical data models, including DDL and SQL generation and deployment features, JDBC connectivity to data sources and targets, comparison and synchronization of database objects, and basic metadata support for modeling SQL-based data flows and mining flows (functions derive from the RDA tool set)
  • SQL-based flows, including both data flows that perform transformation functions inside the DB2 database and mining flows that perform easy data mining discovery and visualization
  • Control flows, which sequence data flows and mining flows, integrate external commands, programs, and stored procedures, and provide conditional processing logic
  • OLAP cube models, including recommendations for materialized query tables (MQTs).

The Eclipse Workbench provides some inherent benefits, including import and export capabilities for project management, online update features for new plug-ins, a dynamic help system, and the ability to integrate a ClearCase or CVS-based version control system.

DWE Administration Console

The Administration Console is the DWE Web client, providing a common administration environment for data warehouse applications. The console is hosted by WebSphere Application Server and recognizes data sources and system resources already defined in the WebSphere environment.

Applications and models that warehouse architects have developed and tested in the Design Studio are deployed to WebSphere, where administrators and operators can schedule and monitor jobs on production systems.

These two new tools make key tasks much easier for warehouse architects and administrators. I’ll explain how these tools make some key tasks easier.

Physical Data Modeling

RDA-based physical data modeling features provide the metadata foundation for warehouse sources and targets. You can reverse engineer a model from either a DDL script or a database connection. You can also create new database objects, add constraints, and compare the structure of different objects.

dbt11q4_f3_fig1.jpg

After creating or reverse-engineering the model, you can drag and drop tables and other objects into data flows and mining flows. You can also connect to live data sources via JDBC, sample the contents of tables, and deploy generated DDL scripts (see Figure 1).

SQL Warehousing Tool

Data warehouse architects use the SQL Warehousing Tool (SQW) to build and maintain database tables and other analytic structures. Design functions are exposed to users in the Design Studio; deployment and scheduling functions are exposed in the Administration Console. The driving force behind these functions is a SQL code generation system optimized for the DB2 engine.

SQW data flows provide an extensive set of data transformation “operators.” The core transformations correspond to SQL operations, such as table joins, UNION, and GROUP BY. Several operators offer specialized warehouse building capabilities, such as slowly changing dimension updates, key lookups on dimension tables, and fact key replace operations that update surrogate keys in a warehouse fact table (see Figure 2).

dbt11q4_f3_fig2.jpg

Data flows are easy to develop as graphical free-form assemblies of operators, with specific properties defined at each step (see Figure 3). Reusable “subflows” can be created to capture units of work that can be shared across flows; interim results can be staged at different points in a flow; and the generated code can be tested and inspected before production use. You can define variables for properties such as schemas, tables, and files, then supply the correct information at runtime, based on the specific requirements of the job.

dbt11q4_f3_fig3.jpg

You can set properties to take advantage of the Database Partitioning Feature for expensive operations such as select distinct processing and key lookups. SQW can generate bulk loads from flat files or relational sources, using the native DB2 LOAD and MERGE utilities. For example, the code in Listing 1 uses a SQL merge operator to perform an “upsert” operation into a fact table based on two conditions that define the criteria for updating and inserting rows.

Before deployment, data flows are packaged inside control flows, which dictate the sequence in which data flows run and define error handling, iterative processing, and other activities (see Figure 4). You can run mining flows, SQL scripts, and DB2 stored procedures; embed WebSphere DataStage ETL jobs that are run by a DataStage server; and invoke DB2 utilities such as REORG and RUNSTATS.

dbt11q4_f3_fig4.jpg

Control flows are deployed inside data warehouse applications on the WebSphere Application Server machine, making them visible to the Administration Console for scheduling. Role-based users can see and work with applications in different ways. Each control flow becomes a manageable process, and each process instance that you schedule is equivalent to a job (see Figure 5).

dbt11q4_f3_fig5.jpg

Who benefits from using SQW? Here’s a short list:

  • Existing DB2 customers who build warehouses with hand-coded SQL and stored procedures
  • New DB2 customers who have database skills and data sources that tend to be flat files and relational tables
  • Customers building a layered warehouse architecture, with a conventional ETL system populating tables from external sources at the detail level
  • Customers who want to automate tedious DBA tasks for warehouse maintenance.

SQW supports a full life cycle of warehouse building tasks, from physical data modeling through to the scheduling and monitoring of repeatable processes that maintain the performance and business access layers of the warehouse. Data mining and multidimensional cube objects, as well as any other in-database structures, can be reliably staged and maintained.

Integrated Data Mining

Data mining discovers patterns and unexpected insights that are hidden in large data sets. You can interactively create and visualize a mining model in the Design Studio to gain valuable insights about the data in your organization’s warehouse. You can generate SQL code to compute a mining model or to deploy the model’s related scoring function.

DWE 9.1’s Intelligent Miner component offers a framework for building and visualizing data mining models. These models can make use of several different data mining functions:

  • Associations. What are the affinities in the data? Which products sell together? What combination of items does your supermarket basket contain?
  • Sequences. What kinds of sequential patterns occur? What will customers buy next, given information about what they are buying now?
  • Clustering. Which groups or segments appear in the data? What kinds of similarities exist? What are my customer profiles?
  • Classification. How is the data classified and what categorical predictions can be made based on those classifications? Which types of customers are likely to default on a loan? What is the likely outcome of this treatment?
  • Regression analysis. How is the data classified and what numeric predictions can be made? How much money will each customer spend this year? How likely is it, in numeric probability terms, that a customer will buy this item?

The integration of data modeling, SQL-based data flows, and mining flows in a common data warehouse project makes data mining in the Design Studio less of a technological challenge. Users who are familiar with data flows use the same drag-and-drop interface to build mining flows without writing code or learning complex statistical algorithms (see Figure 6).

dbt11q4_f3_fig6.jpg

For preprocessing steps, mining flows use many of the same operators as data flows, such as table sources, order by operators, and joins (see Figure 7).

dbt11q4_f3_fig7.jpg

The interactive flow editor provides stepwise execution, and data exploration features present an overview of the content and quality of your data. You can work from design through to the development of embedded applications that produce reports based on mining models.

OLAP Modeling

You can use Cube Views OLAP modeling features directly in the Design Studio to create metadata that describes your data in a multidimensional model or cube model. This metadata can also be used to create recommendations for materialized query tables, which contain aggregated data for query optimization.

In the Design Studio, the foundation for a cube model can be the same physical data model that you use to build data flows and mining flows. Based on the physical model, you can define all of the OLAP objects you need, including fact and dimension objects, measures, and hierarchies. You can also import and export OLAP metadata.

DB2 Alphablox

DB2 Alphablox is a custom development platform for BI applications that are visible to business executives and analysts through web pages and portals. You can create customized analytic components that are embedded into existing business processes and web applications. Business users depend on these portals and can take real actions based on the analysis of large volumes of warehouse data.

Developers can assemble applications by customizing modular, reusable Blox components that provide solutions for data access, user interfaces, HTML forms, business logic, and more. The DB2 Alphablox programming model consists of:

  • An extensive set of JavaServer Pages (JSP) custom tag libraries
  • Server-side APIs to Blox components based on JavaBeans technology
  • Server-side access to the Blox user interface model
  • JavaScript methods to invoke server-side APIs from web browsers.

Developers don’t need to know the technical details behind the Blox components — just the syntax and functionality that they provide. Developers can build a DB2 Alphablox cube based on Cube Views metadata and use the cube as the foundation for building and viewing custom reports. They can also generate dynamic, interactive reports from relational data sources — for example, an application that invokes a data mining model, extracts data from the model, and displays reports in a portal.

DB2 9 and DWE

A DWE 9.1.1 refresh available this quarter upgrades the DWE package to DB2 9 and addresses requests from early DWE 9.1 adopters. DB2 9 includes a number of critical enhancements in its role as the underlying engine for DWE, including self-tuning memory management, range partitioning, and row compression. To learn more about these features, search for “Software Announcement 206-128” on ibm.com.

Intelligence On Demand

DWE 9.1 removes complexity as a barrier to enterprise data warehousing. Measurable benefits include time to value, managed cost for warehouse-based analytics, and avoidance of multiple purchase justifications for BI tools. In this regard, DWE 9.1 is closely aligned with the IBM Information On Demand initiative, providing customers with reliable information that is readily available and easy to manage throughout the data warehousing life cycle. Customers can use this information to make fast, strategic business decisions and sustain success in the marketplace.


LISTING 1. SQL merge code generated from a data flow.

MERGE INTO DB2ADMIN.NEW_PRCHS_PROFILE
USING (
SELECT
STR_IP_ID AS STR_IP_ID,
PD_ID AS PD_ID,
TIME_ID AS TIME_ID,
NMBR_OF_MRKT_BSKTS AS NMBR_OF_MRKT_BSKTS,
NUMBER_OF_ITEMS AS NUMBER_OF_ITEMS,
PRDCT_BK_PRC_AMUNT AS PRDCT_BK_PRC_AMUNT,
CST_OF_GDS_SLD_CGS AS CST_OF_GDS_SLD_CGS,
SALES_AMOUNT AS SALES_AMOUNT
FROM
DB2ADMIN.STAGE_PRCHS_PROFILE INPUT_051) INPUT_01
ON (DB2ADMIN.NEW_PRCHS_PROFILE.STR_IP_ID = INPUT_01.STR_IP_ID)
WHEN MATCHED AND DB2ADMIN.NEW_PRCHS_PROFILE.SALES_AMOUNT > 10000 THEN
UPDATE SET
(STR_IP_ID, PD_ID, TIME_ID, NMBR_OF_MRKT_BSKTS, NUMBER_OF_ITEMS, PRDCT_BK_PRC_AMUNT, CST_OF_GDS_SLD_CGS, SALES_AMOUNT)
= (INPUT_01.STR_IP_ID, INPUT_01.PD_ID, INPUT_01.TIME_ID, INPUT_01.NMBR_OF_MRKT_BSKTS, INPUT_01.NUMBER_OF_ITEMS,
INPUT_01.PRDCT_BK_PRC_AMUNT, INPUT_01.CST_OF_GDS_SLD_CGS, INPUT_01.SALES_AMOUNT)
WHEN NOT MATCHED THEN
INSERT
(STR_IP_ID, PD_ID, TIME_ID, NMBR_OF_MRKT_BSKTS, NUMBER_OF_ITEMS, PRDCT_BK_PRC_AMUNT, CST_OF_GDS_SLD_CGS, SALES_AMOUNT)
VALUES (INPUT_01.STR_IP_ID, INPUT_01.PD_ID, INPUT_01.TIME_ID, INPUT_01.NMBR_OF_MRKT_BSKTS, INPUT_01.NUMBER_OF_ITEMS, INPUT_01.PRDCT_BK_PRC_AMUNT, INPUT_01.CST_OF_GDS_SLD_CGS, INPUT_01.SALES_AMOUNT)
ELSE IGNORE


The author thanks Qi Jin, senior manager of DB2 Data Warehouse Edition, for his technical assistance.


Bob Rumsby has worked as an editor, technical writer, course developer, and trainer in Silicon Valley for 16 years. He is currently an information development team lead in the User Technology group at IBM SVL in San Jose, Calif.

The New Data Warehouse: From Information to Action

(No Ratings Yet)
Loading ... Loading ...

Combining new technologies with traditional business intelligence techniques can fulfil the promise of positive business outcomes.
By Michael L. Gonzales

Powerful transaction-oriented information systems are commonplace in every major industry, effectively leveling the playing field. Getting ahead of the competition now requires analysis-oriented systems that can revolutionize an organization’s ability to use information it already owns. These analytic systems derive insight from available data and deliver information that’s conclusive, fact-based, and actionable.

Business intelligence (BI) improves corporate performance in any information-intensive industry. Most companies have the raw data and the people knowledge BI requires. Operational systems generate vast quantities of product, customer, and market data from point-of-sale, reservations, customer service, and technical support systems. But the challenge is to extract and exploit this data, transforming it into information and actionable insight.

Many companies take advantage of only a small fraction of their data for strategic analysis, let alone tactical applications. The remaining untapped data — often combined with data from external sources such as government reports, trade associations, analysts, the Internet, and purchased information — is a gold mine waiting to be explored, refined, and shaped into information. This knowledge can be applied in a number of ways, ranging from charting overall corporate strategy to communicating personally with vendors, suppliers, and customers through call centers, kiosks, billing statements, the Internet, and other touch points that facilitate genuine one-to-one marketing on an unprecedented scale.
Today’s business climate requires the BI environment to evolve beyond the implementation of traditional data warehousing tools and techniques. A fusion of traditional and advanced technologies is necessary to support a broad analytical landscape and serve up a rich blend of historical trending, real-time reporting, and predictive analytics. Finally, the overall environment must improve the knowledge and performance of the enterprise as a whole, ensuring that actions taken as a result of analysis are fed back into the environment.

Business Value

Every BI environment, to achieve real and significant impact, must be designed and built in the context of the business value that it provides. The value connection is expressed in terms of business capabilities that will be provided or improved through intelligence. Figure 1 shows the BI value chain, which turns data into information. To make this transformation happen, BI architects need to fulfill these requirements:

fig1.jpg

  • Align information with the knowledge of the individuals or work groups to whom it’s provided. The information-to-knowledge connection is one that most IT people find difficult. Knowledge is unique to an individual, the product of personal experience, recall, instincts, and beliefs. When information consumers are at the executive level, it is important to align one-to-one information links. When providing information to larger groups, profile the knowledge of the target groups relative to the business, the information subjects, and skill level. This profiling is the basis for customized actionable information products and services.
  • Combine knowledge and information used to take action. The term “actionable information” is pervasive throughout BI literature. But what companies really need is not actionable information, but actionable insight. Action is a process of doing something. All too often, BI architects look only at the event and not the activities and behaviors that lead to the event. Any combination of insight, resolve, decision, and innovation may drive a person to act. Information is actionable and promotes insight when it supports the entire process of taking action. It’s the essential bridge from integrated data to positive business outcomes — the promise of BI.
  • Enable informed actions that lead to positive outcomes. Favorable business outcomes are generally those that reduce cost, save time, optimize resources, increase revenue, satisfy customers, or otherwise help to fulfill business missions and goals.

How actionable your information is depends on the form in which it’s provided and the kinds of information services or business information capabilities that are offered. It’s critical to match the information services with the needs, knowledge, and abilities of information consumers.

Getting business value from BI requires a view of business domains that are within the scope of the BI effort. Value-creation opportunities may arise from many business domains, including strategy and planning, financial management, research and development, marketing, sales customer support, operations, human resources, information systems, and corporate governance.

The Art of Decision-Making

When most companies think about using BI to support better business decisions, they’re usually considering it from the perspective of strategic decision-making. However, as the BI space matures in technique and technology (and user demands grow), BI continues to evolve. Today, there is significant attention and interest in supporting tactical decision-making as well. But strategic and tactical aren’t the only types of decisions made in an organization. Many argue that there’s a third type: operational decision-making. Table 1 shows definitions and examples for the three categories.

dbt11q4_biz_table1.jpg

The end game for BI isn’t simply exposing actionable information and insight. It’s also ensuring that action is taken to improve business performance. The challenge is not only what information to combine with what knowledge, but also how to ensure action.

For strategic and tactical decisions, any action taken requires human intervention. Curiously, there may be considerably less effort on the BI team to service these types of decisions. The BI team may need to build a data store and install tools that allow users to perform their own research for insight (an OLAP cube implemented for a subject matter expert, for example). The BI team often focuses only on providing the cube and tools and leaves the user to find any actionable insight.

Operational decisions can be, and often are, automated. Figure 2 shows the direct relationship between the decision category and the amount of human intervention required. Because these decisions are often automated, more research and implementation work rests on the shoulders of the BI team.

dbt11q4_biz_fig2.jpg

Strategic decisions (such as those regarding expansion) have broad implications for the direction of the organization as a whole. Answers to these types of questions are rarely derived from a purely automated environment. Even when building simulation models, subject matter experts must still evaluate the results and formulate actions to be taken. Tactical decisions are focused on managing processes, such as evaluating and establishing the level of risk the organization is willing to assume for specific loan products.

Operational decisions, however, are the most fundamental. They address individual transactions (such as whether a loan is approved or not). And, they likely represent the highest number of decisions made on a day-to-day basis. It’s precisely for that reason that operational decision-making can and should be targeted for automation.

Valuable Insight

In order to provide BI value, business architects must understand the kinds of decisions made in organizations, including strategic, tactical, and operational.

Each category provides clues as to the type of action process that’s feasible. Strategic and tactical decisions are often best suited with some human intervention. Once a decision has been made, it’s possible that the action process is a composite of several disparate adjustments to operations. On the other hand, operational decisions can often be fully automated and the subsequent actions made part of an inline process.


Content for this article was adapted from BI Strategy: How to Create and Document , by Michael Gonzales and David Wells, HandsOn-Press, 2006.


Michael L. Gonzales has been a BI data architecture and solutions strategist for more than a decade. He teaches a series of courses internationally through HandsOn-BI LLC and has written books on data warehousing and business intelligence.