22 Jul
Posted by ProCOM
on July 22, 2007 – 12:45 pm - 364 views
If you're new here, you may want to subscribe to my RSS feed. So that you can read the latest updates about Web2.0 tools, Making Money Online, Tips in SEO, Ajax and many more. Thanks for visiting ProgramimiCOM!
Sleek, simple, and efficient, DB2 9 Data Warehouse Edition promises a modern enterprise warehouse without the hassle.
By Bob Rumsby
Analysts across many industries — retail, banking, insurance, healthcare, telecommunications, government — need timely, accurate, and new information about their businesses to help them prosper in the marketplace. Retail companies, for example, collect huge amounts of transactional data about daily sales activity. But simply having that information isn’t enough. They need to know why sales have dropped for a particular brand or store and how to promote new products in the most cost-effective way.
Data warehousing project managers have tried to address these requirements for a long time — at great cost. They have had to manage the purchasing, maintenance, and learning curve of pulling together multiple products from multiple vendors. What if the software infrastructure and tools to solve these kinds of business problems were all under one roof? What if customers could:
DB2 Data Warehouse Edition Version 9.1 (available since mid-2006 on Linux, Unix, and Windows platforms) responds to these questions by providing the tools to build and maintain DB2 enterprise data warehouses. DWE 9.1 also supports analytic structures required for data mining and online analytic processing (OLAP) and the development of embedded analytics for business access on the Web.
DWE 9.1 brings many functional components together and exposes many of their functions through two common client tools: the Design Studio and the Administration Console. DWE components include:
The Eclipse-based Design Studio is an integrated graphical development environment for designing:
The Eclipse Workbench provides some inherent benefits, including import and export capabilities for project management, online update features for new plug-ins, a dynamic help system, and the ability to integrate a ClearCase or CVS-based version control system.
The Administration Console is the DWE Web client, providing a common administration environment for data warehouse applications. The console is hosted by WebSphere Application Server and recognizes data sources and system resources already defined in the WebSphere environment.
Applications and models that warehouse architects have developed and tested in the Design Studio are deployed to WebSphere, where administrators and operators can schedule and monitor jobs on production systems.
These two new tools make key tasks much easier for warehouse architects and administrators. I’ll explain how these tools make some key tasks easier.
RDA-based physical data modeling features provide the metadata foundation for warehouse sources and targets. You can reverse engineer a model from either a DDL script or a database connection. You can also create new database objects, add constraints, and compare the structure of different objects.
After creating or reverse-engineering the model, you can drag and drop tables and other objects into data flows and mining flows. You can also connect to live data sources via JDBC, sample the contents of tables, and deploy generated DDL scripts (see Figure 1).
Data warehouse architects use the SQL Warehousing Tool (SQW) to build and maintain database tables and other analytic structures. Design functions are exposed to users in the Design Studio; deployment and scheduling functions are exposed in the Administration Console. The driving force behind these functions is a SQL code generation system optimized for the DB2 engine.
SQW data flows provide an extensive set of data transformation “operators.” The core transformations correspond to SQL operations, such as table joins, UNION, and GROUP BY. Several operators offer specialized warehouse building capabilities, such as slowly changing dimension updates, key lookups on dimension tables, and fact key replace operations that update surrogate keys in a warehouse fact table (see Figure 2).
Data flows are easy to develop as graphical free-form assemblies of operators, with specific properties defined at each step (see Figure 3). Reusable “subflows” can be created to capture units of work that can be shared across flows; interim results can be staged at different points in a flow; and the generated code can be tested and inspected before production use. You can define variables for properties such as schemas, tables, and files, then supply the correct information at runtime, based on the specific requirements of the job.
You can set properties to take advantage of the Database Partitioning Feature for expensive operations such as select distinct processing and key lookups. SQW can generate bulk loads from flat files or relational sources, using the native DB2 LOAD and MERGE utilities. For example, the code in Listing 1 uses a SQL merge operator to perform an “upsert” operation into a fact table based on two conditions that define the criteria for updating and inserting rows.
Before deployment, data flows are packaged inside control flows, which dictate the sequence in which data flows run and define error handling, iterative processing, and other activities (see Figure 4). You can run mining flows, SQL scripts, and DB2 stored procedures; embed WebSphere DataStage ETL jobs that are run by a DataStage server; and invoke DB2 utilities such as REORG and RUNSTATS.
Control flows are deployed inside data warehouse applications on the WebSphere Application Server machine, making them visible to the Administration Console for scheduling. Role-based users can see and work with applications in different ways. Each control flow becomes a manageable process, and each process instance that you schedule is equivalent to a job (see Figure 5).
Who benefits from using SQW? Here’s a short list:
SQW supports a full life cycle of warehouse building tasks, from physical data modeling through to the scheduling and monitoring of repeatable processes that maintain the performance and business access layers of the warehouse. Data mining and multidimensional cube objects, as well as any other in-database structures, can be reliably staged and maintained.
Data mining discovers patterns and unexpected insights that are hidden in large data sets. You can interactively create and visualize a mining model in the Design Studio to gain valuable insights about the data in your organization’s warehouse. You can generate SQL code to compute a mining model or to deploy the model’s related scoring function.
DWE 9.1’s Intelligent Miner component offers a framework for building and visualizing data mining models. These models can make use of several different data mining functions:
The integration of data modeling, SQL-based data flows, and mining flows in a common data warehouse project makes data mining in the Design Studio less of a technological challenge. Users who are familiar with data flows use the same drag-and-drop interface to build mining flows without writing code or learning complex statistical algorithms (see Figure 6).
For preprocessing steps, mining flows use many of the same operators as data flows, such as table sources, order by operators, and joins (see Figure 7).
The interactive flow editor provides stepwise execution, and data exploration features present an overview of the content and quality of your data. You can work from design through to the development of embedded applications that produce reports based on mining models.
You can use Cube Views OLAP modeling features directly in the Design Studio to create metadata that describes your data in a multidimensional model or cube model. This metadata can also be used to create recommendations for materialized query tables, which contain aggregated data for query optimization.
In the Design Studio, the foundation for a cube model can be the same physical data model that you use to build data flows and mining flows. Based on the physical model, you can define all of the OLAP objects you need, including fact and dimension objects, measures, and hierarchies. You can also import and export OLAP metadata.
DB2 Alphablox is a custom development platform for BI applications that are visible to business executives and analysts through web pages and portals. You can create customized analytic components that are embedded into existing business processes and web applications. Business users depend on these portals and can take real actions based on the analysis of large volumes of warehouse data.
Developers can assemble applications by customizing modular, reusable Blox components that provide solutions for data access, user interfaces, HTML forms, business logic, and more. The DB2 Alphablox programming model consists of:
Developers don’t need to know the technical details behind the Blox components — just the syntax and functionality that they provide. Developers can build a DB2 Alphablox cube based on Cube Views metadata and use the cube as the foundation for building and viewing custom reports. They can also generate dynamic, interactive reports from relational data sources — for example, an application that invokes a data mining model, extracts data from the model, and displays reports in a portal.
A DWE 9.1.1 refresh available this quarter upgrades the DWE package to DB2 9 and addresses requests from early DWE 9.1 adopters. DB2 9 includes a number of critical enhancements in its role as the underlying engine for DWE, including self-tuning memory management, range partitioning, and row compression. To learn more about these features, search for “Software Announcement 206-128” on ibm.com.
DWE 9.1 removes complexity as a barrier to enterprise data warehousing. Measurable benefits include time to value, managed cost for warehouse-based analytics, and avoidance of multiple purchase justifications for BI tools. In this regard, DWE 9.1 is closely aligned with the IBM Information On Demand initiative, providing customers with reliable information that is readily available and easy to manage throughout the data warehousing life cycle. Customers can use this information to make fast, strategic business decisions and sustain success in the marketplace.
MERGE INTO DB2ADMIN.NEW_PRCHS_PROFILE
USING (
SELECT
STR_IP_ID AS STR_IP_ID,
PD_ID AS PD_ID,
TIME_ID AS TIME_ID,
NMBR_OF_MRKT_BSKTS AS NMBR_OF_MRKT_BSKTS,
NUMBER_OF_ITEMS AS NUMBER_OF_ITEMS,
PRDCT_BK_PRC_AMUNT AS PRDCT_BK_PRC_AMUNT,
CST_OF_GDS_SLD_CGS AS CST_OF_GDS_SLD_CGS,
SALES_AMOUNT AS SALES_AMOUNT
FROM
DB2ADMIN.STAGE_PRCHS_PROFILE INPUT_051) INPUT_01
ON (DB2ADMIN.NEW_PRCHS_PROFILE.STR_IP_ID = INPUT_01.STR_IP_ID)
WHEN MATCHED AND DB2ADMIN.NEW_PRCHS_PROFILE.SALES_AMOUNT > 10000 THEN
UPDATE SET
(STR_IP_ID, PD_ID, TIME_ID, NMBR_OF_MRKT_BSKTS, NUMBER_OF_ITEMS, PRDCT_BK_PRC_AMUNT, CST_OF_GDS_SLD_CGS, SALES_AMOUNT)
= (INPUT_01.STR_IP_ID, INPUT_01.PD_ID, INPUT_01.TIME_ID, INPUT_01.NMBR_OF_MRKT_BSKTS, INPUT_01.NUMBER_OF_ITEMS,
INPUT_01.PRDCT_BK_PRC_AMUNT, INPUT_01.CST_OF_GDS_SLD_CGS, INPUT_01.SALES_AMOUNT)
WHEN NOT MATCHED THEN
INSERT
(STR_IP_ID, PD_ID, TIME_ID, NMBR_OF_MRKT_BSKTS, NUMBER_OF_ITEMS, PRDCT_BK_PRC_AMUNT, CST_OF_GDS_SLD_CGS, SALES_AMOUNT)
VALUES (INPUT_01.STR_IP_ID, INPUT_01.PD_ID, INPUT_01.TIME_ID, INPUT_01.NMBR_OF_MRKT_BSKTS, INPUT_01.NUMBER_OF_ITEMS, INPUT_01.PRDCT_BK_PRC_AMUNT, INPUT_01.CST_OF_GDS_SLD_CGS, INPUT_01.SALES_AMOUNT)
ELSE IGNORE
| |
22 Jul
Posted by ProCOM
on July 22, 2007 – 12:33 pm - 767 views
Combining new technologies with traditional business intelligence techniques can fulfil the promise of positive business outcomes.
By Michael L. Gonzales
Powerful transaction-oriented information systems are commonplace in every major industry, effectively leveling the playing field. Getting ahead of the competition now requires analysis-oriented systems that can revolutionize an organization’s ability to use information it already owns. These analytic systems derive insight from available data and deliver information that’s conclusive, fact-based, and actionable.
| |
Business intelligence (BI) improves corporate performance in any information-intensive industry. Most companies have the raw data and the people knowledge BI requires. Operational systems generate vast quantities of product, customer, and market data from point-of-sale, reservations, customer service, and technical support systems. But the challenge is to extract and exploit this data, transforming it into information and actionable insight.
Many companies take advantage of only a small fraction of their data for strategic analysis, let alone tactical applications. The remaining untapped data — often combined with data from external sources such as government reports, trade associations, analysts, the Internet, and purchased information — is a gold mine waiting to be explored, refined, and shaped into information. This knowledge can be applied in a number of ways, ranging from charting overall corporate strategy to communicating personally with vendors, suppliers, and customers through call centers, kiosks, billing statements, the Internet, and other touch points that facilitate genuine one-to-one marketing on an unprecedented scale.
Today’s business climate requires the BI environment to evolve beyond the implementation of traditional data warehousing tools and techniques. A fusion of traditional and advanced technologies is necessary to support a broad analytical landscape and serve up a rich blend of historical trending, real-time reporting, and predictive analytics. Finally, the overall environment must improve the knowledge and performance of the enterprise as a whole, ensuring that actions taken as a result of analysis are fed back into the environment.
Every BI environment, to achieve real and significant impact, must be designed and built in the context of the business value that it provides. The value connection is expressed in terms of business capabilities that will be provided or improved through intelligence. Figure 1 shows the BI value chain, which turns data into information. To make this transformation happen, BI architects need to fulfill these requirements:
How actionable your information is depends on the form in which it’s provided and the kinds of information services or business information capabilities that are offered. It’s critical to match the information services with the needs, knowledge, and abilities of information consumers.
Getting business value from BI requires a view of business domains that are within the scope of the BI effort. Value-creation opportunities may arise from many business domains, including strategy and planning, financial management, research and development, marketing, sales customer support, operations, human resources, information systems, and corporate governance.
When most companies think about using BI to support better business decisions, they’re usually considering it from the perspective of strategic decision-making. However, as the BI space matures in technique and technology (and user demands grow), BI continues to evolve. Today, there is significant attention and interest in supporting tactical decision-making as well. But strategic and tactical aren’t the only types of decisions made in an organization. Many argue that there’s a third type: operational decision-making. Table 1 shows definitions and examples for the three categories.
The end game for BI isn’t simply exposing actionable information and insight. It’s also ensuring that action is taken to improve business performance. The challenge is not only what information to combine with what knowledge, but also how to ensure action.
For strategic and tactical decisions, any action taken requires human intervention. Curiously, there may be considerably less effort on the BI team to service these types of decisions. The BI team may need to build a data store and install tools that allow users to perform their own research for insight (an OLAP cube implemented for a subject matter expert, for example). The BI team often focuses only on providing the cube and tools and leaves the user to find any actionable insight.
Operational decisions can be, and often are, automated. Figure 2 shows the direct relationship between the decision category and the amount of human intervention required. Because these decisions are often automated, more research and implementation work rests on the shoulders of the BI team.
Strategic decisions (such as those regarding expansion) have broad implications for the direction of the organization as a whole. Answers to these types of questions are rarely derived from a purely automated environment. Even when building simulation models, subject matter experts must still evaluate the results and formulate actions to be taken. Tactical decisions are focused on managing processes, such as evaluating and establishing the level of risk the organization is willing to assume for specific loan products.
Operational decisions, however, are the most fundamental. They address individual transactions (such as whether a loan is approved or not). And, they likely represent the highest number of decisions made on a day-to-day basis. It’s precisely for that reason that operational decision-making can and should be targeted for automation.
In order to provide BI value, business architects must understand the kinds of decisions made in organizations, including strategic, tactical, and operational.
Each category provides clues as to the type of action process that’s feasible. Strategic and tactical decisions are often best suited with some human intervention. Once a decision has been made, it’s possible that the action process is a composite of several disparate adjustments to operations. On the other hand, operational decisions can often be fully automated and the subsequent actions made part of an inline process.