Presents quality processes for data warehousing from a multiplicity of perspectives, including the business, user, architect, and administrator. Includes methods and strategies for improving such data warehouses, and covers such topics as the managed query environment, analysis, administration of the IDSE, and managing dimension and event data. The accompanying CD-ROM contains schematics, code, and movable objects. Annotation c. by Book News, Inc., Portland, Or.
"synopsis" may belong to another edition of this title.
Nielson Debevoiseis a data warehousing consultant for Blue Ridge NCA Corporation, residing in Blue Ridge, VA.
With The Data Warehouse Method, enterprises can transcend yesterday's haphazard approaches to data warehousing-and achieve unprecedented strategic gains. Leading consultant Tom Debevoise synthesizes the entire field's experience into a comprehensive methodology that delivers consistency, value, and above all, quality.
Avoiding generalities, Debevoise demonstrates enterprise-class techniques in the context of 64-bit multiprocessing UNIX servers, the Oracle 8 OORDBMS, and the Business Objects Managed Query Environment. Whether you choose identical technologies or not, the resulting methodology is refreshingly specific and practical. Coverage includes:
* Strategy, analysis, design, deployment, and discovery
* Establishing and leveraging an integrated data warehouse support environment
* Use cases, object data modeling, and multidimensional analysis with UML
* Prototyping, converting logical models into physical data structures, and testing
* .Cognitive systems administration and performance management techniques
* .Why database independence and denormalization no longer make sense
The Data Warehouse Method presents the quality data warehouse from every key perspective: the business, the user, the architect, and the administrator. It previews the evolution of data warehouses into business knowledge repositories that integrate the entire enterprise. Best of all, it offers a path you can follow today to achieve maximum results with maximum efficiency.
CD-ROM INCLUDED
The accompanying CD-ROM contains data warehousing schematics, code, and movable objects that demonstrate exactly how to construct and customize and enterprise aggregate management strategy that massively improves the environment's performance.
FOREWORD
Truisms abound in information technology. The project is perpetually over budget, behind schedule, mired in requirements, and a victim of politics. In response, I have developed my own truism: six of the right people can do more than sixty. I have worked on many IT projects, large and small, and have experienced both truisms. I have found that quality is the single point of failure for any project, the quality in the gap between what a team could and did achieve.
The theme of this text is quality processes for the data warehouse. I view this opportunity as a chance to state not what is but what could be. Methodology should control the process of creation. It should be planned with project management and created with discipline.
The data warehouse project must contribute to the performance of an organization. This performance should be measurable. The time to integrate for its own sake has passed. It is time for each organization to examine its IT development processes and find which contribute to the health of the organization and which do not.
For the manager or executive, the methodology that I describe is constructed to meet the strategic objectives of your organization and create a measurable result. Most IT objectives are accomplished in cycles. I suggest that you should use the project cycle to achieve business results for your organization.
Not every project cycle will be strictly for new developments. Your infrastructure must be maintained. Upward moving events and technology requires re-hosting and re-scripting operational systems. The Intel-based work station and operating system has a very short productive life. Because most organizations have not made this distinction, a challenge for senior management is to separate the infrastructure from the business results projects.
The recent business process reengineering fad, while somewhat defunct, was useful in pointing out the age of the processes that are ingrained in today's operational systems. Many legacy systems were implemented by moving paper-based processes onto databases and screens. Somewhere in the jumble of prompts and fields resides the business knowledge of the enterprise. With or without, the enabling technology of data warehousing, the managers make decisions that keep the business afloat. Or not.
Many technical books are written to fill a basic human need: to make the impenetrable understandable. With good prose, just about any technical topic can be illuminated. From subatomic physics to the World Wide Web, there are books that beautifully explain their topics; however these are not more likely to make the reader a physicist or even a database designer.
To build the data warehouse requires a broad range of technical disciplines. Adding maturity and capability to the data warehouse team requires stretching their capabilities and challenging each member to grow. Despite the best efforts of the self-help guides, the construction of the data warehouse remains a challenging undertaking. Success requires both a capable team and a group of users willing to change their daily activities.
At the heart of the management environment is the data warehouse of the discipline of quality. Taken in isolation, quality is the gap between capability and performance. Quality is either high, with a minimal gap, or low, with larger gaps. A quality data warehouse serves the strategic intent of the organization, is created with the best available data, and is achieved at an optimal rate. Both the data available and the rates of implementation are highly dependent on your organization. If your organization has older, less integrated systems and less technical acumen you still achieve a quality data warehouse by promoting consistent methods in its creation.
Collectively, the methods that I discuss in this book enable the implementation and maintenance of a quality environment. It is intuitive that the strategic directions taken in the early phase of a project will sway the technical architecture and ultimately the quality and the performance of the system. Beyond a discussion of the activities and personnel that create the data warehouse, there are technical design approaches that should be taken in order to create a high-performance data warehouse. Since my audience in this text are the project implementers, I will need to be very explicit in my description of these technical implementations.
In choosing to describe the specific nature of integrated environments, I can focus on how the environment can be integrated and managed to provide a true solution. My discussions include UNIX Servers, relational database management systems (RDBMS), and several managed query environments. I hope that by diverting the focus from a generic attempt to providing specific solutions that a model of the characteristics and capabilities of the quality data warehouse environment will emerge.
Reuse is an object-oriented concept that makes the efforts of one project available to another; however, it's the use of the product, not it's features that promote this. Often, I find today's corporate environment a dizzying array of software products with similar capabilities, many of which are object-oriented (OO). On more than one occasion, I have been astonished to discover multiple computer aided software engineering (CASE) tools, multiple user interfaces, even multiple on line analytical processing (OLAP), and managed query environments (MQE) on the same IT shop floor.
For the past decade, vendors of enterprise products, including CASE and RDBMS products have been sold as a method of unifying operational systems across lines of business. Their ubiquitous argument has been that the legacy of the mainframe is a series of non-communicating, outdated systems. The parallel component of the marketing assault is the position that their tools have taken the best "open" or "environment independent" solution. While marketing personnel attack the enterprise from the top, they have strategies that appeal to the shop-floor programmer. Software is given away for free. The software vendor will develop subtle appeals to the programmers, from who has the best implementation of Microsoft's component object model (COM) to who supports the strongest inheritance. These are designed to promote various intellectual viewpoints and develop agents of opposition within the competitor's camp.
The result is that client-server architectures have promulgated more stove pipes than the mainframes ever created. These organizations must maintain a confusing array of skills and capabilities. They must manage huge deployment issues. By maintaining multiple tools with correspondingly similar objectives, be it client server GUI tools, MQE's, reporting tools, and especially CASE tools, these organizations miss the opportunity to promote mature development cycles in their applications deployment.
For instance, adopting database independence can weaken the capabilities of an organization. Databases are no longer merely repositories of data, accessible by standardized sequential query language (SQL) and data manipulation language (DML). Databases provide server-side components that have been optimized to perform in a particular environment. These tools include job queues, alerters, and pipes. By now, database designers should be very familiar with these tools. The result of database independence is freedom from applications that scale. My text will show how these components are indispensable for the generation of a quality environment.
With the advent of very large databases (VLDB) and parallel technologies, these non-generic features are required to create world-class, viable data warehouse solutions. By world-class, I am referring to an environment that supports the broad sweeping, strategic decisions of a large enterprise.
In this text I have taken the position that the advanced visualization and modeling capabilities of the object oriented analysis are required to articulate how the components of the quality data warehouse should cooperatively moil. I utilized the unified modeling language (UML) to detail the steps of the DWM. Not only are deliverable artifacts produced with UML, but the interactions of the project team are also described as discrete services that the development environment should provide to complete a methodology step. All system designers would benefit from adopting the analytical techniques of OO concepts including distributed objects, components, and function points. OO techniques are well suited to design the separation of responsibilities in computer architectures between the operational systems and the data warehouse. Current OO techniques of analysis are applied to the requirements of the data warehouse. For instance, from a business analysis perspective, the economic facts of the customer purchasing an item is not related to the transactions that created the purchase. The transactions are only an atomic element of the economic fact. Within the object oriented data warehouse methods that are presented here, techniques are developed that separate and assign responsibilities of the evaluations of an organization's data among the multiple tiers of the organization's environment.
To provide strategic views of the data in our current environment, there are a broad and diverse set of transactional systems that must be interfaced with the data warehouse. For example the 7x24 transactional support system extracts of the data often cannot be directly queried from the system. Often, the design of data acquisition must accommodate a lack of computing resources.
The concept of "wrappering" legacy systems to provide methods to send transactions and receive business data is a potential integration tool; however, the business-oriented strategist is only interested in an aggregated subset of the data. Strategic data is mostly digested and processed. Even the finest granularity of analytical data requires summarization. For instance, the daily sale of a product by store requires this summarization along the store and product dimensions. This digest is best housed by the data warehouse technique. Creating systems that would support on demand queries of production data would be inefficient both in terms of input/output and processing. The data warehouse method has evolved to suggest that new transactional systems include methods that post useful, a pre-summarized result to multidimensional data structures. Object oriented analysis can design these components. Agents, brokers, and managers are visualized and designed. The analysis optimizes the data warehouse environment into a multi-tiered environment.
For both the object oriented and object relational advocates, the year 2000 issue has breathed new life into the legacy database systems including IDMS, Codacyl, and others. With a more "request broker" oriented approach to data feeds into the data warehouse, gateways could be used to automate the process.
In the final deployment of the data warehouse project, database or operating systems independence is a fallacy. As this text will detail, at some point the IT team must choose an infrastructure and become dependent upon their choices. Software companies have invested hundreds of millions of dollars in the research and development of commercial-off-the-shelf (COTS) tools for the user and development community. The "vendor-independence" movement has sometimes caused managers and developers to resist or avoid using such high quality products as a query environment. With the broad number of databases and systems that COTS products interfaces with, comes increasing flexibility in integrating a large scale environment. The data warehouse manager can maximize the quality of a data warehouse project through the management of these design choices.
A stable environment, designed from state of the art components, is mandatory to accomplish the goals of the organization building the data warehouse. I feel that the most important elements of a particular development environment are not vendor independence, but its features and characteristics. In VLDB's, how scaleable is the technology? Is it efficient? Does is support a methodology? Has it been regression tested? What are the deployment issues. Is it open and extensible? Does it support multi-processing environments? Most importantly, does the environment support repository management and re-use? Finally, the environment must support the business objectives of the organization and it must be capable of maturing along with the users and developers.
I will attempt to present the construction of the quality data warehouse environment in a full, life cycle manner. In this text, I hope to detail most important aspects of methodology, administration, and performance management. The manager or planner will be able to read and understand the technical portions of this text and understand the breadth of issues in this environment.
This text is divided into 11 chapters:
Chapter 1, The Data Warehouse Method. in this chapter, I present the Data Warehouse Method (DWM) and the repeating processes that create the quality data warehouse. I also present the concept of the Multidimensional Business Model. My two objectives in this chapter are to present the themes of the book and to construct my taxonomy for data warehouse development objects and processes. Chapter 1 also presents the concepts of the Integrated Data Warehouse Support Environment (IDSE). The chapter covers a overview of the DWM team, the characteristics of the flow of data among the workstations, and the use and integration of CASE tools and semantic repositories. These themes are threaded into the more general topics of Data warehouse Methodology and the development of technical Architectures. Finally, I describe how data warehouse administration and performance Management work together.
Chapter 2, Managed Query Environment Tools. In this chapter I describe how managers and decision makers use the MQE to decode the business perception developed by the multidimensional business model. The aim of this chapter is to describe the characteristics of the quality data warehouse from the user's perspective. I describe how the environment develops a clear vision and understanding of the business area. The chapter describes the characteristics of the user's interaction with the MQE. These interactions prepare graphical presentations that effectively encode business events against dimensions within the multidimensional business model. I describe standard MQE capabilities including slice and dice and drill methods. The chapter concludes with a short description of the characteristics and application of data mining tools.
Chapter 3, Methodology. Here, I present a detailed vision of how the disciplines of quality, team management, process engineering, and function points are intertwined and directed through methodology. These concepts, are the heart of this text. In the sections that describe strategy, analysis, design, deployment, and discovery, I will develop the steps, components, and capabilities necessary to create a successful data warehouse project.
Using CASE tools and the semantic repository, I describe how to integrate development tools into elements of the work flow and how to present methodology output using the Unified Modeling Language (UML).
The chapter presents a simplified function point methodology for estimating the size of the selected project. I describe how designers can move from data modeling to dimension and event modeling. Finally, the text will describe how to develop the IDES project plan and improve existing projects and teams.
Chapter 4, Strategy. This chapter describes the strategy phase of the Data Warehouse Method. Strategy initiates, defines, and controls the direction of a data warehouse. The chapter presents a method for achieving business objectives through strategic planning. The text details specific distinction of the differences between OLAP and OLTP development requirements. This understanding is pivotal to the successful implementation.
Chapter 5, Analysis. In this chapter I discuss the analysis phase of the Data Warehouse Method. In the analysis phase, the multidimensional model is more completely detailed and verified. The capabilities of the model are sharply defined through a process of multidimensional analysis. Through business rules and data harmonization the event spaces are rigorously detailed. During analysis, the business semantics gathered in the strategy phase are critically evaluated. Business rules define the data feeds from operational systems that are included in the data acquisition packages. Short term rapid prototypes are prepared for the MQE.
Chapter 6, Design. In this chapter I describe the analysis phase of the Data Ware Method. The data and process models are generated from the logical models prototypes. The design phase finalizes the general kinds of questions posed of the data warehouse. The logical model is then converted into physical data structures. With the physical data structures in place, limited volume testing is performed to avoid costly mistakes in succeeding phases. The team works to design and prototype the necessary connectivity or collection feed from data sources, the data acquisition modules.
Chapter 7, Construction. Here I show the final steps of the Data Warehouse Method. In the construction phase, the operational data warehouse is constructed. Data acquisition modules and other programs are developed and tested. The architecture (including the database) is tuned to handle larger data volumes than were tested in the prior phase. For the large data warehouse to run most efficiently, all of the components should be tightly integrated with a management module.
Chapter 8, The Quality DW Technical Architecture. In chapter 8, I present a methodology for developing a data warehouse architecture. I describe modern architectural IT components including parallel technologies, client server architectures, object oriented programming paradigms, and how to present a logical architecture using the Unified Modeling Language (UML).
Also integration steps are enumerated for the IDSE components, including CASE, semantic repositories, systems administration tools and MQE's. Additionally described are the specific attributes of the physical architecture including parallel architectures and server configurations. The text includes information on designing specific CPU requirements, memory and I/O subsystem requirements and redundant array of inexpensive disks (RAID) storage. Finally the chapter describes middle ware and its interaction with the network.
Chapter 9, Administration of the IDSE. Chapter 9 details the technical aspects of server administration in the IDSE. The objective is to describe the mission of the database administrator. The chapter describes specific administration and tuning requirements for databases including: control, tuning, parallizing operations, managing I/O, and memory. Also described are partitions in the database and how they fit into a data warehouse environment. The text uses this new capability as a way of achieving greater parallelism in the data warehouse. Chapter 9 also describes process management in the IDSE. Finally, performance management is described in terms of parallizing operations, and managing I/O and memory.
Chapter 10, Managing Dimension and Event Data. In chapter 10, a specific example of integrating the CASE, semantic, and MQE environments is presented. Using an Oracle RDBMS environment with a Business Objects MQE, the text describes a technique for managing the multidimensional model's degrees of freedom with aggregates. Because it describes dimensions and their distributions, the chapter is of general interest. Also described is the solution architecture, availability, and the aggregate management process. Chapter 11 presents, specific code for correlating the CASE repository with the semantic repository.
Chapter 11, Conclusion: At the conclusion of the book, I present a vision of how the data warehouse environment fits into the future of all information technology environments. I describe the concept of the business knowledge repository as an aggregate of the operation of an enterprise. The text describes a specific instance of a business knowledge repository developed for an integrated transactional/data warehouse environment.
Suggested Reading Patterns. To master the concepts of this text, one should proceed through the entire text. Designers and analysts should complete the text. However, executives and non-technical managers can read a subset of the text and achieve a grasp of the concepts. Non-technical managers should read Chapters 1 through 8 and the conclusion. Executives should read chapters 1,2,3,4 and the conclusion. For the executive, chapter 4 describes how a project cycle of the DWM can be used to create impact on the bottom-line. Quality personnel should read Chapters 1, 2, 3 and the introductions of chapters 4, and 7, the conclusion, and the appendix. Today, there is a great deal of confusion about components of architecture including RAID, parallel servers, and how they should work together and when they should be used in the data warehouse. I have presented clear explanations of how these should be used so that all may understand.
Through a contemplation of the evolution of the CASE, semantic, and administration repositories, I hope that the reader can visualize the directions of new methodologies emerging from these increasing abstractions to the construction of an IT infrastructure. The data warehouse method has created the final corner of what will become the business knowledge method. In a closed loop, this method, in combination with the data warehouse method, will have the ability to deploy new systems that marry existing business models with the fine tunings of a data warehouse analysis. These elements of the new operational systems will enable a zero cost deployment of new business rules.
"About this title" may belong to another edition of this title.
Seller: Mispah books, Redhill, SURRE, United Kingdom
Hardcover. Condition: Very Good. Very Good. Dust Jacket may NOT BE INCLUDED.CDs may be missing. SHIPS FROM MULTIPLE LOCATIONS. book. Seller Inventory # ERICA82901308130603
Quantity: 1 available