Unlock the power of SQL for IBM i database modernization! This comprehensive guide equips IBM i professionals with the SQL skills to modernize legacy databases and build data-centric applications.
SQL for IBM i delivers practical techniques for database design, data validation, and user-friendly data access. Learn to leverage SQL/PSM, stored procedures, user-defined functions, and triggers to create efficient, application-independent solutions. Discover how to transform your IBM i database into an intelligent, well-structured asset. Ideal for IBM i programmers, system administrators, and database developers.
"synopsis" may belong to another edition of this title.
Rafael Victória-Pereira has more than 16 years of IBM i experience as a programmer, analyst, and manager.
Acknowledgments,
Why You Need This Book,
1: Meet the UMADB: Our Example Database,
2: A Data Manipulation Language Basics Recap,
3: A Data Definition Language Basics Recap,
4: Making the Database User-Friendly,
5: Tidying Up the Database,
6: Introducing Data-Centric Programming and SQL/PSM,
7: Creating and Using Stored Procedures,
8: Exploring User-Defined Functions,
9: Making a Trigger-Happy Database,
10: Moving Business Validations to the Database,
11: Exploring the Database Catalog Information,
12: Parting Gifts,
Meet the UMADB: Our Example Database
This chapter introduces UMADB, a database for a university management application. It's important to have a notion of how this database is built — its flaws and shortcomings — because I'll be using it in almost every example in the book. UMADB is in bad shape and will be improved throughout the book, by applying the concepts discussed in each chapter.
Behind (almost) every application worthy of that name is a database. Some databases are small, some are huge, some are simple, and some (I'd say most) are complex. In IBM i's world, they're usually old, big, and confusing. Although our example database, UMADB, is not very big (I downsized it for simplicity's sake), it is also poorly built and can be rather confusing to both programmers and users.
Let's start with what this database should do. It supports a university management application. This means it should manage student, teacher, class, course, classroom, grade, and timetable records. In its current state, it kind of does, but with some room for improvement. The UMA application, an "old-style" RPG application, should keep track of the students' academic lives: the classes they attended, the grades they received, and so on. It should also keep track of classroom usage and timetables, for both students and teachers. However, these last two tasks were deemed "too complex for the application" by the application's manager, and were left out of the database. In other words, these are manual tasks, performed by the university administrative staff. This is one of the many shortcomings of the application and its database.
There are plans to change the existing application, and we (you, dear reader, and I) are part of them. We are going to improve the current database, which is basically a set of DDS-defined physical files, by applying DB2 for i SQL techniques, tricks, and novel features!
But first, you need to get to know the database in some depth. Let's take a look at the current database structure, table by table, starting with the Students table.
The Students Table
The Students table started with a simple student name column and grew to include other pieces of data, as do many DB2 tables in real-life applications. However, the growth was not planned properly (again, as in many real-life DB2 tables), and there are some problems in this table, which might not be obvious at first. But we'll get back to that later; now let's have a look at the actual table (Table 1.1).
This looks just like most IBM i physical files I've seen: cryptically short file (or table) and field (or column) names, concentrating a lot of information in a single row. The columns are mostly unremarkable as well: the list includes student contacts (addresses and phone numbers) and IDs (driver's license and Social Security number).
There are a couple of eyebrow-raising features. First, the column that stores the date of birth is a decimal with a length of 8,0, meaning that it's a number, not a date. Note that the database isn't prepared to validate the content of the field — it's just a number that some convention says represents a date. Another noticeable "feature" is the absence of the student's record unique identifier. This identifier was deemed unimportant, because all searches are done using the student's name.
These flaws are just a couple of examples of textbook problems with IBM i tables: they are, in a word, dumb. Even though it's possible, for instance, to perform basic checks like the validity of a date at database level, this and many other similar tasks are almost always performed at application level, thus making the database a simple (and dumb) repository of data. The same could be said about the absence of a record ID. The problem occurs when there are other, non-native applications accessing and manipulating the data. Without checks at the database level, it's possible, and very likely, to insert rubbish into the tables. Introducing those checks is actually very easy to do. Later in this book, you'll learn how to create validations that mimic (and can even go a step further and actually replace) business rules that currently exist for RPG programs.
There's something else wrong with this table, but it's not obvious yet. We'll need to go over a couple more tables for you to see it. Let's move on to the Courses table.
The Courses Table
Let me take a moment to explain the structure of the information in this database. The students take classes, which are taught by teachers, and are part of courses. At the end of each semester, the teachers grade the students in each of the classes they attended. This may sound obvious and redundant, but it's important to keep the structure in mind from this point on. In a way, the Students and Courses tables are the center of the database, because all the other tables are somehow linked to one (or both) of these tables.
Now let's take a look at the Courses table structure, shown in Table 1.2.
Again, the table is pretty typical: the same cryptic names and the lack of a unique record identifier that characterized the Students table. By the way, I imagine that you're curious about the hidden flaw in the Students table, mentioned earlier. Don't worry, it's going to become obvious in the next section, where we'll look at the Teachers table.
Meanwhile, there's something common to all the tables in this database: a status column. As the name implies, it indicates the status of the record. The convention used here is the following:
• 0 — Created but not active record
• 1 — Active record
• 9 — Inactive (deleted) record
This is something that has to be taken into account when querying the database, and it has been the source of many misunderstandings. Sometimes the users forget to include a condition in their queries and end up mixing active and inactive records, which leads to inconsistent or just plain wrong information.
The Teachers Table
The teachers are a very important part of any teaching system. They're also a very important part of the application our database supports, although the table that keeps their records is not very "polished." You'll see what I mean when we analyze the Teachers table, shown in Table 1.3.
This table is similar to the previous ones, but it includes a sensitive piece of information: the teacher's salary. As things stand, anyone with access to the table can see how much each teacher earns, which might not be a very good idea. I'll get back to this later, when I discuss how to hide a column's data from prying eyes.
Notice the similarities between this and the Students table: the personal information (addresses and IDs) is the same. Even though this makes sense — both teachers and students are people and share the same type of information, it begs the question: what if a student becomes a teacher, or vice versa? There will be duplicate and possibly inconsistent information in the database. I'll address this issue later, in the discussion about database normalization and how that translates to SQL.
Having said that, let me take a moment to explain the other columns in the table. Besides the obvious teacher name and the aforementioned personal information, this table also includes a "teacher rank" (which can be something like Assistant Professor, Professor, and so on) and a "subjects taught" column. The latter is supposed to link to the Classes table, presented in the next section, but the connection is kept by humans, not the database. Because the same person can teach multiple classes in the same school year, the application's manager thought it would be simpler to manually track the link between teachers and classes — yet another shortcoming we'll need to solve later.
It's now time to move on to the next section and review the Classes table.
The Classes Table
Here's where things start to get interesting: finally, a table with links to other tables. The Classes table contains information about the students who form a class of a given subject during a given year and the course to which the class belongs. As I said before, the teacher is not part of the setup, at least not at database level. Table 1.4 shows the complete Classes table structure.
As you can see from this table, the links I mentioned before are based on the names of the student and the course, which might cause some problems. The ideal situation would be to have record identifiers in each of the tables and keep those IDs, instead of the respective names, in the Classes table records. The next issue is the duplicate student information. The application manager thinks this duplication makes sense, because the student information might change from school year to school year, and keeping the information here allows the teacher to contact the student using the most current addresses. We'll also have to deal with this situation later.
Finally, the last table of the downsized version of the UMADB database is the Grades table. Let's analyze it in the next section.
The Grades Table
After the end of the semester, the students are graded on their performance in each of the classes they attended. The results are stored in the Grades table, shown in detail in Table 1.5.
Just like the Classes table, this one also depends on another table's information to form its unique key. In this case, that key is formed by the student name, the class name, and class year. Of these three, two are names stored in character strings. This makes them prone to error (character fields usually make awful keys because of the possible mismatches caused by different character cases — "John" is not the same as "john", for example) and slower to work with (because it takes longer to process a string of characters than a numeric value). The other problem with this table is the Grade column: there's no validation in the database to prevent inconsistent values, such as invalid grades. It's assumed that the letters A, B, C, D, and F will be used, optionally followed by a plus/minus sign, but there's no actual check for a valid grade at the database level. Just like the student's date of birth validation, this one also exists at the application level, buried in some RPG program.
Just a Few Tables, and So Many Problems
From what you've read so far, you probably concluded this (exaggerated) scenario has some similarities with real-life issues in IBM i databases you've seen. Probably not all at the same time, but you know what I mean. It's true that some of the issues are very basic and easy to solve, while others require some database redesign and ingenuity. I'll address all these issues and a few more, which are related to the non-implemented functionalities that are currently handled outside the application's scope, over the next chapters of this book.
You can skip a chapter or two, but keep in mind that the database will evolve, and each chapter will build upon the foundations laid by its predecessor. If you're comfortable with the topics discussed in a chapter, you can simply have a quick look at the SQL code samples to keep track of the changes to the database. (You can download the code samples and other supplementary book materials on the book's page at "https://www.mc-store.com/ products/sql-for-ibm-i-adatabase-modernization-guide".)
Before starting in earnest, I'll start by reviewing some SQL data manipulation language (DML) statements and sharing a few tricks I've learned over the years that can, hopefully, help you get more productive when it comes to manipulating data using SQL.
In This Chapter, You've Learned ...
The UMADB is a mess! Here's why:
• The existing tables don't have unique record identifiers, which makes the connections between its tables weak, to say the least.
• There's no data checking at database level, which can ... well, let me rephrase that ... probably will cause inconsistencies in the data, such as invalid dates and meaningless grades (yup, I'm talking receiving a G or a 24 as a final grade), just to name a couple.
• The table and column names are short and cryptic, following the age-old IBM i tradition, which makes querying the database rather user-unfriendly.
These issues will be addressed throughout the book, accompanied by the discussion of the relevant underlying concepts.
CHAPTER 2A Data Manipulation Language Basics Recap
This chapter recaps the basic data manipulation language (DML) statements and uses the sample UMADB database in all its examples. It will go over the SELECT, INSERT, UPDATE, and DELETE statements. However, this chapter won't discuss the syntax of these statements. Here we will explain how you can write shorter and clearer statements by resorting to a few keywords that you might not be aware of. If you want to play around with the examples, be sure to restore the UMADB_CHP2 library from the downloadable source code, at "https://www.mc-store.com/products/sql-for-ibm-i-a -databasemodernization-guide".
I'm going to assume that you're familiar with the most commonly used DML statements and will not explain their syntax in depth. Instead, I'll focus on some details that can simplify the statements — for instance, shorter "implementations" of concepts.
Using the BETWEEN and IN Predicates
Let's get started with a simple yet very powerful keyword. If you started querying the IBM i's database using Query/400 (as most of us did), one of the things you might miss is the RANGE keyword. This simple-to-use tool allows you specify the lower and upper limits of a range of values in a clear and concise way. What you might not know is thatSQL has a RANGE equivalent: BETWEEN. This keyword's equally easy to use, but it has a different syntax, which is closer to common English than the robot-speak of RANGE.
It's easier to explain with an example, so let's imagine that a user needs a list of all the university students who were born in the 1990s. The kneejerk reaction would be to write something like this:
SELECT STNM
, STDB
FROM UMADB_CHP2.PFSTM
WHERE STDB >= 19900101
AND STDB <= 20000101
;
Even though this statement is correct (assuming that the student's birth date, columnSTDB of the PFSTM table, is in YYYYMMDD format), it can be made clearer with BETWEEN:
SELECT STNM
, STDB
FROM UMADB_CHP2.PFSTM
WHERE STDB BETWEEN 19900101 AND STDB 20000101
;
Notice how using the BETWEEN predicate made the statement easier to read. By the way, I'm a big fan of clear code, so you'll see a lot of indentation in my code examples. It makes the code easier to read and, more important, easier to maintain. For instance, if I want to add a new column to the query, I simple add a new line wherever I need to add it, and insert a comma followed by the column name. If all the columns are in the same line, this might not be so simple, especially in queries with a lot of columns. The only downside to this is that my queries tend to get a bit long. However, if you use IBM Access Client Solutions' Run SQL Scripts or any other non-native query tool (IBM Rational Developer for i's query tool, WinSQL, Toad, and so on), this is not a big issue.
All the examples shown here were written in Run SQL Scripts. You'll notice the SQL syntax with the period (.) separating the library (schema) and file (table) names, instead of the system's native syntax with the slash character (/) acting as a separator between the library (or schema) and the table, and the semicolon (;) terminating each statement.
Even if you knew BETWEEN, you might not know that you can invert the selection by adding a simple keyword: NOT. Here's an example to illustrate what I mean: the user actually wanted a list of all the students who weren't born in the 1990s. Well, let's not waste the statement we just wrote. Let's modify it instead:
SELECT STNM
, STDB
FROM UMADB_CHP2.PFSTM
WHERE STDB NOT BETWEEN 19900101 AND STDB 20000101
;
See how with a very simple change you can get the exact opposite result of the original query? You can use NOT in all sorts of ways to easily negate a comparison. It's particularly useful when the opposite of a comparison is complex to write down. Instead, you can simply write NOT (original comparison), and you're done. I'll provide additional examples in a moment.
Another tedious and error-prone situation is when you want to find all the records that have one of several values in a given column. For instance, let's say that someone wants a list of all the teachers with the rank of Dark Master, Maximus Praeceptor, or Praeceptor. (Yes, the Teachers table is a bit quirky — actually, the entire database is! Take a moment to query the tables, and you'll see what I mean.) In Query/400 you'd use LIST, but I've noticed many people still write the comparison statement using something like this:
SELECT TENM
, TETR
FROM UMADB_CHP2.PFSTEM
WHERE TETR = 'Dark Master'
OR TETR = 'Maximus Praeceptor'
OR TETR = 'Praeceptor'
;
Excerpted from SQL for IBM i by Rafael Victória-Pereira. Copyright © 2018 Rafael Victória-Pereira. Excerpted by permission of MC Press.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.
"About this title" may belong to another edition of this title.
Seller: HPB-Red, Dallas, TX, U.S.A.
paperback. Condition: Good. Connecting readers with great books since 1972! Used textbooks may not include companion materials such as access codes, etc. May have some wear or writing/highlighting. We ship orders daily and Customer Service is our top priority! Seller Inventory # S_423402192
Seller: GreatBookPrices, Columbia, MD, U.S.A.
Condition: As New. Unread book in perfect condition. Seller Inventory # 30924945
Seller: Grand Eagle Retail, Bensenville, IL, U.S.A.
Paperback. Condition: new. Paperback. This book aims to give IBM i technical users basic to intermediate SQL knowledge and tools they can use to get more out of the IBM i database. The book can be useful to veteran IBM i programmers, who have RPG and COBOL roots, system administrators looking to get more information out of their IBM i system, or even Java and .NET developers who need to talk to IBM i database. The author provides comprehensive examples and exercises to help readers understand and practice what they have learned. Shipping may be from multiple locations in the US or from the UK, depending on stock availability. Seller Inventory # 9781583474495
Seller: GreatBookPrices, Columbia, MD, U.S.A.
Condition: New. Seller Inventory # 30924945-n
Seller: California Books, Miami, FL, U.S.A.
Condition: New. Seller Inventory # I-9781583474495
Seller: BargainBookStores, Grand Rapids, MI, U.S.A.
Paperback or Softback. Condition: New. SQL for IBM I: A Database Modernization Guide. Book. Seller Inventory # BBS-9781583474495
Seller: GreatBookPricesUK, Woodford Green, United Kingdom
Condition: As New. Unread book in perfect condition. Seller Inventory # 30924945
Quantity: 11 available
Seller: Revaluation Books, Exeter, United Kingdom
Paperback. Condition: Brand New. 440 pages. 8.50x7.00x1.00 inches. In Stock. Seller Inventory # x-1583474498
Quantity: 2 available
Seller: Kennys Bookshop and Art Galleries Ltd., Galway, GY, Ireland
Condition: New. 2018. None ed. paperback. . . . . . Seller Inventory # V9781583474495
Quantity: Over 20 available
Seller: GreatBookPricesUK, Woodford Green, United Kingdom
Condition: New. Seller Inventory # 30924945-n
Quantity: 11 available