If you're new here, you may want to subscribe to my RSS feed. So that you can read the latest updates about Web2.0 tools, Making Money Online, Tips in SEO, Ajax and many more. Thanks for visiting ProgramimiCOM!
SQL Plan Cache Snapshot, Visual Explain, and other tools solve a DB2 for i5/OS performance mystery.
By Jim Flanagan and Shantan Kethireddy
The story in this column illustrates a methodology DBAs can use to solve SQL query performance issues in DB2 for i5/OS V5R4 when crucial evidence is missing.
Mr. D.B. Admin of Company A wants to hire famed detective Shantan Kolmes and his sidekick Dr. Flatson to help with a serious performance problem. As of last week, Company A’s main order application has been experiencing drastic performance degradation. Mr. D.B. Admin calls Kolmes and Flatson to explain the problem.
“Every day at 10 a.m., our customers’ orders are at their peak, and our operators can’t process them fast enough. We’ve had customers disconnect mid-order. We’ve lost several long-time customers and who knows how many new customers. Our chief competitor, Company Z, is quickly taking over several of our accounts. Performance wasn’t an issue until very recently. Please help — the CEO is ready to fire the entire IT staff!”
| |
“Do you have a SQL Performance Monitor from when the system was running well?” asks Kolmes.
“No, we didn’t see the need for it.”
“I suppose you don’t have an SQL Plan Cache Snapshot either?” Kolmes says.
“No, does that matter?”
“It will make things a bit more difficult.” But Kolmes remained confident. “Has anyone changed the application since you started seeing this problem?” Kolmes asks.
“No, all changes are supposed to be approved through me.”
Kolmes finds Mr. Admin’s choice of words curious. “You say, ‘supposed to.’ It sounds like you might have had the same issue in the past?”
“Well, yes. I have one developer who is a bit of a maverick. But he assures me he didn’t change anything.”
“Have you noticed anything else unusual?” Kolmes asks.
“We did have a contract programmer recently for a minor update, but I approved that change myself.”
Kolmes says he’ll arrive before 10 a.m. the next day. “Whatever you do, please don’t IPL the system!”
In the meantime, Flatson researches Company Z. As he looks through his notes from the questioning of Mr. Admin, Kolmes begins to develop a preliminary strategy for clue collection and analysis.
Mr. Kolmes and Dr. Flatson arrive at Company A at 9 the next morning. Mr. Admin reports that he successfully stopped the system operator from performing an IPL of the system.
If you’re in the situation facing Mr. Kolmes, the first order of business is to capture the SQL performance environment by taking a SQL Plan Cache Snapshot. To do so, go to the iSeries Navigator interface, expand the Databases folder and the specific database folder, and then right-click it. Select the SQL Plan Cache option and then the Show Statements menu action.
On the statements screen, you can fill in the filters you’re interested in. In the Kolmes case, we know which application is causing the problems, so you’d fill in the schema name for the queries that use or reference these objects. You can also set the filter for Top ‘n’ queries with the largest total accumulated runtime, if you were sure you’d catch the queries for KEYPROD1. Figure 1 shows what Kolmes retrieved. You can save the list of queries to a schema for later viewing by choosing Create Snapshot and giving the schema and snapshot name.
![]()
Figure 1. List of queries referencing KEYPROD1.
Figure 1 shows the names of the contract programmer, Mr. Admin, and the maverick programmer. But row six shows something interesting; the User Name in that row matches the last name of Company Z’s CIO.
Notice how easily we (and the two detectives) uncovered preliminary evidence. Before V5R4, you had to turn on a monitor during a period when the application was experiencing difficulties and then analyze the monitor data using monitor analysis queries. This approach could cause system performance degradation if appropriate filtering wasn’t used. You also had to try to duplicate the environment in which the application was running. Now, you can simply take a SQL Plan Cache Snapshot with no system overhead, and have all of the same information and analysis tools at your disposal. Keep in mind that the SQL Plan Cache only contains information about queries that went down the SQL Query Engine (SQE). In-depth analysis of queries that went down the Classic Query Engine (CQE) may still require use of the Database Monitor. See Resources for more information about the difference between SQE and CQE.
If we had had a SQL Plan Cache Snapshot or a monitor from before the point of the initial degradation, we could have run the comparison menu action between the old and new environment to see what was changed. Instead, we’ll need to analyze the worst running queries from the application under the present conditions, figure out the root cause of the performance bottleneck, and then try to figure out what changed (and why) to cause this bottleneck.
In order to view the detailed performance information for a given query, you would use a tool called Visual Explain. Visual Explain is a database tool that graphically represents the optimizer implementation of a query request. It provides a method of identifying and analyzing database performance problems. The tree-like output provides a comprehensive understanding of the optimized plan. Visual Explain shows all the attributes necessary for users and developers to understand implementations, performance bottlenecks, indexing strategies, and so on.
Going back to the SQL Plan Cache viewer dialog, you can simply select the desired statement and click Run Visual Explain.
The Final Select icon shows all environmental attributes such as pool size, number of processors, QAQQINI information, and so on. Based on the list of environmental settings Mr. Admin provided, the detectives find these attributes to be in order. Mr. Admin’s company doesn’t have SMP installed, so the lack of parallel processing is expected. You can see the area of the tree that was estimated to be the most costly by going to the View menu and selecting the Highlight Index Advised menu option. In Mr. Admin’s case, the most costly area of the tree was a table scan on the two-billion-row ORDERS table, which populated a hash table used for joining.
The Index Advisor (available as an icon in the toolbar) suggests indexes to improve performance. In Mr. Admin’s case, the Index Advisor suggests an index over a heavily used join column from the two-billion-row ORDERS table, which would explain the poor performance.
If you look at the rest of Mr. Admin’s poorly performing statements, you see that all the statements reference the same join column from the same ORDERS table. To find out if this missing index is behind the performance problems, you have to determine whether this index, or a comparable index, had existed before the performance degradation and that it was removed immediately prior to the degradation.
To find a record of objects in the database from a period when the application was running well, you would open the iSeries Navigator application and expand the Systems folder. Right click on the database folder and select the Health Center menu option. You should see the Overview page. Click on the Size Limits tab. In Mr. Admin’s system, the settings are already set to the schema the application is running in. Hit Refresh to show the current size limits for KEYPROD1. If you want to save the results on this screen, choose save, and the save dialog displays with the history file name filled in.
Now that the current environment is saved to the history file, you can do a comparison to the last capture. Click View History… and fill in the history filename.
When Mr. Kolmes sees the history file that contains the two collections of the Size Limits for this application, he expands the Maximum number of index entries node in the two collections so that he can compare the history files of the current and last months (see Figure 2).
![]()
Figure 2. Comparison of current and past history files.
His examination shows that there is one less index in the current environment than in the previous environment. The missing index is IMPORTANT_IDX1.”
Kolmes tells Flatson that they need one final piece of data: the auditing journal (QSYS/QAUDJRN). Using the Display Journal (DSPJRN) command on the auditing journal reveals the precise moment when the index was removed — and who removed it.
Mr. Kolmes calls a meeting with Company A’s CEO, Mr. Admin, the system operator, and the maverick programmer.
“We found a DROP INDEX entry in the audit journal. Does anyone know why that was done?”
The system operator says it sounds just like the maverick programmer.
Kolmes replies, “We thought that might be the case, but the queries in the SQL Plan Cache Snapshot show your User Name, Mr. System Operator.”
“That’s not unusual; I do queries when diagnosing a system problem.”
“Ah, but your name also shows up in the audit journal as the User Name who did the DROP INDEX. We also found out that the CIO of Company Z has a nephew that works in this company,” Flatson chimes in.
The system operator yells out, “It was me alright, I admit it! I would have gotten away with it too if it hadn’t been for those new tools, and if you hadn’t stopped me from IPLing the system.”
“Tell it to the judge,” Kolmes replies.
The moral of the story? Keep an SQL Plan Cache Snapshot of your environment so you have something to check against if you encounter performance problems in the future.
Jim Flanagan is an advisory software engineer at IBM in database development for DB2 for i5/OS and team leader of iSeries Navigator — Database.
Shantan Kethireddy is a member of the IBM SQL Query Engine team and a frequent speaker at System i technical conferences.
Print This Post
Email This Post
Comments RSS
TrackBack Identifier URI
You must be logged in to post a comment.