Karl Arao's Blog

Just another weblog about Oracle,Linux,Troubleshooting,Performance,etc..etc..

Posts Tagged ‘Forecasting_Performance’

Fast Analytics of AWR Top Events

Posted by karlarao on March 24, 2012

I’ve been working on a lot of good schtuff lately on the area of capacity planning. And I’ve greatly improved my time to generate workload characterization visualization and analysis using my AWR scripts which I enhanced to fit on the analytics tool that I’ve been using.. and that is Tableau.

So I’ve got a couple of performance and capacity planning use case scenarios which I will blog in parts in the next few days or weeks. But before that I need to familiarize you on how I mine this valuable AWR performance data.

Let’s get started with the AWR top events, the same top events that you see in your AWR reports but presented in a time series manner across SNAP_IDs…

So I’ve got this script called awr_topevents_v2.sql http://goo.gl/TufUj which I added a section to compute for “CPU wait” (new metric in 11g) to include the “unaccounted for DB Time” on high run queue or CPU intensive workloads. This is a newer version of the script that I used here http://karlarao.wordpress.com/2010/07/25/graphing-the-aas-with-perfsheet-a-la-enterprise-manager/

On the screenshots below I spooled the output on a text file and transformed it to a CSV format. Then from there you can play around with the data points using Tableau that you can download here http://www.tableausoftware.com/public/download

Is this a free software? Yes. But the catch is Tableau Public only allows you to have 100K rows on a given text or csv file. But for my capacity planning and troubleshooting purposes that’s enough to do the work :) The paid version of Tableau (Desktop) allows you to connect directly to the database but that will cost you $1999. ouch. The Tableau (Desktop Personal) is much like the Public version but with no data limit and it allows you to save your work on a specific file format (.twb and .tde files), Yes guys in Public version you are only allowed to save it to their Public portal… but that’s fine as long as I have the data points I can quickly work on the visualization again. Here’s the comparison if you are interested on the detailed product comparison http://www.tableausoftware.com/products/desktop/specs

Once you get the CSV file ready which you can download HERE

Open the Tableau and follow the steps below…

1) This is a trial version of Tableau Desktop.. the steps should be the same or similar with the Public version. Click “Connect to data”

2) Select the CSV file

3) Select “Text File”

4) Then you can view the data points

5) Data Points are shown below

6) Click OK

7) Import all data

8) It will create a file format that I believe where it stores the dimension data

9) After the data points import

10) Drag the “Events” on the color section, then “Add all members”

11) Click OK

12) Drag the AAS on the Rows

13) SNAP_ID on the columns. That’s your AWR SNAP_IDs

14) Filter the negative values. Those are the periods where you had instance shutdown.

15) Exclude the Nulls

16) And now you can select an area! hover on it and it will give you the event and the AAS number (the higher the AAS the more database work is happening)

17) Play around with the AAS data points

18) Now here it will show you detailed SNAP_ID data points (the dots) and you can hover on it

19) You can also view the underlying data points of those dots

20) Here it shows the event, SNAP_IDs, and the AAS value.. so here you can grab the SNAP_ID and execute the AWRRPT.sql on the database server to generate the larger AWR report

21) Drilling down more on the data points

22) You can also focus just on specific events. Let’s say just the CPU..

23) or the event that’s hogging most of the resources

24) and just select the data points from that bottleneck

25) Copy the summary info

26) Then play around with the data in command line

And clearly the database bottleneck is the PX event… which could just be the surface of the problem or a symptom for another issue..
And then visually you’ll start finding trends on the workload..

let’s say on this workload I found sustained high workload periods that would last for days..and these could be the periods where the batch jobs are executed..
SNAP_ID 34241 – 34306
11/07/06 03:30 – 11/07/07 12:00SNAP_ID 34417 – 34516
11/07/10 13:30 – 11/07/12 15:00SNAP_ID 34565 – 34694
11/07/13 15:30 – 11/07/16 08:00SNAP_ID 35213 – 35322
11/07/27 03:30 – 11/07/29 10:00

SNAP_ID 35363 – 35477
11/07/30 06:30 – 11/08/01 15:30

SNAP_ID 35590 – 35670
11/08/04 00:00 – 11/08/05 16:00

And if you want to have regression analysis, you must put all of these data points on the Y and X axis (I’ll create a script for that one of these days).. that way the r2toolkit will have a very good scope of data.. and then you’ll get your nice R2 trend..
but it doesn’t stop there..
you have to drill down on the data points, use the awrrpt, addmrpt, awrsqrpt, ashrpt, and drill down on sessions with snapper, os utilities, etc. to get to the bottom of the issue
so that’s forecasting+troubleshooting!
Update: 
The screenshots above just makes sense of the SNAP_ID for the time series data. After a little more AWR data hacking I was able to make sense of the date value which is really useful because I no longer have think of SNAP_IDs across instances and getting them aligned on a particular period… and the trick here is Tableau automatically creates a time dimension for the time column of my AWR scripts.  Kewlness :)  and that way I can stitch performance data across hostnames and consolidated instances of an entire Exadata Full Rack! and I’ll show how it is done on the next couple of posts.
The graph below shows an example of having a time dimension. Below is the AWR data (shown by day and hour) coming from a non-RAC database on a SAN storage with a default AWR retention period. The arrows below are pointing to a workload period where they are  running a report which has a SQL that has access paths doing lots of multiblock IOs (db file scattered read) and that throttles the performance of the (already slow) storage causing the IO latency to shoot up. This graph was pretty useful on the storage performance troubleshooting which correlates to the numbers of OS (iostat, vmstat, and atop) and session level (snapper) tools. Now the single block latency (db file sequential read) has always been on the 20ms below range regardless of load. So fixing the SQL to favor the use of indexes (db file sequential read) made it do efficient IO, that made the SQL run from 15mins to 10secs.

 

Hope I’ve shared you some good stuff!

 

 

Posted in Oracle, Performance, Troubleshooting | Tagged: , , , | 1 Comment »

VirtaThon – Mining the AWR

Posted by karlarao on July 18, 2011

Earlier I did a presentation at VirtaThon which is the same topic that I presented at Hotsos 2011.. Mining the AWR and Capacity Planning are very dear to my heart and up until now I’m using every research I did on that presentation to work on an “Exadata Provisioning Tool” which I’m planning to present at the next Hotsos 2012… well, the only thing that’s different this time is.. my attendees are virtual geeks all over the world ;)

I was at the Virtual Room #100, and the staff earlier were really helpful and most of all I had fun doing that webinar.. and I get to “doodle” on a lot of my slides!

Overall, that was a fun first time experience ;)

Check out the presentation slides below and the scripts here..

BTW, as per Tariq Farooq.. Everything is being recorded! and they will be “broken up” and uploaded within a week or so.. So I’ll be updating this post and include that recording once the video comes out ;)

Hope I’ve shared you some good stuff ;)





Posted in Oracle, Performance, Uncategorized | Tagged: , , , , | 5 Comments »

Hotsos 2011 – Mining the AWR Repository for Capacity Planning, Visualization, and other Real World Stuff

Posted by karlarao on March 11, 2011

For all that loves Oracle performance.. Hotsos is truly the best conference, all the speakers are performance geeks, all the attendees talks about performance. Everything is about performance! There are lot more stuff that I like about my first Hotsos experience, the following are some of them..

I like that presenters and attendees are curious about what each performance geek has to say..

I like that whenever their brains are already fried. They consume as much coffee/soda as they can.. and just lay their butts on this couch.. and still.. talk about performance.

I like that after all the sessions, they will head straight to the hotel’s bar.. drink beer.. and still.. talk about performance.

I like that you can get yourself into healthy debates, validate each other’s methods and work, and also possible to collaborate with some other geeks doing the same research.

I like that I had a blast on my presentation.. :)

I like that I got a good number of attendees, good feedback, and good questions..

Below is the presentation slides BTW..

and below are the daily updates of the conference

Day1 http://goo.gl/H9rJ9

Day2 http://goo.gl/wLh4H

Day3 http://goo.gl/l74C8

Training Day http://goo.gl/cTXZq

 

So.. Definitely I’ll come back, and.. I’ll try to make a good research every a year, so I can go again for free! ;)

Posted in Community, Oracle, Performance | Tagged: , , , , | 3 Comments »

 
Follow

Get every new post delivered to your Inbox.

Join 27 other followers