Tuesday, February 27, 2007

OraSRP v3.0b - a very short initial impression

Update: This has evolved into a series of posts so I added links at the bottom to guide readers who may want to read the entire series of posts.




Last Friday, February 23, Doug Burns mentioned in his blog that OraSRP v3.0b (beta) has been released. I have a special interest in the Oracle Extended SQL Trace File because I have used it to help me solve a lot of performance problems. My interest with the trace file started after reading the excellent book Optimizing Oracle Performance by Cary Millsap and Jeff Holt of Hotsos.

Anyway, I've tried earlier versions of OraSRP but I had some problems. Off the top of my head:
  • It could not finish processing certain files. This has nothing to do with speed. It had something to do with OraSRP not being able to finish processing large files with a lot of BIND lines in them. I had two big files (44+ and 66+ million lines) where OraSRP would just stall. I ran OraSRP against these files on several platforms (Solaris, Linux, Windows) and I had to kill them after two days. The workaround to this type of problem was to bypass or limit the processing of BIND lines.
  • I compared the output to that of the Hotsos profiler (v4.0.16). There were some numbers that were too far apart, so far that they can not be attributed to rounding off. This does not mean, of course that OraSRP was wrong. I'm just saying the outputs of the two profilers were vastly different in some sections.
Again, those two bullet points are for an earlier version of OraSRP. I also want to make clear that I did not have time to prove which output is correct and so I could not conclude that OraSRP was wrong; after all, it is entirely possible that there are some bugs in the Hotsos profiler itself. But I have to say that the big differences in the numbers made me very uncomfortable not only because I know that the people at Hotsos wrote THE book on the Oracle Extended SQL Trace File, but also because I have used earlier versions of the Hotsos profiler to help me solve some very nasty performance problems at some of Sun's big customers.

And so it is natural for me to want to see what is new with OraSRP v3.0b. This new version is a complete rewrite from Python to the D programming language (not to be confused with a similarly named language for Solaris DTrace). It is also in binary-only format for Linux and Windows.

These days, my main workstation runs Solaris and I no longer have a machine that runs Linux so I had to use the Windows version. Here's a screenshot of the first two sections of the report:


Please note the second arrow pointing at "Total lines processed in trace file". OraSRP is reporting 2,150,340 lines. That did not look right to me. So I went back to the raw trace file and did this:

jforonda@u20$ wc -l 4071.trc
2156850 4071.trc
jforonda@u20$

jforonda@u20$ bc
2156850-2150340
6510
jforonda@u20$


Hmm... I know that this specific trace file is 'clean' and it does not have things like those "dump from memory" lines. So it seems like there are 6,510 lines that are somehow not accounted for. What does this mean? Is it a big deal? I'll let the reader draw their own conclusions.

Next, I looked at the total elapsed time for the session. In the image above, that information would be where I drew an arrow at the very bottom of the page. For the total elapsed time, OraSRP reports 14,187.9512 seconds. Again, that didn't look right to me. So again, I went back to the raw trace file. To get a quick approximation of the session start timestamp, I did this:

jforonda@u20$ grep tim 4071.trc | head -1
PARSING IN CURSOR #5 len=69 dep=1 uid=413 oct=42 lid=413 tim=3928564402960 hv=2004533713 ad='ab3a9520'
jforonda@u20$


And to get a quick approximation of the session end timestamp:

jforonda@u20$ grep tim 4071.trc | tail -1
EXEC #15:c=0,e=1077,p=0,cr=3,cu=1,mis=0,r=1,dep=0,
og=4,tim=3942736181428

jforonda@u20$


I used the term "quick approximation" because I understand that those values may not be the absolute start and end timestamps. However, for the purpose of this blog entry, they should be good enough. I think it is safe to say the following:

  • the actual start timestamp can not be greater than 3928564402960
  • the actual end timestamp can not be less than 3942736181428
So I went back to my trusty old command line calculator. :) Note that I set the scale to 6 and divided by 1,000,000 because this is an Oracle9i trace file and so the tim value is expressed in microseconds.

jforonda@u20$ bc
scale=6
(3942736181428-3928564402960)/1000000
14171.778468
jforonda@u20$


So according to the raw trace file, the elapsed time is approximately 14,171 seconds. In fact, I can probably conclude at this point that the elapsed time can not be less than 14,171 seconds. OraSRP reports 14,187 seconds or around 16 seconds more. Okay, OraSRP's elapsed time is not less than my rough estimate but a 16-second difference does not look quite right. Hmm...

Then I noticed that OraSRP has an option to recognize idle 'SQL*Net message from client' events which was turned on in my initial run. So I made one more run with this option turned off. Here's a screenshot:



Now, the title of this blog entry partly reads "a very short initial impression" and it is getting too long :). So... can you spot the difference between the two screenshots?


OraSRP v3.0b posts to date

OraSRP v3.0b - a very short initial impression
OraSRP author replies
OraSRP v3.0b - questionable session elapsed time