Note to subscribers: If your reader does not properly display some formatted content like bold and italics, I suggest you visit the blog site itself.
For those who may just be passing by, this is a continuation of two previous blog entries:
If you have not read these two entries, I suggest that you do so first, specially the first one, since that is where the possible problem with the calculation of session elapsed time was initially described. I also suggest that you read Egor's comment for that entry.
As the title says, this entry is about an observation that I made regarding the discrepancy between the session elapsed time reported by OraSRP, and the contents of the raw Oracle Extended SQL Trace File (simply referred to as trace file from hereon). Earlier, I said that OraSRP reports a number that is around 16 seconds more than what I can readily see and approximate from the trace file. In his comment, Egor explained:
It depends on how to calculate elapsed time.
For example, OraSRP ignores 'tim' in 'PARSING IN CURSOR' lines.
Also, if trace file starts from WAIT lines then OraSRP will sum up elapsed times of such waits and substract it from the first 'tim' value.
Thank you, Egor, for that explanation. But please allow me to be more specific.
Session Start Timestamp
Here's a screenshot that shows the top of the trace file (sensitive information has been crossed out, please click to enlarge):
From that screenshot, I make the following observations:
- There are no WAIT lines before the first 'PARSING IN CURSOR' line so the mention of WAIT lines in Egor's comments is not applicable to this specific situation.
- Let's follow Egor's comment and ignore the tim value from the 'PARSING IN CURSOR' line (line 22). The next tim value after line 22 comes from the EXEC line (line 25). The value of tim for that line is 3928564402472. Those who are closely following the values of the timestamps may notice that the value of tim from the EXEC line (line 25) is less than the value of tim from the 'PARSING IN CURSOR' line. In this specific case, the EXEC tim is 488 microseconds (or 488/1,000,000 of a second) before the tim from 'PARSING IN CURSOR'. This may seem a little bit odd but if you happen to have the Optimizing Oracle Performance book, the reason why this is so is explained on pages 100-101 in the section titled "Oracle Release 9".
At this point, I think it is safe to make the following statements regarding the start timestamp:
-
The actual start timestamp can not be greater than 3928564402472. By that I mean that it can be less, but not 16,000,000 microseconds less.
-
I believe that the earliest timestamp that one can get from this specific tracefile is the one in the 'SESSION ID' line (line 19). In fact, the Optimizing Oracle Performance book describes this line as the line that "identifies the time and the session at which the first trace line was emitted" (page 104). Purists will probably disagree with me but personally, I can tolerate not using the value in line 19 as start time, specially in cases where the session elapsed time is big (like in this case where it is more than 14,171 seconds). I say this for the following reasons:
- The 'SESSION ID' time is in a different format and converting it to the tim format is not always straightforward. Of course, if one can figure out a reliable way to convert the 'SESSION ID' time to tim format, then this is the time that should be used as session start time.
- The difference between the times that are represented between lines 19 and 25 is likely very small, just like the difference between lines 22 and 25 is also small (488 microseconds). I can personally tolerate this difference knowing that it is a small one-time occurence. And in the context of what is being discussed here now, I can not imagine a valid case where this difference would go as high as 16 seconds (or 16,000,000 microseconds).
Session End TimestampHere's a screenshot that shows the bottom of the trace file (sensitive information has been crossed out, please click to enlarge):
From that screenshot, I make the following observations:
- The last line that has a tim is the EXEC line (line 2156805) and the value of tim from this line is 3942736181428.
I said in the first post that the actual end timestamp can not be less than the tim value in the EXEC line above. By that, I implied that it can be more, but not 16,000,000 microseconds more. In this specific case, there are several WAIT lines that follow this EXEC line. The ela values for these WAIT lines must be accounted for.
The total amount of time that can be attributed to these WAIT lines is the sum of their
ela values:
Line # ela
------- -----
2156806 1
2156807 560
2156809 917
2156810 2
2156811 518
------- -----
Total 1,998
jforonda@u20$ bc
1+560+917+2+518
1998
jforonda@u20$
To get the session's end timestamp, simply add the last tim value (3942736181428 from line 2156805) to the total ela of the WAIT lines above (see Clock Walk Formulas, p101-102, Optimizing Oracle Performance).
jforonda@u20$ bc
3942736181428+1998
3942736183426
jforonda@u20$
To the best of my knowledge, that's the maximum value of the end timestamp that one can get from this specific tracefile. If anyone can see anything that I've missed, I will welcome your corrections.
Session Elapsed Time
The session elapsed time should just be the difference between the end timestamp and start timestamp:
jforonda@u20$ bcscale=6(3942736183426-3928564402472)/100000014171.780954jforonda@u20$So there you go. That is how I would calculate this specific trace file's session elapsed time. Is this how others would do it? If what I presented above is wrong, where did it go wrong? If the method above is wrong, is it so way off to a point that it can not be relied on? Is it even
reasonably close enough to be within the bounds of what is acceptable?
Comments are welcome, whether you agree or disagree. Moderation is on to control spam.
Assuming that the calculation above is correct (or at least reasonably close enough to what it should be), it means that nothing much has changed from the initial observations that I made in the first post -- that is, there is a discrepancy of around 16 seconds from the value reported by OraSRP:
jforonda@u20$ bc
scale=6
14187.9512-14171.780954
16.170246
jforonda@u20$
Again, assuming that the calculation above is correct, some people might then ask:
why bother with 16 seconds specially since it is just around 0.11% of the total? I'm very interested to hear what others think about this. Does this 16-second discrepancy tell you anything? At the very least, does it
suggest anything? What kinds of questions come up on your mind as a result of this? If you are a person who does Oracle performance analysis and improvement in
high profile situations, is this 16-second discrepancy acceptable to you? If you have some level of understanding of the trace file and you have a programming background at the same time, is this discrepancy acceptable to you? Why? Why not? Again, your comments are welcome. If there are no comments, I will take this up in a future post when my schedule permits.
If you find this series of posts was worth your time, I would be honored if you
subscribe.