Thursday, March 01, 2007

OraSRP author replies

Note to subscribers: If your reader does not properly display some formatted content like bold and italics (I know that at the very least, Bloglines has this problem), I suggest you visit the blog site itself.



First of all, I would like to thank OraSRP's author, Egor Starostin, for dropping by and making a comment on yesterday's blog entry, OraSRP v3.0b - a very short initial impression. I've decided to put my reply in a separate blog entry so that readers who do not subscribe to comment feeds can pick it up.

With regards to my observation that the number of lines that OraSRP reports is not equal to the actual number of lines in the raw trace file, Egor says:

OraSRP ignores some lines like
'===' or '*** 2007-02-28 12:45:08.424' and doesn't report them as 'processed'. So, 'Total lines processed in trace file' != 'Total lines in trace file'.


That's good to know, and I suspected that much. But without that clarification, a different line count may mean that there are some relevant lines that are being missed. And when relevant lines are missed, it then becomes possible that the output may be affected to a point that it no longer presents a correct profile. When the profile is not correct, then that may lead people to try to solve problems that are not the real ones to begin with. This is very important to me because the real strength of a correct profile, at least as far as I'm concerned, is its ability to pinpoint the root cause of a problem.


Of course, I have no problems bypassing irrelevant lines and since this is already being done, I respectfully suggest that the number of lines in the raw trace file still be reported, even as a separate line item, so that people don't get thrown off by differing numbers.


Now, let's double check the numbers and take into account the types of lines that Egor mentioned above. Let me deal with the '^===' lines first:

jforonda@u20$ egrep '^===' 4071.trc | head -5
=====================
=====================
=====================
=====================
=====================
jforonda@u20$


That looks good enough. Now, let's count them:


jforonda@u20$ egrep '^===' 4071.trc | wc -l
2165
jforonda@u20$


So we have 2,165 lines that fit this criteria.


For the date/timestamp lines:


jforonda@u20$ egrep '\*\*\* 200' 4071.trc | head -5
*** 2004-10-18 19:51:28.453
*** 2004-10-18 19:55:50.302
*** 2004-10-18 19:56:07.899
*** 2004-10-18 19:56:48.561
*** 2004-10-18 19:57:41.135
jforonda@u20$

jforonda@u20$ egrep '\*\*\* 200' 4071.trc | wc -l
106
jforonda@u20$


That's another 106 lines. If we add these two numbers to the total reported by OraSRP, we get a total of 2,152,611:

jforonda@u20$ bc
2150340+106+2165
2152611
jforonda@u20$

This means that there are still 4,239 lines that are not accounted for:

jforonda@u20$ bc
2156850-2152611
4239
jforonda@u20$



This is still a lot of lines that are not unaccounted for. Now, Egor's comment said "OraSRP ignores some lines like..." and that makes me think that there are other types of lines, aside from those that were already mentioned, that are being bypassed on purpose. If Egor gives me a complete list of the types of lines that are being bypassed on purpose, I'd be happy to check my trace file.


With regards to the different totals between the “with idle” and "without idle” runs, Egor said:

As for different times with/without idle waits recognition -- I believe it's a bug. Unfortunatey, I can't reproduce it yet, so I can't fix it too.

In as much as I want to help, I will say now that I can not give this trace file to Egor. As most people reading this probably know, 10046 trace files contain confidential information. This is also the reason why I crossed out the instance and node names from the screenshots that I posted yesterday. If there are readers who can share their trace files to Egor, I suggest that they try to do what I've done here and see if they can reproduce the problems that I have described.


Now, there's still that issue about the total elapsed time but I have to attend to other things so I will stop for now. I will post another entry to talk about the elapsed time within the next day or so. If you want to follow this topic, it will make it easier if you subscribe. :)

3 comments:

egor said...

If Egor gives me a complete list of the types of lines that are being bypassed on purpose, I'd be happy to check my trace file.
Obviously, there is no such list. For example, when OraSRP meet 'select blah-blah...' outside 'PARSING IN CURSOR/END OF STMT' such line is ignored; when inside -- it's counted.
Download http://oracledba.ru/orasrp/orasrp-windows302.zip and run it with --debug option. It will print skipped lines to stderr.

As most people reading this probably know, 10046 trace files contain confidential information.
10046 trace files may contain confidential information.

jforonda said...

Egor,

Thank you for your reply. As usual, I will reply sometime tonight or maybe tomorrow as I try my best to do non-work related things outside my working hours. We are also dealing with a winter storm right now.

James

jforonda said...

Egor,

I'm sorry for the delayed response for this... I was trying to decide if I should write a separate post or do my reply in the comments.

I've decided to do my reply in the comments and so I will keep it as short as I can.

> Obviously, there is no such list.

I'm sorry but not having a list of lines that are bypassed on purpose is not so obvious to me. Personally, I think it makes a lot of sense to keep such a list and to know which lines are being bypassed on purpose.

If there is no list of lines that are being bypassed on purpose, then I can not check the line count from that angle. And if that is the case, please allow me to put this another way: Under what conditions is the "number of lines processed" incremented? What types of lines are being counted? What types of lines trigger that counter to be incremented? The kind of answer that I'm looking for is something like: the following lines are counted - 'PARSING IN CURSOR', 'END OF STMT', 'WAITS', database calls (PARSE|EXEC...), etc. If you give me that list, then I believe I can write a very simple script to count those types of lines... again, keeping in mind that my trace file is clean and simple (not WAITS before first 'PARSING IN CURSOR', no "dump from memory", etc.)

The reason I ask is that I believe that not only should there be a *full accounting* of where the time goes, there should also be a full accounting of the lines processed. In fact, it would be nice if the "number of lines processed" points to a breakdown that shows the number of lines processed for each line type. This may not mean a lot to most people but to me, it serves as a basic sanity check.


>Download http://oracledba.ru/orasrp/orasrp-windows302.zip and run it with --debug option.

The debug option might be good for development purposes but for the purpose of independently checking the output of OraSRP (which is what I've done), I really prefer using separate, well-known, and well-accepted tools, if possible. That is the reason why in my posts, I used things like grep and wc. After all, who would question the output of something like 'wc -l'? Some people may question it... but I won't.

> 10046 trace files may contain confidential information.
Thank you for the correction... and yes, I agree with you that not all 10046 trace files are confidential. I did allude to that when I said "If there are readers who can share their trace files to Egor". I think we both understand, though, that the main thing I was saying there is that I can not share this file with you because it does contain confidential information.

James