[Topaz-dev] [Topaz-pubapp-dev] Mulgara Performance Woes
Life is hard, and then you die
ronald at innovation.ch
Wed Mar 5 07:42:46 PST 2008
On Tue, Mar 04, 2008 at 09:49:47AM -0800, Russell Uman wrote:
>
> > I just tried checking the sar logs, but they only go back to the 24th.
> > However, it seems you upgraded on the 26th at 2:53PM? In that
> > case, the sar log from the 25th indicates about 31% avg cpu
> > usage, and on the 27th 41% avg cpu usage. If the update date
> > is correct, then this would indicate that while there is some
> > increased cpu usage in the new code, it's not huge and not
> > nearly what I thought. But if the increase is only those 10%,
> > then why are you suddenly running into these massive
> > problems? Or am have I got a completely wrong picture of
> > things and you've been having big problems before the
> > upgrade? (I know you've had problems while ingesting, but now
> > you seem to have problems all the time?)
>
> :) i certainly haven't been monitoring mulgara's load regulary. i was
> just curious what evidence you had to favor an inflection point over a
> slow increase in traffic. it sounds like we don't know - but at least it
> wasn't a sudden jump at the release of 0.8.2.1.
Agreed, looks like my memory and knowledge was faulty (though there
was some noticeable increase with 0.8.2.1).
> the timeline iirc is as follows:
>
> we identified the ehcache memory leak or whatever issue sometime before
> 0.8.2 (maybe even before 0.8.1?) and we instituted cron restarts to
> prevent it from happening, and to try and get the inevitable
> restart-related crashes to happen at a known time.
>
> then, for the entire month of february, we were ingesting 12 hours a day
> and performance was terrible, presumably because of ingest.
>
> next, since launching the CJs (on 2/29, after a 2/26 upgrade to 0.8.2.1)
> we've had terrible performance.
>
> my theory is that the extra traffic from the CJs is the cause of our
> load and instability now that we're not ingesting constantly.
Looking at the cpu usage after 2/29 I can't really say there's been a
noticeable increase. But cpu usage is only part of the picture, of
course.
> however, we should definitely consider other possibilities. it's
> definitely possible that we've introduced or uncovered some bugs related
> to caching - perhaps we're flushing caches prematurely on some
> operations - i'll ask the front end devs what logging options we have to
> investigate this.
I would suggest setting the log level for ItqlHelper to debug on the
pub-app side - that'll give you a nice log of all itql requests sent,
their responses, and you'll see how long stuff is taking. You can then
easily correlate those queries with the requests that are causing them.
Cheers,
Ronald
More information about the Topaz-Dev
mailing list