[Topaz-dev] [Topaz-pubapp-dev] Mulgara Performance Woes
Life is hard, and then you die
ronald at innovation.ch
Mon Mar 3 15:21:37 PST 2008
On Mon, Mar 03, 2008 at 01:38:05PM -0800, Russell Uman wrote:
> conflating two of ronald's posts in this response...
>
> > Ooops, I just realized that that would lead to a deadlock:
> > because axis is opening a new connection for every request
> > (not a problem from an efficiency standpoint on a gig-e lan,
> > hence why we never changed this), the "begin tx" would
> > succeed but the next query/insert/commit could get stuck in
> > the accept-queue.
> >
> > So, I take it back: maxThreads must indeed be large, and
> > acceptCount probably 0.
>
> in that case, perhaps the better route would be to reduce maxThreads on
> the pub-app side.
Not neccessarily. A lot of hits on the pub-app should be served from
the cache without needing to hit mulgara at all, so it makes perfect
sense to have many more threads on the pub-app side (though if the
cache is all memory based, that implies the requests will be purely
cpu-bounds, which in turn also means that extra threads don't help
performance, but instead only act as buffers when the number of
requests spike).
> in that case, if mulgara is not in the middle of a long transaction,
> there should always be threads available on the pub-app.
>
> if mulgara is hung up, the pub-app will run out of threads sooner -
> perhaps (hopefully?) returning 503 to client browsers - and mulgara will
> have an easier time catching up with the queue when it comes back,
> leading to fewer hangs?
As a temporary solution, this might work. Longer term we needs to just
clean up faster on timeouts (besides ensuring we don't run into this
problem in the first place).
> i still think that we don't really get out of this problem until we find
> a way to get better cooperation between pub-app and mulgara when mulgara
> is stuck in a long transaction.
Note that as of next version only the write transactions are a problem
- all read-only transactions (which should be the majority) will run
in parallel without being blocked. However, if mulgara is overloaded,
then it's overloaded and everything will slow down of course.
> is there anything else in the timeout train that could be inducing
> abandoned sessions on the mulgara side?
> -connectionTimout in tomcat (both HTTP and AJP connectors) is 20000ms,
> but that doesn't seem to be an issue.
That is probably the idle-connection timeout and comes into play after
a response has been sent and it's waiting for the next request, so
yes, it should not affect slow responses at all.
> -mod_jk worker timeout is set to 600000ms
Which timeout is this exactly? The connection_pool_timeout? If so,
that is similar to the above in that it only affects idle connections
to tomcat.
> -apache timeout is set to 120s...is there any way the the apache timeout
> after 2 minutes is filtering down to the pub-app? usually when apache
> times out we see the action continuing to go on the pub-app side...
Again, which timeout exactly? "Timeout"? If so, yes, this doesn't
affect slow responses either.
> > Well, you build in the knowledge in the app that mulgara only
> > handles a single write-tx at a time and therefore
> > lock/serialize all operations on the app side, but I don't
> > think that's a good idea. I think the best approach is just
> > to make sure the sessions get closed quickly - part of the
> > problem currently is the use of http-sessions to manage state
> > and what appears to be a bug in axis about loosing cookies on
> > timed out operations, something that should go away in the
> > next release since we're swithing to RMI.
>
> i don't think we can wait until 0.9 to resolve this crisis. what can we
> do in the short term to fix? can we get axis to re-use the same thread?
> can we get the pub-app to refrain from opening a new session after a
> long wait?
In the short term fix the caching. Mulgara's load was very little
before this upgrade (typically 3 - 4% cpu, IIRC), so this sort of
jump is odd to say the least.
> i know that we can't squeeze any more juice out of mulgara until 0.9.
> however, i do think we should be able to find a way to gracefully return
> 503 to the client when mulgara is slow, rather than piling up and
> abandoning mulgara sessions so that the whole stack is hung.
How about limiting the number of sessions opened on mulgara - that
should be easy enough to add our wrapper that opens mulgara sessions.
Then set the limit to 5 or 10 or something.
Cheers,
Ronald
More information about the Topaz-Dev
mailing list