[Topaz-dev] [Topaz-pubapp-dev] Mulgara Performance Woes

Life is hard, and then you die ronald at innovation.ch
Mon Mar 3 15:21:37 PST 2008


On Mon, Mar 03, 2008 at 01:38:05PM -0800, Russell Uman wrote:
> conflating two of ronald's posts in this response...
> 
> > Ooops, I just realized that that would lead to a deadlock: 
> > because axis is opening a new connection for every request 
> > (not a problem from an efficiency standpoint on a gig-e lan, 
> > hence why we never changed this), the "begin tx" would 
> > succeed but the next query/insert/commit could get stuck in 
> > the accept-queue.
> > 
> > So, I take it back: maxThreads must indeed be large, and 
> > acceptCount probably 0.
> 
> in that case, perhaps the better route would be to reduce maxThreads on
> the pub-app side.

Not neccessarily. A lot of hits on the pub-app should be served from
the cache without needing to hit mulgara at all, so it makes perfect
sense to have many more threads on the pub-app side (though if the
cache is all memory based, that implies the requests will be purely
cpu-bounds, which in turn also means that extra threads don't help
performance, but instead only act as buffers when the number of
requests spike).

> in that case, if mulgara is not in the middle of a long transaction,
> there should always be threads available on the pub-app. 
> 
> if mulgara is hung up, the pub-app will run out of threads sooner -
> perhaps (hopefully?) returning 503 to client browsers - and mulgara will
> have an easier time catching up with the queue when it comes back,
> leading to fewer hangs?

As a temporary solution, this might work. Longer term we needs to just
clean up faster on timeouts (besides ensuring we don't run into this
problem in the first place).

> i still think that we don't really get out of this problem until we find
> a way to get better cooperation between pub-app and mulgara when mulgara
> is stuck in a long transaction.

Note that as of next version only the write transactions are a problem
- all read-only transactions (which should be the majority) will run
in parallel without being blocked. However, if mulgara is overloaded,
then it's overloaded and everything will slow down of course.

> is there anything else in the timeout train that could be inducing
> abandoned sessions on the mulgara side?
> -connectionTimout in tomcat (both HTTP and AJP connectors) is 20000ms,
> but that doesn't seem to be an issue.

That is probably the idle-connection timeout and comes into play after
a response has been sent and it's waiting for the next request, so
yes, it should not affect slow responses at all.

> -mod_jk worker timeout is set to 600000ms

Which timeout is this exactly? The connection_pool_timeout? If so,
that is similar to the above in that it only affects idle connections
to tomcat.

> -apache timeout is set to 120s...is there any way the the apache timeout
> after 2 minutes is filtering down to the pub-app? usually when apache
> times out we see the action continuing to go on the pub-app side...

Again, which timeout exactly? "Timeout"? If so, yes, this doesn't
affect slow responses either.

> > Well, you build in the knowledge in the app that mulgara only 
> > handles a single write-tx at a time and therefore 
> > lock/serialize all operations on the app side, but I don't 
> > think that's a good idea. I think the best approach is just 
> > to make sure the sessions get closed quickly - part of the 
> > problem currently is the use of http-sessions to manage state 
> > and what appears to be a bug in axis about loosing cookies on 
> > timed out operations, something that should go away in the 
> > next release since we're swithing to RMI.
> 
> i don't think we can wait until 0.9 to resolve this crisis. what can we
> do in the short term to fix? can we get axis to re-use the same thread?
> can we get the pub-app to refrain from opening a new session after a
> long wait? 

In the short term fix the caching. Mulgara's load was very little
before this upgrade (typically 3 - 4% cpu, IIRC), so this sort of
jump is odd to say the least.

> i know that we can't squeeze any more juice out of mulgara until 0.9.
> however, i do think we should be able to find a way to gracefully return
> 503 to the client when mulgara is slow, rather than piling up and
> abandoning mulgara sessions so that the whole stack is hung.

How about limiting the number of sessions opened on mulgara - that
should be easy enough to add our wrapper that opens mulgara sessions.
Then set the limit to 5 or 10 or something.


  Cheers,

  Ronald



More information about the Topaz-Dev mailing list