[Topaz-dev] second level cache

Pradeep Krishnan pradeepk at soft-point.com
Fri Feb 29 17:26:41 PST 2008


This is triggered by Rich and Russ's performance woes with the 
publishing app.

The idea is not really to have a SPI for abstracting out cache 
implementations like Hibernate has done, but instead use the separation 
that we have for plugging in various stores using 
org.topazproject.otm.TripleStore implementations. So a store can be a 
'wrapping' store that wraps a regular triple-store implementation with a 
caching implementation. eg. an EhCacheTripleStore or OsCacheTripleStore 
implementation can wrap an ItqlStore implementation and provide get() 
caching. get() caching is a good candidate for peer-updates and 
boot-strapping etc. since it can be keyed off of the subject-uri and 
insert() and delete() methods can easily trigger cache invalidations. 
These two cache implementations do not cache query results. OQL and 
Criteria queries will still be performed - but the get() that follows 
these queries will come out of the cache. So there will be that benefit.

An alternate/additional strategy would be to have an embedded mulgara 
which can work in a 'slave' mode to a master. The embedded mulgara has a 
complete duplicate of the master data and now that mulgara and OTM are 
JTA enabled all queries can be satisfied by the embedded mulgara and the 
updates can be propagated to the master by OTM. From master to slave 
synchronization can be done based on post-commit update notifications or 
on simple poll and resync at the beginning of every transaction. I am 
guessing the poll and resync would work best since a change-log can be 
maintained by the master (may be the filter-resolver can do this?) and 
polls for updates can be fairly quick and can be satisfied out of band 
from the core mulgara database by connecting directly to the resolver.

But we may not need to use this master/slave strategy yet for the 
plosone hosting if bugs like 808 is fixed (as Ronald pointed out in a 
separate conversation) May be Andrae's XA2 with the clustering support 
will be ready when plosone's scalability requirements absolutely can't 
be satisfied by caching alone.

I am however inclined to implement the get() caching via an 
EhCacheTripleStore for the simple reason that it will make the 
application logic a lot simpler. Even though it is more efficient to 
have a higher level application cache, the complexities in managing such 
a cache is outweighing its benefits (as evidenced by the flurry of 
tickets opened by Russ today on pubapp).

An OTM second level cache however has the drawback that it requires the 
application to build out the 'view' level objects on every request. This 
should be fine as the pubapp is not currently CPU bound and the 
additional latency should not degrade the response time significantly 
(and it is cheaper to upgrade/add new hardware to scale if it comes to 
that).

Cheers,
Pradeep


More information about the Topaz-Dev mailing list