[Topaz-dev] second level cache
Pradeep Krishnan
pradeepk at soft-point.com
Fri Feb 29 17:26:41 PST 2008
This is triggered by Rich and Russ's performance woes with the
publishing app.
The idea is not really to have a SPI for abstracting out cache
implementations like Hibernate has done, but instead use the separation
that we have for plugging in various stores using
org.topazproject.otm.TripleStore implementations. So a store can be a
'wrapping' store that wraps a regular triple-store implementation with a
caching implementation. eg. an EhCacheTripleStore or OsCacheTripleStore
implementation can wrap an ItqlStore implementation and provide get()
caching. get() caching is a good candidate for peer-updates and
boot-strapping etc. since it can be keyed off of the subject-uri and
insert() and delete() methods can easily trigger cache invalidations.
These two cache implementations do not cache query results. OQL and
Criteria queries will still be performed - but the get() that follows
these queries will come out of the cache. So there will be that benefit.
An alternate/additional strategy would be to have an embedded mulgara
which can work in a 'slave' mode to a master. The embedded mulgara has a
complete duplicate of the master data and now that mulgara and OTM are
JTA enabled all queries can be satisfied by the embedded mulgara and the
updates can be propagated to the master by OTM. From master to slave
synchronization can be done based on post-commit update notifications or
on simple poll and resync at the beginning of every transaction. I am
guessing the poll and resync would work best since a change-log can be
maintained by the master (may be the filter-resolver can do this?) and
polls for updates can be fairly quick and can be satisfied out of band
from the core mulgara database by connecting directly to the resolver.
But we may not need to use this master/slave strategy yet for the
plosone hosting if bugs like 808 is fixed (as Ronald pointed out in a
separate conversation) May be Andrae's XA2 with the clustering support
will be ready when plosone's scalability requirements absolutely can't
be satisfied by caching alone.
I am however inclined to implement the get() caching via an
EhCacheTripleStore for the simple reason that it will make the
application logic a lot simpler. Even though it is more efficient to
have a higher level application cache, the complexities in managing such
a cache is outweighing its benefits (as evidenced by the flurry of
tickets opened by Russ today on pubapp).
An OTM second level cache however has the drawback that it requires the
application to build out the 'view' level objects on every request. This
should be fine as the pubapp is not currently CPU bound and the
additional latency should not degrade the response time significantly
(and it is cheaper to upgrade/add new hardware to scale if it comes to
that).
Cheers,
Pradeep
More information about the Topaz-Dev
mailing list