Wednesday, December 11, 2013

Hibernate Caching, Performance and Such

Caching

Caching is very important topic when talking about ORMs and all database access in general. Even very basic knowledge goes pretty far setting yourself on the right track.

Hibernate offers different caching strategies.

  • Session Caching
  • Second Level Cache
  • Query Caching

A word of warning! A common expectation is that caching somehow magically fixes your performance issues. It can and will help, but developers need to understand that the main point of optimization still remains with the database itself (indices, design, etc.), how you design your queries and design your database mappings. On the other hand, good understanding of caching is still vital in various situations.

Session Caching

Session caching is always on, and can't be turned off.  These Hibernate operations within the same session will introduce objects to this cache:

  • Save(), Update(), SaveOrUpdate()
  • Get(), Load(), Find(), List(), Iterate(), Filter()

This simply means, that which ever way an object is introduced to a session, it will be in the cache for the duration of that session.

The state of these objects will remain in memory until session.Flush() is called (flush will happen at the end of the session, so you should not call Flush() manually in the general case). Flushing will then force the state to the database. You may also Evict() objects from the cache by their IDs. It is rarely used but may be necessary to effectively manage cache memory in some cases.

All objects are cached by their IDs as cache keys. Consequently, cache is checked first when objects are accessed by their IDs:

  • session.Get(), session.Load(); 
  • traversing relationships: another = myObj.MyRelation.AnotherRelation;
  • with fetch join mapping hints when using session.Get(), session.Load()

Sometimes people get confused about the fact that queries in general do not fetch anything from the session cache (see query caching later). It is only when you access objects explicitly by their IDs that Hibernate will first reach to the session cache for cached objects. The session cache is primarily there to stage all the changes that must be later flushed to the database, and so during a session, all participating objects are held in memory.

One note about object identity; session promises that there is only one instance object with the same ID in the session at any time. This also means that object identity is retained (meaning that you can do obj1 == obj2 to test equality of objects provided that they came from the same session). Personally, I would not use object identity for comparisons because the applicability depends entirely on the used caching strategies, which can change from one configuration to another.

Use sessions correctly to get the benefits.

Second Level Cache

Second level cache is not bound to sessions and is used for caching in between sessions on the application level. Depending on the configured cache provider, this can be also a distributed cache. Not surprisingly, this cache is accessed second (thus 2nd level cache), after first attempting to find objects from the session cache.

Configuration

This cache level must be turned on by configuration from the Session Factory;
 hibernate.cache.use_second_level_cache=true 
One must also define a cache provider that determines the implementation. For example;
hibernate.cache.provider_class = NHibernate.Caches.SysCache.SysCacheProvider, Hibernate.Caches.SysCache
The selection of the cache provider determines the scope of the intended implementation. Typically, you'd use local caching solutions such as one above. In more extreme cases, you would consider distributed caching solutions such as Memcache. The selection does have implications as to what you can and should cache, so beware. Stick to local in memory caches if you do not know.

Further configuration is required to select which objects should be cached and what concurrency strategy they should use, and which cache region they belong to. Consider 2nd level cache for:

  • Objects that are (mostly) read-only. Sometimes called reference data.
  • Objects that we do not care to synchronize with the database religiously.

Good cache candidates usually are the leaf objects in objects graphs, so start looking at those first and see which ones are mostly read-only, or you do not care if they are up to date constantly.

Do not use 2nd level cache for objects that change state frequently because it can become highly counterproductive. 2nd level cache can also be dangerous in systems where there are other applications that write to the database outside of Hibernate. If you care about the state of the data, in these cases 2nd level cache can become a hazard.

Concurrency strategies are used to determine how the objects in the cache should behave in relation to transaction isolation with the database.

  • Transactional - read-mostly data, prevent stale data in concurrent transactions when there are rare updates.
  • Read-Write - read-mostly data, like transactional but not available in clustered environments.
  • Nonstrict-Read-Write - No guarantees of consistency between the cache and database, acceptable to read stale data. Non-critical state of data.
  • Read-Only - Data never changes, reference data.
Note that less restrictive concurrency levels perform much better. If you use distributed caches, avoid restrictive caching strategies.

Storage Design

2nd level cache stores its contents differently from the session cache. It does not store objects, but serialized forms of the objects with the cache key (ID of object). 

First, there is a cost to serializing and deserializing objects from/to the 2nd level cache. Second, you can not rely on object identity like with session caches (hence you should not rely on object identity in any case).

Cache Regions

Cache regions are named configurations that determine cache settings for cached objects (for 2nd level cache) . You can associate each object type with its region in the mapping configuration. The region will then determine things like cache sizes, expiration policies and so forth. See documentation for various cache providers for their cache region configuration.

Conclusions about 2nd Level Cache

Try using your application before attempting to deal with 2nd level caching. There is considerable amount you have to know and get right with 2nd level caching before it is effective for you. 

I have seen people apply it as a band aid for bad mappings, bad database design, sloppy queries and so on. Fix those first and you get much more mileage from your app. The serialization cost can be punitive alone if you try to stuff everything in the cache. Databases are more effective in caching than compromised 2nd level caching. 

If you use it right, you can get good results. Cache those mostly-read objects in leaves of our mapped object graphs, and all your reference data, and you get away from a lot of complicated query tuning. Choose the relationships you traverse often. That's a good trade off.

Query Cache

Query cache is used to cache the results of queries. This cache is turned on per query basis from Criteria queries, and HQL queries by setting selected queries cacheable. Also, configuration is required to turn the query cache on (hibernate.cache.use_query_cache = true)

Query caches must be used with 2nd level cache turned on to be useful!

When cache option is turned on for a query, only the IDs of resulting objects are returned from the database and cached, and the objects are then loaded from 2nd level cache. If  the query results are not not in the 2nd level cache then they get loaded by their IDs from the database one by one,  slowly!!!

If you choose to use the query cache, you MUST configure the 2nd level cache, and all the objects you intend to return from your cached query. Otherwise, you end up with results that you least expected.

No comments: