Programming Cosmos: hibernate

Wednesday, December 11, 2013

Hibernate Caching, Performance and Such

Caching

Caching is very important topic when talking about ORMs and all database access in general. Even very basic knowledge goes pretty far setting yourself on the right track.

Hibernate offers different caching strategies.

Session Caching
Second Level Cache
Query Caching

A word of warning! A common expectation is that caching somehow magically fixes your performance issues. It can and will help, but developers need to understand that the main point of optimization still remains with the database itself (indices, design, etc.), how you design your queries and design your database mappings. On the other hand, good understanding of caching is still vital in various situations.

Session Caching

Session caching is always on, and can't be turned off. These Hibernate operations within the same session will introduce objects to this cache:

Save(), Update(), SaveOrUpdate()
Get(), Load(), Find(), List(), Iterate(), Filter()

This simply means, that which ever way an object is introduced to a session, it will be in the cache for the duration of that session.

The state of these objects will remain in memory until session.Flush() is called (flush will happen at the end of the session, so you should not call Flush() manually in the general case). Flushing will then force the state to the database. You may also Evict() objects from the cache by their IDs. It is rarely used but may be necessary to effectively manage cache memory in some cases.

All objects are cached by their IDs as cache keys. Consequently, cache is checked first when objects are accessed by their IDs:

session.Get(), session.Load();
traversing relationships: another = myObj.MyRelation.AnotherRelation;
with fetch join mapping hints when using session.Get(), session.Load()

Sometimes people get confused about the fact that queries in general do not fetch anything from the session cache (see query caching later). It is only when you access objects explicitly by their IDs that Hibernate will first reach to the session cache for cached objects. The session cache is primarily there to stage all the changes that must be later flushed to the database, and so during a session, all participating objects are held in memory.

One note about object identity; session promises that there is only one instance object with the same ID in the session at any time. This also means that object identity is retained (meaning that you can do obj1 == obj2 to test equality of objects provided that they came from the same session). Personally, I would not use object identity for comparisons because the applicability depends entirely on the used caching strategies, which can change from one configuration to another.

Use sessions correctly to get the benefits.

Second Level Cache

Second level cache is not bound to sessions and is used for caching in between sessions on the application level. Depending on the configured cache provider, this can be also a distributed cache. Not surprisingly, this cache is accessed second (thus 2nd level cache), after first attempting to find objects from the session cache.

Configuration

This cache level must be turned on by configuration from the Session Factory;

hibernate.cache.use_second_level_cache=true

One must also define a cache provider that determines the implementation. For example;

hibernate.cache.provider_class = NHibernate.Caches.SysCache.SysCacheProvider, Hibernate.Caches.SysCache

The selection of the cache provider determines the scope of the intended implementation. Typically, you'd use local caching solutions such as one above. In more extreme cases, you would consider distributed caching solutions such as Memcache. The selection does have implications as to what you can and should cache, so beware. Stick to local in memory caches if you do not know.

Further configuration is required to select which objects should be cached and what concurrency strategy they should use, and which cache region they belong to. Consider 2nd level cache for:

Objects that are (mostly) read-only. Sometimes called reference data.
Objects that we do not care to synchronize with the database religiously.

Good cache candidates usually are the leaf objects in objects graphs, so start looking at those first and see which ones are mostly read-only, or you do not care if they are up to date constantly.

Do not use 2nd level cache for objects that change state frequently because it can become highly counterproductive. 2nd level cache can also be dangerous in systems where there are other applications that write to the database outside of Hibernate. If you care about the state of the data, in these cases 2nd level cache can become a hazard.

Concurrency strategies are used to determine how the objects in the cache should behave in relation to transaction isolation with the database.

Transactional - read-mostly data, prevent stale data in concurrent transactions when there are rare updates.
Read-Write - read-mostly data, like transactional but not available in clustered environments.
Nonstrict-Read-Write - No guarantees of consistency between the cache and database, acceptable to read stale data. Non-critical state of data.
Read-Only - Data never changes, reference data.

Note that less restrictive concurrency levels perform much better. If you use distributed caches, avoid restrictive caching strategies.

Storage Design

2nd level cache stores its contents differently from the session cache. It does not store objects, but serialized forms of the objects with the cache key (ID of object).

First, there is a cost to serializing and deserializing objects from/to the 2nd level cache. Second, you can not rely on object identity like with session caches (hence you should not rely on object identity in any case).

Cache Regions

Cache regions are named configurations that determine cache settings for cached objects (for 2nd level cache) . You can associate each object type with its region in the mapping configuration. The region will then determine things like cache sizes, expiration policies and so forth. See documentation for various cache providers for their cache region configuration.

Conclusions about 2nd Level Cache

Try using your application before attempting to deal with 2nd level caching. There is considerable amount you have to know and get right with 2nd level caching before it is effective for you.

I have seen people apply it as a band aid for bad mappings, bad database design, sloppy queries and so on. Fix those first and you get much more mileage from your app. The serialization cost can be punitive alone if you try to stuff everything in the cache. Databases are more effective in caching than compromised 2nd level caching.

If you use it right, you can get good results. Cache those mostly-read objects in leaves of our mapped object graphs, and all your reference data, and you get away from a lot of complicated query tuning. Choose the relationships you traverse often. That's a good trade off.

Query Cache

Query cache is used to cache the results of queries. This cache is turned on per query basis from Criteria queries, and HQL queries by setting selected queries cacheable. Also, configuration is required to turn the query cache on (hibernate.cache.use_query_cache = true)

Query caches must be used with 2nd level cache turned on to be useful!

When cache option is turned on for a query, only the IDs of resulting objects are returned from the database and cached, and the objects are then loaded from 2nd level cache. If the query results are not not in the 2nd level cache then they get loaded by their IDs from the database one by one, slowly!!!

If you choose to use the query cache, you MUST configure the 2nd level cache, and all the objects you intend to return from your cached query. Otherwise, you end up with results that you least expected.

Tuesday, December 18, 2012

Hibernate Sessions

I have been using Hibernate for a very long time; at least nine years if not more. It is perhaps the best known ORM tool in the Java/.NET world today. There are many alternatives but none have the feature set or maturity of Hibernate. Hibernate is perhaps not the easiest tool to use, partly due to its long history, and partly because the problem it tries to solve is complicated, but you can get started with some basic information quite nicely. All criticism aside, it is still a fine tool for a general purpose ORM approach, and perhaps the best one to use for bigger applications.

Sessions seem like an appropriate place to start when navigating the Hibernate waters, so let us examine that topic.

Sessions

Working with Hibernate is built around sessions. This is the Unit of Work pattern, and to use Hibernate correctly, you must understand this concept well.

The pattern you use to work with Hibernate is as follows:

Open session
Begin transaction
Using the session, query persistent mapped objects.
and/or add new objects to session.
and/or delete queried objects
and/or modify objects queried or added.
Commit transaction
Close session.

This is your logical transaction, the Unit of Work that you perform, per action, in your application. Note that while this is often the same as a database transaction, these concepts are not equivalent. You may, if you so choose, open a session and transaction, query objects, make changes, flush those changes to a database (commit), open another transaction, make more changes, flush (another commit), and so on. One session, many transactions. This allows you to keep tracking all the changes that you are making but stage your database changes in several steps.

The same basic steps in C# code using NHibernate:

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
 var customer = session.Load<Customer>(customerId);
 customer.Name = "New Name";

 var newCustomer = new Customer(anotherCustomerId, "Another Customer");
 session.save(newCustomer);

 tx.Commit();
}

What do you get by doing this?

all mapped objects that are introduced to sessions are tracked for their changes.
once the transaction in committed, all those changes will be persisted to the database.
all objects within the session are cached. Objects accessed by their IDs come from the cache if already there. This includes all session.Load/Get calls and objects loaded via relationships by their IDs.
Hibernate can batch your updates to the database. Say you made 100 changes to objects, if you had set your batch size to 100, you will likely update everything in one database round trip. Hibernate is also smart enough to flush changes to the database when it needs to so you don't have to worry when to flush things manually. It is enough just to commit the transactions and close sessions as was explained.

Problems with Sessions

If you deviate from the mentioned session usage pattern, you will run into issues, and you will complicate your life tremendously.

Objects Outside Sessions

Hibernate does not know how to deal with any objects that it is not tracking within a session. You know when you have messed something up with the session management if you run into "non unique object", "not persistent object", "non transient object", "lazy loading", or "no active session" exceptions.

What these kinds of exceptions mean is that you are trying to interact with Hibernate with objects that were not introduced to the session, session was already closed, or the objects were introduced in another session (which now is obviously closed). All the objects that you want Hibernate to track, should be loaded, queried, modified, deleted in the same session. If not, then you must explicitly reintroduce (merge) objects back to the session. If you have lazy loading references, or collections, they can only be accessed within the same active session. Since lazy loading is an important concept to be utilized with Hibernate, it is also perhaps the most common scenario where the problems arise.

Lazy Loading and UI Rendering

People often use Hibernate loaded objects while rendering some type of user interface. A web page is typically rendered by passing some Hibernate objects to the view template engine. The problem with this is that objects may have lazy loading members that are loaded at the time of access only, not when the parent object was originally loaded. When a web page is rendered from the template, in may be that the Hibernate session is already closed because the control has already moved away from the code that programmers write (in a controller for example).

For this type of approach to work, the session must be open, even during page rendering. This is often referenced as the Open Session in View pattern. However, I can't openly recommend this pattern, while it solves the problem. It overlooks the fact that views are often composed from more than one action in real life, and are more naturally represented by individual sessions, one per action. Also, domain/Hibernate models are not often the same thing as view models and it might be better to actually translate domain (Hibernate) models to view models and back again to domain models. I am using the "domain model" here quite liberally, but distinctively as a separate concept from view models.

Many of us who use the MVC (Model View Controller) frameworks for our web application often struggle with the "model" concept. Typical frameworks do not really require anything from the M in the MVC, so developers are often left to come up with their own idea what the M means. Sadly, this will also lead to misuse of tools like Hibernate and more generally to mixing different layer concepts. The M in MVC is the mental model of the user (what user sees on the screen), not an internal representation of the domain (programmer's mental model). The domain model, incidentally, is what you load and manipulate with the help of Hibernate, but it is not the M in MVC. The controller (C) is where the translation between the mental models happens.

Session Infrastructure

Managing sessions means repeating a lot of boilerplate code, unless you use infrastructure. You want to hide session management in common scenarios so you do not need to worry about it a whole lot. Obviously, you must still be aware that all of that machinery is still there under the hood.

One of the best pieces of advice in this regard comes from Ayende@Rahien blog series about odorless and frictionless code.

Note that this is done for .NET MVC, but many frameworks have similar hookup points to do infrastructure work. The action filter code in the example wraps around your action call and repeats the usage pattern boilerplate code that I explained. This is simply an "around advice" or "interceptor" in AOP terms.

It also demonstrates the point about "actions". There can be many actions in a web request, where each action executes a particular job for the page. Think of the actions like little windows or blocks on a web page that typical designs render, each separate from another. Sessions should last only so long as the action does, same with transactions.