Programming Cosmos

Saturday, January 3, 2015

JavaScript Melancholy

In some recent history there have been many blog posts and articles about the state of JavaScript. In particular, there seems to be some level of surprise that the landscape has turned into the same morass as every other environment, and in a very short order. Perhaps even more so than anything else if you look at the NPM package counts surpassing that of Java Maven.

Generation JavaScript describes this whole thing quite well.

I am personally not surprised at all about the core point, though I am somewhat surprised about the lightning speed of how quickly things turned to this mess, but maybe I should not be. Things are moving with ever more rampant speed in this industry, but everything new is not good.

It is a language that as an profession we should be pretty ashamed for letting become so popular; it is for cowboy developers writing cowboy systems and anybody setting out to build new applications on top of this (with the intention of longevity) should strongly reconsider. I think that the motivation of using JS to teach programming is understandable because of how accessible it can be to new people but it is also a mistake because it teaches the wrong message about our industry - it is the antithesis to professionalism. If we were ever in a "race to the bottom" then JS can be considered to have won; it has lowered the barrier to entry so far that we'll be dealing with the consequences of this mistake for decades to come.

--Rob Ashton (Disinterest Curve: JavaScript and Node)

It is quite interesting to see Rob's own shift on the topic if you read his earlier entries.

Software Fashion Industry

Software seems to repeat the same cycle as fashion. There always has to be something new almost every year, and every kid follows the new patterns and garments with religiosity. I am pretty sure, if you are a JavaScript "professional" these days, you are one of those fashion lovers. Trust me, in a couple of years it is something else.

Yet, after a while, you will notice how the same patterns and same approaches repeat. They are rehashed, and backed with flamboyant 'opinionated' software that decidedly garners the interest of many until the spell clears once again, or people see the next new thing. There is a back chorus of people who criticize the state of software development, because they have been there and done that already.

Hype Machine

There always seems to be some kind of hype machine going on in this industry that repeatedly does disservice to itself. I am not so sure these things are always driven by programmers themselves though we do like the new and shiny a bit too much.

I am not so against getting people interested in writing software. People should be. The problem is that those same people get caught in the hype cycle and do not educate themselves. There are a lot of dropouts in that process but not before they write a lot of bad code that lives for a long time.

There seems to be no real sensible collective memory about lessons learned. It is perhaps created by the obsession about youth and the hacking culture. I am not talking about real Hackers but this kind of quick turnaround, copy pasting, everyone can code culture, where you actually spend little time with theory and deeper lessons. I do like the sense of enthusiasm but not necessarily the results.

This is a problem in the industry where there really are no prevalent standards or requirements for formal education. The truth is that people need mentors so they can learn and fail safely before stuff gets out in the wild. Open source projects just are pretty much free for all - especially, you can start a new one any time. It can be a great thing, but also bad considering the tidal wave of new and soon obsolete libraries and frameworks.

Thursday, June 5, 2014

Communities on the Internet, Then and Now

Almost Disconnected

It was late 80'ies when I first got my first Internet experiences. It was not much by today's standards but it was mind blowing to me that you could connect to other computers across the world. Before this, you'd typically call to modem pools to get access to some fixed computer setup over the telephone network. You could leave a message on some bulletin board, much like Internet forum today. It would be just a pain in the ass to go and read the updates and check things again. Later you could do something similar over the Internet but you still had to call some modem pool to get there. And holy cow, you still had to know IP-addresses to find stuff. It was somewhat of a person-to-person inside knowledge thing to get around. No real sense of community here because the interval between exchanges was always too long.

The Golden Age

Well, that was the old way and not very inclusive to people at large. But, before we get to the WWW, there was an interesting period of time that has gone missing in away. It is still alive but people do not necessarily engage that way anymore. It was the world of Internet Relay Chat (IRC), and interactive text games, and chat rooms. It was the time of inclusive communities. Think of these as islands where people would meet and communicate. The sense of it was more private and somehow more meaningful in some sense because you were not exposed to millions and millions of people through some flood gate. The second thing about this was that the mode of communication was often immediate and then forgotten. You could choose to leave your communications on a bulletin board, but you could just as easily choose to talk to someone and have the bits lost for an eternity once it was all done. These communities would also have loose gate keeping policies and some behavioral controls so that nobody would ruin the experience. I so miss the safety and convenience, inclusiveness of these kinds of communities. You could just pop in and have a chat with your friends in real time, and there would always be someone you know, at any time.

For me, this was the way I learned to love Internet in real terms. I had the choice where to go, who to include, what to record and leave behind, and what not to. That was real freedom, and it fostered trust and real sense of community. These are very strong bonds that span decades between people from all over the world.

The Curse of WWW

WWW changed all that. Even then, Facebook and Twitters of the world are rather recent inventions. What strikes me about this was that it took a really long time for me to even get into Facebook or Twitter. The mode of communication had totally changed. The cozy feel of a protected community was somehow lost in it all.

Even Facebook, while inclusive to 'friends', is a very watered down model of the kind of freedom that used to be available to you. Your circle of friends does not foster trust the same way if you have 100 of them. Further, Facebook is a leaky boat with all of its privacy rules. The stuff that you say, does not stay in there, but leaks everywhere through your 'friends'.

I suppose the Facebook groups sort of give you this, but the freedom of joining is sort of tainted with all the spam bots and the like. Regardless, even these communities are not protected due to the leaky privacy rules. The very least the privacy defaults are not sane. You will soon find that your friends who are not part of the community start commenting on your timeline about your comments in the closed community you posted to. It's so badly broken.

There are other things, like Twitter. Twitter is a blast horn built for people who collect groupies. The interaction can be more immediate here but the relative anonymity of the 'follow' function sort of soils the safety aspect of any kind of 'tribalism' that is so natural to people. The groupie thing is really not so exclusive to Twitter as you can mimic the same behavior with Facebook by having those 100 "friends". To be fair, Twitter was never built to be a community driver but it is quite representative of the fact that we are forgetting this side of communication.

People Should Forget and so Should Internet

What still feels wrong is the permanent nature of your communications. Unfortunately for us, we suck, so we put out garbage that leaks out of our heads on the Internet. If you are engaged at all, you WILL make mistakes. Except now, everyone sees and remembers your crap forever. Not only that but you have this shadow crew of haters that jump out of thin air because of the leaky boat that knows no bounds. You are in effect always afraid and feel insecure, and this results to not really speaking your mind and being who you are. Who's watching? You freaking never know, which is unsettling to anyone. I do not really want to worry about that a whole lot but you can't break free and feel safe in your interactions. Safety is paramount to trust and forming friendships, for example. That would be the core of any community.

True Community Dialogue Lost?

The true tragedy of this all is that true dialogue between people has been lost in some ways. It feels cold, and impersonal. When your communication is reduced to one liners that never go anywhere, even in the comment section, other than having a bunch of silly remarks where people are trying one-up each other. Another way that things have gone funny is that there is no easy flow in and out of interpersonal communication that you can mix and match. It may be paradoxical but due to all the noise and blast horn style garbage, communication is not better, or easier. For one, you are always semi paranoid, so you are deterred.

The nature of connections between people has permanently changed. They are casual and impersonal, never quite leaving a lasting impact or feel of engagement. So, I sometimes wonder if all the commercialization of communication has done us a disservice. I can see the value in all the information and entertainment but somehow the human aspect is diminished, and we are fed with artificial food.

I guess I just miss the whole meaning of 'community'. To me it is like a lake, with streams coming in, and streams leaving, freely, but with people who form the core, who can have meaningful conversations reliably, at any time. All those immediate communications should be forgotten and thrown away and let people exclusively record things when they want to, for others to see when you are not there right now.

I am unsure if I really fit in the world of Facebook and all the rest all that well. I suppose it gives some people a sugar high but I demand more than "Hello Kitty" and "Achievement Stickers".

Sunday, April 27, 2014

Dealing with ASP.NET MVC Device Output Caching Issues

ASP.NET MVC output caching supports variability by device. This is a useful feature to have when building a mobile version of the desktop site, for example. However; there has been a persistent bug in the output cache support. It was fixed for a short time, but it has mostly remained broken and still is even today (MVC 5.1). Instead of waiting for a fix, and having it break again like before, you can go around this issue.

Workaround

Here is a relatively easy and transparent way to get around this bug:

Use VaryByCustom output cache parameter for the actions.
Use the cache profile support to implicitly inject your VaryByCustom parameter to your output cache.
Implement the custom parameter handling in your Global.asax.cs

Use VaryByCustom output cache parameter for the actions.

This is repetitive, so we want to avoid sprinkling this custom attribute everywhere. We can use the cache profile support in your Web.config to add this attribute implicitly. (Just set the timeouts in your config).

So, now your OutputCache annotation becomes:

Finally, we implement the custom cache handling in the Global.asax.cs:

Tuesday, April 8, 2014

Should Everyone Learn to Program?

In the last couple of years, there have been calls to "learn how to program", and that "everyone should". Here's my take on that topic.

Motivations

I was thinking what the root motivation for this call-to-action is.

Technology is suddenly cool.
Geeks are suddenly cool.
Some tech companies are becoming as big as big oil.
Apparent lack of talent.
The jobs pay rather well.
Many other high paying low skilled jobs are vanishing.
Erosion of middle class, eliminated by globalization and automation.

There is a definite smell of a bubble with the tech fervor. Investors and companies are paying huge sums for token enterprises that seem promising. I really wonder how much of that is viable in reality. This whole image is kept alive by the huge cash reserves that bigger tech companies now sit on, and you'll notice how Google, Facebook, Apple etc. are gobbling smaller start ups and paying big sums for them - because they can.

Just recently the big tech companies were embroiled in a wage fixing ring limiting engineer pay. The pay is reasonable, but the lack of talent is quite palpable so the employee poaching is a fear that these companies have. So, what to they do instead? They pay huge sums for companies with promising tech, but all the same, they are equally interested in the software developers and their talent that they are acquiring. Poaching by acquisitions. It is just as important to vacuum the talent away from your competition before they get to it. The price goes up, up, up. Hence the bubble.

The other end of the equation is the quick erosion of well paying low skilled jobs, and even medium skilled jobs. What is left is creative high skilled jobs. What still pays well? Being a doctor, lawyer, and some other limited occupations... and then there is energy, technology, engineering and programming. Here in Canada, 75% of people make well less than $40K per year, but programming jobs get you to top 10% rather easily and even much higher.

Dreams and Realities

Politicians and policy makers have caught on this. This is the new opportunity to create a high amount of well paying jobs and revive the ailing job market, and then collect tax revenues. Anyone can build an App and make millions! Only if we got every school kid to program!

The well meaning thought process is idealistic. I give them that. I even favor basic level programming becoming part of school curriculum, as part of general technology education. It's basic knowledge that people should know when they enter this new world.

But, that does not guarantee that now people will flock to programming as the new promised land. Even if we had an avalanche of new IT graduates, the problem is that programming is really not the skill that is the key.

Programming is just a means to an end. Programming solves problems, so your key skill is problem solving, not programming. Creativity, critical thinking, organization, concentration, relentless learning about various things. This is all the stuff that comes from "book-worming" all your life. Natural ability is required, so is relentless practice.

If you have high tolerance for solving problems that are so removed from most real life activities, so infuriatingly mundane that defeats even a patient person, and you are OK with the fact, and even enjoy it, then you might have a mind for programming. Stubbornness is a requirement.

There is also a point about the whole "App" thing. You can spend several life times writing Apps without making a buck in the process. You will get far more success with stable jobs in established companies.

Teaching and Schools

The whole problem of teaching programming is quite well explained here: Separating Programming Sheep from Non-programming Goats. The research paper that it is based on crudely concludes that very large portion of Computer Science students are beyond hope to be taught to program. Mind you, this is CS students, and not general population! Also, consider that out of those people who could become programmers, many do not ever enter the field in the first place, but they may go to other engineering disciplines and so on. We are really looking at a paper thin portion of the population here.

The fact that most people are never fit to be programmers is not a big surprise and why should it be? You can teach me singing all you want, but I never ever will be a singer no matter how hard I try, and same is true for most people.

If you want to teach anything to kids, it is creativity, free thinking, problem solving. Those are fluffy esoteric things that are hard to squeeze to school programs, so it means you need creative top notch teachers and mentors. The fact is that most good programmers are still self taught, motivated by their own curiosity and drive. And that is fine, as long as they have creativity, problem solving and free thinking skills.

Career Issues

There are also many problems that plague this field. Age racism being one of the major problems. It is hip to be young and cool, but the problem is that there are not enough programmers around if all you want is the young and hip. But, I think this is changing, and has changed already. Try hiring, and you'll find out. And, what you will often (not always) find out with the young and hip is the Dunning-Kruger effect. You still need experience, you need mentors, you need leaders.

The second problem is the fast pace. Your skills are always at the brink of being obsolete. You need to be a life long learner, and it needs to be second nature to you. You may be forced to switch jobs to find the things that keep you in business. Any environment that does not allow you to keep up, is poison to you. I have seen a number of examples with 'experienced' people who have done nothing relevant in years because they sat in some big corporation doing routine maintenance on some obscure system that nobody cares about. Any smart developer will leave a company that is not investing into their skills.

Third, long hours in the office. I don't know what made developers free game for ridiculous demands and silly time lines that have nothing to do with reality. Gladly, even this is changing, but there are still bottom of the pit jobs in this industry that expects the moon and the stars. It is often the expectation of 'non-techies' that you can throw developers on the problem like monkeys, and multiply by 10. Unfortunately, programming is a creative high skill empirical activity, not a production line in a factory.

And what happened to women? There are hardly any women in the field. That's a failure in its own right that makes everything more difficult. The field needs women.

The whole pay thing is a powerful driver, but ultimately the pay is more of a consequence of the points I mentioned. It will always be a job for the few and not a solution to the structural changes that societies will face. Programming is not an easy job that people might expect it to be. You need to be a business expert, subject matter expert, project coordinator, communicator, team player, and then - for heavens sake - you also need to write code. That is asking a lot and the pay starts to look reasonable if not quite modest.

Monday, December 23, 2013

How to Deal with the Monolith

Not all developers deal with big web sites or applications but many who do know this kind: The Monolith!

Some characteristics:

Lots of code; probably hundreds of thousands of lines all together.
Long history, relatively speaking.
Project based development approach.
No real describable architecture, chaotic organization.
Hard to change, slow (and costly) to work with.
Virtually irreplaceable.

Anatomy

Behold, the Monolith! No, it does not have to be one website; it can be many, but all interconnected tightly. There could be all kinds of examples, but here is one, and it will demonstrate the anatomy of this monster. In some sense, it is even hard to depict this thing because there is no clear cut way to make a sense of it in the first place.

Database

Databases, many of them, typically somehow interconnected. It could be database links, shared views, remote queries, stored procedures talking to other databases. The common denominator here is that at some point in history people felt that databases are a good place for application features and functionality. In fact, often it is very easy to integrate features and data across databases and then build stored procedure layer on top of it - all in the name of "databases must have a stored procedure layer", and that "code should be close to the data, and therefore a lot of it must live on top of the database engine". DB vendors have built all kinds of stuff people can use, so crafty developers (often DBAs) get ideas how to utilize them.

Shared Data Access

Good developers try to layer their software so they create data-access layers. But since you need to build features based on all sorts of data, you can therefore connect to any database. This is acceptable to a developer who is under deadlines, and other pressures, or simply because he does not know better. We are also good citizens so we try to reuse software and depend on other such libraries to do our bidding.

Web Layer and Business Logic

Applications need to do some work, often referred as business logic. Typically, you'd create some kind of layer for this kind of behavior but the reality might be that no such thing really exists. More likely, this type of code is stuffed right in to web controls, page backing code, or anything comparable. Also, remember the good old database and stored procedures which seem to do a lot of this work, too. This code is all over the place, and so much so that you have really hard time understanding what is going on. Over time people just copy pasted bits and pieces of this functionality where ever it was needed because there are no rich models or anything resembling a layer that you could utilize.

In the end you have some kind of pancake for business logic and web layer all intertwined. There might be some attempt in Model View Controller (MVC) type of thing, or any alternative Model View View Model (MVVM) etc, but all of it is there mostly in name only. Say, you had to build a mobile view to support mobile browsers; well, good luck building it in the same web app.

How Did It Happen?

I already alluded to this but the classic Big Ball of Mud document is a pilgrimage destination for anyone who wants to know. The Monolith is really just a version of the shanty town architecture described in BBoM.

So, it is more or less inevitable, and understandably so. Projects are a mainstay of work in any typical organization. Projects have boundaries based on some finite feature set that limits your design space. The focus of work jumps around between largely unrelated projects. People who do the work keep changing; people come in, people leave. There might be many teams, or there are really no consistent teams at all. These patterns keeps repeating year after year, project after project.

Not only that, but to the dismay of software developers, many stakeholders have no stake at all in creating good architecture, nor are they interested in paying for it. There might even be little chance that you can explain to them why they have to do it in the first place. In fact, they might even be sympathetic to your concerns but as the next project rolls in, it is clear to you that they still expect you to deal (almost exclusively) with stuff that matters to them.

How do I Survive?

Considering that you have a big web site (or any other big application) and you have this kind of problem, where do you even start? Your approaches are:

Rewrite the entire thing
Rewrite it piece by piece
Surrender

The Big Rewrite

This approach often sounds attractive but unless you have solid support and business need from stakeholders and management, this is not going to happen. Usually if this approach is taken, the system is so bad that it threatens the business in some way. Or, in the happier case, the organization has ample resources and time, so the project is taken on because there is some related business opportunity, but rarely so.

Rewrites may take a long time, and are costly and risky projects. You will also have significant difficulties capturing business rules, and old functionality of a system that has been cemented in a messy and incoherent code base. Flipping a switch on an entire system is a difficult thing to do and requires a lot of strategizing and thought.

Rewrite Piece by Piece

This approach requires taking every opportunity to improve the system. You will spend significant amount of time in all your projects cleaning up things because projects are typically the only effective vehicle for this work. This will stretch all project time lines so you need management support for your work still, but you may have a better luck selling your plan because of the piece meal approach.

To avoid some undue costs, you may limit yourself to working on fixes on the parts that are core to the system or are causing otherwise constant and significant costs to maintain. How do you know what is core? It is usually a part of the system that requires constant work to support the business. This focus may shift over time, as the business shifts focus, so it is not the same thing forever!

Surrender

Some systems do not require changes often. It may not be worth the rewrite or any kind of fix campaign. You probably just want to stabilize the system to keep it going in this case. You may never get the management support that you need to do the work, either. These may be things that you do not like, but it is still a scenario that can happen.

Strategies

I will review a couple of strategies how to do a piece by piece rewrite. I think this is the most common workable scenario for big systems.

Transform from Inside Out

What I mean by this:

Establish modularization
Utilize layering
Use common infrastructure and frameworks
Use proper Object Oriented Programming. (use Domain Driven Design etc.)

All this type of work would be done inside the Monolith application to slowly make it better from inside out, so that we eventually end up with a workable application.

This may be your first stab at the problem so this is an obvious choice. I certainly tried it. I would just say that this strategy has its limitations if your system is rather large. If you are looking at a scenario like in the anatomy picture, you are not likely to win with this strategy.

The Monolith actually has inherent gravity that does not allow you to escape into a green garden where you can maintain your newly hatched good software. You may rewrite parts of your system, but you almost never can remove the old code either because of the messy interconnected nature of The Monolith. You are also still being limited by the project nature of work which does not give you the room or freedom to roam in large swaths of software to detangle and replace things. In addition, you may attempt to aggressively create 'reusable' code and libraries but forget that reusable code creates tight dependencies, which is roughly the opposite of what you really want to achieve.

Your new code and libraries simply add more weight. It's still a monolith but now even bigger!
You can't isolate yourself from the bad parts very effectively to escape the gravity.

Break the Monolith

I would favor this strategy because it starts to deal with the actual problem. Break off your monolith into smaller applications, that then forms the system. No matter how messy your system is under the hood, on a conceptual level, your monolith still has groups of features (as seen by the user) that go together. These are the natural fault lines between the applications that you want to split off from the Monolith.

Your challenge is to detect the fault lines, take a scalpel, and cut. When doing so, you will copy code, even duplicate databases, and it is OK. You simply want to create an app that functions largely in isolation, from the database level up. The main rules is: share as little as possible, and copy over sharing.

There are many benefits to this approach:

Much smaller and manageable application.
You can change it in isolation without breaking something else.
You can assign a team to work on it.
You can do this gradually, piece by piece.
You can choose any technology stack for the app.

The main challenge? Integration with other parts of the system. But note how this is the exact same challenge that the monolith sought to solve! The monolith approach was simply to share code or share databases etc. But, that is a bad way to integrate big systems because you can not operate, manage or change things that are too tightly integrated. Every system has bad code, so that is not even the main point here.

Also, now you can circle back, and "Transform from Inside Out" each of the small applications that you split off. So, first split, then transform. If you have the luxury to rewrite the split off app, then you can do that, too!

Anatomy of an App

So, what should these applications look like? None of this is a big secret, but let's go over it anyway. You break off your first app from the monolith, so what should happen?

Your app should run without the monolith; it does not care if the rest of the system runs or not, and it is happy to do whatever it needs to do without the rest.
Your app has some data storage typically, a database if you will. The difference here is that it is not shared with any other app. This data is completely private to the app, and all the code that lives outside this app never touches this data.
Your app integrates with the monolith or other apps by asynchronous messaging or by web level services. The integration is governed by contracts that are agreements about what goes on the wire and how to talk to the other side. Now, your app will own the contracts for the integration points that is chooses to advertise. There is no other way for another app to get access or visibility to the services.

The point of this approach is to protect the application from getting locked up in ways that prevent us from changing the innards. You do not want to sink back to the situation where you can't change your app because the way you chose to integrate things.

I would invite you to read Pat Helland's Data on the Outside vs. Data on the Inside to fully understand what needs to happen here to protect your app.

One final point is to not go overboard. Do not create too many micro applications, but choose bigger cohesive collection of features that go together. The size may vary, but when you start to look at a pile of code that is several thousand lines, there might be something worthy of a separate app there. Integration is not the easiest thing in the world to get right either, so do not create unnecessary work and complications by creating applications that can't really function on their own. There should not be that many integration needs if you get the size right, though that depends on the nature of the application just as well.

Monolith Conquered

Once you have taken this path, you may be looking at something like the picture below later. You have now split your big monolithic monster into manageable chunks that mostly just hum along on their own.

You will copy reference data via messaging, and integrate web features using typical web integration techniques.

Messaging

Messaging is important for several reasons. It helps you to build loosely coupled apps, which is what we are looking for. Here you have the applications fire one-way events that other applications can subscribe to. I would not favor bidirectional request-response patterns here; we can use the web for that.

For example, if you have an application that handles user profiles and login, you may fire events related to things that users do in that app.

Account Created
Account Closed
Account Edited
User logged on
User logged out

... just to give a few examples. Pat Helland's document gives an idea what the messages should look like on the wire. The main idea is not to leak details about the internals out in your messages that bind you to other applications too tightly again. So, do not use your internal models on the wire! This is what the contracts are for. The second important thing is to "tell" what your application is doing, rather than tell some other application to do something. This is sometimes called the Hollywood Principle or Inversion of Control.

If you then require reference data, or your application needs to react to something that another application is doing, you subscribe to the relevant events. Once you receive them, you act on the events, and do what your application chooses to do. You can store data in your internal database, or fire up processes to do work. Regardless, it is up to the app to decide how to react, and it may even send events of its own based on its internal processing.

This type of integration allows you to create loosely coupled applications, but also allows very rich interaction that often responds well to previously unseen needs.

Web Integration

Often, big web applications show information on the screen from various feature sets. Your front landing page may show some popular articles, forum threads, and other distilled user content. These views may then come from a number of applications that you have split off.

This creates an integration problem that you can solve on the web level.

You may pull views on the server side.
You can use Ajax to let the browser render the views.
Or you can use some hybrid where you pull data from the server, but render on the browser.

There are different challenges to these approaches that you may have to solve, but generally, it is all doable with considerably little effort. I would favor integration in the browser if it is any way feasible because the browser has a sophisticated web page loading and caching model that avoids a lot of problems on the server side.

Try to keep all web artifacts including JavaScript and style sheets with the application that owns the web views you are serving. This way, you can make integration very painless and clean.

Web Infrastructure

A couple of words about web infrastructure you can use to build your apps and integrate it all to a big application. You may ask how to deploy such an application in pieces? Two main approaches comes to mind:

Deploy to web container, such as Tomcat (Java) as a WAR (web archive), or IIS with .NET as virtual web apps, to build one web site.
Use a reverse proxy to build a site from a number of servers and web containers in them.
Combine the two so you can scale any app as you need, but let the user see just one web site.

Summary

It is possible to build reasonable size applications that mostly stand on their own, scale on their own, and you do not suffer from the monolith mess. You can assign teams to the apps, and you can fix the internals in separation from other applications. Your system stays changeable and you can respond to the needs of the business.

Note that you can do all of this without sharing code between your apps, and you can choose different technology stacks, programming languages, and solutions. All of it can be made to work together and managed with less headaches.

You can take this approach and apply it over time, little by little. You are not forced to do the big rewrite, or take on all problems on all at the same time.

So, happy monolith slaying!

Wednesday, December 11, 2013

Hibernate Caching, Performance and Such

Caching

Caching is very important topic when talking about ORMs and all database access in general. Even very basic knowledge goes pretty far setting yourself on the right track.

Hibernate offers different caching strategies.

Session Caching
Second Level Cache
Query Caching

A word of warning! A common expectation is that caching somehow magically fixes your performance issues. It can and will help, but developers need to understand that the main point of optimization still remains with the database itself (indices, design, etc.), how you design your queries and design your database mappings. On the other hand, good understanding of caching is still vital in various situations.

Session Caching

Session caching is always on, and can't be turned off. These Hibernate operations within the same session will introduce objects to this cache:

Save(), Update(), SaveOrUpdate()
Get(), Load(), Find(), List(), Iterate(), Filter()

This simply means, that which ever way an object is introduced to a session, it will be in the cache for the duration of that session.

The state of these objects will remain in memory until session.Flush() is called (flush will happen at the end of the session, so you should not call Flush() manually in the general case). Flushing will then force the state to the database. You may also Evict() objects from the cache by their IDs. It is rarely used but may be necessary to effectively manage cache memory in some cases.

All objects are cached by their IDs as cache keys. Consequently, cache is checked first when objects are accessed by their IDs:

session.Get(), session.Load();
traversing relationships: another = myObj.MyRelation.AnotherRelation;
with fetch join mapping hints when using session.Get(), session.Load()

Sometimes people get confused about the fact that queries in general do not fetch anything from the session cache (see query caching later). It is only when you access objects explicitly by their IDs that Hibernate will first reach to the session cache for cached objects. The session cache is primarily there to stage all the changes that must be later flushed to the database, and so during a session, all participating objects are held in memory.

One note about object identity; session promises that there is only one instance object with the same ID in the session at any time. This also means that object identity is retained (meaning that you can do obj1 == obj2 to test equality of objects provided that they came from the same session). Personally, I would not use object identity for comparisons because the applicability depends entirely on the used caching strategies, which can change from one configuration to another.

Use sessions correctly to get the benefits.

Second Level Cache

Second level cache is not bound to sessions and is used for caching in between sessions on the application level. Depending on the configured cache provider, this can be also a distributed cache. Not surprisingly, this cache is accessed second (thus 2nd level cache), after first attempting to find objects from the session cache.

Configuration

This cache level must be turned on by configuration from the Session Factory;

hibernate.cache.use_second_level_cache=true

One must also define a cache provider that determines the implementation. For example;

hibernate.cache.provider_class = NHibernate.Caches.SysCache.SysCacheProvider, Hibernate.Caches.SysCache

The selection of the cache provider determines the scope of the intended implementation. Typically, you'd use local caching solutions such as one above. In more extreme cases, you would consider distributed caching solutions such as Memcache. The selection does have implications as to what you can and should cache, so beware. Stick to local in memory caches if you do not know.

Further configuration is required to select which objects should be cached and what concurrency strategy they should use, and which cache region they belong to. Consider 2nd level cache for:

Objects that are (mostly) read-only. Sometimes called reference data.
Objects that we do not care to synchronize with the database religiously.

Good cache candidates usually are the leaf objects in objects graphs, so start looking at those first and see which ones are mostly read-only, or you do not care if they are up to date constantly.

Do not use 2nd level cache for objects that change state frequently because it can become highly counterproductive. 2nd level cache can also be dangerous in systems where there are other applications that write to the database outside of Hibernate. If you care about the state of the data, in these cases 2nd level cache can become a hazard.

Concurrency strategies are used to determine how the objects in the cache should behave in relation to transaction isolation with the database.

Transactional - read-mostly data, prevent stale data in concurrent transactions when there are rare updates.
Read-Write - read-mostly data, like transactional but not available in clustered environments.
Nonstrict-Read-Write - No guarantees of consistency between the cache and database, acceptable to read stale data. Non-critical state of data.
Read-Only - Data never changes, reference data.

Note that less restrictive concurrency levels perform much better. If you use distributed caches, avoid restrictive caching strategies.

Storage Design

2nd level cache stores its contents differently from the session cache. It does not store objects, but serialized forms of the objects with the cache key (ID of object).

First, there is a cost to serializing and deserializing objects from/to the 2nd level cache. Second, you can not rely on object identity like with session caches (hence you should not rely on object identity in any case).

Cache Regions

Cache regions are named configurations that determine cache settings for cached objects (for 2nd level cache) . You can associate each object type with its region in the mapping configuration. The region will then determine things like cache sizes, expiration policies and so forth. See documentation for various cache providers for their cache region configuration.

Conclusions about 2nd Level Cache

Try using your application before attempting to deal with 2nd level caching. There is considerable amount you have to know and get right with 2nd level caching before it is effective for you.

I have seen people apply it as a band aid for bad mappings, bad database design, sloppy queries and so on. Fix those first and you get much more mileage from your app. The serialization cost can be punitive alone if you try to stuff everything in the cache. Databases are more effective in caching than compromised 2nd level caching.

If you use it right, you can get good results. Cache those mostly-read objects in leaves of our mapped object graphs, and all your reference data, and you get away from a lot of complicated query tuning. Choose the relationships you traverse often. That's a good trade off.

Query Cache

Query cache is used to cache the results of queries. This cache is turned on per query basis from Criteria queries, and HQL queries by setting selected queries cacheable. Also, configuration is required to turn the query cache on (hibernate.cache.use_query_cache = true)

Query caches must be used with 2nd level cache turned on to be useful!

When cache option is turned on for a query, only the IDs of resulting objects are returned from the database and cached, and the objects are then loaded from 2nd level cache. If the query results are not not in the 2nd level cache then they get loaded by their IDs from the database one by one, slowly!!!

If you choose to use the query cache, you MUST configure the 2nd level cache, and all the objects you intend to return from your cached query. Otherwise, you end up with results that you least expected.

Tuesday, December 18, 2012

Hibernate Sessions

I have been using Hibernate for a very long time; at least nine years if not more. It is perhaps the best known ORM tool in the Java/.NET world today. There are many alternatives but none have the feature set or maturity of Hibernate. Hibernate is perhaps not the easiest tool to use, partly due to its long history, and partly because the problem it tries to solve is complicated, but you can get started with some basic information quite nicely. All criticism aside, it is still a fine tool for a general purpose ORM approach, and perhaps the best one to use for bigger applications.

Sessions seem like an appropriate place to start when navigating the Hibernate waters, so let us examine that topic.

Sessions

Working with Hibernate is built around sessions. This is the Unit of Work pattern, and to use Hibernate correctly, you must understand this concept well.

The pattern you use to work with Hibernate is as follows:

Open session
Begin transaction
Using the session, query persistent mapped objects.
and/or add new objects to session.
and/or delete queried objects
and/or modify objects queried or added.
Commit transaction
Close session.

This is your logical transaction, the Unit of Work that you perform, per action, in your application. Note that while this is often the same as a database transaction, these concepts are not equivalent. You may, if you so choose, open a session and transaction, query objects, make changes, flush those changes to a database (commit), open another transaction, make more changes, flush (another commit), and so on. One session, many transactions. This allows you to keep tracking all the changes that you are making but stage your database changes in several steps.

The same basic steps in C# code using NHibernate:

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
 var customer = session.Load<Customer>(customerId);
 customer.Name = "New Name";

 var newCustomer = new Customer(anotherCustomerId, "Another Customer");
 session.save(newCustomer);

 tx.Commit();
}

What do you get by doing this?

all mapped objects that are introduced to sessions are tracked for their changes.
once the transaction in committed, all those changes will be persisted to the database.
all objects within the session are cached. Objects accessed by their IDs come from the cache if already there. This includes all session.Load/Get calls and objects loaded via relationships by their IDs.
Hibernate can batch your updates to the database. Say you made 100 changes to objects, if you had set your batch size to 100, you will likely update everything in one database round trip. Hibernate is also smart enough to flush changes to the database when it needs to so you don't have to worry when to flush things manually. It is enough just to commit the transactions and close sessions as was explained.

Problems with Sessions

If you deviate from the mentioned session usage pattern, you will run into issues, and you will complicate your life tremendously.

Objects Outside Sessions

Hibernate does not know how to deal with any objects that it is not tracking within a session. You know when you have messed something up with the session management if you run into "non unique object", "not persistent object", "non transient object", "lazy loading", or "no active session" exceptions.

What these kinds of exceptions mean is that you are trying to interact with Hibernate with objects that were not introduced to the session, session was already closed, or the objects were introduced in another session (which now is obviously closed). All the objects that you want Hibernate to track, should be loaded, queried, modified, deleted in the same session. If not, then you must explicitly reintroduce (merge) objects back to the session. If you have lazy loading references, or collections, they can only be accessed within the same active session. Since lazy loading is an important concept to be utilized with Hibernate, it is also perhaps the most common scenario where the problems arise.

Lazy Loading and UI Rendering

People often use Hibernate loaded objects while rendering some type of user interface. A web page is typically rendered by passing some Hibernate objects to the view template engine. The problem with this is that objects may have lazy loading members that are loaded at the time of access only, not when the parent object was originally loaded. When a web page is rendered from the template, in may be that the Hibernate session is already closed because the control has already moved away from the code that programmers write (in a controller for example).

For this type of approach to work, the session must be open, even during page rendering. This is often referenced as the Open Session in View pattern. However, I can't openly recommend this pattern, while it solves the problem. It overlooks the fact that views are often composed from more than one action in real life, and are more naturally represented by individual sessions, one per action. Also, domain/Hibernate models are not often the same thing as view models and it might be better to actually translate domain (Hibernate) models to view models and back again to domain models. I am using the "domain model" here quite liberally, but distinctively as a separate concept from view models.

Many of us who use the MVC (Model View Controller) frameworks for our web application often struggle with the "model" concept. Typical frameworks do not really require anything from the M in the MVC, so developers are often left to come up with their own idea what the M means. Sadly, this will also lead to misuse of tools like Hibernate and more generally to mixing different layer concepts. The M in MVC is the mental model of the user (what user sees on the screen), not an internal representation of the domain (programmer's mental model). The domain model, incidentally, is what you load and manipulate with the help of Hibernate, but it is not the M in MVC. The controller (C) is where the translation between the mental models happens.

Session Infrastructure

Managing sessions means repeating a lot of boilerplate code, unless you use infrastructure. You want to hide session management in common scenarios so you do not need to worry about it a whole lot. Obviously, you must still be aware that all of that machinery is still there under the hood.

One of the best pieces of advice in this regard comes from Ayende@Rahien blog series about odorless and frictionless code.

Note that this is done for .NET MVC, but many frameworks have similar hookup points to do infrastructure work. The action filter code in the example wraps around your action call and repeats the usage pattern boilerplate code that I explained. This is simply an "around advice" or "interceptor" in AOP terms.

It also demonstrates the point about "actions". There can be many actions in a web request, where each action executes a particular job for the page. Think of the actions like little windows or blocks on a web page that typical designs render, each separate from another. Sessions should last only so long as the action does, same with transactions.