Programming Cosmos: 2010

Friday, September 24, 2010

Thoughts on Message Queues and HTTP

I have been facing this dilemma for a couple of years where I find a need for asynchronous loosely coupled communication.

Requirements

Decoupling of publishers and subscribers.
Reliable message delivery.
Ability to connect, disconnect, reconnect clients without losing messages.
Ability to shut down and restart the messaging server without losing messages.
Ability to 'query' or 'filter' messages based on their category.
No low latency requirement - messages delivered within 'seconds' is enough.
No high throughput requirement - no need to scale to many thousands of messages per second.

Pub-Sub And Message Queues

Naturally, I would turn to use message queuing as a solution. What I found was that while most queue products claim to be very performant having a high throughput, none of the ones I have tried managed to deliver on reliability consistently. These products are great if you only need a best-effort, non-guaranteed, asynchronous communication channel. But sadly, when the reliability requirement is added (that is durable queues and topics) the complexity of the solutions becomes a problem. Not only that, but there are often other limitations that come into play - like performance problems even for low throughput requirements in the low hundreds of messages per second.

If you look at my requirement list, it is doable with message queues - absolutely. The problem is that it is very hard to do well the way message queues do it. The root of the problem is that durable subscriptions are tracked within the server. When the message delivery is tracked at the server things suddenly become very complex. What do you do with subscriptions that are never removed (happens all the time in dynamic environments), how do you do clustering, load balancing etc. While these things are all doable, it is very hard to do it well.

I have watched ActiveMQ for many many years now, and I have been using it almost as long. It's a good product but it has taken amazingly long for it to become somewhat solid. Even then, it only works in the very basic cases reliably; that is unreliable, non-durable, messaging. The second problem tends to be the spartan all around support for clients on various platforms - now people would come out of the woodwork to argue that there are clients for all kinds of platforms, but the problem is that they are not what the Java client is. For many clients, you still have to deal with very basic issues like client reconnects on your own (a little detail that people new to these tools always get wrong). This is not unique to ActiveMQ - HornetMQ, RabbitMQ etc. all suffer from very basic level problems when you try to do anything with reliable message delivery.

I have not mentioned one feature that I absolutely like to have; that is the ability to browse the event streams, anywhere in time, at any time, by any client, forward, or backward. Message Queues can not do this. It is possible to browse unconsumed messages in a queue, but when the most used use-case in my situation needs a topic, this goes out of the window. Also, there is no memory of previous messages for newly connected clients, other than the often used 'last message'. In any case, having an opportunity to get this kind of feature quickly becomes a 'must have' when you have experienced the convenience that it gives you. Now, I am not arguing the Message Queues should do that because that is perhaps something that queues were not really designed to do. We are approaching the other side of this blog entry with this kind of requirement.

HTTP Messaging

A few years ago, we at my current job implemented something that can be described as a message queue. It is not a real message queue but it can be used for the exact same communication patterns. We implemented it with HTTP, utilizing Atom feeds for updates, with a RESTful interface.

A client can simply query for updates for the events that they are interested in. If there are new events, or updates to 'resources', you get an Atom feed as a reply. By default you may get the oldest events stored on the server, and you can browse the Atom feed for more events as you need them, or you can just zoom forward and grab the latest messages. You can tell the server to send only updates after a certain point, all of which you can track from the Atom feeds. You simply save the reference where you are in consuming the events at the client. This way, the server tracks absolutely nothing - the client decides what messages it considers consumed. This means that at any given time, you can just roll back and re-consume the same events if you like.

Because it is so simple to implement, it is also very reliable. All HTTP GETs are idempotent - can be re-tried, and you don't need transactions. You can implement transaction semantics quite easily by yourself. Say you need reliable messaging; from client's point of view this just means moving the reference point in the Atom feed once you have successfully consumed an Atom entry. If something failed, you can just come back and retry starting from the point you were at before. This is easy to implement, and you can make really sturdy consumers with simple rules.

There is no 'protocol', other than HTTP GET, PUT, POST, perhaps DELETE. You want events; do a GET for any combination of events you want. You want to publish events, just issues PUT, or POST for resource updates. You may also DELETE resources. All the events are Atom-wrapped. Simple.

I am talking about 'resources' here as liberally as I am talking about messages or events. This kind of server implementation can model things as documents or events and they can be consumed in the same manner.

The Bottom Line

For my requirements, HTTP/Atom feeding is a really nice solution. It is doable with MQs but it's not worth the hassle. The simplicity of HTTP/Atom approach is at the core of it all; no client issues, can be made reliable, browseable/queryable event feeds, scaling with typical web techniques.

Friday, August 13, 2010

Project Euler in Scala

I had recently bought Programming in Scala book. It's a good book but you have to get your hands dirty after a while, especially because of the depth of the language itself.

Project Euler is a website with all kinds of math and computer programming problems; great for trying things out with a new programming language.

I am not attempting to pull every Scala trick in the solutions. Besides, sometimes the good old way of doing things is more readable to the 'untrained' eye.

So far I have solved the first four problems.

Here they are:


package euler.problem1

/**
 * If we list all the natural numbers below 10 that are multiples of 3 or 5, 
 * we get 3, 5, 6 and 9. The sum of these multiples is 23.
 * 
 * Find the sum of all the multiples of 3 or 5 below 1000.
 */
object SumOfMultiples {

	def main(args : Array[String]) {
		
		var sum = 0
		1 to 999 filter (x => x % 3 == 0 || x % 5 == 0 ) foreach(x => sum += x)
		println(sum)
		
	}


package euler.problem2

/**
 * Each new term in the Fibonacci sequence is generated 
 * by adding the previous two terms. By starting with 1 and 2, 
 * the first 10 terms will be:
 * 
 * 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...
 * 
 * Find the sum of all the even-valued terms in the 
 * sequence which do not exceed four million. 
 */
object Fibonacci {
	
	def main (args : Array[String]) {
		var s : Int = 0	
		fibonacci(0, 1, x => {if (x % 2 == 0) { s += x }; x <= 4000000} )
		println(s)
	}
	
	def fibonacci (x : Int, y : Int, sum : (Int) => Boolean) {
		val z = x + y
		if (sum(z)) {
			fibonacci(y, z, sum)
		}
	}
}


package euler.problem3

/**
 * The prime factors of 13195 are 5, 7, 13 and 29.
 * 
 * What is the largest prime factor of the number 600851475143 ?
 */
object PrimeFactor {

	def main (args : Array[String]) {

		var value = 600851475143L		
		var factorial = 2

		while(value != 1) {
			if(isFactorial(factorial, value)) {
				value /= factorial
			}
			factorial += 1
		}        
		
		println(factorial)
	}

	def isFactorial(factorial:Long, value:Long) : Boolean = 
		isPrime(factorial) && value % factorial == 0
	
	def isPrime(n:Long) : Boolean = 
		(2L until Math.sqrt(n).asInstanceOf[Long]) forall (n % _ != 0)
	
}


/**
 * A palindromic number reads the same both ways. 
 * The largest palindrome made from the product of two 
 * 2-digit numbers is 9009 = 91 × 99.
 * 
 * Find the largest palindrome made from the product of 
 * two 3-digit numbers.
 */
package euler.problem4

object Palindrome {

		def main (args : Array[String]) {
			
			var largest = 0;
			for(x <- 1 to 999 reverse; y <- 1 to 999 reverse) {		
				val prod = x * y
				if(prod > largest) {
					val pStr = String.valueOf(prod)
					if (pStr == pStr.reverse) largest = prod
				}
			}
			println(largest)
		}
}

Friday, July 23, 2010

Rescuing Nokia

The Register writes about Rescuing Nokia. The story originates from a book (promo site in Finnish) written by an old Nokia insider, Juhani Risku.

According to Juhani, Nokia has become fat, heavy with middle management bureaucracy which hinders creativity. The seed of the problem stems, according to Juhani, from incompetence of the leaders.

Amongst some of the most interesting things, the article mentions that Nokia had already been working with touch screens ten years ago, but abandoned development. Many other innovations were ready to be developed to products many years ago, but Nokia more or less ignored them, watching competition implement the features many years later. Often, it was a case of complete lack of ability to move innovations to production due to the massive and crippling effect of incompetent middle management and division leaders who had no real experience in the matters.

Juhani also states that American style business management built around short sighted arrogance and greed has hindered innovation. According to him, appointing a CEO from the USA would be a big mistake and another delay for a company that is Finnish down to the core. American style management just does not work.

Wednesday, July 7, 2010

XML Schema Design Links

Here are my definite sources for XML Schema Design:

Zero, One, or Many Namespaces?
W3C XML Schema: DOs and DON'Ts (Kohsuke)
Reference Model For XML Design (PDF)
W3C XML Schema Design Patterns: Avoiding Complexity (Obasanjo)

These all represent slightly different viewpoints.

For example, these are Obasanjo's guidelines altered from Kohsuke's:

I've altered some of Kohsuke's original guidelines:

Do use element declarations, attribute groups, model groups, and simple types.

Do use XML namespaces as much as possible. Learn the correct way to use them.

Do not try to be a master of XML Schema. It would take months.

Do ~~not~~ use complex types and attribute declarations.

Do not use notations

Do ~~not~~ use local declarations.

Do ~~not~~ carefully use substitution groups.

Do ~~not~~ carefully use a schema without the targetNamespace attribute (aka chameleon schema.)

I propose some additional guidelines as well:

Do favor key/keyref/unique over ID/IDREF for identity constraints.

Do not use default or fixed values especially for types of xs:QName.

Do not use type or group redefinition.

Do use restriction and extension of simple types.

Do use extension of complex types.

Do carefully use restriction of complex types.

Do carefully use abstract types.

Do use elementFormDefault set to qualified and attributeFormDefault set to unqualified.

Do use wildcards to provide well defined points of extensibility.

There are still some problems with extensions of complex types and validation. The problem is the need to have xsi:type attributes sprinkled in your XML documents. From a purist point of view, we should not really have validation artifacts in our XML tags.

Thursday, June 17, 2010

Constructor Vs. Setter Injection

Are we supposed to use setters (or properties) or constructors with dependency injection. There has been all kinds of opinions about the matter.

For me the decision is rather simple. Dependency injection should be noninvasive; after all, that is one of the reasons why we utilize it so much. A framework like Spring should not really change your class design one way or the other. Imagine that dependency injection does not exist - then how would you design your classes?

When constructing an object, you should use the constructor for all those dependencies that are required for the object to function. Setters/properties would then be reserved for changing default behavior. There should not be a requirement to call all these setters to make object functional. That is the job of the constructor.

True, setter injection can be convenient because you are dealing with named properties. In .NET (with Spring.Net), you can also name the constructor parameters so this is really a non-issue there. Setter injection can also be nice when there are so many parameters in the constructor that things become unwieldy. However, there are always ways around this limitation by providing proper abstractions for the constructor arguments.

In any case, the strongest argument that I have has to do with OOD and the promise of transparency of the injector framework for the rest of your program. Do not design your classes in such a manner that is somehow dictated by the dependency injection framework.

Monday, May 10, 2010

Software is Hard, Hardware is Easy

I have been following the latest smart phone race with interest. It seems to have the same underpinnings as the early personal computing trends. It even has one common player; Apple.

Interestingly, Apple has risen to a rather enviable prominence with iPods, iPhones, and now iPad. Even their personal computer business has picked up steam in the recent years. And somewhat surprisingly, North America has got its mojo back as a place of innovation in mobile computing again. Who knew?

The PC story seems to be repeating itself as well, which should be a warning sign for Apple. Hardware is becoming a commodity. There are cameras, a good deal of memory, touch screens, GPS and all kinds of gizmos, and every manufacturer has it. The result is that the differentiators are climbing a level of abstraction: Software is King.

Apple's App Store is obviously the thing that gives Apple the lead. The usability of the iPhone is rather good, and it was a small revelation at the time when it first came out. The usability comes from great design, close integration, and tight control of the experience but it has its price; a closed system works as long as there is no viable alternative market.

Google's Adroid OS is making some promising headway and is now exceeding the sales figures of iPhone. There are many manufacturers and a variety of operators. Apple has hinged itself onto AT&T only. That seems to be a sticking point since the user base is saturating in the realm of AT&T (don't get me wrong; it was great while it lasted). Verizon might no longer give Apple a good deal because it does not have to.

We must not forget Blackberry. They are still holding steady but Android might be a problem for them on the long run as well. Still, plenty of innovation here should keep the Candian message phone maker in the game for years to come.

The elephant in the room is Nokia. Nobody is talking about them, though they sell more mobile phones in one day than Apple sells iPhones in a whole quarter. Nokia is very stong in the emerging markets of China, India, South America, Africa etc. They still produce healthy margins from commodity phones and have access to far cheaper components than any of their competition. But no winner in the smart phone race.

Nokia's problem is that Software is Hard, Hardware is Easy. Nokia never was a real software house that could churn a winning operating system in the scale that is now required. Sure, Symbian was a passable solution in the past but right now it just can't do it. The Ovi app store is remains a question mark and Nokia can't seem to crack the North American market which is tightly under control of the operators. Regardless, other phone makers have a long way to go to catch Nokia so maybe they have their chance.

Interesting times.

And Then There was Light

I have been blogging for a few years but only in a closed corporate environment. I have found it to be a useful tool to influence people and get my voice out. Luckily, I have been allowed to say what I wanted and keep my opinions open to everyone.

I thought I could join the open public blogging slug fest as well. Blogging might be "so yesterday" in 2010 but it is hardly dead, so why not. I'll share my experiences as a programmer and write about a variety of topics in the sphere of technology. I'll spare you from my personal life unless it has relevance to my choice of technology topics.

Let's get started!