This has been kind of a crazy week at work as we have been getting our product MVP out the door. I have learned all kinds of things this week about product design and deployment.
Our product is divided up into applications and services and we are running on jruby inside the GlassFish Java application server. Since I come from a .Net background I haven’t dealt with the need to run inside a separate application server before, unless you consider IIS an application server. We have a developer on our team who comes from a Java background and he has been a great help setting up profilers and drilling down to find some problematic caching code we had in our service client. We have reduced the number of calls to our services and been able to get some decent page load times once the application server has fully warmed up. Something we have been dealing with is the JVM warm up time. The JVM used the JIT compiler while collecting statistics about the code and eventually compiles to native code. The warm up time of the JVM is fairly lengthy so we are trading some initial speed for throughput.
Update 7/1/2012 It took me a while to get around to finishing this post and in the meantime we have had some issues crop up that have made think some hasty decisions were made. Right now we are experiencing an issue where an error occurs and then GlassFish starts returning the wrong response to incoming requests. This error is rare and difficult to reproduce, we are having problems tracking it down. We have found some references to what looks to be this issue online and seems to be a problem in the JRuby stack. I have been working on web applications for ten years and I have never seen this before on any of the other platforms (PERL/PHP/.Net) I have worked on.
Update 7/3/2012 It looks like the issue with User A making a request and getting the response for User B is a problem with AJP and other people have seen it. We are in the process of switching to the HTTP connector between Apache and GlassFish as we haven’t been able to reproduce the error using this configuration.
I now believe we have over optimized our deployment stack prematurely. I think we should have put the effort in to moving to Ruby 1.9.x rather than JRuby. If we had just used Passenger and Apache, which is a much more common deployment scenario, we would have much more predictable performance. Deployment would be much faster and I don’t believe we would be seeing these obscure issues we are dealing with now. If we actually started to run into performance issues it would be much easier to scale by adding more memory to the servers or adding AWS EC2 instances as needed.