Chronon 3 beta: 10x performance improvement
Chronon 3 is all about performance.
We have rewritten our Recorder from the ground up to gain an order of magnitude performance improvement! The basic architecture remains the same, but under the hood its an all new engine. We turned a Yugo into a Ferrari!
No more memory issues
The biggest complaint of all users of Chronon 1 and 2 has been high memory usage and OutOfMemory errors. With Chronon 3 our main goal was to get rid of this. And I am proud to say we have blown it out of the ballpark with this one!
Chronon 3 makes absolute minimal use of your Java heap. We have shifted most of our code to native. And this time we made sure that literally every single bit of memory usage is accounted for! This means that you no longer have to fiddle with -Xmx values.
Here is a graph showing memory usage for recording entire Eclipse (namespace org.eclipse.**) with Chronon 2 and Chronon 3 (size in megabytes):
Smaller Recording file size
The Recording file size of Chronon 3 should on avg be 90-95% smaller than that of Chronon 2.
Here is a comparison of recording file size for the startup of eclipse, recording the entire org.eclipse.** namespace (size in megabytes). Note that eclipse has a lengthy, cpu intensive startup so its a good stress test.
Chronon 3 pulls out all the stops to make sure your applications retain all of their concurrency. Highly concurrent applications will instantly see a very noticeable huge performance boost.
Logless Data Center
With the high performance of Chronon 3, you can have the Chronon recorder running all the time. This means you can have for the first time a Logless Data Center!
Consider this, with Chronon on all the time, you no longer need log files or log statements cluttering up your code. If something breaks, just pull out the Chronon recording and put in Post Execution Logging statements wherever you want.
Getting your hands on the beta
To get the beta, please fill out the form here.
We are opening the beta so you test the performance of Chronon 3 on your applications.
Since this is still beta, the recordings made from the Chronon 3 recorder will not open in the Chronon Debugger for now. You will need to use the Chronon Recording Server to record your applications. The only change would be that you would replace the recorder.jar in your Recording Server installation from v2 to v3.
If your existing Recording Server evaluation has expired, dont worry, we will give you a new one.
We would request providing some contact info so we can be in touch with you throughout the beta.
Please provide either a phone number or a skype id , or anything else (put it in the comments section). If providing a phone number, dont forget putting the country code.
Any spam/incorrect entries will be discarded.
Server Mode in the Chronon Recorder
This week we released Chronon 1.5. The big feature of this release is the inclusion of 'Server Mode' in the Chronon Recorder.
What is the Server Mode?
The Server Mode is designed to allow the Chronon Recorder to be controlled through the Chronon Recording Server.
It includes features such as:
- Ability to do dynamic start/stop of the recorder in a running program
The recorder can stay dormant in your program unless explicitly started from the Recording Server UI.
- Ability to record long running programs.
- Ability to split a recording based at a time interval or when the physical size of the recording gets too large.
- Ability to dynamically modify the set of classes that are being recorded in a running program.
Thus, you can start recording with say include=com.package1.** and later decide to recorder com.package2.**. All this without the need to stop the program.
With the addition of the Server mode in the recorder, we now have 2 distinct modes for the Recorder:
- Developer Mode
- Server Mode
The developer mode is the one you are probably familiar with as that is what is used when you record using the Chronon Eclipse plugin. It records the entire program from beginning to end and is meant for short running programs, as is common in development scenarios.
Moving forward we will probably have each of these 2 modes optimized for their specific use cases. There are a lot of optimizations that we want to put in the Recorder that will add a bit of overheard to the instrumentation time. While this is acceptable for long running server programs, it is not as useful if you are going to run your program for only a few minutes from within eclipse.
A good analogy is the server and client jvms. While the client jvm is optimized for quick startup and does less optimization, the server jvm is meant for long running programs and does a lot more aggressive optimization.
We will keep you posted on the specifics of how we proceed with putting optimizations/features in each of these modes of the Chronon Recorder.
Chronon Performance Guide now available
We have published a Chronon Performance Guide to help people understand and tune the various components of their system and chronon to gain the maximum performance. Of course we will keep updating this as we update Chronon.
Performance tip number 1 : Use 64 bit!
Choosing what to record - Part 1: Controlling Performance
Consider this line of code which reads in the contents of a file.
byte contents = readFile(fileName);
readFile() in this case could belong to some third party library, the JDK or may even be a system call.
As far as debugging our application is concerned, we are not worried about what happens inside this method. And that is because we never wrote it in the first place. It is entirely possible that we may not even have the source code to this method. The only thing we care about when we debug our program is what the arguments and return value to this method were, which in this case would be the file name and the returned byte array.
Thus the central observation being -
- You cannot debug what you/your company didn't write.
- Even if you could, you probably cannot change the faulty code because it is controlled by third parties.
We utilize this observation inside the Chronon recorder to achieve performance. The recorder is designed to record only the code of 'your' program. For calls to all methods that reside in a third party library, or the JDK or native method calls, we only record the arguments and return values of those methods, since that is all that is needed to debug your program.
This also allows you to control the exact impact the recorder has on the performance of your program. Thus for example you may have a million line J2EE application but it spends most of its time waiting for the database or inside third party libraries or webservers. In this case the performance impact of the recorder will be extremely low since the time spent in 'recorded code' is very low. This is also the reason why most web apps can get away with using platforms like ruby, python or php, all of which are much slower than java, because the time spent in that piece of code is very little.
On the other hand, you can have a 20 line program where all its does it do some massive calculation in a tight loop. In this case almost all the time is spent in recorded code thus having a much larger impact on performance.
Of course since this is the first version of Chronon and not meant for deployment in heavy duty 24x7 production scenarios the latter case of performance impact is not such a huge problem. That said we do have plans to dramatically decrease the performance impact from recording in upcoming releases without the need to exclude code from recording.
In the next few posts I will describe how to tell Chronon what to and what not to record and the consequences of excluding code from recording when using the debugger.
Design and Architecture of the Chronon Recorder
The Chronon recorder had directly opposing goals - to collect as much data about your program as possible, while at the same time having the least possible impact on it.
In this post I will try to describe some of the design and architectural decisions I made to achieve that.
The prime design goals of the Chronon recorder were -
- Minimum impact on application responsiveness
Scalability was higher on the list than raw performance. The reason being was that with a scalable implementation, even if you hit a performance wall with the recorder, you can always upgrade your machine/configure the recorder to continue recording.
To achieve this scalability, we made the following assumptions about how hardware is progressing:
- Cpu cores will keep increasing in number and go down in cost.
- Memory is cheap.
- Due to point 2 above, 64 bit computing is becoming the standard.
- Cpu cores aren't getting any faster.
The recorder works by running as a java agent and instrumenting the bytecode of your java program in memory, thus not requiring you to make any changes to your code.
By universal, I mean that the recorder should be able to record any Java app whether it's a J2EE app, Swing/SWT app or any other kind of application.
It should also be platform independent, being in line with the Java philosophy of 'Write Once, Run Anywhere'. Thus you can record on any platform say a Mac and playback on any other platform, say Windows.
How it all works
The recorder works as follows to achieve its design goals:
- The work done in the instrumented threads of your application is kept to a minimum. This is done to ensure minimum impact on the responsiveness of your application.
- The recording data that is generated by the application threads is stored in a buffer in memory.
- 'Flusher Threads' keep reading chunks of data from this buffer, do some processing on it and save it to disk in a highly packed format. Thereby essentially 'flushing' the data generated by the recorder.
So how do our assumptions about hardware help with this? Lets take a look -
- If you have more cores than the threads of your applications, the recorder will do most of its work on those cores, inside the flusher threads and have minimum impact on the performance of your program. This also gives you a hint on how many flusher threads you should use. So if you have a single threaded application and a quad core processor, you can tell Chronon to use 3 flusher threads, similarly if the application has 2 threads, cpu had 4 cores, use 2 flusher threads to make use of those 2 extra cores.
- Now there are always going to be applications which use more threads than the number of cores or are generating data way faster than they can flush it out. This is where assumption 2 comes in. If you have enough memory the generated data will have a place to sit there while it waits to get flushed out. It is for these cases that we recommend using a 64 bit machine.
What about all the Garbage Collector (GC) issues with using all that extra memory?
It is a well known fact that current JVMs don't handle heap sizes above 2gb very well. It is possible that if you have an extremely computationally intensive program that Chronon does generate utilize that much memory or that your application already is reaching the 2gb limit and Chronon makes it go over that.
To solve this issue we use a custom memory management. Thus even if the data generated by Chronon goes a little high, it wont have a heavy impact on the GC. It is common to see a 2-3gb Chronon heap shrink to a few hundred megabytes within a blink of an eye, which would ordinarily take many seconds or minutes without our custom memory management. That said, in most development scenarios, the heap sizes wont reach even near that high.
But even that ain't enough...
But even after all this you may run into hardware assumption 4 above. This happens when even though you have enough cores and memory, the application threads of your program are doing too much work and even the small overheard of the recorder directly on those threads is affecting the responsiveness your application.
For these situations, Chronon allows you to specify any part of your program which to be excluded from recording. So if you have a portion of your program which is doing some heavy computation and which you know doesn't have any errors, or you just don't care about examining it for now, you can exclude it from recording.
We won't go into details of how to configure this right now, but it's suffice to say that any part that is excluded runs with absolutely zero overhead, just like it would run without the recorder. For calls to these 'unrecorded' methods, we will record just the input arguments and the return value on the call site, which is usually enough information for debugging purposes.