Chronon Recorder saves the Chronon Recording Server
The Problem
Last week we had a big issue on our hands. A lot of people trying out the Chronon Recording Server beta were reporting they were getting corrupt files when they would download a recording from the Recording Server UI. The issue was occurring at random. Sometimes the files would be corrupt and sometimes not. And try as we might we just weren't able to reproduce the issue on our end. We looked through our code and made a bit of change to what we thought could have been the issue and sent out the updated binaries, just hoping the issue was fixed.
But luck wasn't on our side, our blind, shot-in-the-dark fix did not work and we were getting overwhelmed with support requests of corrupt downloads. On our end we were trying everything possible, like introducing lag, reducing bandwidth, adding more machines,etc; but nothing was able to reproduce the issue on our end.
Solution
Then finally it struck me. We would use Chronon to debug Chronon!
I asked one of our customers to take the Chronon Recorder and use it to record the Chronon Recording Server. Thus the next time he ran into the corrupt download issue, he simply took the Chronon recording of the Chronon Server and sent it over to me. Once we had the Chronon recording it was only a matter of minutes till we drilled down to the root cause and solved the issue!
Conclusion
This was one of first times we ourselves experienced the power of having a tool like Chronon. Had it not been for the Chronon Recorder we would still be pulling our hair trying to reproduce the issue and then debug it. Our customers would have continued to grow unhappier with our product, our credibility would be on the decline and it would eventually affect our revenues. If you have had to support products out in the field, you know how it is when an unhappy customer calls you about an issue and you have to make him go through all these hoops to try to figure out the root cause, making him even more unhappy in the process, till he eventually just gives up your product altogether.
With Chronon, all this just vanishes! The customer in this case just sent us the recording and no other communication was needed. We have the bug fixed now and tests in place. If you are a company that prides itself on customer support or, like Chronon Systems, pretty much depends on it to drive sales and revenue, then with Chronon you have an invaluable tool that can literally change the public perception of your company. As far reaching as this may sound, we can stand behind this statement now since last week we got to experience this for ourselves.
Chronon 1.6 released
The major theme of this release is improving the workflow of downloading a file from the Chronon Recording Server and opening it in Eclipse.
There are two major features to that purpose:
Open .tar file directly
The 'Open Recording' dialog now allows you to point direcly to a .tar file.
It will automatically do the job of extracting the .tar file and unpacking it.
No longer do you have to manually extract the .tar file and create folders to extract and unpack them.
Specify Source Lookup paths
This has been a big request for some time.
A lot of times you have multiple projects in a workspace which have the same package and class names.When you open an external recording you want to tell Chronon which specific project to look into to find the correct class when debugging the recording.
To that end, the 'Open Recording' dialog now allows you to create 'Source Lookup' definitions.
Creating and selecting a Source Lookup definition while opening an external recording instruct the Chronon Eclipse plugin to look only inside the projects you specify for displaying source code.
You can read more about these features in our documentation.
So go ahead and upgrade your Chronon installation!
Recording Server now the official way to Record outside Eclipse
We have decided to deprecate the developer mode a little bit. It is now used only by the eclipse plugin while recording from within eclipse.
The Chronon Recording Server is now the official way for Recording outside Eclipse.
The reasons for doing so are:
- The developer mode config files were hard to create with the many configuration options.
The server mode config file require only a 'name' value. - The process of creating the config, doing the recording, transferring the recording to the local machine to debug was too manual.
This is exactly what the Recording Server was designed to eliminate. - Developer mode didnt support dynamic start/stop or support for long running programs.
Turns out people who did want to record outside eclipse wanted exactly that. They wanted to skip the long initialization of their web servers and start recording after the web server had initialized and they wanted to record for long periods of time. This is exactly what the Recording Server was designed for. - No easy way to organize and view remote recordings.
With the Recording Server you can easily view all the recordings for every java application on every machine.
While the Recording Server may require a bit of an initial setup due to the need to install the Controller process on each machine, even this takes less than a minute and needs to be done only once.
Thus due to all the benefits described above we believe the Recording Server is the way to move forward for recording outside of Eclipse. It removes all the manual process which was previously required and replaces it with a nice, clean and easy to use GUI.
Server Mode in the Chronon Recorder
This week we released Chronon 1.5. The big feature of this release is the inclusion of 'Server Mode' in the Chronon Recorder.
What is the Server Mode?
The Server Mode is designed to allow the Chronon Recorder to be controlled through the Chronon Recording Server.
It includes features such as:
- Ability to do dynamic start/stop of the recorder in a running program
The recorder can stay dormant in your program unless explicitly started from the Recording Server UI. - Ability to record long running programs.
- Ability to split a recording based at a time interval or when the physical size of the recording gets too large.
- Ability to dynamically modify the set of classes that are being recorded in a running program.
Thus, you can start recording with say include=com.package1.** and later decide to recorder com.package2.**. All this without the need to stop the program.
Future directions
With the addition of the Server mode in the recorder, we now have 2 distinct modes for the Recorder:
- Developer Mode
- Server Mode
The developer mode is the one you are probably familiar with as that is what is used when you record using the Chronon Eclipse plugin. It records the entire program from beginning to end and is meant for short running programs, as is common in development scenarios.
Moving forward we will probably have each of these 2 modes optimized for their specific use cases. There are a lot of optimizations that we want to put in the Recorder that will add a bit of overheard to the instrumentation time. While this is acceptable for long running server programs, it is not as useful if you are going to run your program for only a few minutes from within eclipse.
A good analogy is the server and client jvms. While the client jvm is optimized for quick startup and does less optimization, the server jvm is meant for long running programs and does a lot more aggressive optimization.
We will keep you posted on the specifics of how we proceed with putting optimizations/features in each of these modes of the Chronon Recorder.
Is the traditional debugger still relevant in 2011?
The traditional debugger as we know it hasnt changed since the dawn of programming; which is to say it has remained pretty much the same since 1970s. Lets take a deeper look at some of its fundamental design principals and whether they are still relevant in 2011.
Traditional Debugger
Design Principles
The traditional debugger is designed around the idea that :
- Programs are single threaded
- Flow of execution is sequential
- Bugs are always reproducible.
- Programs run for short periods of time
Implementation
Sequential flow of execution and single threaded-ness
This principle is clearly reflected in the interface of the debugger which has the 'stepping' buttons which allow you to navigate the execution of your program sequentially. There is no well defined semantic for what happens when you say 'step forward' in one thread, with respect to all the other threads.
Reproducible bugs and short runs of a program
The traditional debugger relies on the 'breakpoint' model which assumes that the person debugging has a well defined and fully reproducible set of actions. It also assumes that the program doesn't run for very long otherwise you would have to set a breakpoint and wait hours for it to hit.
Not multithreaded by design
Although most debuggers can stop and show you the stack frames of all the active threads when you hit a breakpoint, that is more of a evolution of the traditional design of just showing the stack frames of the single sole thread which the program is assumed to be running on. The rest of the debugging elements are not designed around the fact that the program flow is not merely sequential and data is being modified by multiple threads.
But we are in 2011...
None of the assumptions of the traditional debugger hold true anymore in 2011:
- Almost all programs are multi threaded
- Flow of execution is not 100% sequential. Data can be modified by multiple threads at the same time.
- Bugs are becoming increasingly non reproducible due to race conditions and just the increasing complexity of programs.
- Programs run for days, months and even years on servers.
Anybody who has had to debug a multi threaded program knows that merely showing the active stack frames does not help much in detecting race conditions. Not only that, but just breaking the program modifies the execution and timings of various threads leading to the bug becoming non reproducible while debugging.
The 'breakpoint' model is broken since for long running server side programs you can't realistically set a breakpoint and wait for days to hit the breakpoint, only to start all over again once you step over a line you didn't intend to.
And that leads us to Log files...
The failure of the debugger to keep up with programs written in the 21st century has led to the rise of logging and huge log files.
Logging is fundamentally broken by its very nature because :
- You are trying to predict the errors in your program, in advance, which you dont even know of.
- Since you usually put a logging statement where, you might think the error would be, you have usually hardened the code around that area already. Thus in real world situations the program usually breaks where there wasn't any log statement at all, because the programmer never thought he might encounter an error in that piece of code.
- Long running programs generate enormous log files and you usually have to write another set of programs just to parse through those log files.
- Writing logging statements is a distraction from programming and results in clutter of code.
Thus the obsoleteness of the traditional debugger has led to people coding their own custom debugging mechanisms for every program they write.
Chronon - Reinventing the debugger for 2011
When we started designing the Chronon Time Travelling Debugger, we built it with programs of the 21st century in mind. Our assumptions were:
- Programs are inherently multi-threaded
- Flow of execution not entirely sequential. Data may be changed by other threads. Calls to a method may be interleaved across threads.
- Bugs are tough, if not, impossible to reproduce in a multithreaded world with race conditions.
- Programs run for (very) long periods of time.
Implementation
Record everything, no need to reproduce
The Chronon Recorder records the entire execution of the Java program. The recording is then subsequently used to debug the program in the time travelling debugger. This ensures that no bugs need to be reproduced.
No breakpoints, built for long running programs
Chronon does away with the concept of breakpoints entirely. You can jump to any point in time of the execution of your program instantly. Thus you might have a recording that is 5 hrs long and maybe you want to get to an exception that was thrown after 4 hrs. Chronon allows you to jump to the exception instantly with a click of a button, instead of making you wait for 4 hrs like your traditional debugger would.
We even came up with the Chronon Recording Server recently which is specifically designed for long running programs. It takes care of splitting the recording after a pre defined time interval or if the physical size of the recording gets too large.
Embraces multi-threaded, non sequential nature of programs
Although Chronon still has the stepping buttons, including a 'step back' button, to allow examining sequential execution of a single thread, the rest of the interface is designed with multithreaded, non-sequential execution in mind.
Showing data independently of threads and then allowing you to jump to any point in time and examine the sequence that led to that particular state embraces both the multi threaded data manipulation as well as the single threaded sequential nature of the execution of the program.
Conclusion
The traditional debugger as we know it is of not much use in 2011. This is the reason people have resorted to the use of log files and custom debugging mechanisms. With Chronon, we have solved a lot of the issues with the traditional debugger and designed it to debug the way modern programs are written and used.
We believe that in 2011, you should not need to litter your code with logging or any other kind of custom debugging mechanism. Our current product and upcoming enhancements are steps in that direction.
Chronon Recording Server Architecture
Here are some details on the architecture of the Chronon Recording Server which recently went into beta.
As shown in the diagram above:
Per Machine:
Each machine being recorded can have a number of jvms running. Each jvm has a recorder attached to it.
Each machine also has a 'controller' service running on it.
Controller:
The controller is the heart of the communications mechanism in the Recording Server product. There is a controller service running on each machine which is controlled by the recording server web ui. The web ui talks to the controller which in turn talks to any of the jvms being recorded on that machine.
Server + UI:
The 'server' portion and the web based UI of the server sit on a separate machine and talk to the controller of each machine that is connected to the recording server.
Design implictaions
This design was chosen because:
Performance
The recordings are stored locally on the machines being recorded. This reduces network traffic.
Fault Tolerance
Any machine can go down without affecting any other machine.
· If any of the machines being recorded go down, they don’t affect communication or recordings of any other machine.
· If the Recording Server goes down, each of the machines being recorded still continue doing what they were last directed, since the recorders are controlled by the Controller service.
Thus all activity like flushing old recordings or splitting a recording after a time interval still keeps happening as it was scheduled when the recording server goes down.
Method Size Limit in Java
Most people don’t know this, but you cannot have methods of unlimited size in Java.
Java has a 64k limit on the size of methods.
What happens if I run into this limit?
If you run into this limit, the Java compiler will complain with a message which says something like "Code too large to compile".
You can also run into this limit at runtime if you had an already large method, just below the 64k limit and some tool or library does bytecode instrumentation on that method, adding to the size of the method and thus making it go beyond the 64k limit. In this case you will get a java.lang.VerifyError at runtime.
This is an issue we ran into with the Chronon recorder where most large programs would have atleast a few large methods, and adding instrumentation to them would cause them to blow past the 64k limit, thus causing a runtime error in the program.
Before we look into how we went about solving this problem for Chronon, lets look at under what circumstances people write such large methods in the first place.
Where do these large methods come from?
· Code generators
As it turns out, most humans don’t infact write such gigantic methods. We found that most of these large methods were the result of some code generators, eg the ANTLR parser generator generates some very large methods.
· Initialization Methods
Initialization methods, especially gui initialization methods, where all the layout and attaching listeners, etc to every component in some in one large chunk of code is a common practise and results in a single large method.
· Array initializers
If you have a large array initialized in your code, eg:
static final byte largeArray[] = {10, 20, 30, 40, 50, 60, 70, 80, …};
that is translated by the compiler into a method which uses load/store instructions to initialize the array. Thus an array too large can cause this error too, which may seem very mysterious to those who don’t know about this limit.
· Long jsp pages
Since most JSP compilers put all the jsp code in one method, large jsp pages can make you run into these errors too.
Of course, these are only a few common cases, there can be a lot of other reasons why your method size is too large.
How do we get around this issue?
If you get this error at compile time, it is usually trivial to split your code into multiple methods. It may be a bit hairy when the method limit is reached due to some automated code generation like ANTLR or JSPs, but usually even these tools have provisions to allow you to split the code into chunks, eg : jsp:include in the case of JSPs.
Where things get hairy is the second case I talked about earlier, which is when bytecode instrumentation causes the size of your methods to go beyond the 64k limit, which results in a runtime error. Of course you can still look at the method which is causing the issue, and go back and split it. However, this may not be possible if the method is inside a third party library.
Thus, for the Chronon recorder at least, the way we fixed it was to instrument the method, and then check the method's size after instrumentation. If the size is above the 64k limit, we go back and 'deinstrument' the method, thus essentially excluding it from recording. Since both our Recorder and Time Travelling Debugger are already built from the groud up to deal with excluded code, it wasn’t an issue while recording or debugging the rest of the code.
That said, the method size limit of 64k is too small and not needed in a world of 64 bit machines. I would urge everyone reading this to go vote on this JVM bug so that this issue can be resolved in some future version of the JVM.
Time inside a Time Travelling Debugger
When we were developing Chronon and started using it ourselves, we realized something very intriguing. You see, the various views of Chronon allow you to step not only forward and backward but to any random point in time. For example, using the Variable History view, you can instantly jump to when a variable became ‘null’ or use one of the powerful filters in method history view to jump directly to a particular call of a method.
The problem
Since you are not just stepping forward, it is easy to get lost in time.
For example:
- How does one event relate to the other, did it happen before or after the other?
- Did I just jump forward or backward when I clicked in the variable history view?
- If I did jump backward/forward, by how much did I jump?
- Where am I in the execution of my program? Am I near the end of my program/ middle or end?
The Solution
Imagine you are a real world time traveler. What is the most important tool in your arsenal?
A clock.
We needed some sort of a clock inside Chronon to solve all the above issues.
Thus we invented the concept of ‘time’.
- It literally shows the current time value.
- The progress bar gives you an idea of how far down the execution of the program you are.
See the bar completely fill up, well you are near the end, if its almost empty, you are near the beginning. - We also added ‘time bookmarks’ in this view which act as a checkpoint mechanism for anything interesting you might want to return to in the future.
Chronon release 1.2
It has barely been a week since we released update 1.1.1 and we are back again with another update full of more goodies.
Support for Reflection in the recorder
The Chronon recorder will now recognize updates to the fields of your object done using the Java Reflection APIs.
This is especially useful if you use ORM frameworks like Hibernate which use reflection to set the fields of the Java objects. No longer will you see 'null' in those fields, but the actual values.
'Copy Value' for variables in the Debugger
If the value of a variable has a string that is too large to fit in the eclipse view, you can right-click and select 'Copy Value' to copy a fully formatted version of the string to clipboard.
This functionality is supported for all views that show the value of a variable, ie theLocals view, Variable History view and Current Line view.
And of course, lots more bugfixes in the recorder and debugger. If you were running into deadlocks while recording before, you shouldnt anymore.
So go ahead and update your Chronon installation!
Chronon release 1.1.1
This update brings a ton of improvements. I will list some of the major ones here:
Support for applications with huge number of threads
Until now, you could create only 1024 different threads in your application, after which Chronon would throw an exception.
With this release, if your application is going to create more threads, you can specify that in the recorder config file, eg
maxrecordedthreads = 3000
Of course, you can set this option from within Eclipse too.
Note that the number of threads here does not mean 'the number of threads active at a certain point in time'. It means 'the total number of threads created during the lifetime of your application'. So if your application frequently creates and destroys new threads instead of using a thread pool, this might be useful to you.
Much faster stack traces in the Stack view
The stack view can now create stack traces much faster and wont crash if the stack trace at a point is extremely deep.
Recorder no longer deadlocks during shutdown
This was a problem for some people who had redirected System.out to a custom Logging class, which they were also recording. This would cause a deadlock in the recorder. No more. Now we print out shutting down messages in a different thread. Since during shutdown, Chronon locks up the rest of the threads while it is persisting data, it is possible if you have System.out redirected to a custom Logging class, for the printer thread to be locked while the persistance is taking place, but that only means the messages to the console will appear a bit delayed. The recorder will still complete and in the same amount of time as before, but it wont deadlock.
We recommend everyone to update their Chronon installation.