Tools for repeatable research

Tim Daly, one of the developers of Axiom, has a vision for solving the following problem:

Computational science seems to be in a state where people work independently. They develop whole systems from scratch just to support their research work. Many make an effort to distribute their system but fail for lack of interest. Worse yet, the research that gets published often cannot be used by others or verified for correctness because the supporting code is based on a specialized system and gets lost. There is currently no community expectation that software should accompany research results. The end result is a loss of significant scientific wealth.

He is focusing on mathematical research, where the tools to do the research and present it could ideally be packaged up onto a single CD and distributed with conference proceedings. Their prototype implementation of this is called Doyen.

In systems research and network measurement, this might be a bit more difficult. For example, data sets are too large to fit on CDs and probably have privacy implications. Real world experiments (e.g. on PlanetLab) are difficult to reproduce. However, it would be nice to repeat research in systems, as students of Jeanna Matthews have done with “Xen and the Art of Repeated Research”.

These thoughts line up nicely with the need to script the data analysis workflow. My advisor has also been a proponent of releasing working systems, which enables repeated research. For example, you can track the development of my work in DHash by viewing our CVS. But that is not so easy to build and deploy. Unfortunately, there’s rarely the time to make the data and analysis tools presentable enough to use when paper deadlines are near, and DHash’s code is no exception. I’m hoping that better discipline and experience will lead to the refinement of my own tools to the point where they will eventually be useful to others.