When a graduate student completes their PhD, the software they wrote for that degree begins an almost inexorable decline into obscurity. Other things become more important and, unless that code serves as a platform for further research or has spun off into a startup, there is precious little time to maintain and further develop code written for the degree. The code has already served a purpose—namely, advancing the state of the art in the form of research papers and a dissertation.
For example, one student I know wanted to test a new Byzantine Fault Tolerant system against an earlier one and was told that the old one only worked on one specific machine in the corner of lab (running some ancient release of RedHat). Another student in my group was trying to compare file system synchronization tools and found that the published source for some tools was hopelessly incompatible with modern compilers and libraries; he wound up finding some old RedHat isos and testing in a virtual machine.
I’m thinking about this today in the context of some recent activity on the Chord mailing list about code I wrote for my PhD. The entirety of my time as a graduate student was spent hacking on the PDOS Chord/DHash implementation, which has served as the reference implementation of Chord in numerous papers. While I hesitate to declare this implementation dead, it has become clear from numerous mailing list posting that it will be increasingly hard for others to use the implementation if it is not more actively maintained.
Chord and DHash are implemented using the libasync C++ library, an obscure and mostly undocumented creation of Prof. David Mazieres, who wrote it as part of his thesis on a Self-Certifying File System (SFS) ten years ago, back when C++ compilers sucked and the STL didn’t work very portably. libasync offers a lot of nice features such as managing TCP buffering for nonblocking I/O and a comprehensive framework for writing user-level NFS servers; this came in handy for early Chord applications such as the Cooperative File System. As a result, the libasync toolkit was adopted by many PDOS research projects.
Unfortunately, as Prof. Mazieres moved on to other research projects, SFS releases and the libasync codebase languished. The libasync library was fortunately adopted by Max Krohn in the form of sfslite. I switched Chord to sfslite, so that it would be possible to point people at a real release instead of asking them to checkout some recent version of SFS from its (now defunct?) CVS server.
As part of sfslite, Max went on to develop tame—a language designed to simplify writing event driven programs with callbacks using libasync, and as part of my thesis, I started to write code in tame to help test it. When sfslite 1.x was released, tame had evolved from its v1 to v2 which changed the syntax. Since I never had the time to take the potential stability hit involved in upgrading, Chord still relies on sfslite 0.8, which Max no longer has time to maintain.
Which brings us back to recent activity on the chord mailing list: as people try to build Chord with ever newer compilers and Linux versions, they run into problems finding a version of sfslite and Chord that build successfully. I still don’t have the time to update the Chord tree to use newer versions of sfslite (though it might not be too hard), nor do I have access to all the resources I used to test it in deployment. So, people are left to patch their own source trees and cobble together working bits.
I wish I had better news for people interested in Chord—there are a lot of useful lessons embedded in our implementation, and many reimplementations that might benefit from those lessons. Perhaps one day soon I will try to write up some of those implementation lessons. Or maybe someone would be willing to take over maintaining Chord. If you volunteer, I’d be happy to help you get started!
.
2 Comments
Thanks for the insightful post. I am also facing this issue. I’ve spent a fair amount of time hacking on the code that will underpin my dissertation, and I don’t want to see that work go to waste. At the same time, keeping code in a working state as compilers, glibc, etc. advances is next to impossible.
The cleanest solution I’ve found so far is to publish VM images with everything required to build the code (plus the project already installed). This at least gives people something they can download and play with. For example if you go to http://www.metafuzz.com we provide a link to a VMWare image and instructions for installing. We also have Amazon EC2/Xen images, though they are not publicized at all yet (just use them for running experiments).
One downside of this is that creating these VMs for me is kind of heavyweight. As a result, the VM image lags behind the current working version of the code. This is a problem because people can then form opinions about the project based on old code without all the latest improvements, bug fixes, etc. For me this is an OK tradeoff, at least for now.
I note the asbestos project (also from David Mazieres and friends) has a qemu image as part of its build process. So whenever someone makes changes, they can build a redistributable image of the new OS. That’s pretty neat. Ideally such an image would include all the tools+code required to make a new such image; I don’t know whether the asbestos people have that or not.
A second downside is that my work needs to test other software to be useful. I’ve already found cases, like the vlc media player, where the latest and greatest just won’t play nice with my VM due to requiring new glibc, new sound libraries, or whatever. I don’t have a good solution for this in general.
Maybe that is less of an issue with a Chord implementation? To do an experiment with a new twist on the basic Chord protocol, would it be sufficient to have a bunch of Amazon EC2 images running old distros of Linux with the tweaked software? If so, you could partially address the bit-rot issue by publishing a basic “Chord Node” Amazon EC2 image which others could rebundle, using it as a basis for their own new Chord implementations.
VMs are a great way to let people test out a system, though reproducibly building good VMware appliances is still a somewhat difficult task, as far as I can tell. There are some tools like the Ubuntu VM Builder or VMware Studio but they are very new. Bryan Ford’s Virtual Executable Archives are a lighter weight example. Chord only speaks SunRPC so it isn’t very Web2.0-plug-n-play but yeah, some sort of VM image might be a good way to make it easier for researchers to use as a basis for comparison and to have a guaranteed functional build system to hack on. Now then, how to make sure it runs on EC2, Emulab, PlanetLab, VMware and qemu? And sadly, I probably won’t have time to do this either.
One Trackback
[...] A common fate for research code. Unless there’s (1) new work to be done or (2) someone else picks up the [...]