Thoughts on Systems

Emil Sit

Systems Researchers: Justin Cappos

Justin Cappos received his PhD from the University of Arizona under the supervision of John Hartman. I met Justin several years ago at a PlanetLab Consortium meeting when he was starting to work on Stork, a system to simplify package deployment. He is currently a Post Doc at the University of Washington working with Tom Anderson and Arvind Krishnamurthy.

What did you build?
The most relevant / longest term projects are: Stork, San Fermin, and Seattle.

Stork is a package manager that has security and functionality improvements over existing Linux package managers. Some of the advances we made in Stork have been adapted by APT, YUM, YaST and other popular Linux package managers.

San Fermin is a system for aggregating large quantities of data from computers. San Fermin provides the result faster and with better fault tolerance than existing techniques.

Seattle is an educational testbed built from resources donated by universities all around the world. The universities run a safe, lightweight VM that students from other universities can run code in.

Tell us about what you built it with.
I used Java for San Fermin because I needed to leverage existing Pastry code that was in Java. I used Python for Stork and Seattle. I found Python to be far superior for large projects (other languages I’ve used are Java, C, C++, QBASIC, and Pascal). Python has been a dream come true because it’s great for prototyping, easy to learn, and the resulting code is readable (so long as you have sensible style constraints on the written code).

Perhaps the most useful thing is getting other developers involved. I like to do the initial prototyping myself, but after that it is great to have others helping out.

How did you test your system for correctness?
There are whitebox / unit tests as well as blackbox / integration tests for most parts of the systems. The time that we spent building thorough test cases really paid off because it simplifies debugging.

I like to use my systems in real world environments with outside users, so the system is never done thus correctness is an iterative process. If I’m aware of bugs, we fix them. If I’m not aware of bugs, we’re adding features based upon user requests (and therefore may be adding more bugs for us to fix later). In general, the fact that we have users that rely on the software over long time periods is a testament to its stability which is related to correctness.

To more specifically answer the question you are really asking, I usually run my code by hand and evaluate it in small / constrained environments (turning these test runs into unit tests). I also follow a philosophy where I try to “detect and fail” as much as possible. I care more about correctness than performance (at least initially) and so add many redundant checks in my code to catch errors as soon as possible.

I find that if I’m careful and thorough when writing my code, I spend very little time debugging. I probably spend about 30% of the time writing code, 40% writing comments / docs (which I normally write before / during coding), 20% of the time writing test code for individual modules, and about 10% debugging after the fact. I think part of this is I’m really careful about checking input and boundary conditions and so I can normally pin-point the exact cause of a failure.

In terms of problems when writing code, I generally only use standard libraries and code I’ve written. I don’t depend on third party code because I don’t know what level of support it will have. Also, since I usually code in Python, it’s easy for me to add functionality to do whatever I need.

How did you deploy your system?  How big of a deployment?
Stork has been deployed on PlanetLab for about 6 years. For the majority of the time we’ve been deployed on every working PlanetLab node. Stork has managed > 500K VMs and when I last checked was used daily by users on around two dozen sites. We initially used AppManager to deploy Stork, but since have been using PlanetLab’s initscript functionality.

San Fermin was deployed on PlanetLab for use in combining and managing logs from Stork. However, we found that San Fermin was dysfunctional due to difficulties in getting Pastry to start reliably and work when non-transitive connectivity occurs. As a result, we mainly ended up using San Fermin as a publication vehicle.

Seattle is currently deployed on more than 1100 computers around the world. We have done our initial deployment by encouraging educators to use our platform in networking and distributed systems classes. Our longer term plans involve using Seattle to build a research testbed of 1 million nodes.

How did you evaluate your system?
With San Fermin (and other systems research I haven’t mentioned), there is fairly clear related work to compare against. In some cases, the biggest challenge has been getting the existing research prototype code from another project to run well enough for comparison.

For the work that I’ve done where I’ve focused on impact over publication impact (Seattle / Stork), evaluation is much more difficult because these aren’t incremental improvements over existing models and systems. These systems break the mold in terms of security and / or functionality so sometimes it’s difficult to know how to compare them to existing work.

Comments