Thoughts on Systems

Emil Sit

Mar 24, 2006 - 3 minute read - Rants bike cookies privacy security

Boycott Performance Bike

Boycott Performance Bike.

Performance is a company that sells bike components. They’re pretty big and have acquired their former competitors like Nashbar and SuperGo. That’s too bad because I really don’t like Performance. Maybe you shouldn’t either.

In 2001, Kevin Fu and I (along with some other members of the Applied Security Reading Group) were looking into the security of web cookies. We broke some cookie authentication schemes and made some recommendations about how to improve them. Most companies were very receptive and thankful when we contacted them privately and pointed out potential problems. Performance, Inc did not. Their site in 2001 had several problems, most notably guessable session ids that would allow anyone to access personal (e.g., password, address, credit card numbers) about other customers. They delayed for over a month on fixing these problems and suggested simply that I order over the phone if I was worried. That’s not the answer you want to hear from a company with your credit card number: anyone could have stolen as many credit cards as they wanted with a simple Perl script.

So, instead of making phone orders or checking if they’ve fixed their problems, I decided to delete all my credit card information from their website (as best I could), sign off all their mailing lists and never buy from them again. (It does look like their security scheme is slightly different now, but I don’t know how much better.)

On Monday, I got an unsolicited e-mail from them:

Welcome and thank you for subscribing to our specials email list!

As a subscriber, you will be the first to receive notice of all our special online promotions. Plus, you are now eligible to receive exclusive, online deals not offered to anyone else.

Thanks again, we know you will enjoy receive [sic] our mail. And remember, all online purchases are 100% guaranteed by Performance.

I didn’t subscribe to their list; I can only conclude they went through a list of people who had stopped ordering from them and added them to this specials list. And today, they sent me their latest specials.

I hate that I don’t have control over whether companies send me ads/catalogs and that I have to explicitly tell them not to sell my address, preferences, and who knows what else (“opt out”). I hate that most companies don’t let me tell them to delete information about me. And most, I hate those companies that still contact me (and, potentially, share my information) after I’ve told them not to.

I can’t fix privacy laws, but I’ve switched to supporting my local bike store: they don’t send me spam or keep my credit card information online. My suggestion for you? Boycott Performance Bike.

Mar 23, 2006 - 3 minute read - Research data-analysis tools workflow

Difficulties in data analysis

In the course of my research, I tend to do a fair amount of data analysis and reduction. This ranges from simple statistics to in-depth examination of traces. While working on the camera-ready for our NSDI paper, I found myself thinking about Vern Paxson’s IMC paper on Strategies for Sound Internet Measurement (PDF. The paper is filled with good material and I long for the kinds of generalized tools he describes for managing an analysis workflow.

A typical analysis might go something like this:

  • Track down or collect some data.
  • Store it someplace accessible: a local directory or SQL database.
  • Hack up some script or program to grab some numbers.
  • Through together some numbers in gnuplot.
  • Discover something odd about the data, resulting perhaps in a change to the collection or analysis programs.
  • Repeat, as your data and analysis tools grow in complexity…
  • Assemble your results into a paper.

If you’re disciplined, you’ll be writing down what you do and why in a notebook. You’ll learn never to manually edit any data file to fix problems (because six months from now, you won’t have a clue what edits you made and why). Hopefully, you are storing your source code in a version control system like CVS or darcs.

Section 4 of Vern’s paper talks in some detail about this, recommending that all your analysis flows from data through to results mediated by a single master script, filled with caching to speed repeated runs. This is an excellent idea but thus far involves a fair amount of hacking on your own, tailored to your particular problems.

And, there are still problems. For example,

  • Which version of the tools did I use to generate a particular set of results? I didn’t have any uncommitted changes in my scripts when I ran it, did I?
  • Datasets can get pretty large. You probably don’t want to include them in the version control repository you are using for your actual paper; how do you relate versions of the paper to versions of your analysis toolkit and results?
  • What if I am preparing my camera ready and run some updated scripts. How will I remember to update all the things derived from the old results? (Even supposing that the conclusions didn’t change much.)

I can see some hope for solving these problems. Systems and network researchers tend to work in Unix environments that have tools and infrastructure available to make this easy—I wouldn’t have a clue how to manage a complex paper with multiple authors using MS Word and Excel, though perhaps it could be done. I’ve heard of nice hacks: for example, for one roofnet paper, the authors set up the Makefiles to query a centralized SQL database in order to generate graphs. Nick Feamster and David Andersen have started a project called the Datapository (paper) which works in this general direction.

However, I’m not aware of any pre-built and general toolkit or framework for hooking together tools to simplify data analysis. But there must be hundreds of little hacks people have developed to get things done. What data analysis and paper writing toolchain have you built that you are most proud of? What was the biggest disaster? Send me e-mail or post a comment. I have some of my own thoughts on how to build such a system, which I’ll try and post soon.

Mar 18, 2006 - 5 minute read - Personal abstraction education math mit programming

Math for people

Steve Yegge’s article, “math for programmers,” has been making the rounds. His thesis is that mathematical breadth in pre-college education would be more valuable than the attempt to provide mathematical depth in few apparently arbitrary areas (like geometry). He argues that specific math skills taught in grade school (like long division) are not necessarily of tremendous use to the average programmer. He writes:

Which is why I think they’re teaching math wrong. They’re doing it wrong in several ways. They’re focusing on specializations that aren’t proving empirically to be useful to most high-school graduates, and they’re teaching those specializations backwards. You should learn how to count, and how to program, before you learn how to take derivatives and perform integration.

I don’t really agree with Steve. Programmers or not, the main problem is that people aren’t taught the ability to think clearly and to abstract. Second, depth is necessary for learning and an early emphasis on breadth would weaken understanding.

From my experience helping my classmates with math from high school through college, I think that many students have one key problem: they don’t know basic abstract mathematical concepts and how they build upon each other. As a result, their knowledge of math consists largely of patterns and rules; their ability to do math rests in their ability to match a given problem to the correct rule. They don’t see that math is a framework of abstractions.

When you learn theoretical math, you start from axioms and definitions. From these you prove theorems. From theorems you develop frameworks. Elementary education starts math with groups and fields: addition, multiplication, and their inverses. It’s probably tough to teach second graders to think abstractly about this, so groups and fields are glossed over in favor of concrete examples: 2 + 2 = 4. With any luck, concrete knowledge results in an ability to manipulate natural numbers. Then rationals or, as they are called in fourth grade, fractions. Soon variables (“x”) and equations are introduced. Of course, variables are just an abstraction of rules that were taught earlier they reveal yet more structure.

Ideally, you build abstractions to match this structure and to provide intuition about how mathematical objects behave. Eventually you think in terms of those objects, without worrying about the foundation. Without the abstraction though, the concepts start to grow unmanageable. Fractions are something special, not an extension of division. Exponents and logarithms aren’t really just talking about multiplication. Without abstraction, you have to remember how to handle everything separately. Further, the result of missing abstractions gets worse with time: there are a lot of rules and each year they teach you more. Instead of building confidence, they build fear.

Good students abstract. They don’t have to study, memorizing tens of special cases and specific formulas. When I took probability, Professor Rota used to say about statistics, “it’s just balls into boxes.” Bose-Einstein, Maxwell-Boltzmann, it’s “balls into boxes.” The difference is perhaps in whether the balls are the same color or the size of the boxes, but the concept is the same. More generally, you can get pretty far in engineering if you really understand conservation of momentum. But if you don’t understand the underlying abstraction, the key idea, your brain has to work a lot harder.

The problem is that our early education doesn’t emphasize building mental models and frameworks of abstraction. Naturally, computer programmers (and computer scientists) tend to be good at this. (Or perhaps people who are good at this make good programmers.) The concept of abstraction is critical to computer science. Build the right abstractions in your software and you can develop more complex programs. If you build the right math abstractions in your mind, you can solve more complex math problems.

My second thought is that depth is important for learning. Mastery of concepts takes a long time—Peter Norvig has an excellent essay arguing that it takes about ten years to develop expertise in any field. Getting good at math is no different: practice. To learn how to do fractions, you have to do fractions. To learn calculus, you have to do calculus. As much as I love MIT and its firehose style of education, I feel that I retained a lot more of high school because it was repetitive and in-depth. MIT doesn’t always give the repetition and time needed to really internalize the mechanics.

Depth gives you a better understanding of details. In programming, abstraction hides details but it is often necessary to work with those hidden details. When you encounter something counter-intuitive or perhaps simply non-obvious, it helps to be able to dig down and debug the problem. This happens just as much in math or physics: instead of debugging, we “go back to first principles.” Being able to do this helps you find mistakes and solve problems that you’re not too sure about.

Steve is right: programming early is important. But, I think I started the other way, as my ability to abstract and think clearly from math helped develop ability to program and debug. To me, focusing on these skills in math will help much more than learning the names and broad facts about many subdisciplines of math. Once you have these skills, learning something specific will be that much easier, whether or not you are a programmer.

Mar 17, 2006 - 3 minute read - Hacking ergonomics yoga

Workspace hacking

One thing I spend a lot of time doing is tweaking things I use to get them the way I like. For example, this website. But today, I want to comment briefly on the physical infrastructure I have at work.

One side effect of doing yoga is that I have become very aware of how I hold my body and how it feels while I’m working. For example, I used to sit with one leg crossed and my body turned sideways at my desk. I spent most of my time with my head angled down to look at my laptop’s screen. The laptop keyboard is not full-sized so my shoulders rounded forward to bring my arms in, and I would slump into my chair. This couldn’t possibly be good for anyone and no amount of xwrits was going to solve that problem.

Fixing this came partially just by being aware of how I was holding myself. Then, I got an external LCD display and setting it on a big dictionary and some conference proceedings to get it at a good height. (My old officemate used three reams of paper.) I set up my X server to manage the displays as separate screens (since they have different resolutions) and now I can work with most of my distractions (IM, E-mail) down on my laptop screen while my editor stays comfortably in front of me. But the most interesting hack I’ve come up recently involves my keyboard setup.

If you want to get a more ergonomic keyboard, there are many options from the old Microsoft ones to fancy Kinesis adjustable ones. There are specialized one-handed entry methods ranging from modifications of keyboards to chorded devices. And while the MIT ATIC lab offers some of these for trial, I was a little bit put off by the fact they all cost hundreds of dollars. Sure, a small price to pay for the health of your arms, but I wondered if there was an easier option that didn’t require lots of mental retraining.

One day, I was helping someone debug some code and wanted to find a way so we could take turns typing at the same monitor without having to keep moving back and forth. A look around my office revealed the answer: an external keyboard. Five years ago, everyone in my group had an IBM external keyboard (including the nipple mouse) for conformance with our massive Thinkpad collection. (Nowadays, half of us have Mac laptops.) Grab one of these, a PS2/USB adapter, plug it into my laptop and Linux/XOrg set it up as a second keyboard. We could use either one to type.

But, aha! So could I! Instant adjustable split keyboard. Now I use my left hand on the laptop’s built in keyboard and I use my right hand on the external one. I can use the built-in mouse and buttons on either keyboard (which means I don’t have to worry about which side of the keyboard to put it on). It took me about an hour to get used to the arrangement but now I can type at full speed with this split keyboard. My arms are much more comfortably spaced. Because I use an external display, I can keep that centered in front of me. And this arrangement can be easily replicated anywhere I can find a USB keyboard.

The only downside of these improvements is that it makes hacking on my laptop alone that much less comfortable. And I used to do it all day long.

Mar 16, 2006 - 1 minute read - Hacking hosting

Colophon, Part 2

My new hosting provider is Nearly Free Speech and I am now running .

One benefit of Wordpress over Typo is the maturity of its plugins and regular releases of the software itself. Even without much knowledge of php or Wordpress internals, I think I’ve been able to get my site set up to behave the way I like. I’m using Ultimate Tag Warrior 3 which let me set up the UI side of tags. I wish it were a little easier to specify tags in mtsend which allows me to use vim to edit my posts instead of using the web interface; I guess it wouldn’t be entirely clear how to hook into xmlrpc.php.

I’m thus far very happy with Nearly Free Speech’s hosting. In addition to having more freedom in how I lay out my content (including an entire MySQL process for myself), I have the freedom to manage my own DNS, ssh shell access, and they even make it easy to hook up to a log parsing front-end, awstats.

Now that I have a framework for publishing in order, maybe it’d be time to write something.

Mar 13, 2006 - 1 minute read - Hacking hosting

Migration

It seems that Typo lacks good ping/trackback integration (e.g. for pinging services like Technorati). I’m playing with moving over to Nearly Free Speech for hosting: their prices are good and their infrastructure seems more advanced than what PlanetArgon currently has. There will be some DNS cache incoherency and URL changes as I migrate over and get working. So far, I miss Typo’s tagging features—WordPress has a plethora of semi-supported looking plugins but no real story on which seems to be the best way to do it (and which interact best with mtsend.py).

Feb 2, 2006 - 2 minute read - Personal

Online memories

This morning I’ve been thinking a bit about my early online antics. It’s been fun finding out what’s on the Internet about my old haunts.

I started out using a nice US Robotics modem that my dad had borrowed from work and dialing into local bulletin board systems using Telix for DOS. Mmm. I don’t remember the names of most of them but I found an abandonded looking website for the Relaynet, a BBS version of Usenet and the first set of discussion forums I participated in. I would dialup each day to the Running Board (a BBS run by Howard Belasco), and download packets of messages (in QWK archives) and read them with a specialized offline reader (the Silly Little Mail Reader). There was a whole class of these programs and little add-ons for managing signatures (called taglines, if memory serves). I think I was a big fan of discussion of current events (perhaps conference #7) and Star Trek (perhaps conference #45).

Later on by the time I was in high school, I started to learn about Unix, in part through a SunOS computer my dad had at school and an AIX workstation my high school had won as part of some super computing contest. That contest also came with a Class B address block, though we were later downsized. You can also find old posts from me on Usenet from that time when I was trying to figure out AIX.

A friend at school introduced me to MUDs: we were trying to implement a virtual version of our high school and I remember spending two hours a day on the computer in the library one summer hacking on Merc code. I think that’s how I learned C. A (perhaps unfortunate) side effect of this was that I spent a lot of time playing a MUD called HiddenWorlds. Later I would advance to socializing on IRC, but that perhaps is a story for another day.

Good times.

Jan 14, 2006 - 1 minute read - Research

Selling the old office

MIT is selling Technology Square. They bought the building back before the Stata Center was finished, and before they added what is now the Novartis building. The Novartis building is physically attached to the old NE43 and blocked out all natural light from my old office. I remember thinking that there seemed to be a lot of empty biotech lab space being constructed in Cambridge and at least it sounds like the vacancy rate might be falling.

I’m happy to be in my new building, even if it has its issues. Complaining that sunlight causes glare in the afternoons is preferable to not having any sunlight at all!

Jan 2, 2006 - 2 minute read - Hacking dokuwiki moinmoin wiki

Migrating from MoinMoin to DokuWiki

On our webserver, we run a wiki for tracking various administrative bits. Today, we migrated it from MoinMoin 1.3 to DokuWiki. This was not entirely trivial but at the same time not that difficult. The following method doesn’t preserve history, users or attachments, but seems to basically work. The handling of categories could probably use a little work though.

  1. First, find the latest revision of the files and move them to the right place. In sh, this is expressed roughly as:

    MOINWIKI=XXX
    DOKUWIKI=YYY
    
    cd $MOINWIKI/main/data/pages
    for i in *; do
        r=$(ls -tr $i/revisions 2>/dev/null | tail -1);
        if [ "$r" ]; then
            cp $i/revisions/$r $DOKUWIKI/data/pages/$i.txt;
        fi
    done
    
  2. Next, you’ll need to handle any categories manually and move them into subdirectories for namespaces. This could probably be automated by splitting based on the (2f)s in the filenames.

  3. Since DokuWiki prefers lowercase filenames, go ahead and lowercase all the filenames, e.g.:

    rename '$_ = lc($_)' *.txt
    

    using this handy perl script. You can repeat this for each namespace/category.

  4. Hack the markup. Create the following migrate.pl script:

    #!/usr/bin/perl -ni.bak
    
    BEGIN { $readblank = 0; }
    
    $readblank = 1 if /^$/;
    
    # Fix line-endings for Unix
    $//;
    
    # Ignore all pragmas
    next if !$readblank and /^#/;
    
    # Fix different a href linking styles
    s/\[(http:\S+)\s+(.*?)]/[\[$1|$2\]]/g;
    
    # Fix lists that aren't indented enough
    s/^ \*/  \*/;
    
    # Fix ordered lists
    s/^(\s+)\d+\./$1-/;
    
    # Fix code blocks
    s/^{{{$/<code>/;
    s/^}}}$/<\/code>/;
    
    # Fix monospace/code text
    s/`/''/g;
    s/{{{(.*?)}}}/''$1''/g;
    
    # Fix headers
    s/^= (.*) =$/====== $1 ======/g;
    s/^== (.*) ==$/===== $1 =====/g;
    s/^=== (.*) ===$/==== $1 ====/g;
    
    print;
    

    and run it on each file (find . -name "*.txt" | xargs -n 1 perl migrate.pl).

  5. Make sure your local.php sets $conf['camelcase'] = 1.

  6. You’ll need to move your attachments from the various attachments subdirectories and move them into DokuWiki’s media subdirectory and fix up the links.

That’s it!