Darren Provine at Rowan University


(Click here for text version.)

Miscellaneous Thoughts

While this class focuses primarily on the practical matters of getting work done right now, it would be a mistake to think that it's not part of the continuum that includes other, more theoretical classes.

To take one obvious example, we've used grep to search for strings. But have you stopped to consider how grep looks to find matches for the regular expressions you enter? If you think about writing a program to scan a line and compare it to an arbitrary expression, you'll probably see that it would be essentially impossible. What grep (and sed and the other Unix programs that do regular expression matching) do is to build a virtual finite-state machine in memory based on the regular expression, and then run that machine on the input strings. The program does the same conversion discussed in Foundations of Computer Science. It may have seemed of merely academic interest that Deterministic Finite Automata have the same descriptive power as regular expressions -- but it's precisely that isomorphism which allows programs like grep and sed to exist, and it's precisely the `purely theoretical' analysis of converting expressions into DFAs that makes it possible for programmers to get such programs written.

Many students seem to have a sense that the mathematical and theoretical side of computer science somehow doesn't affect them. That sense is a mistake. A better sort algorithm, or a better expression->DFA algorithm, directly affect how quickly and how well you get your work done. The practice of computer science -- what we do in Lab Techniques every day -- depends greatly on the thinking that is studied in your other courses. There is no separation between them.


Here is a quote from Richard Stallman, founder of the Free Software Foundation, about debugging programs:

If you're trying to debug a program, get into the symbolic debugger as soon as possible. Assuming that you've got a reproducible test case, don't bother wasting any time looking at the cases that don't fail. Just use the debugger to find out what happens in a case that does fail. Use breakpoints, and then stepping, to localize the problem until you see where it's happening. Other approaches that might seem like shortcuts usually end up taking longer. This one is sure and steady, and it will generally find you the problem faster than any other method. But when there is no reproducible test case, then that's a different kind of fish, and you may have to put debugging buffers and save information into your program, so you can get enough information to figure out what happened.


Here's a quote that should make clear the importance of choosing intelligent variable and function names, and writing good documentation when you write the code:
Back in the day when I worked for Megalithic Insurance in downtown Los Angeles, we had mainframe Cobol applications that were 20 or 25 years old. Actually, none of us knew their exact age because they all pre-dated the IT staff that maintained them. The code was actually commented with warnings about "Don't touch these next N lines because we don't know what this subroutine does!" Kinda scary.
The article is at http://www.internetweek.com/story/showArticle.jhtml?articleID=8600224.


Here is a quote from FreeBSD kernel hacker Matt Dillon, which touches on how to write code which will work and which lets you know if it doesn't:
I always document code as I work on it, to make it easier both for me and for anyone else working on the system, and I am not shy about putting assertions in the code for conditions that are supposed to be true. I would much rather hit the assertion and panic early then allow an incorrect assumption to slowly corrupt the system. I started doing this in the 4.X codebase and it greatly contributed to our famed stability in 4.0 and later releases.

Introduced instabilities, either due to bugs or purposeful assertions, typically lasted no more then a few days. The result of this has had a long term stabilizing effect on the codebase. Even now if someone breaks something horribly in the system there's a good chance their breakage will be noticed quickly due to assertions I and others have strewn all over the VM system. Assertions are good.

[...]

This is why I hate bandaids. A bandaid, in the long term, only adds to the instability of a system. The correct solution is to make the code do what it is supposed to do and assert (panic the system) if it does something it isn't supposed to do. You might get a few panics in the short term, but in the long term you solve the problem. Permanently. Bandaids have the effect of causing problems to return and haunt you, sometimes for years. The dirty-cache-page bug was in the system for at least 3 years because of a bandaid.

The whole interview is here.

(US flag) This page's URI: http://elvis.rowan.edu/~kilroy/class/lab_tech/?thoughts
Last modified: Wednesday, 14 January 2004, 2:28:20pm