Miscellaneous Thoughts
While this class focuses primarily on the practical matters of
getting work done right now, it would be a mistake to think that
it's not part of the continuum that includes other, more theoretical
classes.
To take one obvious example, we've used grep to search for
strings. But have you stopped to consider how grep looks to find
matches for the regular expressions you enter? If you think about
writing a program to scan a line and compare it to an arbitrary
expression, you'll probably see that it would be essentially impossible.
What grep (and sed and the other Unix programs that do regular
expression matching) do is to build a virtual finite-state machine
in memory based on the regular expression, and then run that machine
on the input strings. The program does the same conversion discussed
in Foundations of Computer Science. It may have seemed of merely
academic
interest that Deterministic Finite Automata have the same descriptive
power as regular expressions -- but it's precisely that isomorphism which
allows programs like grep and sed to exist, and it's precisely the
`purely theoretical' analysis of converting expressions into DFAs
that makes it possible for programmers to get such programs written.
Many students seem to have a sense that the mathematical and theoretical
side of computer science somehow doesn't affect them. That sense is a
mistake. A better sort algorithm, or a better expression->DFA algorithm,
directly affect how quickly and how well you get your work done. The
practice of computer science -- what we do in Lab Techniques every
day -- depends greatly on the thinking that is studied in your other
courses. There is no separation between them.
Here is a quote from Richard Stallman, founder of the Free Software
Foundation, about debugging programs:
If you're trying to debug a program, get into the symbolic debugger as
soon as possible. Assuming that you've got a reproducible test case,
don't bother wasting any time looking at the cases that don't
fail. Just use the debugger to find out what happens in a case that
does fail. Use breakpoints, and then stepping, to localize the problem
until you see where it's happening. Other approaches that might seem
like shortcuts usually end up taking longer. This one is sure and
steady, and it will generally find you the problem faster than any
other method. But when there is no reproducible test case, then that's
a different kind of fish, and you may have to put debugging buffers
and save information into your program, so you can get enough
information to figure out what happened.
Here's a quote that should make clear the importance of choosing
intelligent variable and function names, and writing good documentation
when you write the code:
Back in the day when I worked for Megalithic Insurance in downtown Los
Angeles, we had mainframe Cobol applications that were 20 or 25
years old. Actually, none of us knew their exact age because they
all pre-dated the IT staff that maintained them. The code was
actually commented with warnings about "Don't touch these next N
lines because we don't know what this subroutine does!" Kinda
scary.
The article is at http://www.internetweek.com/story/showArticle.jhtml?articleID=8600224.
Here is a quote from FreeBSD kernel hacker Matt Dillon, which touches
on how to write code which will work and which lets you know if it
doesn't:
I always document code as I work on it, to make it easier both for
me and for anyone else working on the system, and I am not shy
about putting assertions in the code for conditions that are supposed
to be true. I would much rather hit the assertion and panic early
then allow an incorrect assumption to slowly corrupt the system.
I started doing this in the 4.X codebase and it greatly contributed
to our famed stability in 4.0 and later releases.
Introduced instabilities, either due to bugs or purposeful assertions,
typically lasted no more then a few days. The result of this has
had a long term stabilizing effect on the codebase. Even now if
someone breaks something horribly in the system there's a good
chance their breakage will be noticed quickly due to assertions I
and others have strewn all over the VM system. Assertions are good.
[...]
This is why I hate bandaids. A bandaid, in the long term, only adds
to the instability of a system. The correct solution is to make
the code do what it is supposed to do and assert (panic the system)
if it does something it isn't supposed to do. You might get a few
panics in the short term, but in the long term you solve the problem.
Permanently. Bandaids have the effect of causing problems to return
and haunt you, sometimes for years. The dirty-cache-page bug was
in the system for at least 3 years because of a bandaid.
The whole interview is
here.
|