05 June 2007

Danger!

In Scheme and Ruby, procedures that modify variables or data structures are marked with a ! by convention:

  // Java
  static <T>
  void arraySwap(T[] arr, int i0, int i1) {
      T tmp = arr[i0];
      arr[i0] = arr[i1];
      arr[i1] = tmp;
  }

  ;; equivalent Scheme
  (define (vector-swap! v i0 i1)
    (let ((tmp (vector-ref v i0)))
      (vector-set! v i0 (vector-ref v i1))
      (vector-set! v i1 tmp)))

What's the point of this convention? Well, Scheme thinks this kind of behavior is slightly dangerous and needs to be highlighted. A little explanation:

If your variables or data structures can be changed, expressions have different values in different places within a program. The order of operations matters. The order of lines within a procedure matters. Procedures can subtly change global data as a side effect, and this can lead to rather subtle (mis)behavior.

Theorists call this language feature state. The word is intended in the sense of status, as in “the state of affairs” or “state of the Union”. C, C++, Java, Perl, Python, Ruby, and so on are stateful languages.

Maybe it sounds kind of, er, sweeping, or maybe absurd and stupid, to say that state is bad. But you already know this is true, at least sometimes: Global variables = bad. Unobvious side effects = bad. Leaving an operation half-finished when you throw an exception = bad (usually). Global state + threads = migraines.*

But what's the alternative? Well, there are alternatives, but it's beyond the scope of a quick blog post. Just like the boundary between static-typed and dynamic-typed code, the boundary between stateful and pure-functional code is getting a lot of attention these days. Interesting times.

What Scheme gets wrong here is that state-related bugs don't necessarily happen at the places where data is modified. In fact, the whole idea of marking the danger spots is a total whiff. The problem with state is that it permeates your program. The danger is everywhere a stateful variable or data structure is used, whether for read or write, directly or indirectly.

*It's not just that there are race conditions and potential crashes here. You can eliminate that with a single global lock. The problem is that a single global lock often isn't fine-grained enough; it creates a bottleneck. So you go to a fine-grained locking scheme, and that's where the migraines come in.

No comments: