23 March 2006

Domain errors (or, stupid compiler tricks)

Warning: the following blog entry is long and boring, and has a Static Typing Index of 9.4.

A domain error is when you pass an invalid argument to a function. I think domain errors are different from other runtime errors, and I think some PL support for domain checks might be a nice feature.

A domain error should be treated as a bug in the caller, not a runtime error in the callee. I think domain errors and asserts should be treated the same way. (That's not to say I like the way asserts work in, say, C++. I'm just saying they're the same sort of thing. If the check fails, that's a bug.)

Suppose you could set domain restrictions out in front of a procedure rather than explicitly checking them inside the body of a procedure.[Footnote 1] For example:

  // Return a sorted copy of a slice of the given list.
  List getSortedSlice(List source, int start, int stop)
    where source != null, 0 <= start, start <= stop, stop <= source.size()
  {
      List result = new ArrayList(source.subList(start, stop));
      Collections.sort(result);
      return result;
  }

What good is that?

  • The code is easier on the typing fingers than explicit asserts, but it could compile into equally good code. And it looks easier to generate good runtime error messages from it.

  • You could display it in auto-generated API documentation. This means a restriction only needs to be typed once, instead of once in the documentation (to tell the user about it) and again in the code (to enforce it). Nice. I think it's particularly nice for Java-style interfaces.

  • A function's domain restrictions could be automatically propagated to its callers. That is, if f(x) calls g(x), f should inherit g's restrictions on x—without f's author having to type them in again.

    This would cause errors to be caught sooner, closer to the source of the bug, when the stack is shorter. Stack traces are so extremely useful because they let you drill down to find the real source of a bug, which is often far down the stack from where the bug is detected. But it would be even more useful to catch the error at its source.

  • The compiler could try to prove that the arguments to a function call meet the domain restrictions. This is an optimization. The compiled function would have two entry points: one that checks arguments and one that doesn't. For a given call site, if the compiler can statically prove all the restrictions are met, the runtime check becomes redundant and can be optimized away. This may not be as hard as it sounds. If all the restrictions are propagated outward, for example, the runtime checks can be skipped. If the call is sheltered by an earlier call (or assert) in the same function that enforces the same or stronger restrictions, the runtime checks can be skipped. (I think some Java JIT compilers do a weak version of this for array accesses. Come to think of it, I bet they do it for NullPointerException checks, too; I haven't looked.)

  • And if the compiler can prove that an argument will fail a precondition, it can treat that as a compile-time error.[Footnote 2] Talk about catching an error at its source.

It amounts to: catching errors earlier is good. This is why I think some errors should be treated differently from others. Many errors, like “out of memory” or “connection refused”, are essentially unpredictable. But with a little effort, a compiler could probably very often find a way to detect that an assert is going to fail, or a function is going to be called with an invalid argument, or a null reference is going to be used, before it actually happens. It should be allowed to generate code that detects the error early. (Especially if it can boost performance by reducing the number of runtime checks.) If the compiler is able to detect that sort of error at compile time, so much the better. It should be allowed to flag it as a compile error.

I don't think this is the case with Java. I think the language spec requires javac to generate code that waits for the error to happen. Principle of least astonishment, or sheer blockheadedness? You decide.

Bonus cool bit: how much of this information could be obtained without the end user having to type anything? Consider the example above. The first restriction, source != null, is implied by the body of the function. source.subList() is called unconditionally on line 1 of the function; the code can't succeed if source is null. The other three restrictions could be put on List.subList() instead; they would automatically propagate outward to getSortedSlice(). Huh.

[1] You can already say things like this in languages like ML and Haskell that have pattern matching. They're called guards, and they look like this in Haskell:

getSortedSlice start stop source
  | 0 <= start, start <= stop, stop <= length source
    = sort $ take (stop - start) $ drop start $ source

If the guard isn't satisfied, the call fails at runtime. But that isn't really the design purpose of guards, as I understand them. I don't know if there's any Haskell compiler that attempts the sort of shenanigans I'm suggesting. I sort of doubt it.

Closer would be Eiffel, which has preconditions that (for all I know) might be intended to do this sort of thing.

[2] Conventional wisdom is that features like this lull the programmer into a false sense of security. If you can detect only X% of a particular class of errors at compile time, with X<100, you're actually doing the user a disservice. I think I disagree. I mean, the same could be said of unit tests, right? In fact, just running code and having it work instills a sense of confidence that may or may not be warranted. My experience is that all code has bugs, and every bug the compiler catches saves me time.

2 comments:

Mattie said...

A lot of programmers love the idea of Design by Contract. For most languages it involves a preprocessor that helps manage a variety of criteria like precondition/assertions, return validation, etc. Is this something you've played with?

jto said...

No, I haven't. I think DBC goes too far. I just want better error detection/behavior/messages with less code.

You know... this might sound silly, but the first problem I have with DBC is that you normally turn off all the checking for release. (I guess you don't have to, but something makes me uneasy about that. It has to do with the program semantics. Preconditions don't mean anything. Admittedly that's a philosophical quibble.)

As a practical matter, shouldn't the runtime checks stay in? If one fails, shouldn't the system at least stash a minidump somewhere? And shouldn't debug and release builds be as similar in behavior as possible?

By analogy: in Java you're not supposed to use assert to check arguments. Did you know this? Assertions are disabled by default, whereas parameter restrictions are supposed to be checked no matter what. So instead you have to type if (!...) throw new IllegalArgumentException("nice try moron");. Blaah.

Assertions are good. More flavors of assertions would be better. I would like to see:

assert foo == bar // always checked, unless the compiler can prove it

and:

debug assert hugeDir1.diff(hugeDir2).isEmpty(); // for costly checks--disabled by default, easily toggled at runtime using the IDE or some debug tool