Tumbolia: January 2006

27 January 2006

LINQ, intro

Sometime last year, while I wasn't paying attention, Microsoft announced something called LINQ. It stands for Language Integrated Query. You've seen this sort of thing before: you can embed a database query in your code and it gets compiled into something that queries the database and automatically stores the results in local variables. Never really took off.

LINQ is a little different. For one thing, you can query plain old boring in-memory data, including plain-vanilla arrays, with the same syntax. You can query XML documents the same way.

They did this by adding a largeish set of .NET class libraries and several major extensions to the C# and VB languages. Read about it in detail: Word .doc, HTML. Browse some example code.

More to come.

Super bowl pick

So my last set of predictions was horribly, horribly wrong.

The Steelers keep surprising me. Going into the Colts game, I knew their defense was incredibly good at creating interceptions, even on seemingly sensible throws, and stopping the run. What I didn't count on was their ability to (a) stump Peyton by betraying no hint of what they'd do on a given play (it seemed like if this were possible, teams would've been doing it all season long); and (b) keep sacking him all day. This Sunday, they looked more like I had expected. Which is still pretty intimidating.

What was amazing this week was the Steelers offense. The commentators liked the play-calling. I don't know if it was that or something else, but it was awfully impressive.

Antwaan Randle El is a fast, fast man.

I'm going to go way out on a limb and pick the Steelers to win the Super Bowl. If you're Matt Hasselbeck, I just don't see how you're supposed to prepare for the Steelers defense. Plus the Steelers are fun to watch. So by my logic, they can't possibly lose. Looking forward to it.

19 January 2006

random_trial, abbreviated

I just realized that the function random_trial in my previous post could be written like this:

random_trial n target =
    any (>= target) . map length . group . sort . take n . map random_birthday

The meaning is exactly the same. It's very similar to a sh pipeline, only written backwards. You might write something like this:

cat ./random-numbers.txt | awk '{print random_birthday($1)}' | head -n $n | sort | group.sh | awk '{print $NF}' | any-greater-than-or-equal.sh $target

The Haskell function is analogous, only data flows right-to-left.

Haskell

I've been reading about Haskell and trying to write a few little programs. Here's a factorial function in Haskell:

fac n = product [1..n]

Is that cheating? Here's another:

fac n = foldr (*) 1 [2..n]

Here foldr is the "fold" function I mentioned a few days ago.

In fact, product is probably implemented like this:

product = foldr (*) 1

Some first impressions:

Haskell programmers apparently don't use comments. In A Gentle Introduction to Haskell, there's a comment in section 2.4, partly to illustrate the syntax. The next comment is in section 7.2. In the entire tutorial there are just six comments, all of them one-liners. (Contrast the Python tutorial.)

I think part of the reason for this is that Haskell functions are often extremely short—2-5 lines maybe. What are you going to do, comment every one? Also, what those functions do is often very abstract, and hard to explain in English.

On the other hand, Haskell is the first language I've come across in years that has meaningful, attractive support for literate programming. So perhaps the idea is that Haskell programs need more explanation than mere comments can provide.
Man, pure functional programming is terse and powerful.
Infinite data structures are somewhat useful.
Lazy evaluation turns everything inside out. In a procedural language, the expression a(b + c(d + e(f(), g()))) would call f() or g() first. In Haskell, a() is called first. The argument is evaluated only if a() needs it.
Monads are an interesting feature, but ultimately more silly than useful. I haven't written one yet.
Some parts of Haskell's syntax are just awesome. The power of functional programming is strongly tied to syntax that makes it easy to define and call functions. So for example, a little lambda:
```
(lambda (y) (f x (g x y))   ; Lisp
lambda y: f(x, g(x, y))     # Python
{|y| f(x, g(x, y))}         # Ruby
f x . g x                   --Haskell
```

(The . operator is function composition.)

Certainly Haskell can't handle all infinite calculations. For example,
```
Prelude> foldr (&&) False (repeat False)
False :: Bool
(29 reductions, 54 cells)                
Prelude> foldl (&&) False (repeat False)
ERROR - C stack overflow
```
Bummer. As far as I know, there's nothing stopping the Haskell compiler from figuring out that this should evaluate to False. But in a way that would be even worse: whether your program runs instantaneously or overflows the stack would depend on whether the particular compiler you're using is smart enough to prove the result.

I suspect Haskell isn't allowed to be that smart. It's probably required to apply functions before it's allowed to even look at arguments. But I'm not sure.
The deterministic, stateless nature of the language makes it a royal pain to do anything involving random numbers. Obtaining the random-number generator is no problem. But you have to pass it throughout your program wherever it's needed, rather than calling rand() whenever you happen to want a random number.

The lack of mutable data structures has me writing code that's less lazy than the equivalent idiomatic Python. (This is a bad thing.) For example,

def random_trial(n, target):
    """ Generate up to n birthdays.  If at least 'target' people in the
    group have the same birthday, return True.  Otherwise return False.
    """
    birthday_histogram = [0] * 366
    for i in xrange(n):
        bd = random_birthday()
        birthday_histogram[bd] += 1
        if birthday_histogram[bd] >= target:
            return True  # we're all done, bail out right now
    # If we get here, we managed to choose birthdays for all n people
    # without any duplicates.
    return False

Here's the Haskell:

-- Generate up to n birthdays.  If at least 'target' people in the
-- group have the same birthday, return True.  Otherwise return False.
-- ('rs' is an infinite list of pseudorandom numbers, which the caller
-- must provide.)
random_trial n target rs = let
    bs = take n (map random_birthday rs)  -- birthdays
    fs = map length (group (sort bs))  -- frequencies
    in any (>= target) fs

In Python, during the loop, if I reach the victory condition, I detect it immediately and return True. The Python code is O(n). In Haskell, I can't generate a histogram on the fly. I have to do it in a separate step. And the simplest way to obtain the histogram (map length . group . sort) requires a sort, which is at least O(n log n).

Arguably the problem is the algorithm. Hmmm.

14 January 2006

Predictions

Both AFC games are looking like nail-biters right now, but I predict they won't be that close. My picks: Seahawks, Patriots, Colts in a blowout, and I don't know about the other one. I haven't seen the Bears or the Panthers play.

11 January 2006

Thought for the day

“Computer Science in the 1960s to 80s spent a lot of effort making languages which were as powerful as possible. Nowadays we have to appreciate the reasons for picking not the most powerful solution but the least powerful.” —Tim Berners-Lee

Hmmm.

I think I might start getting heavily into computer science, particularly programming language design. If I think I can keep up this level of interest, I might go back to school.

If you know Haskell, check this out: A tutorial on the universality and expressiveness of fold. fold is a lot like Python's built-in reduce() or C++'s std::accumulate(). But Haskell's syntax is such that fold is easy to use. Whether that's good or bad I'll leave to you.

03 January 2006

What debt/GDP means

Two months ago, I posted a graph of U.S. debt/GDP over the past 55 years. Here's my explanation.

The U.S. GDP is the total value of all goods and services produced in the U.S. each year. It measures of the size of the U.S. economy.

So the ratio of national debt to GDP (debt/GDP) is a measure of debt burden—how big the debt is, compared to the economy. This is sort of like looking at how much an individual owes, compared to his or her salary. Right now the number is 0.643, which means the national debt is 64.3% of GDP for the past 12 months.

Observations:

Wow, did we have a lot of debt after WWII. Apparently fighting Nazis isn't cheap.
We never paid it down. The dollar value of the national debt, adjusted for inflation*, hovered around $2 trillion (in 2005 dollars) from 1950 to about 1980. Then it took off upwards. Today it's $8 trillion.
For a while, it looked like we were outgrowing the debt. By 1981, the GDP was three times what it was in 1950, in real terms.
We did not outgrow the debt in the Reagan years. And we're not outgrowing it now. Since 1981, except for 6 years of Clinton/Gingrich deadlock (high taxes, slow spending growth), deficit spending has outstripped GDP growth.

The higher debt/GDP gets, the greater the risk of difficulty financing the debt. This ratio is not now at an all-time high; after WWII it was actually much higher than it is today. But it is clearly headed up. This will only get worse as Medicare and Social Security costs rise.

Republicans who think the U.S. can get rid of the debt with pro-growth policies are wrong. Reagan tried it, and debt/GDP went up 36% in 8 years. You actually have to limit spending.

I think we can afford to, and should, pay down the debt in real terms. (Naturally, I think we can do it by eliminating stupid programs.)

*Using the Consumer Price Index, all items. Again, I don't know what I'm doing here, so your mileage may vary.

Basketball and information graphics

Memphis lost yesterday.

The Game Flow chart on that page is the best use of Flash ever. (They've probably had this for years, but I've never seen it before.) At a glance you can tell that Memphis started out awful, rallied in the second half, and didn't score in the last minute of the game. I think Edward Tufte would approve.

Tumbolia - math is stranger than fiction