19 January 2006

Haskell

I've been reading about Haskell and trying to write a few little programs. Here's a factorial function in Haskell:

fac n = product [1..n]

Is that cheating? Here's another:

fac n = foldr (*) 1 [2..n]

Here foldr is the "fold" function I mentioned a few days ago.

In fact, product is probably implemented like this:

product = foldr (*) 1

Some first impressions:

  • Haskell programmers apparently don't use comments. In A Gentle Introduction to Haskell, there's a comment in section 2.4, partly to illustrate the syntax. The next comment is in section 7.2. In the entire tutorial there are just six comments, all of them one-liners. (Contrast the Python tutorial.)

    I think part of the reason for this is that Haskell functions are often extremely short—2-5 lines maybe. What are you going to do, comment every one? Also, what those functions do is often very abstract, and hard to explain in English.

    On the other hand, Haskell is the first language I've come across in years that has meaningful, attractive support for literate programming. So perhaps the idea is that Haskell programs need more explanation than mere comments can provide.

  • Man, pure functional programming is terse and powerful.

  • Infinite data structures are somewhat useful.

  • Lazy evaluation turns everything inside out. In a procedural language, the expression a(b + c(d + e(f(), g()))) would call f() or g() first. In Haskell, a() is called first. The argument is evaluated only if a() needs it.

  • Monads are an interesting feature, but ultimately more silly than useful. I haven't written one yet.

  • Some parts of Haskell's syntax are just awesome. The power of functional programming is strongly tied to syntax that makes it easy to define and call functions. So for example, a little lambda:

    (lambda (y) (f x (g x y))   ; Lisp
    lambda y: f(x, g(x, y))     # Python
    {|y| f(x, g(x, y))}         # Ruby
    f x . g x                   --Haskell
  • (The . operator is function composition.)

  • Certainly Haskell can't handle all infinite calculations. For example,

    Prelude> foldr (&&) False (repeat False)
    False :: Bool
    (29 reductions, 54 cells)                
    Prelude> foldl (&&) False (repeat False)
    ERROR - C stack overflow

    Bummer. As far as I know, there's nothing stopping the Haskell compiler from figuring out that this should evaluate to False. But in a way that would be even worse: whether your program runs instantaneously or overflows the stack would depend on whether the particular compiler you're using is smart enough to prove the result.

    I suspect Haskell isn't allowed to be that smart. It's probably required to apply functions before it's allowed to even look at arguments. But I'm not sure.

  • The deterministic, stateless nature of the language makes it a royal pain to do anything involving random numbers. Obtaining the random-number generator is no problem. But you have to pass it throughout your program wherever it's needed, rather than calling rand() whenever you happen to want a random number.

  • The lack of mutable data structures has me writing code that's less lazy than the equivalent idiomatic Python. (This is a bad thing.) For example,

    def random_trial(n, target):
        """ Generate up to n birthdays.  If at least 'target' people in the
        group have the same birthday, return True.  Otherwise return False.
        """
        birthday_histogram = [0] * 366
        for i in xrange(n):
            bd = random_birthday()
            birthday_histogram[bd] += 1
            if birthday_histogram[bd] >= target:
                return True  # we're all done, bail out right now
        # If we get here, we managed to choose birthdays for all n people
        # without any duplicates.
        return False

    Here's the Haskell:

    -- Generate up to n birthdays.  If at least 'target' people in the
    -- group have the same birthday, return True.  Otherwise return False.
    -- ('rs' is an infinite list of pseudorandom numbers, which the caller
    -- must provide.)
    random_trial n target rs = let
        bs = take n (map random_birthday rs)  -- birthdays
        fs = map length (group (sort bs))  -- frequencies
        in any (>= target) fs

    In Python, during the loop, if I reach the victory condition, I detect it immediately and return True. The Python code is O(n). In Haskell, I can't generate a histogram on the fly. I have to do it in a separate step. And the simplest way to obtain the histogram (map length . group . sort) requires a sort, which is at least O(n log n).

    Arguably the problem is the algorithm. Hmmm.

3 comments:

Vineet said...

I'm also just starting out with Haskell, so this may not be the best way to do it, but you can indeed build up a histogram as you go. Here I accumulate a map of frequencies of each item in the list, and immediately return True as soon as a count reaches the target value:

import Data.Map

random_trial target bs = random_trial' bs empty
__where
__random_trial' [] _ = False
__random_trial' (b:bs) m =
____let (c,m') = count b m in
______if c >= target then True else random_trial' bs m'
__count b m =
____let newm = insertWith (+) b 1 m in
______(findWithDefault 0 b newm, newm)

(I replaced leading spaces with '_', since the comment system wouldn't let me use "<pre>")

You can test its laziness by running it with a small target value against an infinite stream, e.g. random_trial 2 (cycle [1..5])

jto said...

Thanks, vineet, but I sure hope there's a better way.

Vineet said...

Turns out there is a better way. Same basic idea, but using Data.List.mapAccumL instead of doing the accumulator thing manually:

import Data.Map
import Data.List

histogram = mapAccumL acc empty
__where
____acc m k = let m' = insertWith (+) k 1 m in
______(m', findWithDefault 0 k m')

random_trial target = any (>= target) . snd . histogram