21 February 2006

To do

  • Explore possible APIs for discrete finite graphs, along the lines of the stuff for arrays and hash tables that has become canon in PL design these days. (I guess awk started this trend?) Graphs are everywhere: directory trees with symlinks; PL syntax trees; objects in memory and the pointers that tie them together; network architecture—any relational database with foreign keys is a graph.

  • Come to think of it, what is it exactly about arrays and hash tables that makes them so insanely useful? Arrays: ordered, integer-indexed, random-access in constant time, non-sparse, mutable, growable in constant time on average, efficiently sortable, non-unique collection of values. Hash tables: non-ordered, arbitrarily-indexed, random-access in constant time on average, mutable, growable in constant time on average, unique-per-key collection of key-value pairs. There are nigh-infinite combinations of collection properties like these; why these two combinations in particular?

  • That reminds me: submit a patch for Python that implements a priority_queue type to add to the collections module. It would basically deprecate the heapq module. heapq is just a collection of functions; this would be a class. Unlike heapq, it would accept key=, cmp=, and reversed= keyword arguments that would govern the sort order. This is a really good idea, worth doing. It's the only idea I know of that clearly belongs in collections right away.

    One problem with this is that I've never implemented a Fibonacci heap before. I only know about binary heaps, like the one in heapq, which are slower. For example, merging binary two heaps is O(n); with Fibonacci heaps it's O(1).

No comments: