11 July 2018

Axioms that create

Most of the axioms of ZF set theory are pretty easy to understand. They tell which sets exist, creating a rich universe of objects for set theory to talk about.

  • Null. There is a set with no elements.

  • Pair. Given two sets a and b, there is a set that contains exactly those two sets.

  • Union. Given a set of sets S, there is a set that’s the union of the sets in S. In other words, a set that contains exactly the elements of the elements of S.

Even if you stopped here, you’ve already agreed to the existence of infinitely many sets, and sets of every finite size!

  • Infinity. There is a set that corresponds to the natural numbers. It contains the empty set, and for each set x in it, it also contains a set with one more element, consisting of all x’s elements, plus x itself.

  • Subsets. Given a set S, there is a set of all subsets of S.

  • Comprehension. Given a set S and a function F, there is a set that contains exactly F(x) for all x in S.

  • Separation. Given a set S and a predicate P, there is a set that contains exactly those elements of S that satisfy the predicate.

In the formal language of mathematics, these axioms are mostly phrased quite similarly: “[For all <given ingredients>,] there exists a set A such that, for all x, x is in A if and only if <properties>.” The basic linguistic technology of populating a universe.

The other two axioms of ZF that are not like these. I’m saving those for later.

Exercises

Over the past couple of years, my notebooks are increasingly full of questions. I wonder more, and read even slower.

Stars mark questions that are at least a bit beyond my grasp so far.

  1. There is a mathematical notation for the sets postulated by each of these axioms. For example, Null points to the empty set, which is written ∅. How do you write the others?

  2. Does your favorite programming language have anything like sets? Sure it does—arrays or lists are close enough. How many of the axioms does your language satisfy? How can you write those sets? For example, the null set could be written [] in JavaScript. Most languages I know provide at least Null, Pair, Union, Comprehension and Separation, though not necessarily under those names!

  3. Famously, if there were a “set of all sets”, then you could use Separation to obtain a contradiction. How? (Answer.)

  4. If you apply Union to the null set, you get the null set. (There are no elements of elements of ∅.) Is there any other kind of set that is the union of its own elements?

  5. Which axioms can take a set and prove the existence of a larger set? Do these axioms imply the existence of any sets larger than the one postulated by Infinity? If so, how?

  6. ✻ How is it that these axioms, applied finitely many times, can construct objects such as non-computable numbers?

  7. ✻ What does “exist” mean here?

  8. ✻ The standard version of Infinity is rather particular. Is it possible to write “there is a set that is infinite” in the language of set theory? Is that not sufficient?

  9. ✻ The last two axioms, Comprehension and Separation, require funny ingredients: a “function” and a “predicate”, respectively. This produces a strong sympathetic vibration deep in the computer science part of my brain. It seems like these axioms are ways to turn mathematical expressions into sets. In programming terms, they turn code into data. Can this be made precise? Do the axioms in effect say “There is an oracle that serves as an ‘interpreter’ for mathematical expressions”? Is this like the relationship between lambdas and the functions they compute?

  10. ✻ Why can’t Comprehension and Separation be written in terms of sets? All functions you could productively use with Comprehension are sets, right? (Functions are relations, and relations are sets of ordered pairs, which can be expressed using sets.)

  11. ✻ Why isn’t Separation redundant? Doesn’t Subsets postulate the existence of all subsets of a set? And Separation the existence of a particular subset?

  12. ✻ It’s pretty common in math to say, “Let X be the smallest set satisfying the following conditions:” and provide a list of rules like the axioms above. Why wouldn't this work for defining the universe of sets? That is, “Let V be the smallest class of sets satisfying the following axioms: ...”. Is this kind of approach inexpressible in the language of set theory, or just logically unsound?

10 July 2018

Filling in the picture

When you learned math as a kid, you started from the very concrete: learning to count physical objects.

The foundations of math progress the opposite way, from the vague to the specific. The first thing you need to know is the use of variables as placeholders. Partly this is because variables are powerfully expressive—eventually, you’ll need them to write really interesting statements. But the other reason is that you start out knowing nothing. The less you know, the more gaps there will be in the statements you make. Variables fill gaps.


Now you’re ready to study propositional logic. The axioms of this field are things like,

A ∧ B ⟹ A

“If A and B, then A.”

The variables are placeholders for statements. The purpose of an inane-sounding axiom like this one is to tell precisely what “and” means in logic.


Next is the predicate calculus. Logic had variables that stood for statements; predicate calculus offers several kinds of statements you can write and plug into those logic variables. The variables of predicate calculus stand for two kinds of thing:

  • objects, the actual things we will eventually get around to talking about, like numbers, shapes, points, functions, and so on; and

  • predicates, which are statements about objects: “x is even”, “s is convex”, “p has integer coordinates”, and so on.

But predicate calculus does not by itself say what the objects are. You can take predicate calculus and apply it in a universe where there the objects are points, or where the objects are functions, and it will work equally well in either setting.

An axiom of the predicate calculus:

(∀x . P(x)) ⟹ P(t)

“If for all x, P(x), then P(t).”

This axiom is telling you what “for all” means. x stands for an object; P is a predicate. Note that we are still using the language of logic, but as we proceed, the variables are getting rather more specific, and we’re adding new vocabulary.


Next is set theory. This is a particular kind of universe for you to use predicate calculus with. Whatever we’re talking about in math, we will want to talk about sets of those things: sets of points, sets of numbers. It turns out you’ll also very soon want to talk about sets of sets of things, so set theory lets you build sets of sets, sets of sets of sets, and so on without end. The variables in set theory stand for sets.

An axiom of set theory is:

S . ∃T . ∀x . xTxS ∧ P(x)

“Given a set S and a predicate P, there exists a set that contains exactly the elements of S that satisfy P.”

The surprising thing is that now you’re done.

The universe of sets contains sets that have exactly the same structure as: the integers; the Cartesian plane; the functions on real numbers; and so on. It contains basically everything mathematicians study*, so there's no need to go on adding axioms**. The picture is*** filled in.

*not totally true

**except when there is, or when mathematicians just feel like it

***never

28 June 2018

LR parsers, what are they really

I recently went back to school on parsers. I always had an involuntary mental resistance to the whole field, because the theory, while well-established, is inelegant. Oh, it starts out pretty enough, with the regular languages, but from there it gets gradually worse and worse until you’re reading things like “and if there are any slots with multiple entries in table X, then the grammar isn’t LR(1).” The algorithm as definition.

Plus, I was totally unable to map the workings of LR parsers to anything else in my experience. So, apropos of that, here’s a journal entry from my wanderings in the wilderness:

The stack in an LR parser is a lot like the C stack.

When you reach the end of a production, you return, popping the stack—much like how returning from a C function pops its stack frame. Of course, in C, the return address is on the stack; whereas in an LR parser, you instead use the corresponding stack value to look up the next state in a goto table... rather like returning to a switch statement.

Like the C stack, the LR stack is heterogenous. A typical stack data structure contains elements that are all the same type. While you can certainly write an LR parser where the stack is just a stack of grammar symbols, you can also assign a different type to every terminal and nonterminal in your grammar, even types of different sizes, and the whole scheme still works. In fact, real parsers do use the LR stack to store partially built parse trees, although I don’t know that they support nonterminals of many different types. They probably should.

I think of each state in an LR parser as a quantum superposition of possible parses of the input so far. The next token might rule out some of these possibilities, fully or partially collapsing the waveform. What makes it LR is that at any given point, all the possible parses happen to assign exactly the same types to everything on the stack, by construction. I imagine a plain old C program (maybe a recursive descent parser) throwing up its hands and saying, “well, it's not clear which branch to take, we’d better just take them all” and splitting off half a dozen universes. The situation remains miraculously executable on normal hardware, because of this curious stack discipline: all the possible universes end up doing exactly the same thing with the same data at each step.

LR parsers, and automata generally, feel like debuggers to me. It’s hard to put my finger on it: something about the notion of “stepping”, and considering all the program code and state as mere data. The table generation algorithm is abstract interpretation of… something; but again, I’m not sure what. An Earley parser, maybe? The grammar itself, somehow considered as a program?