31 December 2008

Junk and your unconscious mind

It's wonderfully easy to contribute on the Web, and as an unfortunate side effect of this essential fact, hoaxes, cranks, and general nonsense abound. I'll euphemistically call this stuff “junk”. Because so much of what you see online is junk, smart people such as yourself develop a finely-tuned junk detector. This is fine—in any case it's important to have one if you plan to use the Web for anything serious.

But your junk detector is probabilistic, factoring in grammar, habits of speech and writing, vocabulary, the writer's opinions and personality, whether the page has pictures of kittens—anything but the actual argument itself, because the whole point of the junk detector is to avoid wasting the time of reading it. In other words it's like the worst possible use of ad hominem, a logical fallacy. You guess as much as you can about the author, then judge the value of the page based on that. I see no good way around this. Consequences:

Your main way of evaluating the quality of Web pages is subconscious.

The junk detector is not as accurate as actual critical thought.

False positives mean the reader misses out and the writer fails to connect (making good writing skills more important now than ever before).

False negatives mean you may be duped: the junk detector doesn't protect you from lies, logical fallacies, or really sophisticated forms of “junk”. By the time you decide to read the whole page, the junk detector is done working. Another, smarter junk detector had better kick in!

All of this applies in the non-Web world, too, but the Web is so full of junk, and it's so hard to avoid altogether, that the cheapest possible junk detector is highly rewarding and can instill a false sense of confidence.

Structure

On the Web, alternative reading to whatever you're looking at is never far away. There are even links in most Web pages, forever calling you to random-walk. The result, for the reader, can be a haphazard adventure of reading, interesting at every point but without overall purpose.

The result for writers is that time spent organizing thoughts is usually wasted—nobody wants to read all that. Instead, you write one thought per day in a blog, or contribute to sites like Wikipedia, which generally rejoice in the Web's random-walk nature.

Sometimes it is an unexpected pleasure to open a book and follow the development of a big idea over many chapters. I got this kind of feeling from the mathematics textbooks I mentioned a few days ago.

It's weird for me to even be saying this, because I like that the Web is deeply interconnected and wild. But the Web doesn't seem to generate good content with large-scale structure—the kind of stuff that I find most rewarding to read.

17 December 2008

The School Mathematics Project

JJ had me look at a set of old mathematics textbooks, and I found this.

4.1 Division and repeated subtraction

We can write 7 + 7 + 7 + 7 + 7 + 7 + 7 + 7 + 7 = 7 × 9 = 63.

(a) What is 63 - 7 - 7 - 7 - 7 - 7 - 7 - 7 - 7 - 7?

(b) What is 63 ÷ 7?

(c) Explain the connection between the last two questions.

(d) If you were to work out 65 - 7 - 7 - 7 - 7 - 7 - 7 - 7 - 7 - 7, what would you find? How would you give your answer?

4.2 Division of a whole number by a whole number

Example 11 (Method I)

If you were asked to work out 5489 ÷ 12 by finding out how many times you could subtract 12 from 5489, you wouldn't be very pleased!

5489
-12
5477
-12
5465
-12
5453
-12
5441
-12
5429
-12
5417

This is just the start. It would certainly take a long time. However, as you will have realized, there are quicker ways of doing this division.

(Method II)

12 )5489 Consider 5400. There are more than 400 (but less than 500) twelves in 5400. Let us subtract 400 of them all at once.
4800 (400 twelves)
689 Now consider 680. There are more than 50 (but less than 60) twelves in 680. Subtract 50 of these all at once.
600 (50 twelves)
89 Finally, we know that there are 7 twelves in 89 which if we subtract them leave us with a remainder of 5.
84 (7 twelves)
5

So we have subtracted (400 + 50 + 7) twelves and have 5 left over.

5489 ÷ 12 = 457,   remainder 5.

If we were dividing in order to find the answer to a ‘fair shares’ question, we would write

5489 ÷ 12 = 457 5/12

You will probably have recognized this method. Why?

I'll stop there. What struck me as cool about this is that it takes long division, a complex procedure which most students learn by rote, and at once (a) explains why it works (b) makes it seem simple and obvious.

The example is from SMP Book C, published 1969 by Cambridge University Press. JJ has the whole series. They seem quite good, relative to what I recall from grade school. The approach is conversational with a lot of questions. Very few paragraphs are more than a few lines long. There are exercises but no “word problems”. The books are printed in black and red ink. There are no photographs or sidebars. The subject matter is richly mathematical: very little arithmetic, which must have been a separate curriculum; but in the first few books (hard to tell but they appear to be directed at students 12-15 years old) there are chapters about things like relations, directed graphs, symmetry, counting possibilities, why a slide rule works.

The SMP stands for School Mathematics Project, a British nonprofit. They're still making mathematics textbooks.

09 December 2008

The very best of jorendorff?

I like Language Log, but I would like it even better if there were less of it.

Wouldn't it be keen if there were a site where you could enter the URL of any blog, and it would give you back a feed containing only half the entries—the best ones, according to whatever metric of popularity the service could find (links, diggs, whatever).

I proposed this on IRC, where mhoye and humph reacted with a definite meh. (Note: All these chat excerpts are edited to give the illusion that there's a single coherent conversation going on.)

<humph> what's popular and what's interesting to me are not often the same for me
<mhoye> humph++
<mhoye> That sounds like a good way to be drowning in mediocrity, for sure.
<mhoye> jorendorff: Apply your theory to popular music.
...
<ted> so your theory is that if you like a blog enough to subscribe, you would like it even more if you only got the absolute best posts?
<jorendorff> ted: my theory is that "absolute best posts" means something
<mhoye> God, no.
<mhoye> See also, "absolute best music", "absolute best paint color."

I failed several times at explaining why I think this. Let me try again here.

Simple ratings systems are common on the Web. Some, like the Slashdot comment ratings (“Score: 5, Insightful” and such) perform very well. Others, like online restaurant guides, are useless. Ratings work when users agree on what's good and what's bad. On Slashdot, the worst posts are pretty content-free. Subjective tastes don't even really enter into it. Restaurants are a different story. In the case of music, mhoye's example, I'm sure any two people can find plenty to disagree on. But:

<jorendorff> mhoye: do you have a favorite band?
<mhoye> Not just one!
<jorendorff> mhoye: I'm struggling to get you guys to engage on any specific example :(
<mhoye> Jorendorff: Ok, here. "Entertainment", by "Gang Of Four".
<jorendorff> mhoye: excellent - what are your favorite songs off that album?
* mhoye picks "I Found That Essence Rare" and "Anthrax"

Both of mhoye's picks are among what the Apple Store calls the “TOP SONGS” from that album. Both are mentioned in Apple's review. Maybe mhoye picked them because they're the best tracks on the album.

Counterexamples abound too. We could settle this scientifically by sampling a blog's audience, having those people rate posts for a while, and seeing how closely their ratings correlate.

Instead, let's play a silly game. See if you can stand to read these two entries from my old writing journal: Zen in space and the swoon. I believe one of those is about as good as I can write and the other is flat-out bad. I furthermore immodestly claim that those are two different things! And I think you might agree with me on which is which. We'll see (if you're willing) in the comments.

21 November 2008

Arms

I am flushing the buffer of old posts. Here is one I delayed posting because it's just too boring. Well, I'm posting it anyway. Sorry.

xkcd has a provocative comic about cryptography.

I imagine many geeks are moderately in favor of gun control but staunchly opposed to cryptography control. The two issues are very similar.

Having a gun lets you do two basic things: intimidate unarmed people; and resist armed or otherwise violent people. (You can of course just shoot people, but the power of a gun starts working at some remove from that eventuality. Most cops never shoot anyone.) Neither ability is necessary unless something has gone wrong; and both abilities can themselves go wrong in spectacular ways.

I believe in a fundamental human right to self defense, and for both moral and pragmatic reasons I tend to prefer individual rights to the common good where they conflict. So ab initio I pretty much have to oppose gun control unless there is a strong reason to think it's pragmatically the only way to go.

But I also believe gun control is an all-or-nothing proposition in practice: imposing a five-day waiting period before someone can buy a handgun, for example, makes no sense at all to me. Measures that are obviously easily circumvented, like the current U.S. background check, also make no sense. These measures seem squarely targeted at established, law-abiding gun sellers and their law-abiding customers. Effectively preventing criminals from having guns would require serious bookkeeping requirements and a tremendous enforcement effort. Judging by the results in the places where that has been tried (Illinois, the UK) it just doesn't seem worth it.

18 November 2008

Recently I learned...

  • Some people are, at this moment, running around the world. As of this writing they're about 16% done.

  • Seattle is farther north than Montréal. It is in fact farther north than all but the northernmost tip of Maine.

  • Someone with a browser history like mine is more likely female than male.

    (My history includes a lot of pages on bugzilla.mozilla.org and developer.mozilla.org, which should peg me as a male nerd with high confidence. But that page only checks whether you've visited the front page of a few very popular sites.)

  • In 1905, the President of the United States threatened to abolish football unless something was done to reduce the number of fatalities. Colleges established a rules committee and made radical changes to the game. Hurdling (jumping feet-first over other players) was banned. Roughing penalties were introduced. Six men were required on the offensive line. The forward pass was added.

    [T]he new rules committee further opened up the game by requiring the offense to advance ten yards in three tries for a first down instead of five yards, which previously had been a great inducement for bruising, battering line play from which no form of mayhem was barred.

    Principles of Coaching Football by Mike Bobo and Spike Dykes, citing 100 Plus Years of Football by Jerry Brondfield, 1975.

    This, in addition to earlier reforms banning things like piling on the ball carrier and the flying wedge (in which offensive players would build up momentum before the play started by charging en masse toward the line), made football the genteel pastime it is today.

I'm starting to learn random stuff by reading source code. Technically I've been doing this for ten years or more, but I recently have made a conscious effort to be more aggressive.

  • lighttpd has a very simple scheme for exploiting multiple CPUs. After binding the server socket to an address, it simply forks a few times. All the worker processes do the same thing: listen on the socket and serve HTTP requests. There's no load balancing and no communication between the parent process and the worker processes.

  • On x86, at least in glibc's implementation, setjmp saves five 32-bit words of state: three callee-save registers and the caller's stack pointer and instruction pointer.

I also learned a lot last week about David Humphrey's awesome Mozilla project course at Seneca College. It builds on a foundation of C, C++, and UNIX knowledge that Seneca's CS program lays down in required freshman courses. It has a lecture component and a lab. Any time you spend on your project is on top of that. It has a reputation as a killer course.

23 October 2008

Commonwealth of Kentucky v. 141 Internet Domain Names

Bill Poser is annoyed that the state of Kentucky has decided to seize a bunch of domain names on the unlikely theory that they are “gambling devices”.

Among the silly things going on here is the name of the case, which Bill explains in the comments:

Yes, the nominal defendants are the domain names. This is an example of a lawsuit in rem "against a thing". It is the typical form of action in seizure cases. This results in wonderful case names like "United States v. 11 1/4 Dozen Packages of Articles Labeled in Part Mrs. Moffat’s Shoo-Fly Powders for Drunkenness, 40 F. Supp. 208 (W D.N.Y. 1941)" , "United States v. Approximately 64,695 Pounds of Shark Fins, No. 05-56274 (9th Cir. Mar. 17, 2008)", "United States v. Forty Barrels and Twenty Kegs of Coca-Cola 241 U.S. 265 (1916)", and the inimitable "United States v. One Package of Japanese Pessaries 86 F.2d 737 (2nd Cir. 1936)". (The pessaries in question were what we would now call diaphragms. This is the case in which the Court of Appeals for the Second Circuit over-ruled the government's invocation of the Comstack Act and allowed Margaret Sanger to import Japanese contraceptives.)

30 September 2008

Who buys this stuff?

My search continues for something substantial to read from an economist in favor of the bailout. On TV, they all appear to favor it (using vague language and lots of clichés), but on the Internet, they all seem to oppose it (with compelling economic arguments).

I thought I may have found it when I ran across a dire quote from Nouriel Roubini in a newspaper, warning of economic woes to come. Then I went to his web site. It turns out Roubini recently wrote an article entitled “Is Purchasing $700 billion of Toxic Assets the Best Way to Recapitalize the Financial System? No! It is Rather a Disgrace and Rip-Off Benefitting only the Shareholders and Unsecured Creditors of Banks”. Heh! It turns out he's just generally gloomy and has been for years; it didn't start after yesterday's vote.

Reporters have done a bad job with this one. There's the usual ignorance of economics, but it's more than that. They seem to be caught up in the crisis atmosphere. They're not objective. Worse, they never seem to distinguish between Wall Street investors and economists, never mind hysteria and reason.

My Representative

I wrote to my Representative, Jim Cooper, and three days ago, he wrote back:

...I hate the thought of paying ransom to Wall Street, especially when Main Street is struggling. I am furious that our financial situation has been allowed to get this point, and that Treasury is considering bailing out the lenders who helped caused this to occur.

Then he voted for the bailout. According to this morning's USA Today, he said, “It's mainly political fear, the reaction back home. It's the most difficult time for people to be statesmen, 37 days before an election.”

12 August 2008

Squares

Yesterday, apropos of nothing, J. announced that 9 is not the only square number. 4 is, too. Even 1, he added. It turns out he didn't hear the phrase “square number” anywhere. He's just been playing with blocks.

Today I got out some extra blocks and showed him that 16 is a square number, too. He wondered, apparently at random, if 100 was a square number. So we counted out one hundred blocks and as it happens, it is.

He's been watching some math videos. We borrow one from the library each week. Last week's video, on division, explained that any number divided by 1 is itself. I doubt J. has any real conception of what division is and where it applies, but he liked that rule.

Properties of the integers, man. Before you know it he'll be telling me that 7 and 13 can't make any rectangles, except for long skinny ones...

24 July 2008

Last week I learned...

  • When volcanic eruptions created the island of Ferdinandea in 1831, it was quickly claimed by Italy, France, the UK, and Spain. While they were arguing, the little island eroded away.

  • How to put this? Language isn't what I thought it was. (This definitely falls into the category of thought-provoking stuff I won't pretend to understand.)

    A little background. Before your third birthday, you subconsciously achieved a thorough familiarity with the grammar of the language spoken in your home. Never mind that you said I've sawn instead of I've seen. That's small stuff. You knew when to use the and when to use a. You knew which of they run and they runs was right, that big green circle sounds better than green big circle, how to figure out what it means in context, and much more. You will never have conscious understanding of all the syntactic rules you already had subconsciously at three. No one does. Not even people who make a career out of studying exactly that.

    Linguists aren't dumb. Why is it that toddlers are able to do this amazing thing that all the linguists in the world, given several decades to work, can't do?

    Beats me, but there's something else kids do that's even more amazing. They invent grammar.

    I'm tempted to block-quote about a page out of this book I'm reading (Foundations of Language by Ray Jackendoff). It's fascinating stuff. “Derek Bickerton documents in detail that children of a pidgin-speaking community do not grow up speaking the pidgin, but rather use the pidgin as raw material for a grammatically much richer system called a ‘creole’.” If adults could do that, there wouldn't be a pidgin phase. The kids do it. Where does that come from?

    Communities can exist for millenia without developing writing. They don't go without grammatically complex spoken language. Hmmm.

    Even better, there's a school for the deaf in Nicaragua where the kids, unprompted, made up their own sign language. “Besides offering the wonder of a whole language coming out of nowhere, Nicaraguan Sign Language sheds some light on questions about creole. Evidently a community is necessary for language creation, but a common stock of pre-existing raw material is not.” I always assumed the syntax of a language like English comes together incrementally, over thousands of years. Shows what I know. It was probably invented in a single generation.

  • Parahã, a language spoken by a few hundred people in Brazil, contains, according to Wikipedia, “two very rare sounds, [ɺ͡ɺ̼] and [t͡ʙ̥]”. In case you don't have the fonts I do, that first one looks like two upside-down lowercase rs with a squiggle underneath like a bird in flight, and a arc over the top; and the second one looks like tB with a dot under the B and an arc over the top. I wonder how they're pronounced.

  • In the version of g++ that ships on the Mac these days (GCC 4.0.1), you can get the old-school SGI STL hash_map container by doing #include <ext/hash_map> and using __gcc_cxx::hash_map;. But the GCC guys have already replaced the hash_ containers with newer standards-track containers, unordered_map and friends, which you can get in a more recent libstdc++.

  • In C++, a class's private members are not entirely hidden from code that uses the class. It's possible for public names to collide with private names. For example:

        class A { private: void f(); };  // This method is private, but its name matters...
        class B { public:  void f(); };  // ...because it'll conflict with this one.
        class C : public A, public B {};
    
        C().f();  // Error: request for member ‘f’ is ambiguous.

    Leaky abstractions make me sad. This doesn't seem to come up often in practice, but I think it's one reason STL implementations tend to contain lots of extra underscores. Another reason for that, as Blake Kaplan pointed out to me, is that a standard C++ program can do:

    #define n 3
    #include <vector>

    and the headers should be able to cope with that.

23 July 2008

Stuff I learned recently

  • Twelve thousand years ago, a gigantic dam of solid ice blocked the Clark Fork River, creating Glacial Lake Missoula.

    The lake was almost 2,000 feet deep.

    And periodically the dam would explode, laying waste to parts of what's now Montana, Idaho, Washington, and Oregon.

    Thundering waves and chunks of ice tore away soils and mountainsides, deposited giant ripple marks, created the scablands of eastern Washington and carved the Columbia River Gorge.

  • Mendellsohn's Wedding March has about 50 times more notes in it than I had realized.

  • “Between 1958 and 1992, Russia dumped 18 nuclear reactors into the Arctic Ocean, several of them still fully loaded with nuclear fuel,” writes Scott G. Borgerson. The article also points out that last summer, “[f]or the first time, the Northwest Passage—a fabled sea route to Asia that European explorers sought in vain for centuries—opened for shipping.”

  • Calque is a loanword and loanword is a calque. (Source.)

  • Recent Linux and Windows operating systems implement address space layout randomization. The goal is to prevent certain security attacks that depend on specific code being in predictable memory addresses.

  • According to a 2005 research paper by Richard Haier et al, women's brains have about 10 times the amount of white matter related to general intelligence (that is, in areas whose size correlates with IQ) as men's. Contrariwise men have have about 6.5 times the amount of IQ-correlated gray matter. I find that pretty startling.

    Here are some of Haier's own words on brains and genes.

02 July 2008

What is a noun?

But what about earthquakes and concerts and wars, values and weights and costs, famines and droughts, redness and fairness, days and millennia, functions and purposes, craftsmanship, perfection, enjoyment, and finesse?

—Ray Jackendoff, Foundations of Language: brain, meaning, grammar, evolution

I learned in school that a noun is a word that names a person, place or thing.

A few years after that, the definition changed. In hindsight this seems creepy. It happened twice. I don't remember any explicit discussion or even acknowledgment of the change. We would do nouns one way one year, and when that time came around the next year, we would have different textbooks with a different definition. I remember the changes: first “event” and later “idea” were added to the list ...bringing nouns like earthquakes and purposes in from the cold, I guess. We regret the omission, etc.

I didn't know this until a couple days ago, but linguists apparently consider this whole approach to parts of speech hopelessly, fundamentally broken. Morally bankrupt, in fact. That a child is taught the ”person, place or thing“ definition approximately once every 12 seconds preys on the linguist's soul. It causes him to make awkward scenes at parties. Even the funny papers are bristling with painful reminders of this horrible truth.

I never noticed before, but there is a problem or two with this whole “person, place or thing” thing. All the most common words for people (you, I, he, she, they) and things (this, that, these, those, it) are pronouns, while all the most common words for places (here, there, in, out, up, down, to, from, and on and on) are adverbs and prepositions. All the other definitions I learned for parts of speech are bogus, too. I learned that “action words” are verbs; but homocide, defenestration, and touchdown are all nouns. (So is pirouette. My wife didn't believe me.) I learned that prepositions tell about relationships, particularly spacial relationships; but proximity and distance are nouns (and cover and surround are verbs!). I learned that words that describe properties of things are adjectives; but weight, beauty, shape, and color are nouns.

So what is the definition of a noun, exactly? Well, I'll tell you. I don't know. Strangely, I don't think linguists like to say! Here's a pretty good near miss by Geoffrey K. Pullum, writing in Language Log:

The way to tell whether a word is a noun in English is to ask questions like: Does it have a plural form (the terrors of childhood)? Does it have a genitive form (terror's effects)? Does it occur with the articles the and a (the terror)? Can you use it as the main or only word in the subject of a clause (Terror rooted me to the spot), or the object of a preposition (war on terror)? And so on. These are grammatical questions. Syntactic and morphological questions. Not semantic ones.

A bit vague, isn't it? That's way above average, though. Here's an honest attempt; it starts with “A noun is a member of a syntactic class…”. Until I edited it, Wikipedia's article on nouns started, “In linguistics, a noun or noun substantive is a lexical category which is defined in terms of how its members combine with other kinds of expressions.”

There's an interesting twist to how all this gets bootstrapped in the toddler brain. All the first words you learn are nouns, words for people and things in your little one-year-old world. You'll be able to put words together into sentences before you master any pronouns. That is, at the time when you're learning the basic grammar of the language, there is a semantic distinction between the nouns you know and all other words. The values and weights and costs come later.

14 June 2008

Barleycorn Bay

See earlier puzzles for, er, something of an explanation.

“Now listen close-like,” said my new friend, “'cos I'm only going to say this once. All the inhabitants of Barleycorn Bay, and I've met them each and every one, are either heroes or vagabonds. Or both. Every one of the heroes is blonde; every one of the vagabonds is a magician, except for any that be Quakers; and all the magicians are nanny goats. Every one that isn't a walrus isn't a ruminant.”

“Isn't a what?” I said.

“And every living soul in Barleycorn Bay that isn't clean-shaven is red-headed, excepting the nanny goats of course. Needless to say,” he added, scratching his beard with a steel hook, “there are no clean-shaven pirates.”

I thought it over for a while. “Is there such a thing,” I wondered aloud, “as a nanny goat that's also a walrus?”

“I reckon there could be,” he replied, “although I've never met one.”

“What about a blonde redhead?”

“Don't be ridiculous.”

01 June 2008

Facts

David Macaulay wrote and illustrated a book entitled Castle. If you have a child four or older, you have a perfect excuse to buy it. Mill is fascinating, too.

That “if Microsoft designed the iPod package” video was commissioned by Microsoft. To my mind, that makes it even cleverer.

In an underground passageway somewhere near Shinjuku, there is a machine that cleans your glasses.

A hot-air balloon that can carry 8 people costs “all your money”. Or if you prefer, about a hundred thousand dollars.

You can write a regular expression that matches only strings of a composite number of x's. (Hint: Too easy for a hint.)

I knew that Galileo discovered the first four of Jupiter's moons, the Galilean moons. I didn't know that the fifth known moon wasn't discovered until almost 300 years later, when Tennesseean E. E. Barnard discovered Amalthea.

And it took me completely by surprise to learn that Richard Feynman ever worked at a computer hardware startup, Thinking Machines Corporation.

28 May 2008

Crowd vs. Committee

I just found this in an old notebook. Apparently I wrote it a couple years ago. Most of it seems to make more sense to me now.

Wisdom of Crowds Design by Committee
Both: Participants may be biased.
Bias averages out Bias creates “riders”
Not much work Lots of work
No consensus required Seeks consensus. Decisions may be postponed to avoid stirring up trouble.
Minority (“special”) interests can be publicised but are often ignored Minority interests are not ignored
No experts—skepticism (Presumption is that a random individual is not an expert.) All experts—openness
Lossy, mass communication (of arguments, etc.) Tedious explicit communication
Both: No overarching design or uniting vision.
Nobody cares Possibly competing visions
Simple output. Unbounded complexity in output.
Immediate feedback. Long-term, invisible feedback.
Individuals have low individual impact. Individuals are influential.
Neglecting the topic somehow doesn't matter. Neglect causes warts (that is, areas where the design is painfully bad —ed.)
Product needn't be understood (markets) Product is ideas.
Mechanism for approaching a good result exists (market; averaging) Democracy (voting) and consensus are the only such mechanisms.
Interfaces are well-defined before work starts (ballot; prices) Interfaces have to be designed.
Individuals can't introduce bureaucracy Individuals sometimes manage to introduce bureaucracy

23 May 2008

Curlicues

J was writing his sister A's name for her on a piece of construction paper. J is 4 years old and A is 2, and somebody recently taught J that he can decorate his letters with outrageous curlicues. So J says, “Do you want me to put curlicues on it?” And A replies, in her tiny stern voice, “There are no Qs in my name!”

10 May 2008

Firefox 3

Firefox 3 is nearing release. Check out what's new, especially the Awesomebar, which has changed my life.

(Awesomebar itself is the work of superhacker Ed Lee, but it relies on Places, the new bookmarks and history system, 2+ years in the making.)

If you're interested in security, especially the difficulty of giving users correct, actionable security-related info at a glance, read about Firefox's new site identification and malicious site detection features.

06 April 2008

This week I learned...

  • In the children's section of the Nashville public library, the mother lode of folk tales is in the nonfiction section. Aha!

  • There's a whole family of egg-eating snakes that swallow eggs bigger than their heads, squeeze out the insides, and spit out the shell.

  • According to this blog post, native speakers of Chinese are gradually forgetting how to write.

  • According to Jared Diamond, out of 148 species of large, wild, terrestrial herbivorous mammals, only 14 have ever been successfully domesticated. (Guns, Germs, and Steel.)

Every time I go to the Nashville library, I leave feeling like I've just picked somebody's pocket. It's a wonderful library. (I don't know that it's particularly unusual in this regard.)

04 April 2008

This month I learned...

The past three or four weeks are a bit of a blur, but:

  • Just before he died, Beethoven claimed to be working on a Tenth Symphony. Fragments of this were discovered among Beethoven's sketchbooks in the 1980s (!), and musicologist/composer Barry Cooper stitched together a highly speculative, but performable, first movement.

  • I knew that John Harrison invented the first clock that could keep time on a ship and that such clocks cracked the longstanding problem of determining longitude at sea, leading to the first accurate maps. (H1 was his first attempt; his masterpiece, H4, was a 5-inch watch with a diamond-studded movement.) I didn't know that Harrison faced competition from an astronomical method relying on careful on-ship measurements of lunar occlusions of certain stars, huge tables of laboriously pre-calculated data, and maybe four hours of additional calculations to be done on the ship. It was a usability disaster, as one might expect. But at the time, the idea of making a clock run reliably on a pitching, rolling ship apparently seemed even crazier.

  • Bill McCloskey's memoize is a replacement for make in a few lines of Python. The complete source code fits on my screen. This is the coolest hack I've seen all year.

  • The word goodbye comes from the saying “God be with you”. According to the American Heritage dictionary's etymology note, “A letter of 1573 written by Gabriel Harvey contains the first recorded use of goodbye: ‘To requite your gallonde [gallon] of godbwyes, I regive you a pottle of howdyes,’ recalling another contraction that is still used.”

  • According to Wikipedia, the lungfish has the largest genome of any vertebrate. But as of today, Wikipedia does not say anything about the lungfish's lungs! (I usually try to contribute in cases like this, but here I haven't a clue.)

  • On the Mac, if ls -l output has an @ symbol here:

    -rw-r--r--@   1 jason  jason     54838 Sep 27  2007 #tamarin 9-25-07.colloquyTranscript

    then the file has extended attributes. These are used, for example, to mark files as “saved from the web”, triggering a warning if a user tries to open the file.

14 March 2008

A bad guy in a lose-lose situation

I guess there's no point denying that J, my four-year-old, has a bit of an aggressive streak. Paraphrased from memory:

J: I'm going to try and splash him into that swimming pool. He's a bad guy. (The “bad guy” is a toy car.)

If he misses, he's going to be shot out of a cannon that will shoot him so hard, he will crash into the sun, and then he will blow up and his car will blow up.

(J. rolls the car off the table at a mixing bowl; it falls in.)

Me: Hey, you got him into the swimming pool.

J: (casually) Yeah, there's sharks in there.

I'm just glad he has it in for the bad guys.

29 February 2008

This week I learned...

I spent most of this week sick in bed, but I did discover that:

  • According to the Jameel Poverty Action Lab at MIT, the cheapest way to improve attendance in Kenyan schools is mass deworming.

  • There's a guy removing Garfield from Garfield comic strips. The result: “an even better comic about schizophrenia, bipolar disorder, and the empty desperation of modern life”.

  • Guy Steele wrote The TELNET Song. Before webcomics, if hackers wanted to laugh without leaving the net, they had to make their own humor.

Also, I like this poem: “The Trash Can”.

23 February 2008

This week I learned...

  • A shibboleth is language that sends cultural signals beyond its plain meaning. The word comes from a fairly amazing Bible story.

    Now I want a word for language invented to annoy, like “Democrat Party”.

  • Aristotle believed slavery to be “expedient and right”. All the best arguments by learned apologists for slavery in the U.S. South were from his writings, particularly in the Politics.

  • Incidentally, Aristotle thought democracy was a crummy form of government. And, in his ideal society, homeschooling would be banned.

  • There are expressions of the Golden Rule in the scriptures of many religions.

  • I don't understand what this means, exactly, but the x86 instruction set contains an FNOP instruction. The manual describes it as a floating-point no-operation instruction. How this might be different from a regular NOP I don't know.

  • So there's a widely known interview question: write a function to count the number of bits of a 32-bit int that are set to 1. This function is called the population count—that link describes a surprising use for it.

    The last time I tested this, summing 4 queries into a 256-value lookup table is fastest for 32-bit integers, faster than the awesomely clever bit-twiddling solution. I shouldn't have been surprised. The lookup table fits easily into cache. The bit-twiddling solution has a lot of dependencies; the CPU can't find any instruction-level parallelism there.

    Anyway, what I learned a week or two ago is that future Intel chips will have an SSE instruction, POPCNT, that does this, in parallel, for several words at a time. (Someone I mentioned this to commented that he doesn't want to be fired for pronouncing that.)

  • Often when I write these blog entries, I'm still unsure of the significance of some of the things I've just learned. For example, I learned something about Java monitors (or pthreads condition variables, which are the same thing). When thread 1 notifies, waking thread 2 on a separate CPU, the lock associated with the monitor ensures that CPU 1's writes are flushed to main memory and CPU 2 sees them before thread 2 starts running. There's no need for write-barrier magic in Object.notify itself.

  • Apple's Shark profiler has a feature that lets you compare two profiles. But the result is calculated by comparing percentages, not comparing the absolute number of samples. So, for my purposes, useless.

  • The .mshark files produced by Shark are gzipped binary property lists, but the actual samples are stored in there as raw binary data which I haven't tried to decipher.

08 February 2008

This week I learned...

Special double issue! Two weeks' worth of trivia, including a week spent in Mountain View.

  • Most Japanese streets do not have names.

  • With GNU Radio, you can buy some cheap hardware, plug it in, and your computer becomes a GPS receiver, a garage door opener, an HDTV tuner, an AM/FM radio, a cell phone. This is subversive technology. Hollywood wants regulations that would ban such nonsense.

  • Radio waves bounce off meteor trails. This is actually usable as an alternative to satellite communications, depending on the application. For example, the USDA has a network of solar-powered snow depth sensors that use meteor trails to phone home.

  • On the Mac, Alt+[ types a left double-quote mark and Alt+{ types the right one. Much better than typing &ldquo;.

  • GCC generates a floating-point instruction for isnan(x); it amounts to x != x (NaNs are not equal to themselves). Intel engineers claim integer instructions can be much faster, on x86 at least, due to floating-point exception nastiness.

  • Xavier Leroy has written a provably correct “lightly-optimizing” back-end that compiles a subset of C to PowerPC machine code. Sandrine Blazy and Zaynah Dargaye have hooked it up to a provably correct front-end.

    A little background here. A compiler works by lowering code step by step from relatively high-level internal representations, like parse trees, to successively lower-level representations, until it gets down to machine code. At each level it can apply optimizations: some optimizations, like common subexpression elimination, work at a very high level, and some work at the machine-code level. Stack up enough lowering passes and optimizations, and you've got yourself a compiler. There's considerable interest these days in using formal methods to prove the correctness of program transformations. First you define mathematically what it means for two programs to be equivalent. Then you prove that a given transformation always produces a result that's equivalent to the original. Stack up enough provable lowering passes and optimizations, and you've got yourself a provably correct compiler.

    This kind of work has been done for subsets of Java, but C's pointers and undefined behavior present some nasty problems. Leroy's work is hedged in with disclaimers, but it's still pretty amazing stuff, and an interesting read. For example, he proposes the following sneaky technique. Write your compiler pass however you want, and don't bother proving it correct. Then just verify the correctness of the transformation afterwards (for the particular program being transformed). It's apparently much easier to prove a verifier correct than the transformation code itself. Plus, you can then tweak the transformation without redoing any proof work. The only risk is that your transformation is incorrect, in which case your compiler flunks out with an internal error at compile time.

  • The PDP-11 had probably the nicest of all the widely used CISC instruction sets.

  • emacsclient is my new EDITOR. It connects to my existing Emacs process, if any (I had to put (server-start) in my .emacs file) and loads the file there.

  • Speaking of emacs: M-/ is the autocomplete key. It's moderately smart. You also want to know C-x r SPACE x (save current cursor position in x) and C-x r j x (jump to the position saved in x).

  • When you run a configure script, it generates a config.status file in the build directory. That file is helpful when debugging stupid build problems.

  • In the late-1990s, there were two computer science research projects called Dynamo, both involving dynamic optimization: one at Indiana University and of course the awesome one, at HP Laboratories.

25 January 2008

This week I learned...

  • NestedVM can take any program that GCC can compile and run it in a Java VM. It does this by compiling the program to a MIPS executable and then translating the MIPS machine code to Java bytecode. Now, there isn't any high-level type information in a MIPS binary, so there isn't any in the bytecode. Instead each instruction is translated to something that bangs on some large int arrays that represent virtual memory. (The sbrk system call is implemented using new int[].)

    The paper has sentences like, “The NestedVM runtime fills the role typically assumed by an OS kernel.” :)

    I think the point of this, aside from being cool, is to make C++ code run anywhere Java does. I don't know how many platforms have JVMs but not gcc back-ends, though. (GCC actually has a back end for Java, but it can't handle C++.)

  • So if you know C, you know that && has short-circuit behavior: if the left-hand side is false, the right-hand side doesn't get evaluated. This week I learned that if the right-hand side of && is simple enough and has no side effects, as in x > 0 && x < N, a good compiler emits code that evaluates it anyway, essentially treating the && as &. A conditional branch is slower than a few redundant instructions.

  • Objective C exceptions on Mac are implemented using setjmp/longjmp. They don't cooperate with C++; if you throw an Objective C exception across a frame containing C++ objects, the destructors don't get called. This triggered some bugs in Mozilla, which apparently has Cocoa GUI code or something. (Sorry, I don't pay much atttention to that stuff. :))

  • If you compile with gcc -g3, then gdb can print expressions that use macros! I knew Jim Blandy had implemented this but I never actually went and dug up the magic to make it work. This will make my life a lot easier, at least for a year or two.

  • The gcc compiler itself uses a garbage collector. I'm told the GC is autogenerated from the source; so the gcc source distribution actually includes a bunch of autogenerated code.

  • gcc -S prints Intel assembly code with the operands reversed. I don't remember if I ever knew this or not. What a pain.

  • And some more about GCC internals, from here.

18 January 2008

This week I learned...

  • The Phaistos Disc is a mysterious clay disc, about 3400-3850 years old, discovered in the basement of a Minoan palace. It is imprinted with hieroglyphic symbols. It is the earliest known instance of movable-type printing, which would not be seen again until woodblock printing appeared in China some 1600+ years later.

  • Poseidon was believed to have created the horse. (I didn't even know that Pandora was a Greek legend. In my brain she was curiously detached from any specific culture.)

  • For some reason, Wikipedia's pages on Greek mythology are very often vandalized.

  • And I learned more about SpiderMonkey split objects than any human being should know. But I still don't understand them very well, as that page (which I wrote) indicates.

    I've been learning (and documenting) a lot about SpiderMonkey and its API, which may be why the pickings are so slim otherwise.

07 January 2008

This week I learned...

  • Before desktop computers were widely used in China, telegraph operators there had to memorize every Chinese character's GB 2312 character code.

  • Moleskin is made from cotton, not moles. (In other news, guacamole is made from avacados.)

  • Garbage collection in Erlang is per-process. This seems weird—are messages copied from process to process?—but as that article explains, there are advantages, too.

  • The Haskell standard library contains over 100 operators— that is, functions whose names consist of ASCII symbols, like .| and |. and @?= and @=?. Someone must stop these madmen.

  • I learned a few very basic odds and ends of category theory.

    The book I'm reading (by Benjamin Pierce) offers “injective functions are monic in Set; surjective functions are epic” as a mnemonic, you know, to help you remember monic and epic. This has to be the worst mnemonic of all time. I just don't see anything helpful about it. Sur- means “under”. Epi- mean “on top of”. I can never remember the difference between injective and surjective to begin with.

04 January 2008

This week I learned...

  • A chipotle is a jalapeño that has been smoked.

  • There are lots of ways to tile a regular dodecagon with sides of length s using only rhombi with sides of length s. My favorite so far:

    I speculate all such tilings use exactly this many rhombi of each shape—six skinny diamonds, six fat ones, and three squares. It would be really cool if I were wrong. Calculate the area of each shape to see why I think this.

    (Pictured: Melissa & Doug pattern blocks. Great toy.)

  • Basic stuff about the Erlang programming language. If you set aside the concurrency features for a second, Erlang looks like ML without static typing or refs. In a word, yuck.

  • Haskell has concurrency libraries that I should look at (while I'm learning about language-level approaches to concurrent programming).

    Incidentally, if you're a Haskell programmer, see if you can spot the unintentional self-parody in that blog post. Hint: it's in the sentence “So let's do something useful with this, how about a little program that computes primes and fibonacci numbers?”

This week I started looking for elementary school curriculum materials. My son is four years old. We will probably homeschool him, and I want a head start on this one. Not a lot of luck searching so far. There are a lot of individual lesson plans; for example, PBS has some science lessons. On the other end of the spectrum, I found the What your nth-grader needs to know books and ordered one. We'll see. I really want a variety of textbooks and workbooks.