09 December 2008

The very best of jorendorff?

I like Language Log, but I would like it even better if there were less of it.

Wouldn't it be keen if there were a site where you could enter the URL of any blog, and it would give you back a feed containing only half the entries—the best ones, according to whatever metric of popularity the service could find (links, diggs, whatever).

I proposed this on IRC, where mhoye and humph reacted with a definite meh. (Note: All these chat excerpts are edited to give the illusion that there's a single coherent conversation going on.)

<humph> what's popular and what's interesting to me are not often the same for me
<mhoye> humph++
<mhoye> That sounds like a good way to be drowning in mediocrity, for sure.
<mhoye> jorendorff: Apply your theory to popular music.
...
<ted> so your theory is that if you like a blog enough to subscribe, you would like it even more if you only got the absolute best posts?
<jorendorff> ted: my theory is that "absolute best posts" means something
<mhoye> God, no.
<mhoye> See also, "absolute best music", "absolute best paint color."

I failed several times at explaining why I think this. Let me try again here.

Simple ratings systems are common on the Web. Some, like the Slashdot comment ratings (“Score: 5, Insightful” and such) perform very well. Others, like online restaurant guides, are useless. Ratings work when users agree on what's good and what's bad. On Slashdot, the worst posts are pretty content-free. Subjective tastes don't even really enter into it. Restaurants are a different story. In the case of music, mhoye's example, I'm sure any two people can find plenty to disagree on. But:

<jorendorff> mhoye: do you have a favorite band?
<mhoye> Not just one!
<jorendorff> mhoye: I'm struggling to get you guys to engage on any specific example :(
<mhoye> Jorendorff: Ok, here. "Entertainment", by "Gang Of Four".
<jorendorff> mhoye: excellent - what are your favorite songs off that album?
* mhoye picks "I Found That Essence Rare" and "Anthrax"

Both of mhoye's picks are among what the Apple Store calls the “TOP SONGS” from that album. Both are mentioned in Apple's review. Maybe mhoye picked them because they're the best tracks on the album.

Counterexamples abound too. We could settle this scientifically by sampling a blog's audience, having those people rate posts for a while, and seeing how closely their ratings correlate.

Instead, let's play a silly game. See if you can stand to read these two entries from my old writing journal: Zen in space and the swoon. I believe one of those is about as good as I can write and the other is flat-out bad. I furthermore immodestly claim that those are two different things! And I think you might agree with me on which is which. We'll see (if you're willing) in the comments.

21 November 2008

Arms

I am flushing the buffer of old posts. Here is one I delayed posting because it's just too boring. Well, I'm posting it anyway. Sorry.

xkcd has a provocative comic about cryptography.

I imagine many geeks are moderately in favor of gun control but staunchly opposed to cryptography control. The two issues are very similar.

Having a gun lets you do two basic things: intimidate unarmed people; and resist armed or otherwise violent people. (You can of course just shoot people, but the power of a gun starts working at some remove from that eventuality. Most cops never shoot anyone.) Neither ability is necessary unless something has gone wrong; and both abilities can themselves go wrong in spectacular ways.

I believe in a fundamental human right to self defense, and for both moral and pragmatic reasons I tend to prefer individual rights to the common good where they conflict. So ab initio I pretty much have to oppose gun control unless there is a strong reason to think it's pragmatically the only way to go.

But I also believe gun control is an all-or-nothing proposition in practice: imposing a five-day waiting period before someone can buy a handgun, for example, makes no sense at all to me. Measures that are obviously easily circumvented, like the current U.S. background check, also make no sense. These measures seem squarely targeted at established, law-abiding gun sellers and their law-abiding customers. Effectively preventing criminals from having guns would require serious bookkeeping requirements and a tremendous enforcement effort. Judging by the results in the places where that has been tried (Illinois, the UK) it just doesn't seem worth it.

18 November 2008

Recently I learned...

  • Some people are, at this moment, running around the world. As of this writing they're about 16% done.

  • Seattle is farther north than Montréal. It is in fact farther north than all but the northernmost tip of Maine.

  • Someone with a browser history like mine is more likely female than male.

    (My history includes a lot of pages on bugzilla.mozilla.org and developer.mozilla.org, which should peg me as a male nerd with high confidence. But that page only checks whether you've visited the front page of a few very popular sites.)

  • In 1905, the President of the United States threatened to abolish football unless something was done to reduce the number of fatalities. Colleges established a rules committee and made radical changes to the game. Hurdling (jumping feet-first over other players) was banned. Roughing penalties were introduced. Six men were required on the offensive line. The forward pass was added.

    [T]he new rules committee further opened up the game by requiring the offense to advance ten yards in three tries for a first down instead of five yards, which previously had been a great inducement for bruising, battering line play from which no form of mayhem was barred.

    Principles of Coaching Football by Mike Bobo and Spike Dykes, citing 100 Plus Years of Football by Jerry Brondfield, 1975.

    This, in addition to earlier reforms banning things like piling on the ball carrier and the flying wedge (in which offensive players would build up momentum before the play started by charging en masse toward the line), made football the genteel pastime it is today.

I'm starting to learn random stuff by reading source code. Technically I've been doing this for ten years or more, but I recently have made a conscious effort to be more aggressive.

  • lighttpd has a very simple scheme for exploiting multiple CPUs. After binding the server socket to an address, it simply forks a few times. All the worker processes do the same thing: listen on the socket and serve HTTP requests. There's no load balancing and no communication between the parent process and the worker processes.

  • On x86, at least in glibc's implementation, setjmp saves five 32-bit words of state: three callee-save registers and the caller's stack pointer and instruction pointer.

I also learned a lot last week about David Humphrey's awesome Mozilla project course at Seneca College. It builds on a foundation of C, C++, and UNIX knowledge that Seneca's CS program lays down in required freshman courses. It has a lecture component and a lab. Any time you spend on your project is on top of that. It has a reputation as a killer course.

23 October 2008

Commonwealth of Kentucky v. 141 Internet Domain Names

Bill Poser is annoyed that the state of Kentucky has decided to seize a bunch of domain names on the unlikely theory that they are “gambling devices”.

Among the silly things going on here is the name of the case, which Bill explains in the comments:

Yes, the nominal defendants are the domain names. This is an example of a lawsuit in rem "against a thing". It is the typical form of action in seizure cases. This results in wonderful case names like "United States v. 11 1/4 Dozen Packages of Articles Labeled in Part Mrs. Moffat’s Shoo-Fly Powders for Drunkenness, 40 F. Supp. 208 (W D.N.Y. 1941)" , "United States v. Approximately 64,695 Pounds of Shark Fins, No. 05-56274 (9th Cir. Mar. 17, 2008)", "United States v. Forty Barrels and Twenty Kegs of Coca-Cola 241 U.S. 265 (1916)", and the inimitable "United States v. One Package of Japanese Pessaries 86 F.2d 737 (2nd Cir. 1936)". (The pessaries in question were what we would now call diaphragms. This is the case in which the Court of Appeals for the Second Circuit over-ruled the government's invocation of the Comstack Act and allowed Margaret Sanger to import Japanese contraceptives.)

30 September 2008

Who buys this stuff?

My search continues for something substantial to read from an economist in favor of the bailout. On TV, they all appear to favor it (using vague language and lots of clichés), but on the Internet, they all seem to oppose it (with compelling economic arguments).

I thought I may have found it when I ran across a dire quote from Nouriel Roubini in a newspaper, warning of economic woes to come. Then I went to his web site. It turns out Roubini recently wrote an article entitled “Is Purchasing $700 billion of Toxic Assets the Best Way to Recapitalize the Financial System? No! It is Rather a Disgrace and Rip-Off Benefitting only the Shareholders and Unsecured Creditors of Banks”. Heh! It turns out he's just generally gloomy and has been for years; it didn't start after yesterday's vote.

Reporters have done a bad job with this one. There's the usual ignorance of economics, but it's more than that. They seem to be caught up in the crisis atmosphere. They're not objective. Worse, they never seem to distinguish between Wall Street investors and economists, never mind hysteria and reason.

My Representative

I wrote to my Representative, Jim Cooper, and three days ago, he wrote back:

...I hate the thought of paying ransom to Wall Street, especially when Main Street is struggling. I am furious that our financial situation has been allowed to get this point, and that Treasury is considering bailing out the lenders who helped caused this to occur.

Then he voted for the bailout. According to this morning's USA Today, he said, “It's mainly political fear, the reaction back home. It's the most difficult time for people to be statesmen, 37 days before an election.”

12 August 2008

Squares

Yesterday, apropos of nothing, J. announced that 9 is not the only square number. 4 is, too. Even 1, he added. It turns out he didn't hear the phrase “square number” anywhere. He's just been playing with blocks.

Today I got out some extra blocks and showed him that 16 is a square number, too. He wondered, apparently at random, if 100 was a square number. So we counted out one hundred blocks and as it happens, it is.

He's been watching some math videos. We borrow one from the library each week. Last week's video, on division, explained that any number divided by 1 is itself. I doubt J. has any real conception of what division is and where it applies, but he liked that rule.

Properties of the integers, man. Before you know it he'll be telling me that 7 and 13 can't make any rectangles, except for long skinny ones...

24 July 2008

Last week I learned...

  • When volcanic eruptions created the island of Ferdinandea in 1831, it was quickly claimed by Italy, France, the UK, and Spain. While they were arguing, the little island eroded away.

  • How to put this? Language isn't what I thought it was. (This definitely falls into the category of thought-provoking stuff I won't pretend to understand.)

    A little background. Before your third birthday, you subconsciously achieved a thorough familiarity with the grammar of the language spoken in your home. Never mind that you said I've sawn instead of I've seen. That's small stuff. You knew when to use the and when to use a. You knew which of they run and they runs was right, that big green circle sounds better than green big circle, how to figure out what it means in context, and much more. You will never have conscious understanding of all the syntactic rules you already had subconsciously at three. No one does. Not even people who make a career out of studying exactly that.

    Linguists aren't dumb. Why is it that toddlers are able to do this amazing thing that all the linguists in the world, given several decades to work, can't do?

    Beats me, but there's something else kids do that's even more amazing. They invent grammar.

    I'm tempted to block-quote about a page out of this book I'm reading (Foundations of Language by Ray Jackendoff). It's fascinating stuff. “Derek Bickerton documents in detail that children of a pidgin-speaking community do not grow up speaking the pidgin, but rather use the pidgin as raw material for a grammatically much richer system called a ‘creole’.” If adults could do that, there wouldn't be a pidgin phase. The kids do it. Where does that come from?

    Communities can exist for millenia without developing writing. They don't go without grammatically complex spoken language. Hmmm.

    Even better, there's a school for the deaf in Nicaragua where the kids, unprompted, made up their own sign language. “Besides offering the wonder of a whole language coming out of nowhere, Nicaraguan Sign Language sheds some light on questions about creole. Evidently a community is necessary for language creation, but a common stock of pre-existing raw material is not.” I always assumed the syntax of a language like English comes together incrementally, over thousands of years. Shows what I know. It was probably invented in a single generation.

  • Parahã, a language spoken by a few hundred people in Brazil, contains, according to Wikipedia, “two very rare sounds, [ɺ͡ɺ̼] and [t͡ʙ̥]”. In case you don't have the fonts I do, that first one looks like two upside-down lowercase rs with a squiggle underneath like a bird in flight, and a arc over the top; and the second one looks like tB with a dot under the B and an arc over the top. I wonder how they're pronounced.

  • In the version of g++ that ships on the Mac these days (GCC 4.0.1), you can get the old-school SGI STL hash_map container by doing #include <ext/hash_map> and using __gcc_cxx::hash_map;. But the GCC guys have already replaced the hash_ containers with newer standards-track containers, unordered_map and friends, which you can get in a more recent libstdc++.

  • In C++, a class's private members are not entirely hidden from code that uses the class. It's possible for public names to collide with private names. For example:

        class A { private: void f(); };  // This method is private, but its name matters...
        class B { public:  void f(); };  // ...because it'll conflict with this one.
        class C : public A, public B {};
    
        C().f();  // Error: request for member ‘f’ is ambiguous.

    Leaky abstractions make me sad. This doesn't seem to come up often in practice, but I think it's one reason STL implementations tend to contain lots of extra underscores. Another reason for that, as Blake Kaplan pointed out to me, is that a standard C++ program can do:

    #define n 3
    #include <vector>

    and the headers should be able to cope with that.

23 July 2008

Stuff I learned recently

  • Twelve thousand years ago, a gigantic dam of solid ice blocked the Clark Fork River, creating Glacial Lake Missoula.

    The lake was almost 2,000 feet deep.

    And periodically the dam would explode, laying waste to parts of what's now Montana, Idaho, Washington, and Oregon.

    Thundering waves and chunks of ice tore away soils and mountainsides, deposited giant ripple marks, created the scablands of eastern Washington and carved the Columbia River Gorge.

  • Mendellsohn's Wedding March has about 50 times more notes in it than I had realized.

  • “Between 1958 and 1992, Russia dumped 18 nuclear reactors into the Arctic Ocean, several of them still fully loaded with nuclear fuel,” writes Scott G. Borgerson. The article also points out that last summer, “[f]or the first time, the Northwest Passage—a fabled sea route to Asia that European explorers sought in vain for centuries—opened for shipping.”

  • Calque is a loanword and loanword is a calque. (Source.)

  • Recent Linux and Windows operating systems implement address space layout randomization. The goal is to prevent certain security attacks that depend on specific code being in predictable memory addresses.

  • According to a 2005 research paper by Richard Haier et al, women's brains have about 10 times the amount of white matter related to general intelligence (that is, in areas whose size correlates with IQ) as men's. Contrariwise men have have about 6.5 times the amount of IQ-correlated gray matter. I find that pretty startling.

    Here are some of Haier's own words on brains and genes.

02 July 2008

What is a noun?

But what about earthquakes and concerts and wars, values and weights and costs, famines and droughts, redness and fairness, days and millennia, functions and purposes, craftsmanship, perfection, enjoyment, and finesse?

—Ray Jackendoff, Foundations of Language: brain, meaning, grammar, evolution

I learned in school that a noun is a word that names a person, place or thing.

A few years after that, the definition changed. In hindsight this seems creepy. It happened twice. I don't remember any explicit discussion or even acknowledgment of the change. We would do nouns one way one year, and when that time came around the next year, we would have different textbooks with a different definition. I remember the changes: first “event” and later “idea” were added to the list ...bringing nouns like earthquakes and purposes in from the cold, I guess. We regret the omission, etc.

I didn't know this until a couple days ago, but linguists apparently consider this whole approach to parts of speech hopelessly, fundamentally broken. Morally bankrupt, in fact. That a child is taught the ”person, place or thing“ definition approximately once every 12 seconds preys on the linguist's soul. It causes him to make awkward scenes at parties. Even the funny papers are bristling with painful reminders of this horrible truth.

I never noticed before, but there is a problem or two with this whole “person, place or thing” thing. All the most common words for people (you, I, he, she, they) and things (this, that, these, those, it) are pronouns, while all the most common words for places (here, there, in, out, up, down, to, from, and on and on) are adverbs and prepositions. All the other definitions I learned for parts of speech are bogus, too. I learned that “action words” are verbs; but homocide, defenestration, and touchdown are all nouns. (So is pirouette. My wife didn't believe me.) I learned that prepositions tell about relationships, particularly spacial relationships; but proximity and distance are nouns (and cover and surround are verbs!). I learned that words that describe properties of things are adjectives; but weight, beauty, shape, and color are nouns.

So what is the definition of a noun, exactly? Well, I'll tell you. I don't know. Strangely, I don't think linguists like to say! Here's a pretty good near miss by Geoffrey K. Pullum, writing in Language Log:

The way to tell whether a word is a noun in English is to ask questions like: Does it have a plural form (the terrors of childhood)? Does it have a genitive form (terror's effects)? Does it occur with the articles the and a (the terror)? Can you use it as the main or only word in the subject of a clause (Terror rooted me to the spot), or the object of a preposition (war on terror)? And so on. These are grammatical questions. Syntactic and morphological questions. Not semantic ones.

A bit vague, isn't it? That's way above average, though. Here's an honest attempt; it starts with “A noun is a member of a syntactic class…”. Until I edited it, Wikipedia's article on nouns started, “In linguistics, a noun or noun substantive is a lexical category which is defined in terms of how its members combine with other kinds of expressions.”

There's an interesting twist to how all this gets bootstrapped in the toddler brain. All the first words you learn are nouns, words for people and things in your little one-year-old world. You'll be able to put words together into sentences before you master any pronouns. That is, at the time when you're learning the basic grammar of the language, there is a semantic distinction between the nouns you know and all other words. The values and weights and costs come later.

14 June 2008

Barleycorn Bay

See earlier puzzles for, er, something of an explanation.

“Now listen close-like,” said my new friend, “'cos I'm only going to say this once. All the inhabitants of Barleycorn Bay, and I've met them each and every one, are either heroes or vagabonds. Or both. Every one of the heroes is blonde; every one of the vagabonds is a magician, except for any that be Quakers; and all the magicians are nanny goats. Every one that isn't a walrus isn't a ruminant.”

“Isn't a what?” I said.

“And every living soul in Barleycorn Bay that isn't clean-shaven is red-headed, excepting the nanny goats of course. Needless to say,” he added, scratching his beard with a steel hook, “there are no clean-shaven pirates.”

I thought it over for a while. “Is there such a thing,” I wondered aloud, “as a nanny goat that's also a walrus?”

“I reckon there could be,” he replied, “although I've never met one.”

“What about a blonde redhead?”

“Don't be ridiculous.”

01 June 2008

Facts

David Macaulay wrote and illustrated a book entitled Castle. If you have a child four or older, you have a perfect excuse to buy it. Mill is fascinating, too.

That “if Microsoft designed the iPod package” video was commissioned by Microsoft. To my mind, that makes it even cleverer.

In an underground passageway somewhere near Shinjuku, there is a machine that cleans your glasses.

A hot-air balloon that can carry 8 people costs “all your money”. Or if you prefer, about a hundred thousand dollars.

You can write a regular expression that matches only strings of a composite number of x's. (Hint: Too easy for a hint.)

I knew that Galileo discovered the first four of Jupiter's moons, the Galilean moons. I didn't know that the fifth known moon wasn't discovered until almost 300 years later, when Tennesseean E. E. Barnard discovered Amalthea.

And it took me completely by surprise to learn that Richard Feynman ever worked at a computer hardware startup, Thinking Machines Corporation.

28 May 2008

Crowd vs. Committee

I just found this in an old notebook. Apparently I wrote it a couple years ago. Most of it seems to make more sense to me now.

Wisdom of Crowds Design by Committee
Both: Participants may be biased.
Bias averages out Bias creates “riders”
Not much work Lots of work
No consensus required Seeks consensus. Decisions may be postponed to avoid stirring up trouble.
Minority (“special”) interests can be publicised but are often ignored Minority interests are not ignored
No experts—skepticism (Presumption is that a random individual is not an expert.) All experts—openness
Lossy, mass communication (of arguments, etc.) Tedious explicit communication
Both: No overarching design or uniting vision.
Nobody cares Possibly competing visions
Simple output. Unbounded complexity in output.
Immediate feedback. Long-term, invisible feedback.
Individuals have low individual impact. Individuals are influential.
Neglecting the topic somehow doesn't matter. Neglect causes warts (that is, areas where the design is painfully bad —ed.)
Product needn't be understood (markets) Product is ideas.
Mechanism for approaching a good result exists (market; averaging) Democracy (voting) and consensus are the only such mechanisms.
Interfaces are well-defined before work starts (ballot; prices) Interfaces have to be designed.
Individuals can't introduce bureaucracy Individuals sometimes manage to introduce bureaucracy

23 May 2008

Curlicues

J was writing his sister A's name for her on a piece of construction paper. J is 4 years old and A is 2, and somebody recently taught J that he can decorate his letters with outrageous curlicues. So J says, “Do you want me to put curlicues on it?” And A replies, in her tiny stern voice, “There are no Qs in my name!”

10 May 2008

Firefox 3

Firefox 3 is nearing release. Check out what's new, especially the Awesomebar, which has changed my life.

(Awesomebar itself is the work of superhacker Ed Lee, but it relies on Places, the new bookmarks and history system, 2+ years in the making.)

If you're interested in security, especially the difficulty of giving users correct, actionable security-related info at a glance, read about Firefox's new site identification and malicious site detection features.