Pedantry - Moved to http://pedantry.fistfulofeuros.net
Friday, June 20, 2003
Four Months of my existence
This scatterplot was built over 46618 aligned segments from commerical French-to-Dutch translations. The corpus is full of noise - English and Dutch segments passed in as parts of French documents and segments that were realigned by the translator so that the source no longer matches the target - but I made no effort to filter it out. It took 22 minutes to build the model from the underlying corpus on a Sunfire running Solaris 8. My maximum memory usage was under 400Mb and would have been a third as much if Allegro Lisp had a better garbage collection system. Processing the training data through my model took another hour. I have not yet tested the model using out-of-sample data - technically, I'm cheating - but in this case there are grounds to consider it reasonable behaviour.
In simple terms (denuded of any real value in reverse engineering my algorithm), the horizontal axis is how I predict translators will respond to the data I'm giving them, the vertical axis is how they actually responded. This way, I can predict how useful a translator will find an automatically generated translation before any human being has seen it. My average error is 0.17368491 - I didn't take the mean squared error on this run. I could improve my accuracy by adjusting the axes a little, but that would be data dependent and I couldn't extrapolate my adjustments to new data. The overall predictablity holds for both English-to-French (which has a much smaller data set but has half the mean error) and French-to-Dutch bilingual corpora over several dozen translators. Considering that this is a very simple model of as complex and human a process as translation, and think it's pretty damn good.
Four months of my life went into that scatterplot. Tonight, I am going to get drunk.
Busy Busy Busy
Sorry, folks, I haven't blogged anything since Tuesday, and I have at least one e-mail about the on-going discussion about the Middle East that merits a considered response.
I've been fighting off vicious nighttime allergies and trying to retrofit a lot of my code for this "virtual corpus" algorithm I stole from Chunyu Kit that seems to help minimise my consing load. I'm also getting useful results from a sort of coverage metric that I came up with while on the toilet Wednesday instead of similarity coefficients like Jacquard metres and cosines. I'm using it in conjunction with a structure discovery algorithm I stole from some guy at Google and reimplemented in Lisp. Unfortunately, it's an inconsistent segmenter. I suspect, though, that there is a way of enhancing his algorithm's usefulness by using more information theoric considerations. There is a related non-incremental algorithm for doing just what I need, but Chunyu Kit doesn't post his code on the 'Net, so I can't just steal it, I have to reimplement it. It drove me nuts for a while wondering if I could find a version in English, since my Chinese is so not up to it. Fortunately, he covers the same materials in several other papers. Besides, I think I've got a faster way, although like Kit's algorithm it still implies doing Viterbi alignment. All this just to build a quality enhancement system for a computer aided translation system.
If you started nodding off after the words "vicious nighttime allergies", don't worry. I work in an odd field full of odd people and odd ideas. Vincent is probably my only reader who's heard of most of that stuff. (BTW, Vincent, I withdraw everything I said about our speech recognition class. I'm actually using Viterbi alignment on the job. Who knew Compernolle's class would actually be good for something?)
At any rate, between work, being sick, and looking for an apartment I may be off-line for a few days, probably the whole weekend. I'll try to put up some more from Grandpa Dick this weekend if I get a chance, and get around at least to my e-mail. If I'm really sick though, I expect I'll be in bed reading.
Tuesday, June 17, 2003
What the World thinks of America
The above is the title of a programme on BBC2 right now. It is not a brilliant programme. Kegan is talking out of his butt, which is no surprise. The rest of the commentators seem to have little of interest to say, even Claire Short, who sounds like she's trying to run for reelection.
Too much of what is being said is canned noises of various kinds: America is a superpower and can hardly avoid being criticised, People want their countries to be close to America, but not too close, and It's all Bush's fault.
The thing I find interesting is that their poll claims that a majority of Americans do not feel their own culutre is the best in the world. It's the only thing they've said so far that surprised me.
The only thing I can think is that maybe the idea that American food is McDonald's, American art is Holywood, and American literature is Stephen King has actually sunk in. The thing is, it's not even true, and it would be unfortunate if Americans went from trying to defend the worst they have to offer to believing that that's all they have to offer.
It's certainly true that Americans are more aware of global culture than they used to be. When I first lived in the States, foriegn food meant Chinese food (and not very authentic Chinese food either - in New Jersey in the 80's it was mostly ketchup Cantonese), foreign films meant inaccessible art films, and foreign literature meant Margaret Atwood.
American public culture isn't crap. You should see Belgian television. McDonald's may be bad for you, but how people here can snarf fries and sugar waffles and not weigh a ton is something I just don't get. And don't get me started on how hard it is to find good hot-and-sour soup or a decent bottle of teriyaki.
The other thing that bothers me is that they've talked about how America's relative prosperity is seen. Unsurprisingly, most of the world seems not to want their countries to copy American economic policy and think their own lands are better than America, while Americans seem to think their system is pretty good and that everyone wants to live in the US.
The simple truth is one thing no one seems to want to say: nearly everyone in the world has no desire to be American or to be like Americans. Nearly everyone wants what America has - or at least what they perceive it to have - namely, money and cool stuff.
Update: Added a sentence to reinforce something that seems to have gotten dropped - that I don't think American culture is all schlock. I said I needed an editor.
Monday, June 16, 2003
Once around the blogosphere
Conscious that Europe is a continent that has brought forth civilisation; that its inhabitants, arriving in successive waves since the first ages of mankind, have gradually developed the values underlying humanism: equality of persons, freedom, respect for reason,
It's a bit Eurocentric, with that brought forth civilisation stuff, but that's understandable in a preamble. Now, I suppose, comes the haggling. It's not 1776 anymore, you can't just write good stuff into a constitution and run with it.
Mea culpa: Okay, the US constitution dates to 1787 rather than 1776, and wasn't composed free of debate.
Conjectures and Refutations
(although actually in the opposite order)
You know, I'm always amazed at what I write that gets linked to. It's rarely the stuff I expect. My post on collectivism and responsibility - which I thought wasn't even terribly provocative - seems to have garnished a surprising number of links, from the laudatory (Silentio, The Green[e]house Effect, Alas, a Blog, the watch) to the more critical.
Among the later is a post from Armed Liberal on Winds of Change, which has missed the whole point that it is not the Palestinian Authority I am exonerating but the Palestinians. I don't know to what degree the PA shares responsibility for the current mess, although the efforts I've seen seeking to lay all the blame on it leave me cold. There's more than enough blame to go around on both sides of the Green Line. My choice of Weber was to make a really somewhat secondary point: without power or any prospect of establishing a state in the Weberian sense, they neither can nor have they any reason to try to rein in terrorists.
Organisations like Hamas - I'm thinking the IRA, ETA, Corsican nationalists, the various European radical leftist groups, even right wing terrorists of the kind Turkey used to suffer a lot from - are almost impossible to defeat by direct intervention when they are have even moderate public support. Killing one just makes more volunteer. They run into trouble when the people they claim to be fighting for come to the conclusion that the terrorists are interfering with their basically okay lives. Hamas will have trouble recruiting when the Palestinian public has jobs, schools and decent lives and sees Hamas as a threat to those things. It will not happen sooner, no matter how much force Israel uses. That is the clear lesson of successful anti-terrorist campaigns.
One of the commenters at Winds of Change thinks the PA ought to invite outside parties to come into the West Bank and Gaza to help rid themselves of Hamas who would be, as I pointed out, an ever greater threat to a real Palestinian state than to Israel. I don't have a link, but BBC World coverage claims that the PA welcomes Kofi Annan's efforts to bring a UN force into Palestine. The PA seems not to have a problem with outside help in ridding it of alternative organisations, they just have a problem with Israeli "help." I think that's understandable.
Others are less sanguine. I do not imagine that the creation of a Palestinian state will magically end divisions, violence, or terrorism, in Israel or elsewhere. Rather, I think there is no prospect of it even abating over time without some settlement which offers Palestinians dignity, institutions they can call their own and prospects for a much better future.
The big surprise, however, has been the response to my post on how I ended up in Belgium, which I sort of just fired off in response to an e-mail. Jeremy Osner has put up a three part post (1, 2, 3) covering the same sort of ground in his life. The thing I find remarkable in reading it is that he feels he made a mistake in failing to plan for his future in college. Funny thing, I always felt my mistake was to ever have tried to plan for my future in college.
You see, I was fifteen when I graduated from high school, and at that age I already knew what I wanted to do with my life: I was going to be a high energy physicist. Why? Well, I was something of a geek in high school. I was salutatorian at fifteen - I would have been valedictorian except for one senior who took all easy classes her last year because she had already had early admission to Vassar - and I was the smart guy who always screwed up the grading curve. When you're a confirmed geek, not being a geek doesn't seem like an option, so your only hope for self-respect (or to get laid) seems to lie in being an übergeek. This was the mid-80's - nobody had heard of Bill Gates yet - so becoming a computer billionaire was not an option I was aware of. The only road to being an alpha geek was science, and physics was understood to be the hardest and most respected science of them all.
From the perspective of age 32, this all sounds so silly. Even at 15, you couldn't have ever got me to say anything like that, but this description is not too far from my actual reasoning. At fifteen, I felt so much like I urgently had to make these permanent, life-altering choices and the hardest of hard sciences just seemed like an obvious choice.
No one under the age of 25 should have to make life altering decisions. Physics was a big mistake.
I was soon 16, and just about everything in the world mattered more than working hard and getting good grades. I had okay grades the first year, mediocre grades the second year, and the third year I went to France and studied no math or science at all. My fourth year was a disaster, but I passed, barely. If I had been less young and stupid, I would have had the good sense to quit physics and find something I enjoyed, but no. I was stubborn and I fast-talked my way into the Université de Montréal. In the early 90's in Canada, there was a $2000 scholarship for any student with an English language education to go to Quebec and study in French and that pretty much paid for my first semester in Montreal.
Montreal was the most wonderfully liberating place in the world. I was barely 20 and I discovered that my Belgian last name and Alsatian accented French made me a mildly exotic European. I was free of my parents and free of my Mennonite undergraduate school. I had my own apartment, I set my own hours, I met girls and I could legally drink.
My first semester of graduate physics, I had a GPA of 0.5. That was what it took to get me to do the right thing and stop with physics. I was scared of tensor calculus for years afterwards, only curing myself of my phobia by taking robotics at Stanford years later.