Python Corner Has Moved
Jan 14 2008, 17:43 EST
I never got around to adding an RSS feed or archives to this section of the site so instead I started a python blog at blogspot. A bunch of other python blogs are hosted at blogspot so it couldn't be all bad.

A couple links for non-engineers
Nov 09 2007, 16:31 EST [updated Nov 13 2007, 12:03 EST]
I'm putting these here so I don't lose them. Both are a good introduction for non-software types to get a feel for technical products.

The first is Technical Debt. The second is about bit-rot. The two concepts are related and not something non-software people always understand. If something works why would it ever stop working? If feature N was easy why is feature N+1 hard?

ICFP 2007 Screaming C
Aug 13 2007, 00:46 EDT [updated Aug 13 2007, 02:07 EDT]
This is post is quite geeky. For the general overview about beers and BBQ see my ICFP social post

My Python implementation of the dna muncher was much too slow (25 iterations/sec). The full run requied 2 million iterations so that just wouldn't cut it. A first try at a C-based implementation was even worse (10 it/sec). I knew that using ropes was the way to go but I couldn't _find_ a library. It turns out that "crope" is a more common term than you think so the google search is very cluttered. Mainly I was disappointed that my C chops were so rusty. Thankfully my Python interpreter patches get peer reviewed by people who have forgotten more about compilers than I ever knew (rarely do they have to change my stuff but when they do it is magic).

Some lessons I [re]learned: any call to str* is deadly and the C compiler has no memory. If you repeatedly call strcat(a_string, b_string) it will get slower and slower because it will have to find the end of a_string every time even if it just found it the line before. Assert()s with a call to strlen are equally deadly. You wouldn't write a loop with 10 million iterations but a call to strlen on a 10 meg string is just as expensive.

Even without using ropes I was able to get 500 it/sec out of my C version using regular string by fixing obvious ineffeciencies and using a linked list of char * pointers. At the end of every iteration the new DNA sequence was assembled by smartly NOT moving the biggest unchanged part and memmoving() the smaller chunks around it.

I spent 50+ hours after the contest on a truly decent version which can be downloaded here. It runs at a blistering 200k it/sec - three orders of magnitude faster than my first try. The speed is very compiler (and platform?) dependent. My 1.4Ghz Celeron linux laptop with gcc 4.x runs the full endo.dna sequence in 10 seconds but my 2Ghz Althlon X2 windows desktop with gcc 3.x takes 35 seconds - the same as the cheesy laptop with profiling enabled!

The balls of the program is a refcounted string "chunk" struct coupled with a plain linked list "cat" struct which is just a concatenization of pointers to chunks (I picked refcounting over genrational GC because it fit the problem. The debugging output does a mark and sweep to ensure the numbers match so if that floats your boat turn it on). This actually took me some time to figure out. My first attempt tried to use a single structure to hold both kinds of data but the problem really has two kinds of data. One is a list of "I care about chars X:Y in this chunk and see the next guy for more info" and the other is "I am a string of len X and Y people use me." In the source you'll see the refcount is actually in a variable called "usecnt." Easier to show than to say so here they are:

typedef struct chunk {
  char *chars;
  int len;
  unsigned int usecnt; // how many ->uses pointers point to us, any number
  char *purpose;
} chunk;

typedef struct cat {
  struct cat *next;
  struct chunk *uses;
  char *chars;
  int len;
  char *purpose;
} cat;
The "purpose" attribute is in there for debugging. It is the name of the function that created the cat or chunk. This was very helpful for debugging.

Each cat points to one chunk and usually to the middle of one so we need the "len" attribute to know when to stop reading. New chunks come into existence from the nat() and protect() functions which do transformations on the DNA.

Using these structures catapulted my speed into the 100k it/sec range and took the time for a full run down to 20 seconds. This was after the contest and I can be quite obstinate so I wanted more. Almost none of the time was taken up by strstr(haystack, needle) searches that contestants said needed KPM and other fancy stuff ("premature optimization is the root of all evil"). Most of the time was being taken up by traversing long cat lists, accessing the Nth char of the string, and doing simple bookkeeping in newing and deleting structs.

Keeping a reuse list of already malloc'ed sturctures helped greatly with the bookkeeping overhead (this I learned from reading Hettinger's code). Instead of free'ing structs keep the last N around and just reinitialize them and return them like they were brand new.

Performance really suffered when the linked list of cats grew ("real" ropes use binary trees so I was paying a heavier penalty). From the beginning I had capped the length of cats by calling "do over" whenever the list grew too big: just copy everything into one big chunk referenced by one cat and then move on. But "new_cat" was getting called 80 million times in a full run so keeping the number of cats small or the speed of "new_cat" low was a big priority.

Trimming time for a call to "new_cat" was a bad deal. You can do it easily by not setting everything to NULL/0 or leaving out the "purpose" entry but that greatly hurts assertions and debugabilty (we are re-using structs so not initializing them means that most still point to valid structures). So the answer is to drop the number of calls to "new_cat" which is a little trickier.

Operations on big strings are expensive and correspondingly operations on small strings are cheap. To keep the number of cats down it is cheaper to copy a one byte string onto an existing chunk that it is to malloc a whole new cat that references a single byte in another existing chunk. By trial and error I found that anything in the size 10k to 100k is cheaper to strcat() than add an item to the cat linked list. This somewhat complicated optimization brought the cost of new_cat down from 80M calls to less than 50M.

Finally I changed my "do over" routine to only attack the small chunks instead of all of them. Because it only composited small cats and chunks it was much cheaper to run. Because of that I could run it more often to keep the list of cats even smaller. In a standard endo run of 1.9M iterations it gets called only 101 times but makes the overall speed much faster (and, umm, not coredump).

This approach is pointed at a particular problem and will have drawbacks if used more generally. It never frees a chunk even if only 1 byte of it is used and it weighs 10 megs. For this task which has a high chunk turnover the wastage memory never gets above three times over the minimum (60M for 20M worth of real data). You can keep the ratio tighter by tuning the cat concatenation but it wasn't needed for this problem so I didn't do it (it hurt speed).

My final and fastest product would not have been ideal for the contest even if I had finished it in time. The optimizations meant that I didn't keep track of the original position of DNA in the trace so it is less useful in investigating the problem. My only goal was to go into gear head mode and make an interpreter that would be slick as shit. I'm frankly thrilled that it can do a 10 second run on my Celeron laptop but unfortunately I know five guys that could cut that number in half with a few tweaks. And I'm adult enough to know I will never match them. Such is life.

Errata: Valgrind is your friend. I cannot overemphasize this enough. gprof is your friend, sometimes. In order to get library calls included in the output you will need to write thin wrapper functions. My source includes a bunch of wrap_str* and wrap_mem* functions for that purpose.

See also ICFP social stuff.

ICFP 2007
Aug 10 2007, 23:40 EDT [updated Aug 10 2007, 23:48 EDT]
Like past years bobsalive and I did the ICFP contest right. Our purchase list consisted of Omaha steaks, Akona coffee, Yeungling larger, and Jefferson's Reserve burboun (and a liberal sprinkling of eggs and bacon). Grads students may kick our ass in the final tally but we'll eat and drink better getting there. We've done the ICFP as a BBQ/beers/hang out for the past five years with varying levels of seriousness and it always turns out to be a good time.

Bob is a singer who dances - an Electrical Engineer who does some programming. I'm a dancer who sings - a Computer Engineer who has an oscillascope but doesn't use it. The straight programming aspect falls on my shoulders with Bob as a gut check to keep me from doing anything stupid. The easiest thing to do wrong is to misinterpret the problem so two heads are much much better than one.

Unfortunately for us this year's task favored large teams (like last year). It had a strong "hunt the wumpus" component where following clues meant as much as decent programming. We found the first clue by chance: our image rendering program did the "composite" operation back-to-front so the wrong bitmap was displayed. The organizers had left a clue that was supposed to be immediately over written but our error made sure it was pushed to the top every time.

Unfortunately like last year Python just wouldn't cut it for the implementation. Some years the problem lends itself to rapid development and flexibility and python shines. Some years the problem requires bare knuckle speed and C is the only way to go. We had a working implementation in just a couple hours (good) but it only ran the simulated language at 25 iterations per second (bad). The provided program was 7MB and required 2 million iterations to run so our version took hours to do one run. Other teams were claiming 20k-40k iterations per second so I knew we could do better.

I rewrote the interpreter in C and it ran at -- 10 iterations per second! And it was buggy. Sh*t. I managed to up that to 250 iterations per second but our weekend was essentially over. We had a whole day left but were out of steam. The "Sealab Technicians" finished at 150 out of the 350 teams that actually submitted scores.

To add insult to injury bobsalive smashed up my car Monday morning while leaving his Jeep unscathed. He honked his horn and shouted "screw you and the horse you rode in on!" or maybe it was "sorry, there isn't usually a car there. Let me know about the door."

Will we do it again next year? Hell yeah.

This was the social post about the ICFP. I'll have a couple follow ups about the geekier bits. One will be a contest HOWTO and the other will describe the implementation of the interpreter I did after the contest that runs at a blazing 200k iterations per second (1000 times faster than my first attempt). I don't write C much anymore and my ego wouldn't allow me to stop with a half assed implementation.

Here are two of my older writeups (I don't do them every year). 2002 and 2004.

Recruiting Horror Stories
May 24 2007, 19:12 EDT
The.codist reminisces about bad recruiters. My personal "bad recruiter" story happened to me around 1998. The recruiter was clearly a slime ball. First he demanded I sign a contract with him that said if I quit my job in the first three months before his commission vested I would have to pay him the commission out of my own pocket. I explained to him I would sign no such thing because it removed his incentive to find me a job I could stand for more than three months. Finding me a job I wouldn't immediately hate seemed like a pretty low bar for him to clear.

After we got through that argument (I was young, today I would have already walked out) he asked me if I "could pass a drug screening test." He didn't ask if I did drugs he just wanted to know if I could pass a tox-screen. To drive the point home he then added "because if not we can run down to CVS and get you something to flush your system." I had long hair back then (it was the ninties!) so he repeated the question a few times.

I did end up getting a job through him and started my first non-contract gig out of college at MusicBlvd.com (which is long since defunct). They were growing like crazy so the HR paperwork was very spotty: I had to fill out a W-2 right away but it was more than a month before I was asked to sign a random drug screening consent form (and I never had to sign a non-compete). I laughed and laughed. You see the company was heavilly staffed with would-be musicians and the executives were would-be music moguls. If there had been a company-wide "piss test day" most of customer service, a chunk of tech and marketing, and the entire executive team would have shown up positive for a swath of drugs with slight variations based on the employee's salary.

This was the go-go nineties so recreational drug use was the least of the company's problems. There was a culture of playing office; it was concentrated in the non-tech divisions but even tech suffered from it somewhat. True story: there were enough conference rooms in the building for fully half of the employees to be in conference at any given time and you had to book a week in advance to get a room. A year later musicblvd.com was redirecting to cdnow.com, a year after that cdnow.com was redirecting to bn.com, and today both will send you to amazon.com. Who would have guessed?

A quick trip to Microcenter
May 11 2007, 20:12 EDT
I was in the car going to Microcenter when I realized I was a walking faux pas. Like going to a concert wearing the band's T-shirt I was wearing a Python baseball cap and a VA Research (RedHat circa 1995) T-shirt[1]. Oh well, I don't think anyone held it against me.

The only thing I needed to buy was a copy of MS Office (I can't get the publisher's templates to work with OpenOffice). Being a guy in a computer T-shirt and hat [and birkenstocks] I couldn't avoid browsing a bit[2]. I picked up two USB Wifi adapters to de-wire a desktop and my X-Box. 2.5" hard drive enclosures were on sale for $8 so I bought a couple of those for some old laptop drives I have in a drawer. Finally I bought a spanky new leather laptop bag. My old bag is seven years old and held together with zip ties.

So it's back to getting some chapters into a format the publisher wants. I've been doing them in plain text on my linux laptop which has some advantages: I can use my regular editor [emacs] and I have a little script to test code blocks.

[1] I really need to do laundry. Many of my friends have never seen me in a T-shirt (or jeans!).
[2] I've never been in a Fry's but Microcenter is the best computer store I've ever been in. Tons of high end home-brew parts and a decent book selection.

Python TODO List
Apr 09 2007, 14:13 EDT [updated Apr 11 2007, 21:16 EDT]
Brett Cannon just posted his TODO list which has prompted me to update mine [update Doug Napoleone makes it a trifecta]. The list is depressingly similar to the one I posted four months ago. If anything it is now longer. Quiting my company took far more time than anyone envisioned but I'm done as of last week and have nothing but "free" time. This TODO represents 90% of my whole life TODO for the next three months.
  • Book book book. Short term goal is one chapter a week, accelerating to two.
  • Class Decorator PEP and Patch. I haven't worked on it since PyCon and the patch is now out of date. I also need to check in the PEP
  • str.partition() patch. Woefully out of date. Still needs more tests
  • probstat[1] upgrade. It was one of my first C/Python modules and needs to be brought up to snuff.
  • EuroPython talk. I need to think of one, propose it, write it, and practice it in front of Boston-PIG. My PyCon talk sucked[2] mainly because of nerves and lack of live practice (mirrors don't count!).
  • Move and improve the blog. This is a tiny [400 line] custom Twisted app that runs on company servers. The last time the process was restarted was three years ago when I added the Python corner (103 minutes of CPU consumed since). Only the last X python posts are reachable via navigation, the rest you have to guess (e.g. jackdied.com/python/1).
  • Book, book, book. I can't start my next venture until I get this one out of the way so it needs to be done.
  • [1] The first user submitted patch for probstat was from John Hunter who later went on to publish the hugely useful and widely used matplotlib. It was also his first fore [ed - first first? redundant] into open source.
    [2] A few strangers did thank me for the talk and asked for the slides. Most of the feedback was from friends and was translated into "you'll do better next time" or "hey, no one booed" (spoken as "that was OK for your first time" or the obtuse " the content was good"). For me it was a perfect storm of nerves - my fight or flight reflex was already pegged at maximum from my day job at my company . Add that I was in the middle of quitting and doing transition work and lawyer stuff. Add that I hadn't spoken in public in seven years. It was a bucket of suck as evidenced by the fact that it was one of only three talks that no PyCon attendee listed as their favorite (err, one of six talks with zero favorites. You can probably include the ten with one vote as ballot stuffing too. Regardless, it was in the lowest quarter.). Next time I'll kill, I promise.

Python in demand at Digg
Mar 09 2007, 15:09 EST [updated Mar 09 2007, 21:43 EST]
From the current listing for Senior Systems Engineer (which looks like a sys-admin job to me) is this requirement:
  • Proficiency in scripting/programming in any of python, perl, python, C, or C++.
[emphasis mine].

update: from the digg comments (someone else noticed)

    python, eggs and spam. python, ham, python, eggs, python and spam. python, python, bacon, python, eggs, python and spam.

Meetup.com trading on my good name!
Mar 08 2007, 20:23 EST

and yes, I was vanity googling...

Back from PyCon
Mar 03 2007, 13:28 EST [updated Mar 03 2007, 13:57 EST]
Flight was delayed before leaving Houston due to weather in Boston. Got in around midnight. It could have been worse.

A good con. Four days of sprinting was too long. The population of sprinters dropped by about one third a day so on the last day there was almost no one.

I left my car parked on the street and it wasn't towed, which is nice. There was at least some snow in Boston but not enough for a "snow emergency." If an emergency is declared they tow all the cars on one side of the street to make plowing easier (you have to move to a muni parking lot or school playground for a day). Technically you have to move your car every 48 hours but in Somerville this is rarely enforced. An alternate theory is that I have an angel on my shoulder in the form of the ladies down at the office of Traffic and Parking. I'll write up the full story some day but the short version goes like this:

    Me: I'm here to pay my tickets.
    Ladies: You total due is ... oh my god.
    Me: [present $2700 in cash]
    Ladies: Why aren't you yelling?
    Me: I can, would it help?
    Ladies: No .. but why are you smiling?
    Me: It can't hurt.
    Ladies: I have a niece about your age...
I'd like to say thanks to all the sponsoring orgs. The PSF and organizers who did the yoemans work. Also to EWT, CCP games, Tummy.com, and Google who helped make the con social by sending contingents and sponsoring off-book events. Also to the Twisted and Canonical guys who had devs with strong opinions that made post-dinner converstations more interesting.

My post-con TODO list includes getting some of the Boston Twisted guys into Sunday bowling and dropping by the Tabblo office with a case of beer in exchange for the nickel tour.

PyCon Sprints Day 2
Feb 27 2007, 22:17 EST [updated Feb 28 2007, 14:48 EST]
After reading hundreds of emails in dozens of threads spread over several years I've written the class decorator PEP and it is _short_. It is amazing how little was said using so many words.

Those damned googlers implemented nonlocals which somehow broke my (not yet checked in) patch. It doesn't conflict with my patch so I'll have to figure out where the bad interaction is. Or maybe I changed something after I had all the tests passing. Dunno.

Tomorrow I'll publish the PEP, post the patch to sourceforge and then update PEP 306:"How to Change Python's Grammar" which is years out of date. So are many of the comments in compile.c (referencing removed functions) but I'm not familiar enouogh with the guts of that to clean it up.

Update I was working with an old version of my patch. I had three copies of the svn tree around and just forgot which one was freshest. Problem resolved.

PyCon Sprints Day 1
Feb 26 2007, 22:06 EST
I slept in to catch up on all the rest I missed during the conference proper. Most of the python-devs are working on py3k issues. The twisted guys are doing their own thing. Django has about ten guys sprinting locally and some more telecommuting in. OLPC has a about eight people though I'm not sure how many work on the project versus hobbiests just pitching in for the day. Jython has at least one core developer and several attendees pitching in.

I worked on class decorators. The patch is finally done but I still have to write the PEP. I don't expect any new resistance to the PEP (the arguments have been hashed out numerous times) so 2.6 and 3.0 should hopefully have class decorators by the end of the week. Finally! It is two years after I first submitted a patch but this time there is much broader support.

PyCon Conference notes
Feb 26 2007, 21:54 EST [updated Feb 26 2007, 22:12 EST]
Attendence was 593 this year, up 40%+. I can't say much about the content of the conference because I didn't go to many talks. I did learn a lot from just hanging out in the halls.

My talk "Writing your own python types in C" went OK. 150 people is more than I've spoken in front of in a long time, if ever. Some technical glitches and being the first talk on the first day compounded my nervousness. Next year I'll do a better job.

Lots of companies were here recruiting though with varying degrees of umph. When all the googlers are hanging out in the EWT suite it is a sign that maybe google should have one too. The CCP guys donated some beer on one of the nights but failed to mention they had an open tab at the bar for the whole week (they forgot because they were hanging out in the EWT suite as well). Tummy.com organized pool side pizzas on the last night.

At least for the companies I just mentioned this was all very non rivalrous (heck, the Tummy party included leftover beer from EWT). Many of the individuals representing the companies are python-devs and knew each other before they started their current gig. Mainly it is that orgs using python know that broadly supportive is a better strategy than trying to fight over developers in a zero-sum game. I was just surprised that most of the sponsoring companies spent thousands of dollars sending representatives and printing T-shirts and then stopped short of spending an extra few hundred a day on food and beer. Go figure (though the free T-shirts were popular. I don't understand that phenomenon).

Sprints started this morning, I promise to blog more about that than I have the conference.

Oh, and the Bowling BoF was a bust. Roberto de Almeda was the only one to sign up. Hopefully next year I'll be a sponsoring bowling dotcom and can do more to promote the idea than put up a wiki page.

YouTube runs on python
Dec 12 2006, 13:01 EST
So says Guido in a post on python-dev.
    And I just found out (after everyone else probably :-) that YouTube is almost entirely written in Python. (And now I can rub shoulders with the developers since they're all Googlers now... :-)
I certainly didn't know that.

My Python TODO list
Dec 08 2006, 14:10 EST [updated Dec 08 2006, 14:12 EST]
  • Spend a few hours fleshing out my PyCon talk "Writing Your Own Python Types in C." I already know what I want to say but I need to figure out how best to say it. A day of trial an error should give me some ideas before I sit down and write the first version.
  • Class Decorator PEP and patch. A couple weeks ago I promised the BDFL I would do this one.
  • str.partition() patch. I did a first pass months ago but I have to go back and finish writing tests for the places that don't have them. The behavior of rpartition() was also changed after I did the first pass so I'll have to recheck those few place where it is used. I'll also have to remove some parenthesis to fit the style guide. I always use them in "If"s which is non-standard.
  • My probstat extension module needs some love. It was written several years ago and while it does work the code isn't very pythonic. It was written as a wrapper around a generic implementation. No one asked for the PerlXS bindings or that it be ported to PHP so I've started a rewrite to make it CPython from the ground up.

Archives
  • None Yet

0.03 seconds
jackdied.com 2003-2007