29 September 2004

Research, Intellectual Property, and Innovation

There is an interesting article (“Grant Givers Turn More Demanding”) in today’s Wall Street Journal about the way that grant donators for medical research are forcing research scientists to collaborate with other donation recipients to work more efficiently toward effective treatment for targeted conditions. This apparently is a shocking change of pace for medical researchers who are used to dealing with the rather untimely process of making a discovery, confirming the discovery with further tests, generating a a journal article, submitting it for peer review, revising the article to fix criticisms, resubmitting it, waiting for acceptance, waiting for it to be printed, and only then does the information become available for other researchers. This obviously isn’t an ideal model for rapid development and discovery.

This ties in a way with an article a friend pointed out about how the bubble proved small teams tend to be more efficient for accomplishing tasks. I think the combination of these two suggests there is certainly a way to do the small team concept the wrong way. That is, with the balkanization of small teams comes a huge reduction in efficiency. While the small teams may be very productive internally, the lack of intellectual discourse on the subject can become a substantial obstacle to boosting that productivity to the next level and a huge obstacle to innovation. You get a vast amount of duplicated work where there isn’t sufficient communication and shared effort.

The open source community tends to have an aversion to this sort of secrecy as well. There was a great deal of anguished discussion in the FreeBSD project when development was taking place in a Perforce environment that wasn’t accessible to some of the developer community. This discussion and the outfall of having “owners” of subject areas within the project wound up driving talented developers away to form their own projects when the “owners” weren’t receptive to outside input and work on “their” property that was held in quasi-secrecy. It is interesting to note that by keeping their projects more open some of these forks have wound up producing a lot more and being more innovative than their slower moving cousins.

There is probably a middle ground somewhere which rewards the discoverers appropriately, but doesn’t derail the innovation by having one discoverer try to solve the whole problem from beginning to end. It is good to see some important medical research work this direction though, by having multiple small teams of interested parties building off each others work. It is probably a model that could be emulated with a great deal of success elsewhere.

28 September 2004

WordPress Filters Slowly

For a while now I’ve noticed that the front page on my site has kind of a long load time. It’s visible at the bottom of the page source and tends to be just over a second. Serving one page a second is not really acceptable for any sort of load, the server would peg its CPU before managing to saturate even the measily DSL line we serve from.

After wrestling with the mess that is PHP in the FreeBSD ports system, I got a profiling tool installed to see what was going on. The results aren’t too pretty.

         Real         User        System             secs/    cumm
%Time (excl/cumm) (excl/cumm) (excl/cumm) Calls call s/call Memory Usage Name
100.0 0.00 1.64 0.00 1.21 0.00 0.19 1 0.0001 1.6418 0 main
56.0 0.00 0.92 0.00 0.76 0.00 0.06 46 0.0001 0.0200 0 apply_filters
52.7 0.00 0.87 0.00 0.71 0.00 0.05 5 0.0003 0.1732 0 the_content
49.2 0.00 0.81 0.00 0.66 0.00 0.05 5 0.0000 0.1616 0 textile
39.1 0.64 0.64 0.62 0.62 0.02 0.02 705 0.0009 0.0009 0 preg_replace

The Textile filter is taking a whopping 0.81 wall clock seconds for the 5 calls to it. It is also responsible for a significant number of the preg_replace calls. The solution to this is not to filter every time you display something, but to save it pre-filtered first. There is even a nice database column “post_content_filtered” for this use. Too bad it’s not used right now.

Given the nasty performance characteristics of these WordPress filters it’s a small wonder that blog sites tend to be easily “slashdotted” off the air. While one should certainly avoid premature optimization, this seems like a place where optimization is a bit overdue. Nothing like one more pet project…


  1. Save a copy of the filtered text to “post_content_filtered” on each saved change to the post

  2. Rebuild this cached filtered text if “post_content_filtered” is empty when we look at it

  3. Provide a mechanism to invalidate the cache, or in simpler terms, empty the filtered copies so the next time they’re looked at they’ll be refiltered by the previous item.

This approach seems to be level with suggestions and at least one other implementation. Deep in my heart I wish the trigger was if post_content_filtered was “null” in the SQL sense rather than just and empty string, but that would require a schema change and probably won’t be accepted by the WordPress folks. The main integration concern I think is going to be properly filtering for some uses of “the_content” and not others. There will probably be a “the_content_static_filter” or some such for things that want to be processed and added to the cache and then the late normal filter like the existing one for everything else.

Interaction with the RSS feed and comments should probably be addressed here at some point, but I imagine I’ll figure that out as I go along. I already know that the Textile2 filter is not being applied to the RSS feed, so there maybe some adapting in order to fix that anyway. I have to admit I’m really surprised by the lack of commentary on this issue when I did some cursory searching online. Either my search terms were poor, or typical bloggers don’t really care about performance much.

05 September 2004

Patch to add Nikon "MakerNote" Support to jhead

About a year and a half ago, I wrote the author of jhead with a nice patch that fetched the extra EXIF information in the “MakerNote” section supplied by Nikon digital cameras. One would think that obtaining this extra information, above and beyond what is specified in the EXIF standard tags would be useful. Especially given that Nikon has around a 15% share of the digital camera market, and a larger percentage of the digital SLR market. Instead I got a curt reply from the author…

My philosophy has been not to implement maker note. If I start including maker note, the program becumes MUCH bigger, because each camera has its own set of maker notes. Plus, these will contain fields that other cameras don’t even have. And that makes testing and maintaining the program harder.

The solution is not to buy cameras that include lots of proprietary tags.

I can understand not wanting feature creep in the program. As far as compatibility, I had gone to some pains to not create conflicts with working files, frankly the code needs an quick automated test set anyway. And that last part, well, that’s great advice. I shouldn’t buy a camera that doesn’t include any information above and beyond the standard. Knowing what lens was used for a given picture is not at all valuable information I’d want to have when viewing pictures. Thanks for that!

As it turns out, now 16 months later, the most recent version of jhead does have support for “MakerNote” information but only for Canon. The inner conspiracy theorist in me wants to think this is part of yet another Canon vs. Nikon squabble, but never attribute to malice what can be adequately explained by apathy.

So, here is a patch I’ve written that adds back the support I once had. It is more involved than the perfunctory Canon support that the author has written, since Nikon has an “EXIF-in-a-EXIF” format for their MakerNote data and I tried to reuse the EXIF parser rather than copy the code. Let me know if this code proves useful to you.

D100 I

  • Lens

  • Focus Mode

  • Auto-focus Position

  • White Balance Name

  • White Balance Bias

  • Flash Setting

  • Flash Metering Mode

  • Noise Reduction

  • JPEG Image Sharpening

Creative Commons License

This patch is licensed under a Creative Commons License.

04 September 2004

DSL Strangelove

(or how I learned to stop worrying and love the RBOC)

I’ve spent a few hours shopping for high-speed internet service for our home again. It’s not that the existing service is particularly broken, it’s just that right now we’re paying $66 per month for 512k/256k DSL with one static IP address. This class of service is available for less half the price with any other phone company in the state. As such, I try to look around from time to time to see if anyone new happens to be offering service in our area. A better deal is bound to pop up sooner or later. Shopping for this might be easier if I didn’t want to host my own content, like this page you are looking at right now, but hosting elsewhere costs more and puts some limits on what I can do.

The folks at Comcast refuse to offer a static IP address at any price. They also are a bit reluctant to give information about prices for any of their services on their web site, which leads me to think that they are embarassingly high. There is a surcharge for the internet access if you don’t get cable TV service from them, and frankly they can peel our DirecTiVo from our cold, dead hands. They want $56 per month for 3M/256k, there’s also an install fee since there isn’t Comcast coax to our house right now.

Our friendly neighborhood Bell operating company has recently upgraded to offering better bandwidth than they did previously. There is now 1.5M/256k service for $55 per month. Unfortunately they still want an additional $20 per month for a single static IP address. As I said earlier, the reliability of the connection has been fairly good versus the anecdotal evidence I’ve heard from Comcast customers. However, when it does go down, they tend not to fess up to it on their support page, which is rather annoying coming from a position where I’m required to report and quantify that sort of downtime.

This local company bought my parent’s dialup ISP, Connect Northwest, some time ago. They advertised pretty heavily on local radio for a good long while and have been offering service nationally of late. They’re offering 1.5M/384k for $50 per month, with a static IP for an extra $5. Or at least that is what they think they’re offering; the salesman I spoke to on the phone couldn’t get information about supported speeds on my line. Odds are that this problem, was in fact, CenturyTel’s fault. He suggested that I call CenturyTel and find out from them what the line will support. Their reviews seem pretty favorable online.

Is there noone else?
Somehow there are only these three choices for land-based non-dialup internet here. Of course the lack of any significant competition goes a long ways to explaining why CenturyTel and Comcast get away with charging such exorbitant prices for their high-speed internet services. In a conversation with a sales tech at Blarg that I had a year or two ago, he told a story about the lineworkers from CenturyTel intentionally destroying competitors’ equipment at the CO. Pretty appalling stuff.

It looks like Isomedia will be a $10 per month savings with significantly more bandwidth at least on the download side. The downside will be the coordination required to get all my resources moved over. I’ll have to have our DNS root records updated and arrange some backup hosting for DNS and email while the switchover happens. Which makes me wonder if the hassle involved will be worth the $10…

01 September 2004

Surprise Pictures

A staircase into the wilderness I’ve added some pictures to the Photo Gallery from our hike on the Surprise Creek trail in the Alpine Lakes Wilderness. There are also pictures of Mocha in her cast, as well as a variety Seattle landmarks that I was photographing for contribution to the Wikipedia project.

The wooden stairs along with a lot of the other work on this trail were quite impressive. It offered a very nice change of pace from a work week trapped in a cubicle. One can never underestimate the positive power of not being within range of a cell phone tower. At the end of the hike, there is Surprise Lake which is open for camping. The sun was setting and the rain was starting to fall so we didn’t make it all the way, but it’s good to have something to work towards. The hike also served as a good test of how far I can press my repaired knee before it acts up.