25 February 2007

Data Validation Rant

I just signed up to renew my ACM membership for the first time since I was a student back at WWU. I have to admit to feeling kind of bad about having finally caved in to the direct mail they've been bombarding me with, but it did offer a reasonable discount. So I'm busy filling out their online form that asks for all sorts of information they can use to direct additional marketing at me and I click the "continue" button, but alas I cannot continue.


So why can I not continue? Everything I entered is correct, I double-checked. It turns out I put a dash in my zip+4 code, and for them that's an error. The US Postal Service disagrees, and in fact insists that xxxxx-xxxx is the "standard format" for a zip code. One would hope that the ACM of all organizations could figure out how to validate data without looking stupid, but apparently that's too much.


This is a real frustration in lots of other online settings as well. There are quite a few online merchants that refuse to process your payment unless you leave the spaces out of your credit card number. They display stern warnings like "Important: Please do not put spaces or dashes between credit card numbers." If they can't even figure out how to remove extraneous spaces or dashes from the information I give them, should I be trusting them to get the rest of my order correct and not have my personal information stolen? Dealing with that input is maybe 4 lines of code if you're trying to make it hard. It's usually not the merchant's fault, they're just buying some third party service. Any service that merchants are having to pay for that can't manage to get this right should be publicly shamed for their lack of competence.


Email addresses are also apparently a tricky beast for people who are building online tools. There are a lot of characters that are valid in an email address that various online services tend to choke on. There are numerous examples online disguised as "how-tos" that are actually "how-not-to-dos". The example I'm picking on in this case doesn't accept addresses with a + in them. Addresses like learn+to+read@rfc2822.int are in fact perfectly legitimate email addresses, as are addresses with hyphens, underscores, carets, tildes, equals and a lot of other somewhat obscure characters. Mail servers seem to deal successfully with these things, one would hope most
web application writers could cope with it too.


Come on all you folks that are building tools for the web, it's not hard to get these things right by just being little bit more pragmatic about it.