by jared on July 07, 2011
Ever wonder what goes on behind the scenes of our website? Not just the gears behind the shiny stuff: the colorful web design, the zigzag of buttons and boxes and widgets. I'm talking about the boring bits ? the aggregation and pruning of data that serves as the site's backbone. The oft over-looked life of the bleary-eyed programmer toiling away in his cubicle. Ever wonder about this guy? No?
It's okay. I don't blame you. Admittedly, my life isn't all that glamorous. My primary job is to process data and try to get it into structured form. In other words, given a list of words or names, I try to have the software sort them into meaningful categories, like Specialty, Clinical Interests, or Medical School.
It's a little like teaching a robot to pick up colored legos from their respective buckets - if the buckets had been sorted by a two-year-old. At best, most are in the right bucket, but a fair amount of red legos are in the blue bucket and most of the yellows are in the green. At worst, the lego pieces are strewn all over the floor - Medical School, Residency, Dental School, Fellowship - all in the same pile.
Here's an example of what I mean. This time there are just two buckets: Last Name and Credential. You'd think it would be pretty simple to distinguish someone's name from his credentials, you know ? like Dr. Smith vs. ?MD?. Pretty straightforward stuff.
You'd be surprised.
Try looking up ?Larry Keyser? on the Centers for Medicaid and Medicare website. Can't find him? Neither can I. But wait, that doesn't make sense - here's his profile. What's going on here?
When I review the data file again, I discover the problem: the data source mixed this provider's last name and credentials. According to the source, this doctor's last name is "Keyser Optometrist". Now I need to tell the software to take the green lego "Optometrist" out of the red bucket with all the other last names. And it's hard enough to get robots to recognize colors.
I am the programmer (one of them anyway) who works on issues like these, tirelessly trying to perfect the ghastly data in the public databases we work with. But I'm more than willing - every little improvement we make helps push us closer toward our goal of bringing transparency to healthcare. That's what makes it all worth it.