by jared on August 26, 2011
As you know, our primary focus at DocSpot has been to connect you with individual health care providers. This week, I had hoped to unveil a new service that would allow you to search for hospitals, but the final touch-ups have taken me longer than I expected. Sometimes the smallest segments of a product can take the longest amount of time. Such is the nature of development.
In this case, I discovered that one of our sources of data was not as tidy as we had thought. Since we deal with publicly available data, we don't expect everything to be nicely sorted and packaged for us. That's what our specialized "robots" are for. However, there are certain times when the data proves to be incorrigible, and we must either reject it as a primary source or dispose of it altogether.
I had relatively high expectations for Medicare's "Providers of Service" list; albeit publicly available, it is not free. And at first glance, it seemed polished and straightforward to integrate. Then when I ran some diagnostics, I met with the worst nightmare of any engineer tasked with data management: duplicates. Multiple hospitals with the same address and same name - but different data. I had no idea which profile was correct, and the data's documentation didn't give me any indication of how to resolve the issue, let alone mention possible redundancies.
So, as engineers are wont to do, I started looking for patterns. I found a reference number that might link one duplicate to the next, a date which seemed to indicate when the profile was last updated, a code that suggested a hospital had been shut down, a category that appeared to single out duplicate entries. In the end, the relationships seemed too arbitrary, and I hadn't even rooted out all the redundancies. One pair, in particular - two profiles for Broughton Hospital, in North Carolina - deigned to mock my efforts: differing by only one or two data points, they matched on every single metric I used to differentiate between duplicate profiles.
After almost giving up on this rich source of data, I finally discovered another Medicare file (on a completely different section of their website) that identifies the unique entries in the problematic source. Problem solved. The question remains - will there be yet another set of finishing touches? Time will tell - such is the nature of product development. In the meanwhile, keep checking our blog for updates, and let us know what you would like to see in our upcoming hospital product.