Saturday, August 18, 2012
Why I think online access to the biomedical literature is one of the most important advances in science & technology of the past 30 years
It's the 200th year anniversary of the venerable New England Journal of Medicine, and they have been running a series of articles noting seminal advances they have published over the last two centuries, as well as forward-looking prognostications about the future of medicine. It made me think of the most important advances I have witnessed in my 20 years as an oncologist, and I am not restricting it to therapeutics. Examples that immediately come to mind for me, in completely random order, are filgrastim, 5-HT3 receptor antagonists, imatinib, rituximab, PET imaging, cloud computing, and social media. We all tend to be drawn to top ten lists, be it the top articles from 200 years of the NEJM, the top clinical papers of 2012 so far, the most clinically important genomic discoveries affecting patient care, or the top hospitals in the U.S. (a painful subject for us at Hopkins ever since that…ahem…medical center in Boston nabbed the top spot). But IMHO, thinking back over the last 20-30 years, you would be hard pressed to identify a series of advances with greater impact and a more amazing evolution than what has occurred in the area of information retrieval, especially search, as it applies in particular to the biomedical literature.
I remember when I was working in a lab as an undergraduate in college in 1980 and needed to learn something about H2 receptors (we were studying the effects of cimetidine on lymphocyte blastogenesis, of all crazy things), I went to the intimidating medical library at University of Virginia and pulled down these massive copies of the bound Index Medicus, which was a catalogue of all - yes, that's all - of the published scientific literature. So you would find the article you were looking for (sort of, since this was in the era before hyperlinks), and then you went to the stacks, located the bound copy of the journal, and trudged back to the copy center where you paid $0.05/page to copy your reference. And you couldn't be sure your copy was always going to be legible, since the binding was often so tight you had a large vertical blurry patch on the edge of every page where you couldn't totally press the book against the glass. Things were marginally better when I was in medical school and residency in the 80's, since there was a librarian who could run searches for you. For those of you too young to know what I am talking about here's the way it worked. You decided what terms you wanted to search ("breast cancer" AND "thiotepa" NOT "intrathecal"), submitted your request to the medical librarian, who may or may not have been any help in refining it (often not), paid the fee for the search, then…waited and waited. Sometimes it was two weeks before you got the results, which were usually displayed in these huge sheaves of continuous computer paper (the kind with the holes down the sides) printed on a dot matrix printer in faded grayish ink. Half the time - probably 80% of the time - you realized then that the terms you searched weren't right and most of the references were irrelevant, but you were either too broke to pay another fee to run a different search or you didn't have another two weeks to kill waiting for the results.
Then the 90's came and with the growth of the Internet, online searching became a reality. But it was nothing like today where searching is real-time and free of charge. There were several different methods to access the PubMed database, but many were associated with either an annual fee or a per-search fee. So unless you had an unlimited institutional account, you had to be judicious about how much searching you would do. And you often needed to use some type of front end software to access PubMed like Internet Grateful Med, which had its own learning curve and quirks. You still had to go to the library to actually read the full text of the article you retrieved, since full text online was not a reality in the early days. Storing results was another issue, since formats were not standardized and PDF's were just penetrating into biomedicine.
I remember how amazed I was when the PubMed database became totally free to use, having lived through the hassles of the previous eras. The fact that you can now construct and execute an endless series of online searches with instantaneous results using a web browser still amazes me today. Even better, access is not restricted to health professionals. Patients and families now have access to the same literature database as I do, and this dissemination and democratization of knowledge, in my opinion, has done more to improve patient engagement and enable true patient-centered care than almost anything else. It is why I say that the evolution of information retrieval of the biomedical literature is one of the greatest advances affecting medicine over the past 30 years. Of course, there still are some critical barriers to overcome, such as access to full text articles for all, better use of metadata to search, and the all-important semantic deficiencies of the web; i.e., you can search by a given term, but if you really don't know what that term means, the computer often can't help you, since humans are still needed to interpret the information retrieved. (For more on the idea of the semantic web see here.) Considering how far we've come since I was an undergrad in a lab in 1980 trying to teach myself pharmacology, I am pretty optimistic it won't take another 30 years to achieve this level of data integration and reusability.