On Informatics

Google as Diagnostic Aid

December 5th, 2006

I was recently alerted by another member of the department to an article right up my alley – it reports the results of a recent study on using the Internet (Google, specifically) to assist doctors in making diagnoses.

Before we go any further, let’s just be clear that the article doesn’t suggest Google should replace doctors in the decision-making process; in fact, it makes it quite clear that a doctor’s specialized knowledge is necessary to form the search query well in the first place, as well as to sift through the results when they are returned. In this sense, Google’s usefulness to decision support is really as something of a brainstorming tool – it allows a trained physician to consider possibilities, sometimes unusual, that may not have been at the forefront of their thoughts.

I found the editorial even more interesting than the results of the study itself, if that is possible. In particular, the editorial points out a significant difficulty with using the web to generate these sorts of results – namely, its anarchy. Because web pages are created using a variety of formats and formatting (HTML, CSS, PDF, DOC, XML, etc.), it makes it very difficult to understand the data itself. Computers do not speak or understand the way we do, they simply index. Something like HTML only tells the computer how to display information, but it doesn’t impart anything about what the information means (this is one reason why search engines often return bizarre results, incidentally – because they currently have no way of knowing better).

To deal with this difficulty, there has been an ongoing push (15+ years) by the founder of the web (Tim Berners Lee) and others to move to what they have dubbed “The Semantic Web.” In short, this effort is intended to embed all web documents with hidden information that actually indicates what each piece of information means, as well as what type of information it is. As an imaginary example, take this text:

'Googling for a diagnosis—use of Google as a diagnostic aid: internet based study,' Tang and Ng.

Google does not understand that ‘Googling for a diagnosis…’ is the title, nor does it understand Tang and Ng to be authors. Using semantic web principles, however, one would simply code the page to be something like this:

〈title〉'Googling for a diagnosis—use of Google as a diagnostic aid: internet based study,'〈/title〉 〈author〉Tang〈/author〉 and 〈author〉Ng〈/author〉.

The person viewing the page, though, would not see all of this information, they would see things as in the first example.

It seems easy enough, but in fact making such a large scale switch is extremely difficult. Among other things, how does one identify which standardized words to use? OWL (Web Ontology Language) is part of the semantic web effort to devise a consistent way of naming individual pieces of information. It couples with RDF (Resource Description Framework), which contains hidden information about the nature of an overall document, to paint a more precise picture of a web page for search engines and other tools.

Although the theories are good, they have taken a long time to catch on, not least because it is so difficult to change the fundamental nature of a system as vast and varied as the web. Proponents point out that, if all the information on the web is consistently presented, individual pages and even pieces of information from within a page can be shared across pages (in other words, rather than linking to an outside page, information can be pulled into an existing page to create a new page that is a mix of many) and applications – databases, for instance, could be built simply by plucking data from multiple sources in real time. Detractors point out that such a requirement goes against the nature of the web by requiring severe rigidity in the ways the pages are structured, and that such rigidity could possibly inhibit new developments. They also express unease at the idea that it would make the web easier to index not only by beneficial entities (Google), but also by hostile ones (government censors).

It’s an extremely interesting discussion with implications and applications for several branches of informatics (text mining, information retrieval, and decision support…among others), as well as for the very nature of the web itself.
Googling for a diagnosis—use of Google as a diagnostic aid: internet based study
http://www.bmj.com/cgi/content/full/333/7579/1143

Editorial
http://www.bmj.com/cgi/content/full/333/7579/1131

Semantic Web
http://www.w3.org/2001/sw/
http://www.semanticweb.org/

[ links suggested by Sarah Lopez ]

Share