Clinical NLP at 2012 NAACL Human Language Technology Conference

I livetweeted presentations about clinical and biomedical natural language processing and computational linguistics at last week’s meeting of the North American Chapter of the Association for Computational Linguistics (NAACL, rhymes with “tackle”): Human Language Technologies, in Montreal. This blog post embeds those tweets and adds lots of editorial and tutorial material (“Editorial”? Wait. How about “Editutorial”? I think I just coined a term!).

My goal was threefold:

  • Leverage social media content I went to some effort to create (the tweets).
  • Summarize current state-of-the-art clinical NLP research and directions.
  • Make it understandable to readers who are not computational linguists or NLP engineers.

By the way, what’s the difference between clinical NLP and biomedical NLP? Take, for example, the alarmingly, but misleading, headline “Will Watson Kill Your Career?” It has a great quote (my emphases):

“New information is added to Watson in several different ways. On the patient side, the healthcare provider adds their electronic records to the system. On the evidence side, it happens by accessing the latest medical sources such as journal articles or Web-based data and clinical trials.”

Clinical NLP? The patient side. Biomedical NLP? The evidence side. The former is free text about specific patients, such as is found in transcription systems and electronic health records. The latter is free text about biological theories and medical research. You’ll observe this division of classification of tweeted papers below.

To date, most NLP research and application focused on the biomedical evidence side. As NLP becomes more practical and as electronic free text about (and by!) patients explodes, computational linguistics (the theory) and natural language processing (the engineering) inevitably will shift toward the patient side. We’ll need to combine both kinds of human language technology–patient and evidence–to create a virtuous cycle. Mine patient data to create and test theories that will, in turn, come back to the point of care to improve patient care.

And, given my blog’s EHR-and-workflow brand, just think of all the complicated and interesting healthcare workflow issues! :-)

By the way, sometimes I’ll editorialize, or explain ideas and terminology. What the speaker presented and my elaboration should be clear from the context. I’ll try to signal lengthy tangential discussions of my own thoughts about the subject at hand. Where they exist, I’ll provide links to related work. Where illuminating, I’ll quote an abstract or relevant paragraph or two. I tend not to name names (unless I follow them on Twitter, in which case I’ll often provide that link). I’d rather talk about ideas. I provide links to each paper associated with each presentation, where you can find the who and the where.

This blog post ended up being a lot longer than I intended or planned. But, the more I dug the more I found, and the more I found the more I dug. Current clinical NLP research and developments reflect a remarkable amount of accumulated knowledge, tools, community, ambition, momentum and, ideally, critical mass.

I expect great things. I hope the following conveys this excitement.

Before I livetweet I like to forewarn and apologize to any folks following me who are not interested in whatever I’m about to flood their tweetstream.

A photo to set the stage, so to speak! Note the highest paper review scores ever and trends toward diversity of clinical and biological topics.

You can see my format: “Listening to…” then title then link to actual paper. By the way, is my custom URL shortener and tweet archiver. Since tweets are limited to 140 characters in length, long URLs need to converted to short URLs. Otherwise the entire tweet could be taken up by just an URL! And that’s no good.

I really liked this talk. The authors applied machine translation technology, the same techniques that allow you to read web pages in foreign languages, to classify transcriptions of patients retelling simple stories.

You can think of retelling a story as similar to translating a story from its original words into the patient’s own words. The bigger the difference, the larger the translation distance, and, possibly, the larger the cognitive impairment.

They showed that their automatic scoring system was usually as good as human judged comparisons of original to retold stories.

The reason I liked this presentation so much is that it’s different. It’s not about extracting knowledge from genetics research papers or combing patient records for symptoms and diagnoses (those are impressive, but they are more plentiful). It’s about measuring something about patient-uttered language and helping diagnose potential medical problems. It’s using computational linguistics to create a virtual medical instrument, just as a stethoscope or X-ray machine is a medical instrument, to better see (or hear) potential medical conditions.

Here “Listening to…” precedes the title of a paper about constructing ontologies from medical research text using a system called NELL, for the Never Ending Language Learner. I presume there’s some, possibly implicit, connection to the similarly titled movie with learning of language a central theme. (Of course, there’s also the Never Ending Story…)

The combined ontologies were:

  • Gene Ontology, describing gene attributes
  • NCBI Taxonomy for model organisms
  • Chemical Entities of Biological Interest, small chemical compounds
  • Sequence Ontology, describing biological sequences
  • Cell Type Ontology
  • Human Disease Ontology

See next tweet.

From the paper:

“BioNELL shows 51% increase over NELL in the precision of a learned lexicon of chemical compounds, and 45% increase for a category of gene names. Importantly, when BioNELL and NELL learn lexicons of similar size, BioNELL’s lexicons have both higher precision and recall.” (p. 18)

This time NLP is used to mine textual data about adverse drug events. Same or similar events can be described in different ways. To get a handle on the total number or frequency of different kinds of drug reactions we need to lump together similar events. From the paper’s intro (my emphases):

“When an automatic system is able to identify that different linguistic expressions convey the same or similar meanings, this is a positive point for several applications. For instance, when documents referring to muscle pain or cephalgia are searched, information retrieval system can also take advantage of the synonyms, like muscle ache or headache, to return more relevant documents and in this way to increase the recall. This is also a great advantage for systems designed for instance for text mining, terminology structuring and alignment, or for more specific tasks such as pharmacovigilance….”

”…if this semantic information is available, reports from the phramacovigilance databanks and mentionning similar adverse events can be aggregated: the safety signal is intensified and the safety regulation process is improved.”

Back to clinical NLP. Understanding the order of patient clinical events is crucial to reasoning about diagnosis and management. I also like this paper a lot. In fact, folks at #NAACL2012 must have too, since the presenter got to present twice on the same topic, once to the main conference and once in the BioNLP workshop.

Extracting this information from EHR free text is complicated by the fact that events are not mentioned in the same order that they originally occurred and due to inconsistencies.

Since it is difficult or maybe impossible to determine the exact date upon which an event happened, we need fuzzier labels. Helpfully, most events are described relative to an admission date included at the top of clinical notes.

Next tweet…

(To repeat my tweet:)

The fuzzy categories used were:

  • “way before admission,
  • before admission,
  • on admission,
  • after admission,
  • after discharge”

I found this paper particularly interesting because I’ve written about process mining, which builds flowcharts from event data in EHR logs. These flowcharts represent typical temporal ordering of events. I think I’ve actually seen a paper about process mining applied to “timestamps” gleaned from clinical free text…(note to self, look and put link here; found this, but not it). I suspect that future uses of process mining in healthcare will combine fine grained event log data from EHRs with coarse grained time-bin-like data from free text.

Another temporal reasoning from clinical free text paper. I think this topic is especially important because it’s relevant to combing patient records to construct typical patient scenarios. Such information could be invaluable to creating care pathways and guidelines. These, in turn, are relevant to the EMR workflow management systems and EHR business process management suites I write about in this blog.

A medical condition’s assertion status refers to:

  • Negation Resolution (”There is no evidence for…”)
  • Temporal Grounding (current or historical condition)
  • Condition Attribution (experienced by the patient or someone else, such as family member)

I found some interesting related material:

A knowledge-based approach to medical records retrieval (Demner-Fushman et al., 2011)

ConText: An Algorithm for Identifying Contextual Features from Clinical Text (Chapman et al. 2007) (5M: 11th of 33 papers)

(plus found this) A Review of Negation in Clinical Texts

When livetweeting conferences, I like to signal the beginning and ending of tweetable events. If someone follows in realtime, they may decide to have a coffee break too, and not worry about missing anything. This may sound a little odd, but I know this is true, because I frequently listen to tweets at conferences I wish I was attending. Things “kicking off,” breaks for lunch, etc. provide mundane but useful local color.

By the way, I met @McCogley at the main conference during a “tweetup” I organized. Tweetups are gatherings of people on Twitter who use Twitter to suggest a time and place to rendezvous and talk about, well, Twitter of course, but also anything else in common (after all, we presumably share an interest in computational linguistics and natural language processing).

By the (second) way , while none of these papers that I’m livetweeting are about Twitter, there were some such at NAACL. Whenever I see a paper about Twitter, I try to look up the authors on Twitter, to see if they are active and to follow them. At #NAACL2012 there were six papers/posters about analyzing tweets. There were 14 co-authors. I could identify seven Twitter accounts (searching on Twitter and Google, confirming if name and affiliation matched, also taking into account NLP content in profiles, tweets, followers or followers). Three accounts actively tweet. Of course folks may have accounts I didn’t find. And you don’t need a tweet account to read someone’s tweets. But still.

In this tweet I’m corrected the earlier incorrect #NAACL12 hashtag and indicated the coffee break is over.

The key note is starting, so it’s worth setting the stage again, with a photo. The title makes me think it will be very interesting. What’s an “NLP Ecosystem”? Guess I’m in the right place to find out. By the way, there’s apparently no paper to which to link.

I found similar slides from a presentation given at The Second International Workshop on Web Science and Information Exchange in the Medical Web. I consulted these slides several times below to elaborate what might be otherwise somewhat cryptic tweets (expanding acronyms, provide additional context).

OK! Let’s start out with what almost seems like the proverbial elephant in the room. If clinical natural language processing is so great, has so much potential, why hasn’t it realized more of that potential by now? Reminds me of the old rejoinder, if you’re so smart, why aren’t you rich? That rejoinder really applies, too. (Mind you, the following is me ruminating.) If computational linguistics and natural language processing applied to medical data and processes is such a smart thing to do, why aren’t people doing so and making a lot of money at that?

Well, as a matter of fact, speech recognition-based dictation and NLP-based coding are burgeoning, though still nascent, industries. (This is still me ruminating.) But, relative to the potential promise of clinical NLP, the question “Why has clinical NLP had so little impact on clinical care,” is spot on.

“Sharing data is difficult” (quoting my tweet)

We need a bit of an explanation here. Modern computational linguistics is not your grandfather’s computational linguistics. In the old days (when I got my virtual graduate degree in computational linguistics) natural language processing system were created by consulting linguistic theories and/or artificial intelligence researchers and writing programs to represent, process, and reason about natural language.

Today, natural language process systems are created by “training” them on large amounts of “annotated” text.

What’s annotated text?

From Natural Language Annotation for Machine Learning (my links and emphases):

“It is not enough to simply provide a computer with a large amount of data and expect it to learn to speak–the data has to be prepared in such a way that the computer can more easily find patterns and inferences. This is usually done by adding relevant metadata to a dataset. Any metadata tag used to mark up elements of the dataset is called an annotation over the input. However, in order for the algorithms to learn efficiently and effectively, the annotation done on the data must be accurate, and relevant to the task the machine is being asked to perform. For this reason, the discipline of language annotation is a critical link in developing intelligent human language technologies.”

What’s training on annotated text?

“Machine learning is the name given to the area of Artificial Intelligence concerned with the development of algorithms which learn or improve their performance from experience or previous encounters with data. They are said to learn (or generate) a function that maps a particular input data to the desired output. For our purposes, the “data” that a machine learning (ML) algorithm encounters is natural language, most often in the form of text, and typically annotated with tags that highlight the specific features that are relevant to the learning task. As we will see, the annotation schemas discussed above, for example, provide rich starting points as the input data source for the machine learning process (the training phase).

When working with annotated datasets in natural language processing, there are typically three major types of machine learning algorithms that are used:

  • Supervised learning - Any technique that generates a function mapping from inputs to a fixed set of labels (the desired output). The labels are typically metadata tags provided by humans who annotate the corpus for training purposes.
  • Unsupervised learning - Any technique that tries to find structure from an input set of unlabeled data.
  • Semi-supervised learning - Any technique that generates a function mapping from inputs of both labeled data and unlabeled data; a combination of both supervised and unsupervised learning.”

“Treebanks” are databases of free text marked up as syntax trees. Here’s an example of treebank annotation guidelines for biomedical text: grueling!
Sharing databases of free text marked up in a standard way has been incredibly important to the progress of natural language technology to date. By sharing annotated text computational linguists don’t have to reinvent the wheel and can focus on creating better NLP machine learning techniques. Shared databases of annotated free text also play an important role in comparing these techniques. Contests between centers of NLP excellence to see who can do better and then to learn from each other’s successes would not be possible without shared annotated natural language text.

Which gets us to the number one reason (in my opinion, and other’s as well) that computational linguistics on the “patient side” has not taken off. Privacy and security concerns such as those encoded into law by HIPAA make it difficult to share, and learn from (in both the computerized and professional senses of “learn), annotated free text.

Lack of annotation standards contribute too. But, even the creation of desirable annotation standards runs afoul of privacy concerns. After all, how do you create data standards without data?

OK, back to the presentation.

All of this (the above described issues), and more, lead to a perception that clinical NLP is too expensive. You need to employ a Ph.D. in computational linguistics to reinvent the wheel for each healthcare organization.

Which brings us to: is there any way to use technology to reduce the expense of NLP?

iDash is one such initiative. You can read more about it at the link in the tweet. But there is also a brief communication in JAMIA: iDASH: integrating data for analysis, anonymization, and sharing. Here’s the abstract (my emphasis):

iDASH (integrating data for analysis, anonymization, and sharing) is the newest National Center for Biomedical Computing funded by the NIH. It focuses on algorithms and tools for sharing data in a privacy-preserving manner. Foundational privacy technology research performed within iDASH is coupled with innovative engineering for collaborative tool development and data-sharing capabilities in a private Health Insurance Portability and Accountability Act (HIPAA)-certified cloud. Driving Biological Projects, which span different biological levels (from molecules to individuals to populations) and focus on various health conditions, help guide research and development within this Center. Furthermore, training and dissemination efforts connect the Center with its stakeholders and educate data owners and data consumers on how to share and use clinical and biological data. Through these various mechanisms, iDASH implements its goal of providing biomedical and behavioral researchers with access to data, software, and a high-performance computing environment, thus enabling them to generate and test new hypotheses.

One way to create sharable clinical text is to de-identify it. This means to remove any material from the text that explicitly identifies a patient. By the way, de-identification is not the same as anonymize (a point made by a later presenter, who has written a review of de-identification of clinical free text). The latter means to make it impossible for anyone to figure out who the patient was. De-identification (in at least some opinions) does not go that far.

NLP systems typically rely on complicated “pipelines” starting from the original free text and passing through further stages of processing (see below for examples). Setting up the individual software systems for the individual pipeline steps and then connecting them all up to work together correct is difficult and expensive. So, why not put them all together in a virtual machine, which folks and download and use almost immediately (after just a bit of configuration to handle their specific needs).

You can think of an NLP system for free text in an EHR as a set of NLP subsystems among which information must flow correctly in order for the entire NLP system to work correctly. Just as the human body has organs, such as heart, lungs, and kidneys, each of which has a specialized purpose and all must work together, NLP systems have subsystems, NLP organs, to continue the medical analogy.

There are NLP modules that determine where words begin and end. There are modules that find where sentences begin and end. Others locate beginnings and endings paragraphs or clinical subsections. There are entity recognizers and relations between entities recognizers. There are event recognizers and text entailers (recognizing/inferring what text implies, even if not stated explicitly). All of these subsystems must work together via workflows between them. The order of this workflow is sometimes referred to as a “pipeline”.


Above is an example of NLP “pipeline” from the recent NIH workshop on
“Natural Language Processing: State of the Art, Future Directions and
Applications for Enhancing Clinical Decision-Making.” One of the problems with NLP pipelines is that mistakes made in earlier modules and steps tend to propagate forward causing a lot of problems. If you can’t figure out whether a string is a noun (“Part-of-Speech Tagging” in the diagram), you’ll not likely be able to recognize what real world entity to which it refers (“Named Entity Recognition”). And if you can’t do that, well…

For example, entities such as patients, conditions, drugs, etc. must be recognized. Co-reference must be detected: This “heart attack” and that “MI” refer to the same clinical entity. Three-year old “John Smith” and his 73-year old grandfather “John Smith” do not refer to the same entity. In fact, the latter JS2 has a grandfather relation to the former JS1: grandfather(JS2, JS1)). And the fact that JS2 is the grandfather of JS1 implies (entails) that JS1 is the grandson of JS2.

We could keep going, from words to sentences to semantics on to pragmatics and discourse. I think this is exactly where clinical NLP needs to go if EHRs are to become truly useful but unobtrusive helpmates to their users. I touch on this at the end of this blog post, in an epilogue. However, practically speaking, due to the nature of the traditional NLP pipeline, earlier stages of processes need to be mastered before later stages. (Hmm. Ontology recapitulates phylogeny?)

I’m starting to ramble, so back to the presentation!

eHost is described in the poster paper (my emphases) A Prototype Tool Set to Support Machine-Assisted Annotation. eHost stands for Extensible Human Oracle Suite of Tools.

Manually annotating clinical document corpora to generate reference standards for Natural Language Processing (NLP) systems or Machine Learning (ML) is a time-consuming and labor-intensive endeavor. Although a variety of open source annotation tools currently exist, there is a clear opportunity to develop new tools and assess functionalities that introduce efficiencies into the process of generating reference standards. These features include: management of document corpora and batch assignment, integration of machine-assisted verification functions, semi-automated curation of annotated information, and support of machine-assisted preannotation. The goals of reducing annotator workload and improving the quality of reference standards are important considerations for development of new tools. An infrastructure is also needed that will support large-scale but secure annotation of sensitive clinical data as well as crowdsourcing which has proven successful for a variety of annotation tasks. We introduce the Extensible Human Oracle Suite of Tools (eHOST) that provides such functionalities that when coupled with server integration offer an end-to-end solution to carry out small or large scale as well as crowd sourced annotation projects.

I’m impressed with the goals of eHOST and the described vehicle for achieving them.

TextVect is described in more detail [I’ve inserted the bracketed material]:

TextVect is a tool for extracting features from textual documents. It allows for segmentation of documents into paragraphs, sentences, entities, or tokens and extraction of lexical, syntactic, and semantic features for each of these segments. These features are useful for various machine-learning tasks such as text classification, assertion classification, and relation identification, TextVect enables users to access these features without installation of the many necessary text processing and NLP tools.

Use of this tool involves three stages as shown in Fig below: segmentation, feature selection, and classification. First, the user specifies the segment of text for which to generate the features: document, paragraph or section, utterance, or entity/snippet. Second, the user selects the types of features to extract from the specified text segment. Third, the user can download the vector of features for training a classifier. Currently, TextVect extracts the following features:

  • unigrams and bigrams [One and two-word sequences. An n-gram of size 1 is referred to as a "unigram"; size 2 is a "bigram" (or, less commonly, a "digram"); size 3 is a "trigram”. Wikipedia]
  • POS tags [POS stands for Part-of-Speech: noun, verb, adjective]
  • UMLS concepts [Unified Medical Language System: “key terminology, classification and coding standards, and associated resources to promote creation of more effective and interoperable biomedical information systems and services, including EHRs"]

There’s also an IE tool (IE means Information Extraction, not Internet Explorer). I wasn’t able to find a link to KOS-IE cross indexed to iDash, but KOS likely means Knowledge Organization System, such as whose purposes are described here (though not in a BioNLP context):

  • “translation of the natural language of authors, indexers, and users into a vocabulary that can be used for indexing and retrieval
  • ensuring consistency through uniformity in term format and in the assignment of terms
  • indicating semantic relationships among terms
  • supporting browsing by providing consistent and clear hierarchies in a navigation system
    supporting retrieval”

I found the following paper about cKASS:

Using cKASS to facilitate knowledge authoring and sharing for syndromic surveillance

The introduction (no abstract, my emphasis):

Mining text for real-time syndromic surveillance usually requires a comprehensive knowledge base (KB), which contains detailed information about concepts relevant to the domain, such as disease names, symptoms, drugs and radiology findings. Two such resources are the Biocaster Ontology (1) and the Extended Syndromic Surveillance Ontology (ESSO) (2). However, both these resources are difficult to manipulate, customize, reuse and extend without knowledge of ontology development environments (like Protege)and Semantic Web standards(like RDF and OWL). The cKASS software tool provides an easy-to-use, adaptable environment for extending and modifying existing syndrome definitions via a web-based Graphical User Interface, which does not require knowledge of complex, ontology-editing environments or semantic web standards. Further, cKASS allows for–indeed encourages–the sharing of user-defined syndrome definitions, with collaborative features that will enhance the ability of the surveillance community to quickly generate new definitions in response to emerging threats.

I found a description of Common Evaluation Workbench Requirements and this description of its Background and Business Case (brackets and emphasis are mine):

“Many medical natural language extraction systems can extract, classify, and encode clinical information. In moving from development to use, we want to ensure that we have our finger on the pulse of our system’s performance and that we have an efficient process in place for making outcome-driven changes to the system and measuring whether those changes contributed to the desired improvements.

Our goal is to develop a quality improvement cycle and a tool to support that cycle of assessing, improving, and tracking performance of an [Information Extraction] system’s output.

The workbench should

  1. be compatible with any NLP system that is willing to generate output in a standard output
  2. compare two annotation sets against each other
  3. produce standard evaluation metrics
  4. interface with at least one manual annotation tool
  5. allow exploration of the annotations by drilling down from the confusion matrix to reports to individual annotations and their attributes and relationships
  6. provide a mechanism for categorizing errors by error type
  7. provide the option to track performance over time for a system or annotator
  8. allows user to record types of changes made between versions of annotated input so that changes in performance over time are linked to specific changes in guidelines (in the case of human annotations) or changes in the system (in the case of automated annotations)”

See next tweet…

I found the next section especially interesting. It deals with workflow and usability! I usually focus on EHR workflow. That’s sort of this blog’s brand. My Twitter account is even @EHRworkflow. I’ve a degree in Industrial Engineering where I studied workflow (and stochastic processes, dynamic programming, and mathematical optimization, all relevant to modern computational linguistics, but that’s another blog post).

So, just look at the above tweet! Tasks, workflow, cognitive load… Even 60-page annotation guidelines ring a bell (for example, I’ve written about the 200 pages about EHR workflow, in one EHR user manual, instructing EHR users about what to click, what to click on, what to click on…etc. )

Here’s an interesting notion. Apparently there’s been some success getting lots of people to annotate free text over the Web. Might it work for medical record text?

Obviously there are, again, HIPAA-related issues. But perhaps these can be dealt with.

See next tweet…

For information at

Annotation Admin is a web-based service for managing annotation projects. The tool allows you to

  1. create an annotation schema comprised of entities, attributes, and relationships
  2. create user profiles for annotators
  3. assign annotators to annotation tasks
  4. define annotation tasks
  5. determine batch size
  6. sync with annotation tool (currently syncs with eHOST) to send schema and batches to valid annotators and to collect the annotations when finished
  7. keep track of progress of annotation project

The following is my reaction to the above, in light of this blog’s theme: workflow automation in healthcare:

For EHRs to fulfill their full potential they will need to not just interoperate with a wide variety of other systems, but manage those interactions as well. Some of those key interactions will be with descendants of the kinds of NLP tools being discussed here. Workflow engines and process definitions and process mining will be the glue to connect it all up to operate efficiently, effectively and satisfactorily for EHR and NLP system users.

Back to the presentation…

If you want to learn more, there’s a free workshop this fall at the link in the tweet.

This was a good conclusion slide so I tweeted it.

In light of “More demand for EHR data”

“NLP has potential to extend value of narrative clinical reports”

Key developments include

  • Common annotation conventions (annotations needed to learn)
  • Privacy algorithms (to de-identify, anonymize)
  • Shared datasets (to compare and improve NLP systems)
  • Hosted environments (deliver tools, manage workflows, etc.)

Two of the best questions were about meaningful use and capturing data about users in improve usability.

Stages of meaningful use will likely drive demand for clinical NLP. There’s too much relevant information locked in EHR free text.

(Slight tangent ahead…)

Now, if you’ve read a few of my blog posts or follow me on Twitter you’ll know I am not a fan of keyboards (one way to create free text). Contrary to complaints about traditional EHR user interfaces, with all those dropdown menus and tiny little checkboxes, a well-designed (read, well-designed workflow) EHR point-and-click oriented user interface can outperform alternatives for routine patient care. If you’re a pediatrician, which is faster? Saying “otitis media” and then making sure it was recognized correctly? Or just touching a big button labeled “otitis media”, which is, as the military say, fire-and-forget.

However, EHRs really need to be multimodal, accepting input in whatever form users prefer. If you like to type (yuck) by all means have at it. Or, as automated speech recognition gets better and better, dictate. As non-routine phrases becomes routine, I still see both styles of data entry giving way to quicker (and not-requiring-post-editing) fire-and-forget of canned text strings (increasingly mapped to standardized codes).

In the meantime, there is an ocean of legacy free text to sail and we need means to traverse it. So, yes, meaningful use will drive demand for clinical NLP.

What about “instrumenting” annotation software to capture information about user behavior and use machine learning to improve usability (how I interpreted the question). Since I’ve suggested doing similar for EHRs, I might as well extend the suggestion to annotation software as well. One approach is to use process mining to learn process maps from time stamped user event data. Cool stuff, that. Can’t wait.

Here’s another paper about using natural language processing of biomedical text to match abbreviations (such as MAR) with their expanded definitions (Mixed Anti-globulin Reaction test/not text!). Well, it’s by the same authors of the first paper on this topic.




Break for lunch, spelling #NAACL21012 correctly this time!

Interesting work comparing English vs Swedish taxonomies of certainty and negation in clinical text.

“[A]nnotators for the English data set assigned values to
four attributes for the instance of pneumonia:

  • Existence(yes, no): whether the disorder was ever present
  • AspectualPhase(initiation, continuation, culmination, unmarked): the stage of the disorder in its progression
  • Certainty(low, moderate, high, unmarked): amount of certainty
    expressed about whether the disorder exists
  • MentalState(yes, no): whether an outward thought or feeling
    about the disorder’s existence is mentioned

In the Swedish schema, annotators assigned values to two attributes:

  • Polarity(positive, negative): whether a disorder mention is in the
    positive or negative polarity, i.e., affirmed (positive) or negated
  • Certainty(possibly, probably, certainly): gradation of certainty
    for a disorder mention, to be assigned with a polarity value.

The best thing about comparing English vs Swedish taxonomies of uncertainty and negation? Getting to go to Sweden to do so! Jag är väldigt säker på! (I am very certain! By the way, these were my thoughts, though enthusiasm for visiting Sweden was indeed expressed.)

Back to “Listening to…”

Automatic de-identification of textual documents in the electronic health record: a review of recent research

Multiple methods were combined and compared to alternatives.

Described hybrid approach had did very well. Most of the strings it said were patient names were indeed patient names (precision) and most of the patient names in the corpus were detected (recall).

”Coreference resolution is the task of determining linguistic expressions that refer to the same real-world entity in natural language” (See later tweet)

”Active Learning (AL) is a popular approach to selecting unlabeled data for annotation (Settles, 2010) that can potentially lead to drastic reductions in the amount of annotation that is necessary for training an accurate statistical classifier.”

As noted, behind a paywall, which is too bad! But, here’s the abstract:

“Coreference resolution is the task of determining linguistic expressions that refer to the same real-world entity in natural language. Research on coreference resolution in the general English domain dates back to 1960s and 1970s. However, research on coreference resolution in the clinical free text has not seen major development. The recent US government initiatives that promote the use of electronic health records (EHRs) provide opportunities to mine patient notes as more and more health care institutions adopt EHR. Our goal was to review recent advances in general purpose coreference resolution to lay the foundation for methodologies in the clinical domain, facilitated by the availability of a shared lexical resource of gold standard coreference annotations, the Ontology Development and Information Extraction (ODIE) corpus.”

The penultimate paper is up! I didn’t prepend a “Listening to…” because the paper title was so long and I couldn’t figure out how to shorten it without changing its meaning: computational linguists ought to study this, after getting a Twitter account :-))

Protein interactions!

Suppose you wanted to look up all the papers that mention a specific disease? Could you do so? Not unless alternative ways to refer to the same disease are somehow aggregated (recall aggregating adverse drug events?).

Back to the problem of annotation here. This time, since the data is in medical research text, there are no HIPAA issues.

I’ve livetweeted events before. Each time I do it a bit differently. The first time I didn’t even have a smartphone Twitter client, so I texted to a special number. Since then I livetweeted a health IT conference and a government workshop. Only recently did I realize I could embed tweets from Twitter as I did above (No more copy and paste. Now, Twitter, don’t go away!). There are advantages and disadvantages. It’s a lot of work. I only do it when I am really interested in the subject. It keeps me focused during presentations so that I don’t miss the perfect slide bullet or speaker quote to summarize the entire presentation. What’s really fun, though, what I don’t show above, are all the retweets of my tweets by others, as well as others finding the #NAACL2012 hashtag and chipping with their own, often funny, comments.

I walked out of the #NAACL2012 BioNLP workshop and this is what I saw!


Computational linguistics and natural language processing (the former the theory and the latter the engineering) are about to transform healthcare. At least some people think so. There’s certainly a lot of buzz in health IT traditional and social media about medical speech recognition and clinical language understanding.

Coverage can be pretty superficial. Watson will, or won’t, replace clinicians. Siri will, or won’t, replace traditional EHR user interfaces. It comes with the territory. CL and NLP are full of dauntingly abstract concepts and complicated statistical mathematics. However, there is an idea, among philosophers, that science is really just common sense formalized. If so, maybe the science of CL/NLP can be “re-common-sense-ized”, at least for the purpose of looking under the hood of what makes these clever language machines possible.

Looking further ahead, where I’d really like to see clinical NLP go, is toward conversational EHRs. A bit like Siri, or at least the way Siri is portrayed in ads, only a lot more so. To get there EHRs will need to become intelligent systems, not just converting compressions and refractions of air molecules into transcribed tokens to be passed on to pipelines and become ICD-9 or -10 codes. They will need to “understand” the ebb and flow of medical workflow and, like the hyper-competent operating room nurse, do the right thing at the right time with the right person for the right reason, without having to be explicitly told or triggered to do. This is where this blog’s brand, EHR+plus+workflow comes together with thinking and language.

Thanks for reading! I learned a lot!

This entry was posted in natural-language-processing. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.


  1. Wendy Chapman
    Posted July 8, 2012 at 3:44 am | Permalink

    Chuck, I just read your blog of live tweeting from BioNLP in Montreal and thoroughly enjoyed it. Thanks for your summaries and all of the supplementary material between them. BTW, I am in Stockholm now - we are working on translating the terms in ConText to Swedish and evaluating its performance on Swedish clinical texts. Very fun! I hope to see you at the iDASH/ShARe Annotation Workshop Sept 29th.

    Wendy Chapman

  2. chuckwebster
    Posted July 8, 2012 at 2:06 pm | Permalink


    Thank you so much for your comments! It was a lot of fun and I was impressed with the progress of computational linguistics and natural language processing in medicine and healthcare.

    If I can make it, I’ll be there!


Post a Comment

Your email is never published nor shared. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

You can add images to your comment by clicking here.