I Interview New M*Modal Website(!) on Future of Language and Workflow in Healthcare

Short (well, memorable!) Link: http://ehr.bz/mmodalinterview

Last week, during #HIMSS13, I tweeted out individual questions and answers from the following interview with [wait for it!] the new M*Modal website at MModal.com. Here is the combined interview. I’ve included the original tweets so you can retweet answers to individual questions….

I’m going to try something a little different this week. I’ll talk to a website! I usually submit geeky questions about workflow or language technology to an industry expert, then top it off with a One-Minute Interview (on YouTube) embedded in the resulting blog post.

I recently interviewed M*Modal’s Chief Scientist Juergen Fritsch, Ph.D. Like any good interview, it left me wanting more. As smart as Juergen is, using a whole-is-greater-than-the-parts logic, M*Modal must be even smarter than he is. But I can’t interview 12,000 people. So I decided to have a conversation with MModal.com, M*Modal’s new website.

Live Thumbnail

By the way, I’m aware and concerned about walking the fine line between education and marketing (and have written about it). I am not endorsing any M*Modal product or service. However, I’ve written hundreds of thousands of words (and 15,000 tweets!) about workflow tech in healthcare. I look for confirmation wherever I can find it. :) I certainly endorse combination of workflow technology and language technology to help make EHRs and health IT systems more usable and useful. M*Modal is a leader in this area and I appreciate their cooperation to increase public understanding of both workflow tech and language tech opportunities.

My interviews? They’re more like conversations in which I talk almost as much the person (or, in this case, website) I’m talking to. I’ll mention earlier blog posts, quote from Wikipedia, even textbooks.  I eventually do get to the point. I don’t think my interviewees mind. I’m not like Larry King, who reputedly never read the books before he interviewed their authors (”So, what’s your book about?”) I’m more like Charlie Lamb on CSPAN (”On page 582 you write, [Charlie reads a couple paragraphs]. What did you mean by that?”)

So, MModal.com, thank you for agreeing to this interview. Silence. Hmm.

I searched MModal.com for “workflow”. I got 165 hits. I looked at each instance and context. Using the most interesting material I created 10 “answers.” Then I wrote the questions.

Let’s try again…

1. MModal.com, in a nutshell, in words people who aren’t rocket scientists or computational linguists can understand, what problem are you trying to solve?

“Physicians are natural storytellers. They prefer to document the complete patient story by simply speaking and naturally capturing the full narrative. With Electronic Health Records (EHR) it’s not that simple. Clinicians have to change their behavior and use point-and-click into various templates that just can’t tell the whole story. Using EHRs, collaboration remains difficult, prone to errors and incomplete. Speech-based narrative documentation is workflow-friendly and permits the whole story to be told, and easily and more completely passed along, creating a much more collaborative sharing of intelligence from doctor to doctor.”

Nicely put! EHR usability is a big issue these days. “Clickorrhea” does seem part of the problem. Got it.

2. From your unique perspective, what is the connection between language tech and workflow tech?

“This is an absolute dead-on question, I’m so happy you asked it. The important connection is if we would just do speech-to-text transcription we wouldn’t affect anything. We’d just be creating a piece of text, without being able to drive actions. Ultimately we want to drive that action in the workflow – for example, have a physician create that order for a new medication. We want to make sure follow up happens and facilitate the workflow that enables that process from beginning to end. Also, healthcare is all about collaboration among providers. There is a lot of patient handoff and effective coordination of care doesn’t happen nearly as much as it should, and it only happens if proper workflow processes are in place. If we’re not trying to get involved in that process and drive more effective workflow processes, we’re not being successful in affecting change.”

(You’re right, MModal.com doesn’t have “This is an absolute dead-on question, I’m so happy you asked it.” on it anywhere. That would be a remarkable feat of dynamic natural language generation and extrasensory perception now, wouldn’t it! It’s Juergen’s answer to question 8 in that recent interview. However, the interview is noted on M*Modal.com, with a link to the full interview on my blog.)

3. I’m especially interested in how workflow technology, combined with language technology, can improve efficiency and user experience. Could you expand a bit on those themes?

(”Certainly” I faintly hear.)

“By extracting, aggregating, analyzing and presenting clinical information based on business intelligence, M*Modal imaging solutions make sure that the right information is available at the right time for game-changing workflow management. Based on semantic understanding, M*Modal technology dynamically reacts to what is said and what is known from priors to automatically initiate a unique, information-driven, situationally-appropriate workflow. This content-based, real-time, corrective and pre-emptive physician feedback and decision support not only enhance efficiency and user experience, but also support downstream processes like compliance, coding and quality reporting.”

Wow! Now this is a lot more technical! However, I wrote a paper a couple years about about using event processing and workflow engines to improve “EHR Productivity.” Let me go back and reread that…. OK…. yes, I do think we are speaking of similar ideas. I wrote about use of structured EHR data to trigger EHR workflows, not unstructured free text, but similar idea. Let me tease this apart.

  • Speech recognition turns sounds into free text.
  • Natural language processing turns free text into structured data.
  • Semantic understanding figures out what is means and which workflows to trigger.
  • So, based on what the physician says, within moments after it’s said, asking for clarification if necessary, tasks are automatically queued, executed, tracked, etc.

Am I right?

And then the strangest thing happened. The MModal.com webpage refreshed and the following text appeared:

“In principle you’re right, except that this is not a sequential process where one technology works on the output of the previous one. Instead, we have tightly integrated speech recognition, natural language processing and semantic understanding in a way that they complement each other. For example, speech recognition accuracy is improved by leveraging some of the semantic understanding that would indicate that a physician is talking about patient problems, rather than patient medications. When you adequately combine all the technologies mentioned above, you get more out of it than just the sum of their individual capabilities.”

MModal.com has some seriously wicked tech to pull that off! Both natural language processing *and* natural language understanding, not to mention remote extrasensory perception!

4. I’m a visual kinda guy, at a high level, what does your language and workflow platform look like?


Thanks. Let’s unwind the workflow from the moment a physician says something to the moment it helps someone.

  • Real-Time Speech Recognition
  • Cloud
  • Automated transcription
  • Human post-editing?
  • Cloud
  • Natural language processing
  • Cloud
  • (Then, in parallel, no particular order)
    • Insert data into EHR
    • Submit codes to billing
    • Distribute management reports
    • Analyze data to improve effectiveness and efficiency

Speech Understanding, the small cloud on the upper right, is sort of a label for the entire cloud, including speech recognition, natural language understanding, and workflow orchestration, right?

How did I do?

“Very well!! And as I said in my previous comment, Speech Understanding represents a tight integration of various technologies, using them in a non-linear way.”

MModal.com is even diplomatic! That requires remarkable discourse processing technology.

5. It would be great it we could actually track a hypothetical phrase, from beginning to end, what NLP engineers call a “linguistic pipeline.”

“While we could provide an example of that, it would look fairly generic and like any other NLP pipeline you may have seen before. The core differentiator of M*Modal’s speech understanding technology is that we don’t run a sequential pipeline, but that we have feedback loops and non-linear interactions between the individual stages of speech recognition, NLP, etc.”

“[F]eedback loops and non-linear interactions”, yes I’ve read about this. Speech and language understanding is a complex mixture of data-driven, bottom-up processing and context-driven, top-down processing. (Just think if how many times you don’t actually “hear” what’s said, but know it nonetheless purely from context.)

6. About that sub-cloud labeled “Workflow Orchestration”… Are we talking “workflow orchestration” in the same sense it is used in the workflow automation and business process management community?

From Wikipedia:

Workflow engines may also be referred to as a Workflow Orchestration Engines.

“The workflow engines mainly have three functions:

    • Verification of the current status: Check whether the command is valid in executing a task.
    • Determine the authority of users: Check if the current user is permitted to execute the task.
    • Executing condition script: After passing the previous two steps, workflow engine begins to evaluate condition script in which two processes are carried out, if the condition is true, workflow engine execute the task, and if execution successfully complete, it returns the success, if not, it reports the error to trigger and roll back the change.

Workflow engine is the core technique for task allocation software application, such as BPM in which workflow engine allocates task to different executors with communicating data among participants. A workflow engine can execute any arbitrary sequence of steps. For example, a workflow engine can be used to execute a sequence of steps which compose a healthcare data analysis.”

Do you use a workflow engine? Could you describe what we discussed earlier in terms of this engine?

“We do use a workflow engine in various of our solutions. In the case of clinical documentation services, it is used to orchestrate the processing, proofreading and distribution of millions of clinical documents per year, involving tens of thousands of users. In the case of coding and clinical documentation improvement workflows, it is used to orchestrate intricate workflows involving a combination of technology and humans, with lots of different users with different roles.”

That’s fantastic! I think healthcare needs more true workflow technology, such as what you describe. I increasingly frequently prepend “Workflow engine sighting” to links I tweet from @EHRworkflow.

7. But I’d like to shift gears now, over to the computational linguistics and natural language processing side. Computational linguistics, the science behind the NLP engineering, includes conversation (discourse) and achieving goals (pragmatics), not just sounds, syntax, and semantics. Where do you see medical language technology going in this regard?

“Again, you hit it dead-on – in the past, people have ignored the pragmatics aspect. At M*Modal we have been focused on pragmatics since the very beginning. Where it’s all going is being able to understand the content of speech, using semantics and syntax to understand what people are really talking about. You are absolutely right that without pragmatics we’d never be able to accomplish what we’re trying to with NLP technology.”

8. I picked up a copy of Introduction to Pragmatics. It was a great review, since the last graduate course in pragmatics that I took was so ago. And I read it! (I’m planning a blog post about importance of pragmatics to EHR and HIT interoperability and usability.)

At the end of the book, in the summary, was this:

“Who could doubt that the world of artificial intelligence will soon bring us electronic devices with which we can hold a colloquial natural-language conversation? The problem, of course, is pragmatics. Not to slight the difficulties involved in teaching a computer to use syntax, morphology, phonology, and semantics sufficiently well to maintain a natural-sounding conversation, because these difficulties are indeed immense; but they may well be dwarfed by the difficulties inherent in teaching a computer to make inferences about the discourse model and intentions of a human interlocutor. For one thing, the computer not only needs to have a vast amount of information about the external world available (interpreting I’m cold to mean “close the window” requires knowing that air can be cold, that air comes in through open windows, that cold air can cause people to feel cold, etc.), but also must have a way of inferring how much of that knowledge is shared with its interlocutor.”


“Thus, the computer needs, on the one hand, an encyclopedic amount of world knowledge, and on the other hand, some way of calculating which portions of that knowledge are likely to be shared and which cannot be assumed to be shared – as well as an assumption (which speakers take for granted) that I will similarly have some knowledge that it doesn’t. Beyond all this, it needs rules of inference that will allow it to take what has occurred in the discourse thus far, a certain amount of world knowledge, and its beliefs about how much of that world knowledge we share, and calculate the most likely interpretation for what I have uttered, as well as to construct its own utterances with some reasonable assumptions about how my own inferencing processes are likely to operate and what I will most likely have understood it to have intended. These processes are the subject of pragmatics research.”

In his recent interview, Juergen said “At M*Modal we have been focused on pragmatics since the very beginning”. Could you expand on his comments?

You would be justified to suspect that the answer to this question is not to be found on MModal.com. However, Wikipedia says “Pragmatics is a subfield of linguistics which studies the ways in which context contributes to meaning.”

“Context” occurs 48 times on MModal.com. For example:

  • “Healthcare Challenges and Context-Enabled Speech
  • “the real context and meaning behind a physician’s observations”
  • Context-specific patient information — from prior reports, EHRs, RIS, PACS, lab values, pathology reports, etc”
  • “providing real understanding of context and meaning in the narrative – not simply term matching or tagging”
  • “combine […workflow management…] with Natural Language Understanding to bring context to text
  • “Enabling physicians to populate the EHRs with color, context and reasoning without changing their established workflow
  • context-aware content that is codified to standardized medical lexicons, such as. SNOMED®-CT, ICD, RadLex®, LOINC, and others”

I love the connection between context and workflow. I’ve written about that too. But my point here is: if pragmatics is about context and M*Modal is about context then M*Modal is about pragmatics too. I won’t go any further into the subject of the importance of pragmatics to healthcare workflow. I’m planning a future blog post about import of discourse, reference, speech acts, implicature, intent, inference, relevance, etc. to EHR interoperability and usability.

In our interview, when Juergen said “You are absolutely right that without pragmatics we’d never be able to accomplish what we’re trying to with NLP technology,” what did he mean?

“The context of any natural language statement is extremely important for the correct semantic understanding. It is not sufficient to identify a key clinical concept like ‘pneumonia’ in a statement like ‘Two months ago, the patient was diagnosed with pneumonia, which turned out to be a mis-diagnosis.” Pragmatics (context, really) informs us that the statement is about the patient, that it is about something that occurred 2 months ago, and that it was a false diagnosis. Without a level of pragmatics, we would completely misinterpret that statement.”

9. By the way, while the web page didn’t come up in response to my “workflow” query, I stumbled across an M*Modal developer certification program. Which leads me to my final question. All of this workflow technology and language technology for improving efficiency and user experience? How do I, as a developer (and I are one), harness what you have created?

HCIT vendors can take advantage of M*Modal’s free Partner Certification
Program. M*Modal Fluency Direct speech-enables electronic health records (EHR) and other clinical documentation systems by verbally driving actions normally associated with point-and-click, templated environments.

    • No cost to certify or for yearly recertification
    • Access to product development engineers
    • Access to product development documentation
    • Onsite engineering-focused, peer-to-peer training session
    • Featured on program website
    • Allowed to use a specialized certified logo
    • Co-marketing and marketing opportunities
    • Signage for tradeshows
    • Product labels and specialized documentation

How to Get Started

M*Modal has made certification as simple and smooth as possible. The certification process consists of an onsite Speech Enablement Workshop at no cost to the vendor. To get started, vendors simply visit www.mmodal.com/certification and register or email us at certification@mmodal.com. We will follow up with you and provide additional information that will prepare you for the certification workshop.

Well now! I have to admit you nailed that last question. You even used bullet points. I’ve never, ever, had an interviewee who (er, which, that, you tell me, you’ve got all the grammar rules!), who did that before.

I appreciate all the time you’ve spent with me. I hope I didn’t put too much of a strain on the web server. If anyone has any follow up questions, are you on Twitter?

Cool! I already follow you.

Well, that was my interview, about the future of language and workflow, with the MModal.com website. I’m sure you’ll agree that it’s remarkable.

This entry was posted in natural-language-processing. Bookmark the permalink. Both comments and trackbacks are currently closed.