Marc Andrews Observations On Demand

Trends in Enterprise Use of Information

UIMA Becomes a True Standard

Standardization around unstructured analytics is finally becoming a reality.  At IBM, we had been pushing for a common framework to enable organizations to more easily combine various different technologies that may be required to interpret and gain meaning from unstructured information.  Last year, we began working with several vendors to drive adoption of UIMA to provide just such a framework.  And a few weeks ago (sorry for the late post, but it has been a crazy month on the road), we made another announcement to solidify UIMA as an open platform, and as an industry standard.

This announcement was really about two key things.  First, we submitted the UIMA specification to OASIS to establish UIMA as a formal industry standard, and provide an open process for further enhancement and refinement of this standard.  Second, we worked with the Apache Software Foundation to establish an incubator project for developing UIMA-based software, and contributed our implementation of the UIMA framework to the open source community as part of this project.  This will enable other organizations to start further leveraging and building solutions on top of UIMA without reliance on IBM.  We also expect this to spur further innovation around unstructured analytics by openning up evolution of the UIMA framework to the Apache community, which has proven to generate some of the most innovative open source projects.

For an independent analyst view, and some additional commentary on what this means for the market, you can check out the Next Steps for UIMA article in Hurwitz & Associates' recent newsletter. Mike Ferguson also has a recent post that discusses some of the value of unstructured analytics, with a specific focus on how it is driving the emergence of a new BI market around unstructured information in general.

December 05, 2006 in Unstructured Analytics | Permalink | Comments (0) | TrackBack (0)

IBM talks about importance of unstructured information to future of business intelligence

At the IBM Information On Demand conference this week, IBM talked about its view on the future of business intelligence.  There is a great write up of this in SearchDataManagement.com.  The high level statement on this, from Karen Parrish, IBM's vice president of BI solutions, is that the future of business intelligence will combine structured and unstructured information, utilizing text analytics to turn free text into information that can be stored in a dynamic data warehouse.

She also talked about the importance of analyzing data in real-time, delivering BI to mobile devices, democratization of BI, and making use of semantics and enterprise search to analyze all kinds of content, including text, voice and even images.  There are some good examples in this article on the various types of capabilities that are already starting to become available, and that were demonstrated at the IOD conference.

October 20, 2006 in Business Intelligence, Unstructured Analytics | Permalink | Comments (0) | TrackBack (0)

Linking Structured Application Data to Supporting Unstructured Information Through MDM

Companies are beginning to realize more and more that a lot of the information users need to complete an activity is not necessarily contained within the application or underlying database used to manage that business process.  Much of the critical supporting information exists in unstructured formats, which is a major reason for the recent growth in the content management market.  However, Wayne Kernochan has touched upon a very important point in his recent article, Who will leverage semi-structured and unstructured data? (warning, this article requires registering on the site, so I will repeat much of what he says here), which was actually based on the acquisitions of Documentum and FileNet by platform vendors EMC and IBM.

This article talks about the importance of creating linkages between structured data in enterprise applications and unstructured content across the organization.  As he points out, for customers, the ultimate "end game" behind these acquisitions is actually in enabling organizations to relate all of the unstructured information to its mission-critical relational data used as part of its business processes.  By creating and maintaining relationships between the different types of data stored across the organization, companies can more effectively surface that information to users as needed.  And that information can then be used to improve productivity and make better business decisions.

IT is accustomed to viewing data management through the lens of the enterprise relational database managing structured relational data. In point of fact, by some estimates, the proportion of data in the typical enterprise that is semi-structured or unstructured is approaching 90% and rising. Much of that data is related in some way to key enterprise data. In other words, even if most vital enterprise data sits in enterprise business-critical applications, the related data -- customer pictures, X-rays, security tapes, email, news footage -- that would enhance the enterprise's ability to interact with customers and suppliers, brand the company more effectively and allow effective response to Sarbanes-Oxley legal discovery is sitting outside the enterprise-application data stores. Thus, the enterprise badly needs better ways of managing its semi-structured and unstructured data -- and relating it to mission-critical relational data.

But, how are organizations to establish links between all of their unstructured information and their application data?

Well, this is where master data management (MDM) solutions would seem to be a perfect fit.  MDM technologies were developed for the specific purpose of pulling together information from different repositories for use in operational systems and applications.  Kernochan suggests that the best place to start is to use these master data management efforts to "classify semi/unstructured data related to customer data as part of master data."

I agree with his suggestion, but would actually take this a step further.  While initially focused on aggregating and maintaining links to all relevant customer information, these systems are now being used to capture and create linkages across data relevant to other types of business objects.  Colin White does a good job of pointing this out in When Will People Realize that MDM is More than Just CDI?

This is just another example highlighting his point.  Companies should start thinking about MDM technologies as something that could be applied more broadly - as a mechanism to maintain associations between all types of application data and unstructured information.

September 12, 2006 in Content Management, Master Data Management, Unstructured Analytics | Permalink | Comments (0) | TrackBack (0)

UIMA Used to Fight Crime - More Real World UIMA Examples

Consider this Real World UIMA Examples Part 3!  There is a great UIMA article in Byte and Switch about a pilot project at the Cape Breton Regional Police (that's in Nova Scotia, Canada for those of you wondering) that was demonstrated at the Canadian Chief of Police Conference last month and is now receiving interest from police departments across Canada.

They are using UIMA to
analyze digitized surveillance videos, audiotapes of interviews and interrogations, voice clips, images of vehicle licenses and police reports, and the like.

The goal is to feed in police data and produce information that solves a crime -- or at least helps unravel it.

"The program provides detectives with timelines, linkage analysis, and disclosures," says Burke. As data from tapes, videos, and even voice messages enters the system, it will acquire date and time stamps and be parsed so that further analysis can tie information to specific individuals and situations. Data items also will be annotated with the date when each was shown to prosecutors, enabling vastly improved police record keeping.

The New York City Police Department (yes, NYPD Blue!) has also begun using UIMA.  They are only in the initial stages of truly leveraging UIMA, with the primary focus on basic search capabilities, but selected a UIMA based platform so that they can extend their solution to capture the kinds of buried knowledge mentioned above.  They started off by implementing a real-time Crime Information "Warehouse" that is enabling them to more quickly gather, share and act on information.  By transforming the way they use information, NYPD is able to redeploy resources in response to crime patterns and trends, and resolve crimes and apprehend criminals more quickly.  You can read the NYPD case study for more details.

September 08, 2006 in Enterprise Search, Unstructured Analytics | Permalink | Comments (0) | TrackBack (0)

Hurwitz on Evolution of Enterprise Search

Fern Halper, from Hurwitz & Associates, just published a great article on how Enterprise Search Evolves.  She touches on two very important points in here.

First of all, the fact that enterprise search is not just about finding web sites...and it is not even just about finding unstructured information anymore.  Enterprise search is about providing a way to find all relevant information across the organization, including information from structured and unstructured sources, and about surfacing knowledge, not just finding documents.

The first part of that is not necessarily new, but an important distinction when looking at the various enterprise search solutions out in the market.  Some have been designed with this in mind...others are bolting on these capabilities in various ways (although I have to admit they are doing this fairly quickly and in clever ways as they recognize the needs of enterprise customers).  However, the second part of that, about enterprise search moving towards finding facts and knowledge, is an important distinction.

The second, and more more important point highlighted by Halper is the fact that text analytics will be playing a greater and more important role in how enterprise users find the knowledge and facts buried across their organization in order to complete their tasks, make better decisions and innovate their business to create competitive advantages.  This article is definitely worth a read to see how text analytics and enterprise search are converging, and even includes a good example.  Halper is trying to help companies understand how they might use this new functionality and the potential value.

For a more specific and detailed overview of the value of text analytics, you should also check out her previous article entitled Patterns for Success - Options for Analyzing Unstructured Information. 

It's also interesting to see further touch points in both of these articles around how the worlds of business intelligence and search are starting to converge.  I blogged about the Convergence of BI and Search back in May and am continuing to see a lot of activity and inquiries from customers and analysts in this space.  I think this is an exciting area that is just getting started.

September 01, 2006 in Business Intelligence, Enterprise Search, Unstructured Analytics | Permalink | Comments (0) | TrackBack (0)

Real World UIMA Examples - Part 2

Here is another set of real world examples to help people understand how companies are leveraging UIMA to extract knowledge from unstructured information and drive business innovation.  This installment will focus on customer care solutions. What you'll find in the following paragraphs are examples of organizations that are using this technology to better understand customer issues, improve customer service and use these uncovered insights to their business advantage.

Let's start off with Daksh.  Daksh is the fastest growing business process outsourcing services provider in India, with the largest part of its business focused on providing customer care services.  You've probably heard that when you call a company's 800 number for support, you are probably getting routed to someone in India...well, these are the guys who are likely picking up the phone on the other end.  Daksh provides outsourced customer service for some of the largest financial services, technology, e-commerce, telecommunications, insurance, hospitality and airline companies in the world.  By the way, if you're wondering who these companies are, Daksh won't tell you because most of them don't want everyone knowing they are outsourcing their customer service!

So, what is Daksh doing to improve customer service for these organizations?  Well, they started out by automating the process of analyzing customer surveys. They had over 20 employees going through thousands of surveys every week for just one of their customers. They have applied UIMA enabled analytics to automatically identify the products or services referenced and categorize the types of concerns. They are even determining the customer sentiment from the tonality of what they have written to come up with a customer satisfaction score.  In initial tests, Daksh was able to attain 85% accuracy and automate the workload of 60 subject matter experts.

It's important to note that UIMA based analytics can be applied not only to text, but to audio and other types of unstructured information also.  Daksh is taking advantage of this capability by recording the calls of its agents, with UIMA enabling them to send the audio to a combination of different analyses, first a speech-to-text engine, and then a set of pattern detectors.  They are applying this technology to better assess the performance of their call center agents.  For example, in processing reservations for a major car rental agency, they found that certain agents were making bookings that led to higher pick-up rates. They analyzed the unstructured information in the call notes and from recorded audio transcripts to identify patterns found with the better agents.  They are now using that knowledge to create best practices and improve overall performance.

Daksh actually has several other projects leveraging UIMA, but I think that is enough to get your creative juices flowing around what is possible, and I want to save some time to talk about a few other companies that are doing some innovative things.

The next example I'd like to mention is how BlueCross BlueShield of Tennessee (BCBS TN) is using this same technology to create a more complete, single view of their customers.  You may have seen them mentioned in an IBM press release or in a ComputerWorld article. They were building a data warehouse to create a single view of all the different health care providers the work with.  However, their customer service concerns were buried in free form text call center notes.  Understanding the types of services provided and geographies serviced could only be determined by extracting this information from government regulation web sites and web subscription services.  And the terms of their existing agreements were hidden in contracts scanned into a content management system.  They are leveraging UIMA to extract all of this knowledge from the various unstructured content sources and funnel that information into a provider dashboard that can be used both to improve customer service AND to help their sales agents better negotiate renewals of agreements with the providers.

I'll close off with the an example in the telecommunications industry.  One of Japan's premier mobile communications companies, providing voice and data communications to millions in Japan, is analyzing customer complaints and questions in real-time, as they are entered by agents or received via e-mail, and automatically matching them to candidate questions in order to generate FAQs.  The FAQs and related support information are able to be disseminated to call center operators much quicker, enabling them to dramatically improve customer service and reduce call center wait times.  Imagine how much happier you would be if you had to wait a few minutes less next time you call your cell phone company!

Hopefully this provides some additional insight into how organizations are leveraging UIMA and unstructured analytics in general to extract knowledge from the various types of content they are capturing, and use that information to drive business innovation.

July 22, 2006 in Unstructured Analytics | Permalink | Comments (3) | TrackBack (0)

Real World UIMA Examples - Part 1

Sorry it has taken me so long to follow up on my promise to provide some real world examples of how companies are leveraging UIMA to drive real business value.  April has been a busy month and seems to have just flown by!  I will start off with some life sciences companies that are extracting knowledge from unstructured information and adding semantics to improve research and discovery around new drug development, clinical trials and patent infringement.  One company is even leveraging these capabilities to better share information with doctors and patients worldwide to give them a better chance at surviving what are often life threatening diseases.

I'll start off with Memorial Sloan-Kettering Cancer Center.  They are working with IBM to develop a web accessible data warehouse that will enable clinicians and researchers to more efficiently use the information they are gathering to facilitate research on a new cancer taxonomy, while conforming to HIPAA requirements.  This is really becoming an "information" warehouse through the inclusion of searchable concepts from their text-based pathology reports.  These concepts are automatically extracted by a set of text analytics developed by IBM on top of the UIMA framework.  As an example, the system can automatically identify the particular body part where the occurence was found and which side of the body, along with the preliminary diagnosis, the grade and size of the occurence.  This information is extracted from the unstructured pathology reports and turned into a structured set of metadata that can be used to provide more efficient query capabilities and generate more detailed reports that enable deeper analysis.

Another great example is Mayo Clinic, who used UIMA as the basis for implementing a system to extract more detailed knowledge from approximately 20 million clinical notes.  UIMA provided the flexibility to easily combine a series of home grown annotators, along with some from the open source community and others developed by IBM as part of the implementation project.

At two large pharmaceutical companies (I'm not sure if I'm allowed to mention their names, so I'm erring on the safe side), IBM's OmniFind search and text analytics platform is being used to incorporate UIMA compliant annotators that identify chemical names, along with some other entities such as drug and disease names.  Once identified, the chemical names are converted into their chemical structures using [name=structure] programs.  This produces SMILES strings that provide graphical representations of the chemical structures, which can be used for performing computational calculations and as input for other applications.  These technologies were leveraged to analyze millions of patents and Medline abstracts to generate a large database of molecular structures derived from the text of those documents.  This effectively renders the scientific literature and patents searchable by structure and substructure, and they can even perform similarity analysis!

The combined technologies for reading and processing molecular structure will allow researchers to build large databases of previously inaccessible literature - relevant in the areas of patents, pharmaceuticals, publishing, health care and environmental science just to cite a few.  In addition to identifying chemical entities, additional annotators have been created to identify and extract proteins, genes, cell-lines, cell-types and host of other life sciences related entities.  The combined data can be used to perform co-occurrence analysis, automated classification, visualization and OLAP analysis capabilities through some additional tools available from IBM Research.

This has gotten to be a long blog entry, so I will wrap up with a quick reference to the IFPMA Clinical Trials Portal, which provides doctors and patients with the ability to search for ongoing and completed trials being performed by all of the different pharmaceutical, academic and drug research organizations.  This enables people to find trials relevant to their disease or symptoms without having to know all of the various technical jargon that may be used to describe these things in clinical trials documentation.  There is a great article on this in Technology Review if you want to learn more.

I'll try to follow up this entry with some examples from another industry.  Check back in to see what other types of solutions can be addressed by combining unstructured and structured information.

May 03, 2006 in Unstructured Analytics | Permalink | Comments (0) | TrackBack (0)

Why Do I Need OmniFind for UIMA?

If UIMA is now open source, what value does OmniFind provide and why would I need it?  This is a question I am often being asked, so I thought I would try to provide an answer in my blog that I could just refer people to.

So, let’s get back to what UIMA is.  Remember that at its roots, UIMA is just a framework for enabling different analysis engines (“annotators”) to work together.  And while IBM has made an implementation of that framework available to the open source community, that implementation still only provides a basic runtime for passing a pre-built UIMA object through a set of hard-wired annotators, resulting in a resulting UIMA object that contains the analysis results.  There is still additional work for organizations to leverage UIMA compliant analytics as part of a production solution.  For example, how do you create the initial UIMA object, and what do you do with the final, resulting UIMA object?  Not to mention how you will manage multiple processes for different sets of content.

OmniFind provides organizations with a commercially supported platform for processing unstructured information through UIMA-compliant analytics.  The first thing it does is allow organizations to define and manage different collections of content and associate those collections with different UIMA processes.  This is important because, as I discussed in my previous blog, different content typically requires different analytics.  And you may want to do different things with the analysis results from those different collections.  In addition, OmniFind provides a broad set of crawlers, enabling you to automatically pull content from a variety of sources. This includes not only web pages and file servers, but also the ability to pull content from databases, enterprise content management systems, collaborative applications, mainframe systems and several other content sources.  OmniFind also does all of the up front work of parsing through over 200 different file formats to create a text version that can be processed by most annotators.  Not to mention the fact that it creates the UIMA object for you, which is required before initiating any UIMA-based process (remember that the open source UIMA implementation requires the developer to create this UIMA object).

Another feature OmniFind provides is all of the basic language processing facilities, such as language identification and native tokenization, stemming, and parts of speech detection for over 20 languages.  OmniFind has also built in a few other non-trivial "platform" type features, such as chunking for better handling of large documents, error handling, logging and a few other such add-ons that are critical for a production system.

Finally, at the end of the process, OmniFind provides facilities for dealing with the resulting UIMA object after it has gone through all of its processing.  Since OmniFind is also a search engine, of course it automatically takes all of the analysis results and sends them into a searchable index.  But this is not just a typical keyword index.  OmniFind can index XML spans, and supports both parametric and semantic queries (in addition to standard keyword queries), enabling users to search for the entities, entity relationships and facts identified during the UIMA process.  This can enable a whole new level of access to find and research the knowledge buried in unstructured information that has been processed through OmniFind.  But, OmniFind recognizes that search is not the only application that organizations may want to leverage the UIMA based processing.  So, OmniFind also provides a JDBC “consumer” that allows you to map the analysis results directly to fields in a database.  In this way, you can use the analysis results to expand the schema in your database or data warehouse, and create new types of reports to take advantage of this newly generated knowledge.

Thus, OmniFind is not just for searching text analysis results...it provides organizations with a robust platform for processing unstructured information through UIMA compliant analytics -- for search or other applications.

Click here  to download a document that talks more about OmniFind as a text analytics platform.

March 19, 2006 in Unstructured Analytics | Permalink | Comments (2) | TrackBack (0)

UIMA - What is it Good For?

UIMA (pronounced you-ee-muh) stands for Unstructured Information Management Architecture and is a framework for extracting knowledge from unstructured data through text analytics and other content processing mechanisms.  It has been gaining significant traction since it was first introduced to the market by IBM just over a year ago, picking up significant momentum since last August, when IBM announced its intent to open source its implementation of the framework.  While there have been a lot of articles talking about UIMA, there continues to be some confusion of exactly what UIMA is, and what UIMA can be used for.  This will be one of my first attempts (of many I'm sure) to provide some clarity.

To start off with, UIMA does not provide actual text analytics or content processing capabilities.  It provides a common model for plugging in different modules (called "annotators") that can perform these functions.  Why is this important?  Well, the fact is that there are tons of different components out there to help extact meaning and interpret unstructured data.  These different annotators may be used to support different analysis capabilities (i.e. basic word root extraction to entity extraction and relationship detectors) or they may be relevant for different specializations (i.e. industry specific terminology, application specific domains, language support, etc.). So, companies will find that different analytics will be required for different sets of content.  For example, the analytics most relevant for life sciences applications, such as clinical trials or drug research, will not be appropriate for fraud detection in financial services.  And even within a single organization, you will most likely want to use a different set of analytics on warranty claims data than what you will use on marketing information, legal documents or general intranet content. Thus, it is important that all of these annotators use a common framework and speak a common language so that they can interoperate with each other and more easily be plugged into enterprise applications.  This is the goal of UIMA - to enable people to more easily leverage all of the text analytics capabilities out there, and re-use custom developed capabilities, to deliver applications that provide significant business value and generate new insights.

So, what are these applications that UIMA can enable?  Well, once again, it is important to note what UIMA does not do.  It does not provide actual search, business intelligence, data mining, reporting or other research capabilities.  However, it does empower these applications to leverage knowledge buried within unstructured data.  Organizations can use UIMA to generate and extract structured data, which can be sent to a database or data warehouse, enabling deeper reporting and analysis of information that previously could not have been easily surfaced in its unstructured format -- for example, processing technician comments or call center notes to report on the underlying root cause of a problem, including conditions that may have led to the problem and actions that were taken.

UIMA can also be used to provide additional metadata for indexing by search engines.  This can be used to go beyond standard key word searching to find concepts and facts (I'll talk more about this in the future).  Finally, UIMA can also send the extracted knowledge directly to business processes and applications.  For example, a rules engine could be used to automatically notify users or kick off special business processes based on certain findings.  Or applications could use UIMA for real-time processing of text entered by a user or received from a service to better determine what actions should be taken.

This should provide a good introductory foundation for what UIMA does.  In future posts, I'll try to provide examples of some more specific applications that can benefit from this technology

March 14, 2006 in Unstructured Analytics | Permalink | Comments (0) | TrackBack (1)

My Photo

About

Subscribe to this blog's feed

Your email address:


Powered by FeedBlitz

Recent Posts

  • Key Challenges Managing Risk
  • Steve Mills Talks to Forbes About the Need for an Information Agenda
  • IBM Named One of the Most Influential Vendors in Enterprise IT
  • Financial Darwinism - How Can Financial Companies Survive?
  • Why BI Deployments Won't Be Slowed Down by the Recession
  • Analyst Predictions for Information Needs in 2009
  • Banking's Increased Need for BI in Uncertain Times
  • Top 100 Focused on Delivering Intelligence to More Employees
  • Leveraging Information in Today's Challenging Economic Environment
  • The Effectiveness of IT Investments Challenged

Quick Links

  • Information Agenda
  • Information On Demand
  • OmniFind and UIMA
  • UIMA on IBM Research
  • IBM's IOD Conference

Archives

  • February 2009
  • January 2009
  • November 2008
  • September 2008
  • June 2008
  • February 2008
  • December 2007
  • November 2007
  • August 2007
  • May 2007

More...