How To Get Started With Knowledge Mining In Azure

Acting on unstructured information can be a challenge. This is especially true when that information is locked inside of documents, stored on a drive or hidden in an archive somewhere.

It can be a frustratingly difficult job to search and analyze the data within documents, and the more there are, the harder that task becomes. If only you could easily draw from this combined information, you might be able to make better decisions, improve your business processes, identify new opportunities and mitigate organizational risks.

The ironic truth is that the more unstructured information you have:

  1. The more likely there’s value, in the aggregate, to be mined from it.
  2. The more difficult it will be to get at that value through purely human means.

Amount of Unstructured Data Graph

If your organization finds itself with thousands of documents shuffled away on shared drives or employees’ computers, you may well be sitting on a goldmine of untapped value – value that’s locked away inside of documents, spreadsheets, presentations and other unstructured data sources.

There’s good news, however: Knowledge Mining can allow you to get at that data with a little help from AI and machine learning.

How Knowledge Mining in Azure Works

Azure Cognitive Search is a cloud-based search service that provides built-in Knowledge Mining capabilities. This service allows you to turn your unstructured data into searchable content through a combination of integrated indexing and cognitive (i.e. “smart”) services.

Azure Cognitive Search works in three key phases:

1.   Upload and Extract Information

It all begins by uploading documents to the Azure cloud and then extracting information from them through “document cracking”. This process uses intelligent services and custom code to read meaningful information locked inside data sources, such as text, images and metadata. For scanned documents, optical character recognition reliably and quickly transforms images into text.

Both unstructured data (in the form of PDFs, images, videos, documents, spreadsheets, presentations, emails, audio files, etc.) and structured data (i.e. spreadsheets, databases) can be used as data sources in Azure Knowledge Mining.

2.   Enrich Information

To find patterns and provide an understanding of the data, pre-trained AI models developed by teams at Microsoft are applied. These cognitive services include:

  • Vision skills – face detection, tag extraction, celebrity recognition
  • Language skills – key phrase extractions, language detection, sentiment analysis, location entity extraction
  • Speech skills – text to speech, speech to text, speech translation
  • Customized AI models – specific to an organization or its industry (i.e. “built to suit” by your developers or a partner like T4G)

The result is a searchable index populated with enriched, structured data.

3.   Search Information

At this point, users can now begin to explore the data using a customizable, responsive, faceted search UI. This makes the data available to users to quickly find useful information that was previously so much harder to get at.

The search tool can be very useful, and we’ve found that this initial process can be completed quite quickly, however that’s not where Knowledge Mining ends. Search is only the first step.

The Sky’s the Limit – Iterate and Elevate

After that first key step of setting up Azure Cognitive Search, the index can be improved and the UI extended to meet organizational needs. This is one way to extract more value from the underlying documents and is what we call the “iterate” opportunity.

Search Evolved – Iterate

Through imaginative design coupled with advanced technical skill, we’ve seen exciting user experiences come alive from what initially began as a straightforward search:

  • interactive relationship graphs between entities (e.g. people and organizations)
  • infinitely navigable relationship paths among the various meaningful concepts that were extracted from the underlying documents
  • intuitive ways of relating business-critical concepts across contexts and timeframes

In many cases, the resulting search becomes a user-empowering tool that breaks down the boundaries between previously unrelated documents and provides results that better reflect the actual meaning that was locked within them. What that custom search tool ultimately looks like really depends on your needs and data.

Search Evolved Iterate
JFK Files – an interactive Cognitive Search experiment by Microsoft

Beyond Search – Elevate

The other parallel option is to “elevate” the solution. When we extract information from the documents, we not only feed it to the search index but also a “Knowledge Store” in Azure. This makes the mined data easily available to solutions outside of the search, including machine learning and other applied data science techniques. Data that was originally locked inside of documents can be cross-referenced against or added to existing systems and datastores to uncover insights that help improve efficiencies, raise margins and lower costs.

The opportunities around the elevation of your solution are unbounded and depend on the needs and means at hand. Whether it’s machine learning, advanced dashboards like Power BI, your CRM (to enrich your client information), or your existing analytics practice, anything is possible.

Moving Forward with Knowledge Mining in Azure

Perhaps you already have an intuitive sense of how Knowledge Mining could help your organization. You have a document set in mind or a question you think you can answer. If you’re still unsure whether or not you are a good candidate for Knowledge Mining or Azure Cognitive Search, consider the following statements about your situation:

  • Document data – you have unstructured data sources in the form of Word documents, PDFs, images, audio, etc.
  • Critical mass – you have enough data to reveal trends and feed an algorithm
  • Understood format – the documents’ format is common or proprietary to allow for attributes, information and metadata to be extracted
  • Unique/Proprietary – the information is only owned by you and can provide a competitive advantage.
  • Spread over time, space and circumstance – there is enough of a meaningful spread of data to allow machines to find correlations and causalities over time or across circumstances.
  • Humans could mine it – if a human is able to pull meaning from the documents, this is a good indication that a machine can be trained to do the same.

If any of these statements ring true about your organization, Knowledge Mining may be the key to unlocking otherwise wasted potential within your documents. It’s certainly worth a discussion about the opportunity that Knowledge Mining in Azure might represent for you.

What a Knowledge Mining Engagement Looks Like

The sequence of a typical Knowledge Mining project is as follows:

  1. Discovery, to determine business goals and identify use cases for unstructured data
  2. Document upload and rapid creation of the initial search app on Azure Cognitive Search
  3. Selection and rollout, on a priority basis, of advanced extraction and search deliverables
  4. Iterate – Continuous evaluation and improvement of search and data extraction
  5. Elevate – Identify and deliver strategic opportunities for the data “beyond search”

As a go-to Microsoft partner for Knowledge Mining using Azure Cognitive Search, T4G’s applied data team can help kick-start your Knowledge Mining project. If you’d like to explore the potential benefits of Knowledge Mining to your organization, please reach out. We’d be happy to start a conversation about this and other applied data opportunities. We can help you elevate your AI capabilities beyond just search to extract real value for your business.

Get in touch today to find out more.

Taylor Bastien

Taylor is a Solutions Architect at T4G, working closely with clients to bring their true needs into focus and forming the right team of professionals to deliver quality solutions. He takes a strategic view of each client’s challenges, helping them to make informed technical investments. When he’s not on the clock, he enjoys staying fit, learning languages, and spending time with his family.