The Definitive Guide to Knowledge Mining Using Azure Cognitive Search

Just how might the insights buried within thousands of documents be unearthed? The answer is Knowledge Mining.

If the future belongs to intelligent machines and smart algorithms, then the path to success starts by harnessing data, the very lifeblood of machine learning and AI. What most people don’t realize is that 80% of the data humans create is locked inside unstructured formats like Word documents, PDFs, e-mails, webpages and audio files.

Consider the amount of unstructured information your organization has been accumulating over recent years. What if you could access all of that data, search through it and make it available to analytics or machine learning experts to work their magic on? It’s difficult to exaggerate the scale of this opportunity.

  • Imagine a machine learning model trained using every key purchase decision you’ve made in the past several years, alongside the inputs and eventual outcomes. Couldn’t that model help you make wiser, more profitable decisions today?
  • Imagine being able to host all of your client-facing support manuals, maintenance and repair guides in a faceted search, including scans of old print manuals from bygone days. What if an intelligent bot had access to all of that knowledge and used it to answer customer or employee questions on your behalf?
  • Imagine thousands of recordings from a call centre, transcribed and analyzed for sentiment and tagged with keywords related to the call’s contents – all available to search, report on and analyse.

All of these scenarios, and many more are possible through Knowledge Mining using Azure Cognitive Search.

What is Knowledge Mining

Knowledge mining allows you to pull useful data from unstructured information – i.e. info stored in documents, e-mails, images, audio – and build a searchable index with a rich, faceted search UI on top of it. The search immediately gives your users access to every indexed document, and can provide insights into the keywords, concepts and metadata of those files.

For instance, we can analyze correspondence, call centre recordings  or online comments to determine the sentiment of those involved.  We can extract the locations or people mentioned in a document.  All of that information can then be turned into filtering options in  the search tool.

At its core, Knowledge Mining is about finding real value in existing information.

As we build the index to drive the search, we can store that same information in a structured format in Azure Storage making it readily available for analytics, machine learning and any number of applied data uses.

The bottom line is that you’ll have taken unstructured, document information and given it structure that makes it “machine-friendly”.

At its core, Knowledge Mining is about finding real value in existing information.

Benefits

At its core, Knowledge Mining is about finding real value in existing information.

Consider the following: knowledge workers spend half of their time either looking through or creating unstructured information. No matter the organization, that’s a lot of time, experience and know-how locked away in thousands of files. It’s a lot of value often just out of reach. Between 60% and 80% of the data generated is unstructured, and much of it can only be interpreted manually by an expert.

We can gain a renewed return in your investment in knowledge content. Those documents were created with a purpose in mind but we can now extend their useful, profitable lifespan by consolidating and homogenizing the unstructured information they contain and transform it into something much more useful to both humans and automated, machine processes.

You can gain economies of scale by auditing and consolidating your unstructured data and take control of these now unified sources of value.

Initial Approach

The initial approach is simple enough:

Discovery, Consolidate, Indexing, Iterate

Initial Approach

The initial approach is simple enough:

Discovery, Consolidate, Indexing, Iterate

Request a Free Consultation

Not sure where to begin? Our experts will guide your team through an interactive introduction to Knowledge Mining on Azure, using real-life use cases and demos to bring it to life.

BOOK NOW

Am I a Good Candidate?

  • Organizations across all verticals can benefit from Knowledge Mining.
  • There are many direct and indirect benefits, starting with consolidating your knowledge and making it searchable.
  • What other benefits you can draw from Knowledge Mining depends on the goals you have and the information you’re holding. You may be sitting on a goldmine and not even know it.

Ask Yourself Some Questions

Identifying an opportunity can be a tricky thing. However, it usually boils down to solving a pain point, smoothing out an uneven or difficult process or aggregating and transforming raw information into something more practical and accessible.

Some questions can help guide you towards your solution:

Do you have repeated document outputs from current or past business processes?

Have you collected documentation in various formats across business units, subsidiaries or acquisitions?

Are some of your key documents floating around on shared drives, SharePoint, local machines or e-mail attachments?

Does your staff spend altogether too much time tracking down reference documentation, manuals, reports, guides or other documents or files?

Do you have digital documents (i.e. Word, PDF, etc.), scanned forms or images, video or audio files that you believe contain valuable information but no means of collecting it into a database?

Are you holding on to regulatory or housekeeping docs, parallel to your core business and the combined information they contain might reveal trends or opportunities for your operation?

Answering “Yes” to any of these questions is usually a sign that Azure Cognitive Search, or Knowledge Mining more broadly, could help you achieve otherwise unattainable gains.

Search

If you have challenges finding information or must look in different places to find all the info you need, consolidate your knowledge assets in one place and make them searchable. Through its underlying search engine (the powerful, best-in-class Solr on Apache Lucene), we can extract key attributes into facets and filters that you can use to slice and dice your search results, navigate across content by tag or keyword and truly own your own knowledge.

Once that initial search is built, it can be seen as a living application: it can be continually improved and refined by extracting additional document attributes or adding support for new file types.

Structuring the Unstructured

If you have a critical mass of information locked away in paragraph text, images, audio or video files, we can help you collect all of that information together and bring transparency to it through a unified database on the Azure cloud platform. As part of building the index for Azure Cognitive Search, we can populate a cloud database in parallel that will hold the text, metadata and other attributes of your information. This flexible format will allow you to set about querying your data, building dashboards on top of it or wrapping applications around what was previously largely inaccessible, unstructured — but no less valuable — information.

Data-Driven Innovation

By having your data go from opaque to structured, to fully accessible and transparent, Knowledge Mining opens up powerful opportunities for data-driven innovation. Build dashboards and interactive reports in PowerBI. Build applications on top of and around your data. Surface your search within Teams pages or integrate it fully with Teams to give better context to the documents we’re mining.

As our data-driven world becomes more of a reality, having your knowledge assets as true data is a natural progression. By transforming it into a universally accessible format in the Knowledge Store, you can feed your data into advanced deep and machine learning processes. Train a machine model to make better decisions. Predict possible futures and minimize decision risk through trend analysis. The possibilities are only limited by your knowledge and vision.

If you’re starting to see opportunities for Knowledge Mining within your organization, the path to achieving a first win may be shorter than you think. The key is to focus on quick wins within a broader, strategic plan.

Knowledge Mining Quick Start

This four-week engagement will get you up and running with Azure Cognitive Search, unlocking vital information lying latent in your unstructured documents. Co-funding opportunities are available for qualified customers.

LEARN MORE

How do I get started?

So, you’re considering getting started with Knowledge Mining? Great decision. In this short write-up, we’ll try our best to guide you through some simple steps to get you going quickly.

The beginning of Knowledge Mining with Azure Cognitive Search is fairly standard. It typically starts with a “Proof-of-Value” (POV) rollout. Once that has been established, the doors of opportunity open up to a broad array of possibilities that reflect your needs and means. The key to remember throughout is that this is a living process: as we progress, we will adjust course and optimize our efforts to match your priorities and best possible outcomes.

Initial Proof of Value

These first steps will establish the baseline features of your search, will help you get your information organized and will allow us to define a roadmap for future enhancement. User feedback throughout is key, as are the strategic priorities of your team and organization.

The POV project is not meant to be a production system (though it might become that) but rather prioritizes our ability to answer key questions and establish the longer-term plan in which we can have confidence.

User feedback throughout is key, as are the strategic priorities of your team and organization.

These first steps will establish the baseline features of your search, will help you get your information organized and will allow us to define a roadmap for future enhancement. User feedback throughout is key, as are the strategic priorities of your team and organization.

The POV project is not meant to be a production system (though it might become that) but rather prioritizes our ability to answer key questions and establish the longer-term plan in which we can have confidence.

User feedback throughout is key, as are the strategic priorities of your team and organization.

Step 1. Discovery

We begin with analysis. In one or more conversations, we will define the inputs, outputs and organizational situation to set us up for success. We will identify the types of information (i.e. documents) that will go into the search and what kinds of results will best suit your users. We will identify stakeholders who will be involved in answering questions from the design and development team and who will be involved in testing the solution on your behalf.

The main outputs for this conversation will be:

Maze plan for delivery

A plan for delivering the initial POV, along with any missing inputs

Icon of people with checkmark

A new working partnership between our two teams for being successful going forward

Step 2. Identify and Consolidate Knowledge Assets

Knowing what we have and getting it ready to go is a critical early step. While the first version of the POV may not contain all features nor every document type, we aim to capture a representative set of documents, across the various types that will be used in the full production solution.

Since size of the document set will impact Azure usage pricing, for the initial POV, we normally recommend selecting a representative sub-set of the overall document set. Once the documents are selected, they are uploaded to Azure.

Step 3. Development and Indexing

This is where the rubber meets the road. We initialize and configure the search index, develop any required custom skills and customize the search user interface to match your requirements.

Custom skills are self-contained pieces of logic that are used to extract or enrich date from the documents we’re indexing. Examples would be

  • A skill for tagging all documents with the customers/clients that they relate to.
  • A skill that tags documents with important terms related to your business, like products, materials, technologies, etc.
  • A skill that augments the information in the index using data from your CRM, allowing us to include customer information in the search.

The need for custom skills varies. You may not need any or you may defer this type of work to future releases.

T4G has designed and developed a generic, responsive search UI that we use as a foundation for Knowledge Mining rollouts. We would customize that interface to match and specific needs you might have, including adding search facets to match data from custom skills. This too is on an as-needed basis. In some instances, the out-of-the-box interface is more than enough, given that it already includes full text search and facets around document type, author and entities extracted using Cognitive Skills.

Once principal development is complete, we put the tool through quality assurance (QA) testing and hand it off to your stakeholders for user acceptance testing (UAT). From there, it’s a short path to production use, if appropriate.

Step 4. Iterate / Elevate

Once the initial POV is in use, we set to evaluating the result and preparing a strategy for delivering an enhanced or more complete feature set. This is what we call the “iterate” step. We will monitor user and organizational needs and incrementally improve the tool and index over time. As required, we may add new document types, new Custom Skills or new search features. The goal is to take the initial POV and transform it over time into a highly optimized tool that directly benefits your business.

In some cases, we may decide to elevate our Knowledge Mining efforts by going “beyond search”. Thanks to the structured data we’ve put in the Knowledge Store, we can build any number of solutions:

  • Interactive reports and dashboards in PowerBI
  • Fully custom web, mobile and desktop applications
  • Machine learning or other intelligent data services such as trained chatbots
  • Integration into your broader human workflow, integrating with Office 365 or Teams, etc.
  • Content or insights to drive your digital marketing forward

If you’re ready to begin your journey into Knowledge Mining using Azure Cognitive Search, please reach out to the T4G team. We welcome the chance to discuss the opportunity with you and set out a plan to take you from here to a successful POV in but a few short weeks.

Request a Free Consultation

Not sure where to begin? Our experts will guide your team through an interactive introduction to Knowledge Mining on Azure, using real-life use cases and demos to bring it to life.

BOOK NOW

Additional Resources

Related Posts

Sunlight gleaming through a forest

How To Get Started With Knowledge Mining In Azure

There’s Value in Your Unstructured Data

Knowledge Mining with Azure Cognitive Search: 12 Common Questions

We’ve collected some of the more salient questions about Knowledge Mining on Azure into a single article for easy reference.