Just how might the insights buried within thousands of documents be unearthed? The answer is Knowledge Mining.
If the future belongs to intelligent machines and smart algorithms, then the path to success starts by harnessing data, the very lifeblood of machine learning and AI. What most people don’t realize is that 80% of the data humans create is locked inside unstructured formats like Word documents, PDFs, e-mails, webpages and audio files.
Consider the amount of unstructured information your organization has been accumulating over recent years. What if you could access all of that data, search through it and make it available to analytics or machine learning experts to work their magic on? It’s difficult to exaggerate the scale of this opportunity.
All of these scenarios, and many more are possible through Knowledge Mining using Azure Cognitive Search.
Knowledge mining allows you to pull useful data from unstructured information – i.e. info stored in documents, e-mails, images, audio – and build a searchable index with a rich, faceted search UI on top of it. The search immediately gives your users access to every indexed document, and can provide insights into the keywords, concepts and metadata of those files.
As we build the index to drive the search, we can store that same information in a structured format in Azure Storage making it readily available for analytics, machine learning and any number of applied data uses.
The bottom line is that you’ll have taken unstructured, document information and given it structure that makes it “machine-friendly”.
Identifying an opportunity can be a tricky thing. However, it usually boils down to solving a pain point, smoothing out an uneven or difficult process or aggregating and transforming raw information into something more practical and accessible.
Some questions can help guide you towards your solution:
Do you have repeated document outputs from current or past business processes?
Have you collected documentation in various formats across business units, subsidiaries or acquisitions?
Are some of your key documents floating around on shared drives, SharePoint, local machines or e-mail attachments?
Does your staff spend altogether too much time tracking down reference documentation, manuals, reports, guides or other documents or files?
Do you have digital documents (i.e. Word, PDF, etc.), scanned forms or images, video or audio files that you believe contain valuable information but no means of collecting it into a database?
Are you holding on to regulatory or housekeeping docs, parallel to your core business and the combined information they contain might reveal trends or opportunities for your operation?
Answering “Yes” to any of these questions is usually a sign that Azure Cognitive Search, or Knowledge Mining more broadly, could help you achieve otherwise unattainable gains.
If you have challenges finding information or must look in different places to find all the info you need, consolidate your knowledge assets in one place and make them searchable. Through its underlying search engine (the powerful, best-in-class Solr on Apache Lucene), we can extract key attributes into facets and filters that you can use to slice and dice your search results, navigate across content by tag or keyword and truly own your own knowledge.
Once that initial search is built, it can be seen as a living application: it can be continually improved and refined by extracting additional document attributes or adding support for new file types.
If you have a critical mass of information locked away in paragraph text, images, audio or video files, we can help you collect all of that information together and bring transparency to it through a unified database on the Azure cloud platform. As part of building the index for Azure Cognitive Search, we can populate a cloud database in parallel that will hold the text, metadata and other attributes of your information. This flexible format will allow you to set about querying your data, building dashboards on top of it or wrapping applications around what was previously largely inaccessible, unstructured — but no less valuable — information.
By having your data go from opaque to structured, to fully accessible and transparent, Knowledge Mining opens up powerful opportunities for data-driven innovation. Build dashboards and interactive reports in PowerBI. Build applications on top of and around your data. Surface your search within Teams pages or integrate it fully with Teams to give better context to the documents we’re mining.
As our data-driven world becomes more of a reality, having your knowledge assets as true data is a natural progression. By transforming it into a universally accessible format in the Knowledge Store, you can feed your data into advanced deep and machine learning processes. Train a machine model to make better decisions. Predict possible futures and minimize decision risk through trend analysis. The possibilities are only limited by your knowledge and vision.
If you’re starting to see opportunities for Knowledge Mining within your organization, the path to achieving a first win may be shorter than you think. The key is to focus on quick wins within a broader, strategic plan.
So, you’re considering getting started with Knowledge Mining? Great decision. In this short write-up, we’ll try our best to guide you through some simple steps to get you going quickly.
The beginning of Knowledge Mining with Azure Cognitive Search is fairly standard. It typically starts with a “Proof-of-Value” (POV) rollout. Once that has been established, the doors of opportunity open up to a broad array of possibilities that reflect your needs and means. The key to remember throughout is that this is a living process: as we progress, we will adjust course and optimize our efforts to match your priorities and best possible outcomes.
We begin with analysis. In one or more conversations, we will define the inputs, outputs and organizational situation to set us up for success. We will identify the types of information (i.e. documents) that will go into the search and what kinds of results will best suit your users. We will identify stakeholders who will be involved in answering questions from the design and development team and who will be involved in testing the solution on your behalf.
The main outputs for this conversation will be:
A plan for delivering the initial POV, along with any missing inputs
A new working partnership between our two teams for being successful going forward
Knowing what we have and getting it ready to go is a critical early step. While the first version of the POV may not contain all features nor every document type, we aim to capture a representative set of documents, across the various types that will be used in the full production solution.
Since size of the document set will impact Azure usage pricing, for the initial POV, we normally recommend selecting a representative sub-set of the overall document set. Once the documents are selected, they are uploaded to Azure.
This is where the rubber meets the road. We initialize and configure the search index, develop any required custom skills and customize the search user interface to match your requirements.
Custom skills are self-contained pieces of logic that are used to extract or enrich date from the documents we’re indexing. Examples would be
The need for custom skills varies. You may not need any or you may defer this type of work to future releases.
T4G has designed and developed a generic, responsive search UI that we use as a foundation for Knowledge Mining rollouts. We would customize that interface to match and specific needs you might have, including adding search facets to match data from custom skills. This too is on an as-needed basis. In some instances, the out-of-the-box interface is more than enough, given that it already includes full text search and facets around document type, author and entities extracted using Cognitive Skills.
Once principal development is complete, we put the tool through quality assurance (QA) testing and hand it off to your stakeholders for user acceptance testing (UAT). From there, it’s a short path to production use, if appropriate.
Once the initial POV is in use, we set to evaluating the result and preparing a strategy for delivering an enhanced or more complete feature set. This is what we call the “iterate” step. We will monitor user and organizational needs and incrementally improve the tool and index over time. As required, we may add new document types, new Custom Skills or new search features. The goal is to take the initial POV and transform it over time into a highly optimized tool that directly benefits your business.
In some cases, we may decide to elevate our Knowledge Mining efforts by going “beyond search”. Thanks to the structured data we’ve put in the Knowledge Store, we can build any number of solutions:
If you’re ready to begin your journey into Knowledge Mining using Azure Cognitive Search, please reach out to the T4G team. We welcome the chance to discuss the opportunity with you and set out a plan to take you from here to a successful POV in but a few short weeks.