Whether seeking to engage customers, investigate legal action, assess risk, uncover marketing intelligence, or manage and enforce regulations, every industry wants its data readily available for actionable insight. They must be able to identify and extract data and insight from the billions of documents stored across content repositories, file shares and email servers.
For unstructured data the status quo is insufficient
This year’s Gartner Data and Analytics Summit brought together more than 5,000 data leaders and over 150 exhibitors from all over the globe. Nearly every exhibitor at the Summit spoke to their tool’s ability to work with data, including unstructured data.
Great news — sort of.
What we learned is organizations make working with unstructured files slower and more difficult than it has to be.
In fact, the most prevalent approach to working with unstructured files shared by exhibitors and attendees was to ask customers to convert all of their unstructured files to PDFs and then run those files through optical character reading (OCR) engines. Relying on this approach is inefficient, incomplete, introduces OCR-base errors, and is certainly not fast.
Imagine the time it takes a large financial institution or global commercial enterprise to convert billions of emails, spreadsheets, PowerPoint presentations, Word documents and more to PDF. Then, run them through an OCR engine. Then, the process is back to the beginning of the identification and data preparation process necessary for downstream cognitive processes.