Content Extraction

Extract and analyze file text and metadata with the Document Filters SDK

Document Filters leverages unique deep inspection technology to extract and analyze all of the text and metadata – including hidden content – in a file. With Document Filters, software developers can enable their applications to extract and process content from hundreds of file formats without the need for the source application.

  • Extract all text and metadata from over 550 file formats including Word, Excel, PowerPoint, PDF, AutoCAD, ZIP, MSG, Visio and hundreds more
  • Extract previously hidden information such as tracked changes, comments, notes, annotations and embedded links
  • Perform optical character recognition (OCR) of document images to extract textual data
  • Extract contents of packaged, archived, compressed, and other container files
  • Deploy it your way - Document Filters runs natively on 27 platforms and flexible APIs give you the choice of language to integrate with your application