Document Filters

An SDK to identify, extract and view hundreds of file formats

Hyland’s Document Filters SDK gives software developers the tools they need to embed rich document processing into their applications. A single toolkit provides all you need to:

  • extract content, including track changes and hidden content from 550+ formats;

  • convert documents to high-fidelity renditions in PDF, SVG, HTML5, XML, PostScript or Image; and

  • manipulate, annotate, redact and markup content, all out of the box.

Document Filters is one of the few SDKs to run natively on 22 platforms, from mobile to mainframe, Android to AIX. Our APIs give you the choice of language to integrate with your application, including Java, C#, C/C++ or Python.

Document Conversion in 15 Lines or Less

Explore how easy it is to leverage Document Filters in your applications by taking advantage of the sample code below.

Extracting text from a PDF

class Program
{
    // Extract all subfiles from a container
    static void Run(string sourceFile, string outputDirectory)
    {
        // Create an initialize an instance of the Document Filters API
        var docFilters = new DocumentFilters();
        docFilters.Initialize(LICENSE_KEY);

        // Load the source document ready for processing
        var doc = docFilters.GetExtractor(sourceFile);

        // Extract each subfile
        for (var subfile = doc.GetFirstSubFile(); subfile != null; subfile = doc.GetNextSubFile()) {
            subfile.CopyTo(System.IO.Path.Combine(outputDirectory, subfile.Name));
        }
    }
}

Converting a Word document to PDF

class Program
{
    // Create a multi-page PDF from a source file
    static void Run(string sourceFile, string outputFile)
    {
        // Create an initialize an instance of the Document Filters API
        var docFilters = new DocumentFilters();
        docFilters.Initialize(LICENSE_KEY);

        // Load the source document ready for processing
        var doc = docFilters.GetExtractor(sourceFile);
        doc.Open(isys_docfilters.IGR_BODY_AND_META | isys_docfilters.IGR_FORMAT_IMAGE);

        // Create a PDF canvas
        var canvas = docFilters.MakeOutputCanvas(outputFile, isys_docfilters.IGR_DEVICE_PDF, "");

        // Render each page of the document into the PDF
        for (int pageIndex = 0; pageIndex < item.GetPageCount(); pageIndex++) {
            canvas.RenderPage(doc.GetPage(pageIndex))
        }
    }
}

Converting a Word document to PNG

class Program
{
    // Create one PNG per page
    static void Run(string sourceFile, string outputDirectory)
    {
        // Create an initialize an instance of the Document Filters API
        var docFilters = new DocumentFilters();
        docFilters.Initialize(LICENSE_KEY);

        // Load the source document ready for processing
        var doc = docFilters.GetExtractor(sourceFile);
        doc.Open(isys_docfilters.IGR_BODY_AND_META | isys_docfilters.IGR_FORMAT_IMAGE);

        // Render each page to it's own PNG
        for (int pageIndex = 0; pageIndex < item.GetPageCount(); pageIndex++) {
            using (var canvas = docFilters.MakeOutputCanvas(System.IO.Path.Combine(
            outputDirectory, $"page-{pageIndex+1}.png",
            isys_docfilters.IGR_DEVICE_IMAGE_PNG, "")) {
                canvas.RenderPage(doc.GetPage(pageIndex))
            }
        }
    }
}

Why Document Filters?

Leverage Document Filters to:

Hyland

Extract every piece of value in a file: Unique ‘deep inspection’ capability lets you extract and analyze all of the text and metadata in a file – even what was previously hidden (e.g., tracked changes, comments, notes, annotations and embedded web links).


Hyland

Seamlessly view the content in a high-quality format: View, render and even manipulate the extracted content in near pixel-perfect high definition. Create renditions using your preferred output format, including HTML5, PDF, multi-page TIFF, PNG and SVG.


Hyland

 

Choose the language that works for you: C/C++, Java, C#, VB.NET and Python are supported out-of-the-box, and the library can be called from any language that supports “C” bindings.



Hyland

 

Identify the true nature of content: Intelligent file identification means the source content is accurately identified for filtering, without simply relying on the filename extension.



Hyland

Deploy to virtually any platform: Natively supported on 22 platforms – including Windows, MacOS, Linux, FreeBSD, Solaris, HP-UX and AIX – Document Filters does not require bloated run-time, providing lightning faster performance and easier deployment.



The ideal OEM partner technology

Document Filters is the ideal OEM partner technology for processing unstructured content outside of native applications. Today, this powerful technology is the key catalyst driving content mining and intelligence-gathering across a key range of business applications. These include:

typing
  • E-discovery
  • Data loss prevention
  • Text analytics
  • Enterprise content management
  • Email archival
  • Enterprise search

As an embeddable set of components, Document Filters serves as the 'intelligence' inside solutions from many global ISVs and SaaS vendors. It also helps drive content gathering and mining for ISV software applications.

To learn more about Document Filters, check out our FAQs.

Discover how you can leverage Document Filters

To learn more about Document Filters and request a trial, contact us.

Learn more

Architecture diagram for Document Filters

Hyland Download

Product summary: Document Filters

Hyland Download