Disabling Loading of External DTDs in XML Documents Read by JAXP

By | August 2, 2013

I had a specific problem I set out to solve, but the technique described in this article can be used to customize the behaviour of JAXP document builder factories that are used by code you cannot change by setting other features and properties on such document builder factories.

Problem

I first saw the problem with JAXP wanting to read a DTD document referenced from an XML document when I created a mapping from one XML format to another using the Mule DataMapper.
Later, I encountered the same problem when using Java code generated by Altova’s MapForce.

In the case with the Mule DataMapper, setting the appropriate feature on the JAXP document builder factory turned out to be very difficult, if not impossible. Also running a patched version of Mule did not quite appeal to me.
In the case with the MapForce generated code, I just did not want to modify the generated code, in the case I will have to re-generate it at some later point in time.

Background

As described in the Java API documentation of the newInstance() method in the DocumentBuilderFactory class, JAXP uses the following procedure to determine which class to create an instance of when a document builder factory is requested:

  • Examine the value of the system property javax.xml.parsers.DocumentBuilderFactory.
    If set, it should contain the fully qualified name of the class to create.
  • Examine the jaxp.properties file in the lib directory of the Java runtime environment used.
    Using the same key as for the above mentioned system property, JAXP tries to retrieve a value that again is to be a fully qualified name of the document builder factory class to use.
  • Use the Services API to try to determine the fully qualified name of the document builder factory class to use.
    For details, see the Java API documentation for the java.util.ServiceLoader class.

Since I do not have control over the XML fragments that are received and did not want to revert to string-manipulation of incoming XML data, I made some research and found the following two features available in the Xerces XML parser that Java uses:

  • http://apache.org/xml/features/nonvalidating/load-external-dtd
    If set to false and if validation is turned off, there will be no attempts at loading external DTD documents when parsing XML.
  • http://xml.org/sax/features/validation
    If set to false, no validation will take place when parsing XML.

Solution

Given that Java system properties can be manipulated programmatically at runtime, I came up with the idea to dynamically replace the current document builder factory with a wrapper that sets certain features on new instances of the document builder factory. Setting those features were said to cause references to DTD documents to be ignored when parsing XML that contained such references.

A Test

I implemented a JUnit 4 test to expose the problem. In the test, I use JAXP to read an XML file on the classpath into an XML document object:

package com.ivan.xml;

import java.io.FileNotFoundException;
import java.io.InputStream;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.junit.Assert;
import org.junit.Test;
import org.w3c.dom.Document;

/**
 * Tests the WrappingDocumentBuilderFactory class.
 * 
 * @author Ivan Krizsan
 */
public class WrappingDocumentBuilderFactoryTest {
    /* Constant(s): */
    /** Base path to test files. */
    private static final String TEST_BASE = "/com/ivan/xml/";
    /** XML document on the classpath to read. */
    private static final String XML_DOCUMENT = TEST_BASE + "command-sheet.xml";

    /**
     * Tests reading an XML document that contains a reference to a DTD
     * that is not available without applying the wrapping document factory
     * builder.
     * 
     * @throws Exception If error occurs. A FileNotFoundException is
     * expected.
     */
    @Test(expected = FileNotFoundException.class)
    public void testReadDocumentWithout() throws Exception {
        final Document theXmlDocument = readXml();
        Assert.assertNotNull(theXmlDocument);
    }

    /**
     * Tests reading an XML document that contains a reference to a DTD
     * that is not available applying the wrapping document factory
     * builder.
     * 
     * @throws Exception If error occurs. Indicates test failure.
     */
    @Test()
    public void testReadDocumentWith() throws Exception {
        WrappingDocumentBuilderFactory.activate();
        final Document theXmlDocument = readXml();
        WrappingDocumentBuilderFactory.deactivate();

        Assert.assertNotNull(theXmlDocument);
    }

    /**
     * Reads an XML document on the classpath using DOM.
     * 
     * @return The read DOM document object.
     * @throws Exception If error occurs reading XML document.
     */
    private Document readXml() throws Exception {
        Document theXmlDocument = null;

        final InputStream theXmlDocumentStream =
            getClass().getResourceAsStream(XML_DOCUMENT);

        final DocumentBuilderFactory theXmlDocumentBuilderFactory =
            DocumentBuilderFactory.newInstance();
        final DocumentBuilder theXmlDocumentBuilder =
            theXmlDocumentBuilderFactory.newDocumentBuilder();

        theXmlDocument = theXmlDocumentBuilder.parse(theXmlDocumentStream);

        return theXmlDocument;
    }
}

Note:

  • The first test is expected to throw a FileNotFound exception.
    This is the exception I saw when running the test from within Eclipse.
  • The second test method is identical to the first except for the calls to the static methods activate and deactivate on the WrappingDocumentBuilderFactory class.
    The call to activate enables the document builder factory wrapper and the call to deactivate consequently disables the document builder factory wrapper.
    In production code, the call to deactivate should be placed in a finally-block and the call to the code that reads the XML document should be placed in a try-block, to ensure that the wrapper is disabled, regardless of the outcome of trying to read the XML document.

The above tests uses an XML file in the com.ivan.xml package on the classpath:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE PDMessage SYSTEM "idontexist.dtd"[]>
<tns:CommandSheet
    xmlns:tns="http://www.example.com/SpaceWarGame"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <commands>
        <move>
            <point x="3.14159E0" y="3.14159E0"/>
        </move>
        <jump>
            <destination x="3.14159E0" y="3.14159E0"/>
        </jump>
        <attack>
            <targetLocation x="3.14159E0" y="3.14159E0"/>
        </attack>
        <transmit>
            <text>String</text>
        </transmit>
    </commands>
</tns:CommandSheet>

Note:

  • The second line in the XML file contains:
    <!DOCTYPE PDMessage SYSTEM "idontexist.dtd"[]>

    This is the line that causes a FileNotFoundException to be thrown when I tried to read the XML file using JAXP.
    I have deliberately given the DTD file a name that does not exist in my environment.

The Wrapper

The document builder factory wrapper class is implemented like this:

package com.ivan.xml;

import java.util.logging.Level;
import java.util.logging.Logger;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.validation.Schema;

/**
 * JAXP document builder factory that disables loading of external DTDs
 * when reading XML documents using document builders created by this
 * factory.
 * When activated, this document builder factory will wrap the current
 * document builder factory and become the current document builder
 * factory, delegating to the previous document builder factory.
 * When deactivated, the previous document builder factory will be
 * restored as the default document builder factory.
 *
 * @author Ivan Krizsan
 */
public class WrappingDocumentBuilderFactory extends DocumentBuilderFactory {
    /* Constant(s): */
    /** Class logger. */
    private static final Logger LOGGER = Logger
        .getLogger(WrappingDocumentBuilderFactory.class.getCanonicalName());
    /** JAXP document builder factory system property name. */
    private static final String DOCUMENTBUILDERFACTORY_SYSTEM_PROPERTY_NAME =
        "javax.xml.parsers.DocumentBuilderFactory";
    /** Name of load external DTD feature. */
    private static final String LOADEXTERNALDTD_FEATURE =
        "http://apache.org/xml/features/nonvalidating/load-external-dtd";
    /**
     * Xerces validation feature. Turn off to be able to deactivate
     * DTD loading.
     */
    private static final String XML_VALIDATION_FEATURE =
        "http://xml.org/sax/features/validation";

    /* Class variable(s): */
    /** Previous document builder factory class. */
    protected static Class<?> sPreviousDocumentBuilderFactoryClass = null;
    /** Previous value of document builder factory system property. */
    protected static String sPreviousDocumentBuilderFactorySystemPropertyValue =
        null;

    /* Instance variable(s): */
    /** Document builder factory to delegate to. */
    protected DocumentBuilderFactory mDocumentBuilderFactoryDelegateBuilderFactory;

    /**
     * Activates this document builder factory wrapper.
     */
    public static void activate() {
        /* 
         * Save value of current JAXP document builder factory
         * system property.
         */
        sPreviousDocumentBuilderFactorySystemPropertyValue =
            System.getProperty(DOCUMENTBUILDERFACTORY_SYSTEM_PROPERTY_NAME);

        /* 
         * Save previous document builder factory class.
         * Use JAXP to request an instance is easier than trying to
         * find the class name ourselves.
         */
        sPreviousDocumentBuilderFactoryClass =
            DocumentBuilderFactory.newInstance().getClass();

        LOGGER.info("Previous document builder factory class: " +
            sPreviousDocumentBuilderFactoryClass.getCanonicalName());

        /*
         * Set JAXP document builder factory system property name to this
         * class, causing this class to be the first choice when requesting
         * document builder factories from JAXP.
         */
        System.setProperty(DOCUMENTBUILDERFACTORY_SYSTEM_PROPERTY_NAME,
            WrappingDocumentBuilderFactory.class.getCanonicalName());

        LOGGER.info("Enabled document builder factory wrapper. " +
            "Delegating to: " +
            sPreviousDocumentBuilderFactoryClass.getCanonicalName());
    }

    /**
     * Deactivates this document builder factory wrapper.
     */
    public static void deactivate() {

        if (sPreviousDocumentBuilderFactoryClass != null) {
            /* 
             * Restore previous JAXP document builder factory system
             * property value.
             */
            if (sPreviousDocumentBuilderFactorySystemPropertyValue != null) {
                System.setProperty(DOCUMENTBUILDERFACTORY_SYSTEM_PROPERTY_NAME,
                    sPreviousDocumentBuilderFactorySystemPropertyValue);
            } else {
                System
                    .clearProperty(DOCUMENTBUILDERFACTORY_SYSTEM_PROPERTY_NAME);
            }
            /* Clears variables to indicate that wrapper is deactivated. */
            sPreviousDocumentBuilderFactorySystemPropertyValue = null;
            sPreviousDocumentBuilderFactoryClass = null;

            LOGGER.info("Deactivated document builder factory wrapper.");
        } else {
            LOGGER
                .info("Document builder factory wrapper not active, do nothing.");
        }
    }

    /**
     * Default constructor.
     * Sets features in the delegate document builder factory as to
     * disable DTD validation.
     */
    public WrappingDocumentBuilderFactory() {
        if (sPreviousDocumentBuilderFactoryClass != null) {
            /* Create delegate document builder factory. */
            try {
                mDocumentBuilderFactoryDelegateBuilderFactory =
                    (DocumentBuilderFactory) sPreviousDocumentBuilderFactoryClass
                        .newInstance();

                /* Set features to turn off loading of external DTDs. */
                mDocumentBuilderFactoryDelegateBuilderFactory.setFeature(
                    LOADEXTERNALDTD_FEATURE, false);
                mDocumentBuilderFactoryDelegateBuilderFactory.setFeature(
                    XML_VALIDATION_FEATURE, false);
            } catch (final Exception theException) {
                LOGGER.log(Level.SEVERE,
                    "Error occurred creating wrapping document "
                        + "builder factory instance", theException);
            }
        } else {
            throw new Error("No previous document builder factory class set.");
        }
    }

    /*
     * The following are delegate methods that ensures that all information
     * is conveyed to the delegate.
     */

    @Override
    public DocumentBuilder newDocumentBuilder()
        throws ParserConfigurationException {
        return mDocumentBuilderFactoryDelegateBuilderFactory
            .newDocumentBuilder();
    }

    @Override
    public void setNamespaceAware(final boolean inAwareness) {
        mDocumentBuilderFactoryDelegateBuilderFactory
            .setNamespaceAware(inAwareness);
    }

    @Override
    public void setValidating(final boolean inValidating) {
        mDocumentBuilderFactoryDelegateBuilderFactory
            .setValidating(inValidating);
    }

    @Override
    public void setIgnoringElementContentWhitespace(final boolean inWhitespace) {
        mDocumentBuilderFactoryDelegateBuilderFactory
            .setIgnoringElementContentWhitespace(inWhitespace);
    }

    @Override
    public void setExpandEntityReferences(final boolean inExpandEntityRef) {
        mDocumentBuilderFactoryDelegateBuilderFactory
            .setExpandEntityReferences(inExpandEntityRef);
    }

    @Override
    public void setIgnoringComments(final boolean inIgnoreComments) {
        mDocumentBuilderFactoryDelegateBuilderFactory
            .setIgnoringComments(inIgnoreComments);
    }

    @Override
    public void setCoalescing(final boolean inCoalescing) {
        mDocumentBuilderFactoryDelegateBuilderFactory
            .setCoalescing(inCoalescing);
    }

    @Override
    public boolean isNamespaceAware() {
        return mDocumentBuilderFactoryDelegateBuilderFactory.isNamespaceAware();
    }

    @Override
    public boolean isValidating() {
        return mDocumentBuilderFactoryDelegateBuilderFactory.isValidating();
    }

    @Override
    public boolean isIgnoringElementContentWhitespace() {
        return mDocumentBuilderFactoryDelegateBuilderFactory
            .isIgnoringElementContentWhitespace();
    }

    @Override
    public boolean isExpandEntityReferences() {
        return mDocumentBuilderFactoryDelegateBuilderFactory
            .isExpandEntityReferences();
    }

    @Override
    public boolean isIgnoringComments() {
        return mDocumentBuilderFactoryDelegateBuilderFactory
            .isIgnoringComments();
    }

    @Override
    public boolean isCoalescing() {
        return mDocumentBuilderFactoryDelegateBuilderFactory.isCoalescing();
    }

    @Override
    public void setAttribute(final String inName, final Object inValue)
        throws IllegalArgumentException {
        mDocumentBuilderFactoryDelegateBuilderFactory.setAttribute(inName,
            inValue);
    }

    @Override
    public Object getAttribute(final String inName)
        throws IllegalArgumentException {
        return mDocumentBuilderFactoryDelegateBuilderFactory
            .getAttribute(inName);
    }

    @Override
    public void setFeature(final String inName, final boolean inValue)
        throws ParserConfigurationException {
        mDocumentBuilderFactoryDelegateBuilderFactory.setFeature(inName,
            inValue);
    }

    @Override
    public boolean getFeature(final String inName)
        throws ParserConfigurationException {
        return mDocumentBuilderFactoryDelegateBuilderFactory.getFeature(inName);
    }

    @Override
    public Schema getSchema() {
        return mDocumentBuilderFactoryDelegateBuilderFactory.getSchema();
    }

    @Override
    public void setSchema(final Schema inSchema) {
        mDocumentBuilderFactoryDelegateBuilderFactory.setSchema(inSchema);
    }

    @Override
    public void setXIncludeAware(final boolean inState) {
        mDocumentBuilderFactoryDelegateBuilderFactory.setXIncludeAware(inState);
    }

    @Override
    public boolean isXIncludeAware() {
        return mDocumentBuilderFactoryDelegateBuilderFactory.isXIncludeAware();
    }
}

Note:

  • When being activated, the wrapper saves the value of the system property containing the name of the class to use when instantiating document builder factories and the class of the current type of document builder factory that is created.
    If the system property is not set, these two pieces of data does not refer to the same class, otherwise they do.
  • When deactivating the wrapper, the system property value is cleared if the property value saved was null.
    A null value cannot be inserted in the map that contains the system properties, thus this way of setting/clearing the system property.
  • In the WrappingDocumentBuilderFactory constructor, a new instance of the previously used document builder factory is created and the two features discussed earlier are set to false on the instance.
    The wrapper delegates all the work to an instance of the previously used document builder factory. The two features are set to prevent attempts at loading DTD documents when parsing XML.
  • The constructor of the wrapper class is followed by a large number of methods that just delegates to an instance of the previous document builder factory.
    The wrapper must make sure that this delegate instance is in the proper state, otherwise there will be unexpected results when trying to parse XML.

If we now run the unit-test, both the tests should pass.

Leave a Reply

Your email address will not be published. Required fields are marked *