Disabling Loading of External DTDs in XML Documents Read by JAXP

By | August 2, 2013

I had a specific problem I set out to solve, but the technique described in this article can be used to customize the behaviour of JAXP document builder factories that are used by code you cannot change by setting other features and properties on such document builder factories.

Problem

I first saw the problem with JAXP wanting to read a DTD document referenced from an XML document when I created a mapping from one XML format to another using the Mule DataMapper.
Later, I encountered the same problem when using Java code generated by Altova’s MapForce.

In the case with the Mule DataMapper, setting the appropriate feature on the JAXP document builder factory turned out to be very difficult, if not impossible. Also running a patched version of Mule did not quite appeal to me.
In the case with the MapForce generated code, I just did not want to modify the generated code, in the case I will have to re-generate it at some later point in time.

Background

As described in the Java API documentation of the newInstance() method in the DocumentBuilderFactory class, JAXP uses the following procedure to determine which class to create an instance of when a document builder factory is requested:

  • Examine the value of the system property javax.xml.parsers.DocumentBuilderFactory.
    If set, it should contain the fully qualified name of the class to create.
  • Examine the jaxp.properties file in the lib directory of the Java runtime environment used.
    Using the same key as for the above mentioned system property, JAXP tries to retrieve a value that again is to be a fully qualified name of the document builder factory class to use.
  • Use the Services API to try to determine the fully qualified name of the document builder factory class to use.
    For details, see the Java API documentation for the java.util.ServiceLoader class.

Since I do not have control over the XML fragments that are received and did not want to revert to string-manipulation of incoming XML data, I made some research and found the following two features available in the Xerces XML parser that Java uses:

  • http://apache.org/xml/features/nonvalidating/load-external-dtd
    If set to false and if validation is turned off, there will be no attempts at loading external DTD documents when parsing XML.
  • http://xml.org/sax/features/validation
    If set to false, no validation will take place when parsing XML.

Solution

Given that Java system properties can be manipulated programmatically at runtime, I came up with the idea to dynamically replace the current document builder factory with a wrapper that sets certain features on new instances of the document builder factory. Setting those features were said to cause references to DTD documents to be ignored when parsing XML that contained such references.

A Test

I implemented a JUnit 4 test to expose the problem. In the test, I use JAXP to read an XML file on the classpath into an XML document object:

Note:

  • The first test is expected to throw a FileNotFound exception.
    This is the exception I saw when running the test from within Eclipse.
  • The second test method is identical to the first except for the calls to the static methods activate and deactivate on the WrappingDocumentBuilderFactory class.
    The call to activate enables the document builder factory wrapper and the call to deactivate consequently disables the document builder factory wrapper.
    In production code, the call to deactivate should be placed in a finally-block and the call to the code that reads the XML document should be placed in a try-block, to ensure that the wrapper is disabled, regardless of the outcome of trying to read the XML document.

The above tests uses an XML file in the com.ivan.xml package on the classpath:

Note:

  • The second line in the XML file contains:

    This is the line that causes a FileNotFoundException to be thrown when I tried to read the XML file using JAXP.
    I have deliberately given the DTD file a name that does not exist in my environment.

The Wrapper

The document builder factory wrapper class is implemented like this:

Note:

  • When being activated, the wrapper saves the value of the system property containing the name of the class to use when instantiating document builder factories and the class of the current type of document builder factory that is created.
    If the system property is not set, these two pieces of data does not refer to the same class, otherwise they do.
  • When deactivating the wrapper, the system property value is cleared if the property value saved was null.
    A null value cannot be inserted in the map that contains the system properties, thus this way of setting/clearing the system property.
  • In the WrappingDocumentBuilderFactory constructor, a new instance of the previously used document builder factory is created and the two features discussed earlier are set to false on the instance.
    The wrapper delegates all the work to an instance of the previously used document builder factory. The two features are set to prevent attempts at loading DTD documents when parsing XML.
  • The constructor of the wrapper class is followed by a large number of methods that just delegates to an instance of the previous document builder factory.
    The wrapper must make sure that this delegate instance is in the proper state, otherwise there will be unexpected results when trying to parse XML.

If we now run the unit-test, both the tests should pass.

Leave a Reply

Your email address will not be published. Required fields are marked *