Most of the Java stream classes support sequential input and output but
the sequence of XML input is not always directly related to the sequence
of the required output. Moreover, the processing required for any part of
the imput may be influenced by content that occurs later in the document.
The package ca.gorman.io
provides helper classes support the use
of Java stream classes in XML applications that require nonsequential
processing.
A WriterStack can be used to reverse the order of lines from an input file. Before writing each line to the WriterStack, a CharArrayWriter is pushed on the WriterStack. After all lines have been read, each CharArrayWriter is popped and copied to the final output. The method that writes the line to the WriterStack has no knowledge of the actual destination.
A Resequencer (implemented here by a ResequencingWriter) can be used as a substitute for random access to the output stream. When data is required at a particular point in the output stream, but is not available, a place marker is written instead. The value of the place marker can be set at any time before or after it is written. When the Resequencer is closed, all of the markers are replaced by their corresponding values. The final result is the same as if the output stream (which is a Writer) had been written with random access.
Example | Java code | Input | Output |
---|---|---|---|
Using WriterStack to reverse the order of lines in a file | StackLines.java | lines.txt | stacklines.txt |
Using ResequencingWriter to support nonsequential output to a Writer | ResequencingWriterDemo.java | lines.txt | resequencingdemo.txt |
Java supports regular expression pattern matching on
java.lang.CharSequence
, which is a superclass of
java.lang.String
and of java.nio.CharBuffer
.
The pattern-action rules operate in a manner very similar to the rules in
awk
and perl
, where a pattern is followed by a
code block that is to be executed on the input that matches the pattern.
The principal differences are:
Package ca.gorman.util.scan
extends pattern matching to include
the application of multiple pattern-action rules to a CharBuffer
or a Reader
, producing output to a Appendable
or
Writer
. The package can be used by itself, or with the XML
parsing package.
The reference implementation is based on java.util.regex
, but
can handle multiple patterns in the same pass, and is sufficiently powerful
and flexible to do recursive-descent parsing of an input stream.
Example | Java code | Input | Output |
---|---|---|---|
Implementing a Four-Function Calculator with Pattern Rules (Still In Development) | Calculator.java | Not Available Yet | Not Available Yet |
JUnit test for recognizing nested parentheses. (This will be replaced later by an example.) | NestedParenthesesTest.java | Input is part of test | Output is part of test |
GXPARSE allows a programmer to use a sequential processing paradigm (like SAX) while treating elements and other structures as single objects (like DOM) and provides easy access to structural information (like DOM).
The Elements
example illustrates the basic principles of
parsing with GXPARSE.
Idrefs
is a more elaborate example showing how to do more
complex processing while still working in a stream-processing paradigm.
A Resequencer is used to support the processing of ID and IDREF attributes
without the need for complex programming. A WriterStack is used to simplify
programming by allowing temporary redirection of output. An ElementMapper
makes it unnecessary to test for the name of every element.
Taglist
uses WriterStacks and a Resequencer to print summary
information about an XML document.
Example | Java code | Input | Output |
---|---|---|---|
Translate XML to plain text, giving some elements special handling and and passing the rest through a single handler | Elements.java | example.xml | elements.txt |
Handle ID and IDREF attributes using Resequencer, WriterStack, and ElementMapper as described above | Idrefs.java | example.xml | idrefs.txt |
List element tags with element character count | Taglist.java | example.xml | taglist.txt |
The following examples are taken from the docbook-to-HTML translator that
is used to produce the HTML version of the GXPARSE user manual. They show
how a docbook itemizedlist
with listitem
members
is translated to a HTML unordered list (UL
) with LI
elements.
Because of the large number of elements, elements are mapped to classes, instead to methods. In this mapping, namespaces map to packages, instead of to classes. It is not necessary to specify a handler for every element, if the unspecified elements can all be handled in the same way by a default handler.
Description | Java code |
---|---|
Setting up and running the parser | Transform.java |
Invoking the ElementMapper |
DocbookListener.java |
Processing itemizedlist |
Element_itemizedlist.java |
Processing listitem |
Element_listitem.java |
Processing an element that has no declared handler | DefaultHandler.java |
Discarding the content of an element | Element_author.java |