Table-Aware DOM Processing

Next Previous Up Contents
Next: Table-Aware SAX Processing
Up: Reading VOTables
Previous: Generic VOTable Read

7.3.2 Table-Aware DOM Processing

VOTable documents consist of a hierarchy of RESOURCE, DEFINITIONS, COOSYS, TABLE elements and so on. The methods described in the previous subsection effectively approximate this as a flat list of TABLE elements. If you are interested in the structure of the VOTable document in more detail than the table items that can be extracted from it, you will need to examine it in a different way, based on the XML. The usual way of doing this for an XML document in Java is to obtain a DOM (Document Object Model) based on the XML - this is an API defined by the W3C representing a tree-like structure of elements and attributes which can be navigated by using methods like getFirstChild and getParentNode.

STIL provides you with a DOM which can be viewed exactly like a standard one (it implements the DOM API) but has some special features.

All elements in it are instances of the VOElement class (which itself implements the DOM Element interface). This provides a few convenience methods such as getChildrenByName which can be useful but don't do anything that you couldn't do with the Element interface alone.
Some of the elements, according to their name, are instances of specialised subclasses of VOElement which provide methods specific to their rôle in a VOTable document. For instance every GROUP element in the tree is represented by a GroupElement; this class has a method getFields which returns all the FIELD elements associated with that group (this method examines its FIELDref children and locates their FIELD elements elsewhere in the DOM). The various specific element types are not considered in detail here - see the javadocs for the subclasses of VOElement.
The most important of these special element subclasses is TableElement. A TableElement can provide the table data stored within it; to access these data you don't need to know whether it is stored in TABLEDATA, FITS or BINARY form etc.
Full ID/ref cross-referencing is supported for elements which have ID attributes in the VOTable specification - this is required so that for instance FIELDref elements can access their FIELDs, and TABLE elements can define their structure by reference to previously defined ones. If you need to locate cross-references by hand you can use the getElementById method.
In most cases, the DOM you acquire will not contain the bulk data in the VOTable XML. Specifically, the children of TABLEDATA elements (a lot of TR and TDs) and of STREAM elements (long Base64-encoded strings containing FITS/binary data) will be absent. User code inspecting the DOM is rarely interested in these elements, only in the table data they represent, and this can be obtained from the corresponding TABLE element.
The DOM is modifiable - that is you can add, remove and relocate nodes within it in the standard ways permitted by the DOM API.

To acquire this DOM you will use a VOElementFactory, usually feeding a File, URL or InputStream to one of its makeVOElement methods. The bulk data-less DOM mentioned above is possible because the VOElementFactory processes the XML document using SAX, building a DOM as it goes along, but when it gets to the bulk data-bearing elements it interprets their data on the fly and stores it in a form which can be accessed efficiently later rather than inserting the elements into the DOM. SAX (Simple API for XML) is an event driven processing model which, unlike DOM, does not imply memory usage that scales with the size of the document. In this way any but the weirdest VOTable documents can be turned into a DOM of very modest size. This means you can have all the benefits of a DOM (full access to the hierarchical structure) without the disadvantages usually associated with DOM-based VOTable processing (potentially huge memory footprint). Of course in order to be accessed later, the data extracted from a stream of TR elements or from the inline content of a STREAM element has to get stored somewhere. Where it gets put is determined by the VOElementFactory's StoragePolicy (see Section 4).

If for some reason you want to work with a full DOM containing the TABLEDATA or STREAM children, you can parse the document to produce a DOM Document or Element as usual (e.g. using a DocumentBuilder) and feed that to one of the the VOElementFactory's makeVOElement methods instead.

Having obtained your DOM, the easiest way to access the data of a TABLE element is to locate the relevant TableElement in the tree and turn it into a StarTable using the VOStarTable adapter class. You can interrogate the resulting object for its data and metadata in the usual way as described in Section 2. This StarTable may or may not provide random access (isRandom may or may not return true), according to how the data were obtained. If it's a binary stream from a remote URL it may only be possible to read rows from start to finish a row at a time, but if it was in TABLEDATA form it will be possible to access cells in any order. If you need random access for a table and you don't have it (or don't know if you do) then use the methods described in Section 2.3.4.

It is possible to access the table data directly (without making it into a StarTable) by using the getData method of the TableElement, but in this case you need to work a bit harder to extract some of the data and metadata in useful forms. See the TabularData documentation for details.

One point to note about VOElementFactory's parsing is that it is not restricted to elements named in the VOTable standard, so a document which does not conform to the standard can still be processed as a VOTable if parts of it contain VOTable-like structures.

Here is an example of using this approach to read the structure of a, possibly complex, VOTable document. This program locates the third TABLE child of the first RESOURCE element and prints out its column titles and table data.

    void printThirdTable( File votFile ) throws IOException, SAXException {

        // Create a tree of VOElements from the given XML file.
        VOElement top = new VOElementFactory().makeVOElement( votFile );

        // Find the first RESOURCE element using standard DOM methods.
        NodeList resources = top.getElementsByTagName( "RESOURCE" );
        Element resource = (Element) resources.item( 0 );

        // Locate the third TABLE child of this resource using one of the
        // VOElement convenience methods.
        VOElement vResource = (VOElement) resource;
        VOElement[] tables = vResource.getChildrenByName( "TABLE" );
        TableElement tableEl = (TableElement) tables[ 2 ];

        // Turn it into a StarTable so we can access its data.
        StarTable starTable = new VOStarTable( tableEl );

        // Write out the column name for each of its columns.
        int nCol = starTable.getColumnCount();
        for ( int iCol = 0; iCol < nCol; iCol++ ) {
            String colName = starTable.getColumnInfo( iCol ).getName();
            System.out.print( colName + "\t" );
        }
        System.out.println();

        // Iterate through its data rows, printing out each element.
        for ( RowSequence rSeq = starTable.getRowSequence(); rSeq.next(); ) {
            Object[] row = rSeq.getRow();
            for ( int iCol = 0; iCol < nCol; iCol++ ) {
                System.out.print( row[ iCol ] + "\t" );
            }
            System.out.println();
        }
    }

Versions of STIL prior to V2.0 worked somewhat differently to this - they produced a tree structure representing the VOTable document which resembled, but wasn't, a DOM (it didn't implement the W3C DOM API). The current approach is more powerful and in some cases less fiddly to use.

Next Previous Up Contents
Next: Table-Aware SAX Processing
Up: Reading VOTables
Previous: Generic VOTable Read

STIL - Starlink Tables Infrastructure Library
Starlink User Note252
STIL web page: http://www.starlink.ac.uk/stil/
Author email: m.b.taylor@bristol.ac.uk