Table objects

Next Previous Up Contents
Next: Table filter commands (cmd_*)
Up: JyStilts - STILTS from Python
Previous: Table I/O

4.3 Table objects

The tables read by the tread function and produced by operating on them within JyStilts have a number of methods defined on them. These are explained below.

First, a number of special methods are defined which allow a table to behave in python like a sequence of rows:

__iter__: This special method means that the table can be treated as an iterable, so that for instance "for row in table:" will iterate over all rows.
__len__ (random-access tables only): This special method means that you can use the expression "len(table)" to count the number of rows. This method is not available for tables with sequential access only.
__getitem__ (random-access tables only): Returns a row at a given index in the table. This special method means that you can use indexing expressions like "table[3]" or table[0:10] to obtain the row or rows corresponding to a given row index or slice. This method is not available for tables with sequential access only.
__add__, __mul__, __rmul__: These special methods allow the addition and multiplication operators "+" and and "*" to be used with the sense of concatenation. Thus "table1+table2" will produce a new table with the rows of table1 followed by the rows of table2. Note this will only work if both tables have compatible columns. Similarly "table*3" would produce a table like table but with all its rows repeated three times.

In all of these cases, each row object that is accessed is a tuple of the column values for that row of the table. The tuple items (table cells) may be accessed using a key which is a numeric index or slice in the usual way, or with a key which is a column name, or one of the ColumnInfo objects returned by columns().

Sometimes, the result of a table operation will be a table which does not have random access. For such tables you can iterate over the rows, but not get their row values by indexing. Non-random-access tables are also peculiar in that getRowCount returns a negative value. To take a table which may not have random access and make it capable of random access, use the random filter: "table=table.cmd_random()".

To a large extent it is possible to duplicate the functions of the various STILTS commands by writing your own python code based on these python-friendly table access methods. Note however that such python-based processing is likely to be much slower than the STILTS equivalents. If performance is important to you, you should try in most cases to use the various cmd_* commands etc for table processing.

Second, some additional utility methods are defined:

count_rows(): Returns the number of rows in the table in the most efficient way possible. If the table is random-access or otherwise knows its row count without further calculation, that value is returned. Otherwise, the rows are iterated over without reading, which may take some time but should be much more efficient than iterating over the table as an iterable, since the row cell data itself is not retrieved.
columns(): Returns a tuple of the column descriptors for the table. Each item in the tuple is an instance of the ColumnInfo class; useful methods include getName(), getUnitString(), getUCD(). str(column) will return its name.
coldata(key): Returns a sequence of the values for the given column. The sequence will have the same number of elements as the number of rows in the table. The key argument may be either an integer column index (if negative, counts backwards from the end), or the column name or info object. The returned value will always be iterable (has __iter__), but will only be indexable (has __len__ and __getitem__) if the table is random access.
parameters(): Returns a name to value mapping of the table parameters (per-table metadata). This does not include all the available information about those parameters, for instance unit and UCD information is not included. For more detailed information, use the StarTable methods. Note that as currently implemented, changing the values in the returned mapping will not change the actual table parameter values.
write(location=None, fmt=None): Outputs the table. The optional location argument gives a filename or writable file object, and the optional fmt argument gives a format, one of the options listed in Section 5.1.1. If location is not supplied, output is to standard output, so in an interactive session it will be printed to the terminal. If fmt is not supplied, an attempt will be made to guess a suitable format based on the location.

Third, a set of cmd_* methods corresponding to the STILTS filters are available; these are described in Section 4.4.

Fourth, a set of mode_* methods corresponding to the STILTS output modes are available; these are described in Section 4.5.

Finally, tables are also instances of the StarTable interface defined by STIL, which is the table I/O layer underlying STILTS. The full documentation can be found in the user manual and javadocs on the STIL page, and all the java methods can be used from JyStilts, but in most cases there are more pythonic equivalents provided, as described above.

Here are some examples of these methods in use:

   >>> import stilts
   >>> xsc = stilts.tread('/data/table/2mass_xsc.xml')  # read table
   >>> xsc.mode_count()                                 # show rows/column count
   columns: 6   rows: 1646844
   >>> print xsc.columns()                              # full info on columns
   (id(String), ra(Double)/degrees, dec(Double)/degrees, jmag(Double)/mag, hmag(Double)/mag, kmag(Double)/mag)
   >>> print [str(col) for col in xsc.columns()]        # column names only
   ['id', 'ra', 'dec', 'jmag', 'hmag', 'kmag']
   >>> row = xsc[1000000]                               # examine millionth row
   >>> print row
   (u'19433000+4003190', 295.875, 40.055286, 14.449, 13.906, 13.374)
   >>> print row[0]                                     # cell by index
   19433000+4003190
   >>> print row['ra'], row['dec']                      # cells by col name
   295.875 40.055286
   >>> print len(xsc)                                   # count rows, maybe slow
   1646844
   >>> print xsc.count_rows()                           # count rows efficiently
   1646844L
   >>> print (xsc+xsc).count_rows()                     # concatenate
   3293688L
   >>> print (xsc*10000).count_rows()
   16468440000L
   >>> for row in xsc:                  # select rows using python commands
   ...     if row[4] - row[3] > 3.0:
   ...         print row[0]
   ... 
   11165243+2925509
   20491597+5119089
   04330238+0858101
   01182715-1013248
   11244075+5218078
   >>>                                  # same thing using stilts (50x faster)
   >>> (xsc.cmd_select('hmag - jmag > 3.0')
   ...     .cmd_keepcols('id')
   ...     .write())
   +------------------+
   | id               |
   +------------------+
   | 11165243+2925509 |
   | 20491597+5119089 |
   | 04330238+0858101 |
   | 01182715-1013248 |
   | 11244075+5218078 |
   +------------------+

The following are all ways to obtain the value of a given cell in the table from the previous example.

    xsc.getCell(99, 0)
    xsc[99][0]
    xsc[99]['id']
    xsc.coldata(0)[99]
    xsc.coldata('id')[99]

Some of these methods may be more efficient than others. Note that none of these methods will work if the table has sequential-only access.

Next Previous Up Contents
Next: Table filter commands (cmd_*)
Up: JyStilts - STILTS from Python
Previous: Table I/O

STILTS - Starlink Tables Infrastructure Library Tool Set
Starlink User Note256
STILTS web page: http://www.starlink.ac.uk/stilts/
Author email: m.b.taylor@bristol.ac.uk
Mailing list: topcat-user@jiscmail.ac.uk