The usage of tmatch1
is
stilts <stilts-flags> tmatch1 matcher=<matcher-name> params=<match-params> tuning=<tuning-params> values=<expr-list> action=identify|keep0|keep1|wide2|wideN progress=none|log|time|profile runner=parallel|parallel<n>|parallel-all|sequential|classic|partest ifmt=<in-format> istream=true|false icmd=<cmds> ocmd=<cmds> omode=out|meta|stats|count|checksum|cgi|discard|topcat|samp|plastic|tosql|gui out=<out-table> ofmt=<out-format> [in=]<table>If you don't have the
stilts
script installed,
write "java -jar stilts.jar
" instead of
"stilts
" - see Section 3.
The available <stilts-flags>
are listed
in Section 2.1.
For programmatic invocation,
the Task class for this
command is uk.ac.starlink.ttools.task.TableMatch1
.
Parameter values are assigned on the command line as explained in Section 2.3. They are as follows:
action = identify|keep0|keep1|wide2|wideN
(Match1Type)
identify
:
The output table is the same as the input table except that
it contains two additional columns,
GroupID
and
GroupSize
,
following the input columns.
Each group of rows which matched is assigned a unique integer,
recorded in the GroupID column,
and the size of each group is recorded in the GroupSize
column.
Rows which don't match any others (singles) have null values in
both these columns.
keep0
:
The result is a new table containing only "single" rows,
that is ones which don't match any other rows in the table.
Any other rows are thrown out.
keep1
:
The result is a new table in which only one row
(the first in the input table order)
from each group of matching ones is retained.
A subsequent intra-table match with the same criteria
would therefore show no matches.
wideN
:
The result is a new "wide" table consisting of matched rows in
the input table stacked next to each other.
Only groups of exactly N rows in the input table are used to
form the output table; each row of the output table consists of
the columns of the first group member, followed by the columns of
the second group member and so on.
The output table therefore has
N times as many columns as the input table.
The column names in the new table have
_1
, _2
, ...
appended to them to avoid duplication.
[Default: identify
]
icmd = <cmds>
(ProcessingStep[])
in
,
before any other processing has taken place.
The value of this parameter is one or more of the filter
commands described in Section 6.1.
If more than one is given, they must be separated by
semicolon characters (";").
This parameter can be repeated multiple times on the same
command line to build up a list of processing steps.
The sequence of commands given in this way
defines the processing pipeline which is performed on the table.
Commands may alteratively be supplied in an external file,
by using the indirection character '@'.
Thus a value of "@filename
"
causes the file filename
to be read for a list
of filter commands to execute. The commands in the file
may be separated by newline characters and/or semicolons,
and lines which are blank or which start with a
'#
' character are ignored.
ifmt = <in-format>
(String)
in
.
The known formats are listed in Section 5.1.1.
This flag can be used if you know what format your
table is in.
If it has the special value
(auto)
(the default),
then an attempt will be
made to detect the format of the table automatically.
This cannot always be done correctly however, in which case
the program will exit with an error explaining which
formats were attempted.
This parameter is ignored for scheme-specified tables.
[Default: (auto)
]
in = <table>
(StarTable)
-
",
meaning standard input.
In this case the input format must be given explicitly
using the ifmt
parameter.
Note that not all formats can be streamed in this way.:<scheme-name>:<scheme-args>
.<
" character at the start,
or a "|
" character at the end
("<syscmd
" or
"syscmd|
").
This executes the given pipeline and reads from its
standard output.
This will probably only work on unix-like systems.istream = true|false
(Boolean)
in
parameter
will be read as a stream.
It is necessary to give the
ifmt
parameter
in this case.
Depending on the required operations and processing mode,
this may cause the read to fail (sometimes it is necessary
to read the table more than once).
It is not normally necessary to set this flag;
in most cases the data will be streamed automatically
if that is the best thing to do.
However it can sometimes result in less resource usage when
processing large files in certain formats (such as VOTable).
This parameter is ignored for scheme-specified tables.
[Default: false
]
matcher = <matcher-name>
(MatchEngine)
params
,
values*
and
tuning
parameter(s).
[Default: sky
]
ocmd = <cmds>
(ProcessingStep[])
Commands may alteratively be supplied in an external file,
by using the indirection character '@'.
Thus a value of "@filename
"
causes the file filename
to be read for a list
of filter commands to execute. The commands in the file
may be separated by newline characters and/or semicolons,
and lines which are blank or which start with a
'#
' character are ignored.
ofmt = <out-format>
(String)
(auto)
"
(the default),
then the output filename will be
examined to try to guess what sort of file is required
usually by looking at the extension.
If it's not obvious from the filename what output format is
intended, an error will result.
This parameter must only be given if
omode
has its default value of "out
".
[Default: (auto)
]
omode = out|meta|stats|count|checksum|cgi|discard|topcat|samp|plastic|tosql|gui
(ProcessingMode)
out
, which means that
the result will be written as a new table to disk or elsewhere,
as determined by the out
and ofmt
parameters.
However, there are other possibilities, which correspond
to uses to which a table can be put other than outputting it,
such as displaying metadata, calculating statistics,
or populating a table in an SQL database.
For some values of this parameter, additional parameters
(<mode-args>
)
are required to determine the exact behaviour.
Possible values are
out
meta
stats
count
checksum
cgi
discard
topcat
samp
plastic
tosql
gui
help=omode
flag
or see Section 6.4 for more information.
[Default: out
]
out = <out-table>
(TableConsumer)
This parameter must only be given if
omode
has its default value of "out
".
[Default: -
]
params = <match-params>
(String[])
matcher
parameter.
If it contains multiple values, they must be separated by spaces;
values which contain a space can be 'quoted' or "quoted".
progress = none|log|time|profile
(String)
The options are:
none
:
no progress is shown
log
:
progress information is shown
time
:
progress information and some time profiling
information is shown
profile
:
progress information and limited time/memory profiling
information are shown
[Default: log
]
runner = parallel|parallel<n>|parallel-all|sequential|classic|partest
(RowRunner)
parallel
:
uses multithreaded implementation for large tables,
with default parallelism,
which is the smaller of 6
and the number of available processors
parallel<n>
:
uses multithreaded implementation for large tables,
with parallelism given by the supplied value
<n>
parallel-all
:
uses multithreaded implementation for large tables,
with a parallelism given by the number of
available processors
sequential
:
uses multithreaded implementation
but with only a single thread
classic
:
uses legacy sequential implementation
partest
:
uses multithreaded implementation even when tables are small
parallel*
options
should normally run faster than
sequential
or classic
(which are provided mainly for testing purposes),
at least for large matches
and where multiple processing cores are available.
The default value "parallel
"
is currently limited to a parallelism of 6
since larger values yield diminishing returns given that
some parts of the matching algorithms run sequentially
(Amdahl's Law), and using too many threads
can sometimes end up doing more work
or impacting on other operations on the same machine.
But you can experiment with other concurrencies,
e.g. "parallel16
" to run on 16 cores
(if available) or "parallel-all
"
to run on all available cores.
The value of this parameter should make no difference to the matching results. If you notice any discrepancies please report them.
[Default: parallel
]
tuning = <tuning-params>
(String[])
matcher
parameter.
If it contains multiple values, they must be separated by spaces;
values which contain a space can be 'quoted' or "quoted".
If this optional parameter is not supplied, sensible defaults
will be chosen.
values = <expr-list>
(String[])
matcher
.
Depending on the kind of match, the number and type of
the values required will be different.
Multiple values should be separated by whitespace;
if whitespace occurs within a single value it must be
'quoted' or "quoted".
Elements of the expression list are commonly just column
names, but may be algebraic expressions calculated from
zero or more columns as explained in Section 10.