The usage of tmatchn
is
stilts <stilts-flags> tmatchn nin=<count> ifmtN=<in-format> inN=<tableN> icmdN=<cmds> ocmd=<cmds> omode=out|meta|stats|count|checksum|cgi|discard|topcat|samp|plastic|tosql|gui out=<out-table> ofmt=<out-format> multimode=pairs|group iref=<table-index> matcher=<matcher-name> params=<match-params> tuning=<tuning-params> valuesN=<expr-list> joinN=default|match|nomatch|always fixcols=none|dups|all suffixN=<label> progress=none|log|time|profile runner=parallel|parallel<n>|parallel-all|sequential|classic|partestIf you don't have the
stilts
script installed,
write "java -jar stilts.jar
" instead of
"stilts
" - see Section 3.
The available <stilts-flags>
are listed
in Section 2.1.
For programmatic invocation,
the Task class for this
command is uk.ac.starlink.ttools.task.TableMatchN
.
Parameter values are assigned on the command line as explained in Section 2.3. They are as follows:
fixcols = none|dups|all
(Fixer)
none
: columns are not renameddups
: columns which would otherwise have duplicate names in the output will be renamed to indicate which table they came fromall
: all columns will be renamed to indicate which table they came fromsuffix*
parameters.
[Default: dups
]
icmdN = <cmds>
(ProcessingStep[])
inN
,
before any other processing has taken place.
The value of this parameter is one or more of the filter
commands described in Section 6.1.
If more than one is given, they must be separated by
semicolon characters (";").
This parameter can be repeated multiple times on the same
command line to build up a list of processing steps.
The sequence of commands given in this way
defines the processing pipeline which is performed on the table.
Commands may alternatively be supplied in an external file,
by using the indirection character '@
'.
Thus a value of "@filename
"
causes the file filename
to be read for a list
of filter commands to execute. The commands in the file
may be separated by newline characters and/or semicolons,
and lines which are blank or which start with a
'#
' character are ignored.
A backslash character '\
' at the end of a line
joins it with the following line.
ifmtN = <in-format>
(String)
inN
.
The known formats are listed in Section 5.1.1.
This flag can be used if you know what format your
table is in.
If it has the special value
(auto)
(the default),
then an attempt will be
made to detect the format of the table automatically.
This cannot always be done correctly however, in which case
the program will exit with an error explaining which
formats were attempted.
This parameter is ignored for scheme-specified tables.
[Default: (auto)
]
inN = <tableN>
(StarTable)
-
",
meaning standard input.
In this case the input format must be given explicitly
using the ifmtN
parameter.
Note that not all formats can be streamed in this way.:<scheme-name>:<scheme-args>
.<
" character at the start,
or a "|
" character at the end
("<syscmd
" or
"syscmd|
").
This executes the given pipeline and reads from its
standard output.
This will probably only work on unix-like systems.iref = <table-index>
(Integer)
multimode
=pairs
this parameter gives the index of the table in the input table
list which is to serve as the reference table
(the one which must be matched by other tables).
Ignored in other modes.
Row ordering in the output table is usually tidiest if the default setting of 1 is used (i.e. if the first input table is used as the reference table).
[Default: 1
]
joinN = default|match|nomatch|always
(MultiJoinType)
The default behaviour is that a row will appear in the output table if it represents a match of rows from two or more of the input tables. This can be altered on a per-input-table basis however by choosing one of the non-default options below:
match
:
Rows are included only if they contain an entry from
input table N.
nomatch
:
Rows are included only if they do not contain an entry from
input table N.
always
:
Rows are included if they contain an entry from
input table N
(overrides any match and nomatch
settings of other tables).
default
:
Input table N has no special effect on
whether rows are included.
[Default: default
]
matcher = <matcher-name>
(MatchEngine)
params
,
values*
and
tuning
parameter(s).
[Default: sky
]
multimode = pairs|group
(String)
pairs
:
Each output row corresponds to a single row of the
reference table
(see parameter iref
)
and contains entries from other tables which are pair matches
to that.
If a reference table row matches multiple rows from one of
the other tables, only the best one is included.
group
:
Each output row corresponds to a group of entries from the
input tables which are
mutually linked by pair matches between them.
This means that although you can get from any entry to any
other entry via one or more pair matches,
there is no guarantee that any entry
is a pair match with any other entry.
No table has privileged status in this case.
If there are multiple entries from a given table in the
match group, an arbitrary one is chosen for inclusion
(there is no unique way to select the best).
See Section 7.2 for more discussion.
joinN
parameter.
[Default: pairs
]
nin = <count>
(Integer)
ifmtN
, inN
and icmdN
.
ocmd = <cmds>
(ProcessingStep[])
Commands may alternatively be supplied in an external file,
by using the indirection character '@
'.
Thus a value of "@filename
"
causes the file filename
to be read for a list
of filter commands to execute. The commands in the file
may be separated by newline characters and/or semicolons,
and lines which are blank or which start with a
'#
' character are ignored.
A backslash character '\
' at the end of a line
joins it with the following line.
ofmt = <out-format>
(String)
(auto)
"
(the default),
then the output filename will be
examined to try to guess what sort of file is required
usually by looking at the extension.
If it's not obvious from the filename what output format is
intended, an error will result.
This parameter must only be given if
omode
has its default value of "out
".
[Default: (auto)
]
omode = out|meta|stats|count|checksum|cgi|discard|topcat|samp|plastic|tosql|gui
(ProcessingMode)
out
, which means that
the result will be written as a new table to disk or elsewhere,
as determined by the out
and ofmt
parameters.
However, there are other possibilities, which correspond
to uses to which a table can be put other than outputting it,
such as displaying metadata, calculating statistics,
or populating a table in an SQL database.
For some values of this parameter, additional parameters
(<mode-args>
)
are required to determine the exact behaviour.
Possible values are
out
meta
stats
count
checksum
cgi
discard
topcat
samp
plastic
tosql
gui
help=omode
flag
or see Section 6.4 for more information.
[Default: out
]
out = <out-table>
(TableConsumer)
This parameter must only be given if
omode
has its default value of "out
".
[Default: -
]
params = <match-params>
(String[])
matcher
parameter.
If it contains multiple values, they must be separated by spaces;
values which contain a space can be 'quoted' or "quoted".
progress = none|log|time|profile
(String)
The options are:
none
:
no progress is shown
log
:
progress information is shown
time
:
progress information and some time profiling
information is shown
profile
:
progress information and limited time/memory profiling
information are shown
[Default: log
]
runner = parallel|parallel<n>|parallel-all|sequential|classic|partest
(RowRunner)
parallel
:
uses multithreaded implementation for large tables,
with default parallelism,
which is the smaller of 6
and the number of available processors
parallel<n>
:
uses multithreaded implementation for large tables,
with parallelism given by the supplied value
<n>
parallel-all
:
uses multithreaded implementation for large tables,
with a parallelism given by the number of
available processors
sequential
:
uses multithreaded implementation
but with only a single thread
classic
:
uses legacy sequential implementation
partest
:
uses multithreaded implementation even when tables are small
parallel*
options
should normally run faster than
sequential
or classic
(which are provided mainly for testing purposes),
at least for large matches
and where multiple processing cores are available.
The default value "parallel
"
is currently limited to a parallelism of 6
since larger values yield diminishing returns given that
some parts of the matching algorithms run sequentially
(Amdahl's Law), and using too many threads
can sometimes end up doing more work
or impacting on other operations on the same machine.
But you can experiment with other concurrencies,
e.g. "parallel16
" to run on 16 cores
(if available) or "parallel-all
"
to run on all available cores.
The value of this parameter should make no difference to the matching results. If you notice any discrepancies please report them.
[Default: parallel
]
suffixN = <label>
(String)
fixcols
parameter
is set so that input columns are renamed for insertion into
the output table, this parameter determines how the
renaming is done.
It gives a suffix which is appended to all renamed columns
from table N.
[Default: _N
]
tuning = <tuning-params>
(String[])
matcher
parameter.
If it contains multiple values, they must be separated by spaces;
values which contain a space can be 'quoted' or "quoted".
If this optional parameter is not supplied, sensible defaults
will be chosen.
valuesN = <expr-list>
(String[])
matcher
.
Depending on the kind of match, the number and type of
the values required will be different.
Multiple values should be separated by whitespace;
if whitespace occurs within a single value it must be
'quoted' or "quoted".
Elements of the expression list are commonly just column
names, but may be algebraic expressions calculated from
zero or more columns as explained in Section 10.