Next Previous Up Contents
Next: densogram
Up: Layer Types
Previous: kde

8.3.33 knn

Plots a Discrete Kernel Density Estimate giving a smoothed frequency of data values along the horizontal axis, using an adaptive (K-Nearest-Neighbours) smoothing kernel. This is a generalisation of a histogram in which the bins are always 1 pixel wide, and a smoothing kernel is applied to each bin. The width and shape of the kernel may be varied.

The K-Nearest-Neighbour figure gives the number of points in each direction to determine the width of the smoothing kernel for smoothing each bin. Upper and lower limits for the kernel width are also supplied; if the upper and lower limits are equal, this is equivalent to a fixed-width kernel.

Note this is not a true Kernel Density Estimate, since, for performance reasons, the smoothing is applied to the (pixel-width) bins rather than to each data sample. The deviation from a true KDE caused by this quantisation will be at the pixel level, hence in most cases not visually apparent.

Usage Overview:

   layerN=knn colorN=<rrggbb>|red|blue|... transparencyN=0..1
              sidewaysN=true|false knnN=<number> symmetricN=true|false
              minsmoothN=+<width>|-<count> maxsmoothN=+<width>|-<count>
              kernelN=square|linear|epanechnikov|cos|cos2|gauss3|gauss6
              cumulativeN=none|forward|reverse
              normaliseN=none|area|unit|maximum|height fillN=solid|line|semi
              thickN=<pixels> xN=<num-expr> weightN=<num-expr> inN=<table>
              ifmtN=<in-format> istreamN=true|false icmdN=<cmds>

All the parameters listed here affect only the relevant layer, identified by the suffix N.

Example:

   stilts plot2plane layer1=knn in1=rrlyrae.fits x1=p1

colorN = <rrggbb>|red|blue|...       (Color)
The color of plotted data, given by name or as a hexadecimal RGB value.

The standard plotting colour names are red, blue, green, grey, magenta, cyan, orange, pink, yellow, black, light_grey, white. However, many other common colour names (too many to list here) are also understood. The list currently contains those colour names understood by most web browsers, from AliceBlue to YellowGreen, listed e.g. in the Extended color keywords section of the CSS3 standard.

Alternatively, a six-digit hexadecimal number RRGGBB may be supplied, optionally prefixed by "#" or "0x", giving red, green and blue intensities, e.g. "ff00ff", "#ff00ff" or "0xff00ff" for magenta.

[Default: red]

cumulativeN = none|forward|reverse       (Cumulation)
If set to forward/reverse the histogram bars plotted are calculated cumulatively; each bin includes the counts from all previous bins working up/down the independent axis.

Note that setting cumulative plotting may not make much sense with some other parameter values, for instance averaging aggregation modes.

For reasons of backward compatibility, the values true and false may be used as aliases for forward and none.

The available options are:

[Default: none]

fillN = solid|line|semi       (FillMode)
How the density function is represented.

The available options are:

[Default: semi]

icmdN = <cmds>       (ProcessingStep[])
Specifies processing to be performed on the layer N input table as specified by parameter inN. The value of this parameter is one or more of the filter commands described in Section 6.1. If more than one is given, they must be separated by semicolon characters (";"). This parameter can be repeated multiple times on the same command line to build up a list of processing steps. The sequence of commands given in this way defines the processing pipeline which is performed on the table.

Commands may alternatively be supplied in an external file, by using the indirection character '@'. Thus a value of "@filename" causes the file filename to be read for a list of filter commands to execute. The commands in the file may be separated by newline characters and/or semicolons, and lines which are blank or which start with a '#' character are ignored. A backslash character '\' at the end of a line joins it with the following line.

ifmtN = <in-format>       (String)
Specifies the format of the input table as specified by parameter inN. The known formats are listed in Section 5.1.1. This flag can be used if you know what format your table is in. If it has the special value (auto) (the default), then an attempt will be made to detect the format of the table automatically. This cannot always be done correctly however, in which case the program will exit with an error explaining which formats were attempted. This parameter is ignored for scheme-specified tables.

[Default: (auto)]

inN = <table>       (StarTable)
The location of the input table. This may take one of the following forms: In any case, compressed data in one of the supported compression formats (gzip, Unix compress or bzip2) will be decompressed transparently.
istreamN = true|false       (Boolean)
If set true, the input table specified by the inN parameter will be read as a stream. It is necessary to give the ifmtN parameter in this case. Depending on the required operations and processing mode, this may cause the read to fail (sometimes it is necessary to read the table more than once). It is not normally necessary to set this flag; in most cases the data will be streamed automatically if that is the best thing to do. However it can sometimes result in less resource usage when processing large files in certain formats (such as VOTable). This parameter is ignored for scheme-specified tables.

[Default: false]

kernelN = square|linear|epanechnikov|cos|cos2|gauss3|gauss6       (Kernel1dShape)
The functional form of the smoothing kernel. The functions listed refer to the unscaled shape; all kernels are normalised to give a total area of unity.

The available options are:

[Default: epanechnikov]

knnN = <number>       (Double)
Sets the number of nearest neighbours to count away from a sample point to determine the width of the smoothing kernel at that point. For the symmetric case this is the number of nearest neighbours summed over both directions, and for the asymmetric case it is the number in a single direction.

The threshold is actually the weighted total of samples; for unweighted (weight=1) bins that is equivalent to the number of samples.

[Default: 100]

maxsmoothN = +<width>|-<count>       (BinSizer)
Fixes the maximum size of the smoothing kernel. This functions as an upper limit on the distance that is otherwise determined by searching for the K nearest neighbours at each sample point.

If the supplied value is a positive number it is interpreted as a fixed width in the data coordinates of the X axis (if the X axis is logarithmic, the value is a fixed factor). If it is a negative number, then it will be interpreted as the approximate number of smooothing widths that fit in the width of the visible plot (i.e. plot width / smoothing width). If the value is zero, no smoothing is applied.

When setting this value graphically, you can use either the slider to adjust the bin count or the numeric entry field to fix the bin width.

[Default: -100]

minsmoothN = +<width>|-<count>       (BinSizer)
Fixes the minimum size of the smoothing kernel. This functions as a lower limit on the distance that is otherwise determined by searching for the K nearest neighbours at each sample point.

If the supplied value is a positive number it is interpreted as a fixed width in the data coordinates of the X axis (if the X axis is logarithmic, the value is a fixed factor). If it is a negative number, then it will be interpreted as the approximate number of smooothing widths that fit in the width of the visible plot (i.e. plot width / smoothing width). If the value is zero, no smoothing is applied.

When setting this value graphically, you can use either the slider to adjust the bin count or the numeric entry field to fix the bin width.

[Default: 0]

normaliseN = none|area|unit|maximum|height       (Normalisation)
Defines how, if at all, the bars of histogram-like plots are normalised or otherwise scaled vertically.

Note that some of the normalisation options may not make much sense with some other parameter values, for instance averaging aggregation modes.

The available options are:

[Default: none]

sidewaysN = true|false       (Boolean)
When set to the default value of false, the quantity being accumulated is on the the horizontal axis and the frequency is represented vertically as usual. If set true the quantity accumulated is on the vertical axis, and the frequency is represented horizontally, so that the chart is displayed reflected in the X=Y line.

[Default: false]

symmetricN = true|false       (Boolean)
If true, the nearest neigbour search is carried out in both directions, and the kernel is symmetric. If false, the nearest neigbour search is carried out separately in the positive and negative directions, and the kernel width is accordingly different in the positive and negative directions.

[Default: true]

thickN = <pixels>       (Integer)
Thickness of plotted line in pixels.

[Default: 2]

transparencyN = 0..1       (Double)
Transparency with which components are plotted, in the range 0 (opaque) to 1 (invisible). The value is 1-alpha.

[Default: 0]

weightN = <num-expr>       (String)
Weighting of data points. If supplied, each point contributes a value to the histogram equal to the data value multiplied by this coordinate. If not supplied, the effect is the same as supplying a fixed value of one.

The value is a numeric algebraic expression based on column names as described in Section 10.

xN = <num-expr>       (String)
Horizontal coordinate.

The value is a numeric algebraic expression based on column names as described in Section 10.


Next Previous Up Contents
Next: densogram
Up: Layer Types
Previous: kde

STILTS - Starlink Tables Infrastructure Library Tool Set
Starlink User Note256
STILTS web page: http://www.starlink.ac.uk/stilts/
Author email: m.b.taylor@bristol.ac.uk
Mailing list: topcat-user@jiscmail.ac.uk