kde

Next Previous Up Contents
Next: knn
Up: Layer Types
Previous: histogram

8.3.32 `kde`

Plots a Discrete Kernel Density Estimate giving a smoothed frequency of data values along the horizontal axis, using a fixed-width smoothing kernel. This is a generalisation of a histogram in which the bins are always 1 pixel wide, and a smoothing kernel is applied to each bin. The width and shape of the kernel may be varied.

This is suitable for cases where the division into discrete bins done by a normal histogram is unnecessary or troublesome.

Note this is not a true Kernel Density Estimate, since, for performance reasons, the smoothing is applied to the (pixel-width) bins rather than to each data sample. The deviation from a true KDE caused by this quantisation will be at the pixel level, hence in most cases not visually apparent.

A weighting may be applied to the calculated levels by supplying the weight coordinate. In this case you can choose how these weights are aggregated in each pixel bin using the combine parameter. The result is something like a smoothed version of the corresponding weighted histogram. Note that some combinations of the available parameters (e.g. a normalised cumulative median-aggregated KDE) may not make much visual sense.

Usage Overview:

   layerN=kde colorN=<rrggbb>|red|blue|... transparencyN=0..1
              smoothN=+<width>|-<count> combineN=sum|sum-per-unit|count|...
              kernelN=square|linear|epanechnikov|cos|cos2|gauss3|gauss6
              cumulativeN=none|forward|reverse
              normaliseN=none|area|unit|maximum|height fillN=solid|line|semi
              thickN=<pixels> xN=<num-expr> weightN=<num-expr> inN=<table>
              ifmtN=<in-format> istreamN=true|false icmdN=<cmds>

All the parameters listed here affect only the relevant layer, identified by the suffix N.

Example:

   stilts plot2plane ymin=0 layer1=kde in1=rrlyrae.fits x1=p1

colorN = <rrggbb>|red|blue|... (Color)

The color of plotted data, given by name or as a hexadecimal RGB value.

The standard plotting colour names are red, blue, green, grey, magenta, cyan, orange, pink, yellow, black, light_grey, white. However, many other common colour names (too many to list here) are also understood. The list currently contains those colour names understood by most web browsers, from AliceBlue to YellowGreen, listed e.g. in the Extended color keywords section of the CSS3 standard.

Alternatively, a six-digit hexadecimal number RRGGBB may be supplied, optionally prefixed by "#" or "0x", giving red, green and blue intensities, e.g. "ff00ff", "#ff00ff" or "0xff00ff" for magenta.

[Default: red]

combineN = sum|sum-per-unit|count|... (Combiner)

Defines how values contributing to the same bin are combined together to produce the value assigned to that bin, and hence its height. The bins in this case are 1-pixel wide, so lack much physical significance. This means that while some combination modes, such as sum-per-unit and mean make sense, others such as sum do not.

The combined values are those given by the weight coordinate, but if no weight is supplied, a weighting of unity is assumed.

The available options are:

sum: the sum of all the combined values per bin
sum-per-unit: the sum of all the combined values per unit of bin size
count: the number of non-blank values per bin (weight is ignored)
count-per-unit: the number of non-blank values per unit of bin size (weight is ignored)
mean: the mean of the combined values
median: the median
q1: first quartile
q3: third quartile
min: the minimum of all the combined values
max: the maximum of all the combined values
stdev: the sample standard deviation of the combined values
stdev_pop: the population standard deviation of the combined values
hit: 1 if any values present, NaN otherwise (weight is ignored)

[Default: sum-per-unit]

cumulativeN = none|forward|reverse (Cumulation)

If set to forward/reverse the histogram bars plotted are calculated cumulatively; each bin includes the counts from all previous bins working up/down the independent axis.

Note that setting cumulative plotting may not make much sense with some other parameter values, for instance averaging aggregation modes.

For reasons of backward compatibility, the values true and false may be used as aliases for forward and none.

The available options are:

none: The value plotted for each bin uses the samples accumulated in that bin.
forward: The value plotted for each bin uses the samples accumulated all the way from negative infinity to that bin.
reverse: The value plotted for each bin uses the samples accumulated all the way from positive infinity to that bin.

[Default: none]

fillN = solid|line|semi (FillMode)

How the density function is represented.

The available options are:

solid: area between level and axis is filled with solid colour
line: level is marked by a wiggly line
semi: level is marked by a wiggly line, and area below it is filled with a transparent colour

[Default: semi]

icmdN = <cmds> (ProcessingStep[])

Specifies processing to be performed on the layer N input table as specified by parameter inN. The value of this parameter is one or more of the filter commands described in Section 6.1. If more than one is given, they must be separated by semicolon characters (";"). This parameter can be repeated multiple times on the same command line to build up a list of processing steps. The sequence of commands given in this way defines the processing pipeline which is performed on the table.

Commands may alternatively be supplied in an external file, by using the indirection character '@'. Thus a value of "@filename" causes the file filename to be read for a list of filter commands to execute. The commands in the file may be separated by newline characters and/or semicolons, and lines which are blank or which start with a '#' character are ignored. A backslash character '\' at the end of a line joins it with the following line.

ifmtN = <in-format> (String)

Specifies the format of the input table as specified by parameter inN. The known formats are listed in Section 5.1.1. This flag can be used if you know what format your table is in. If it has the special value (auto) (the default), then an attempt will be made to detect the format of the table automatically. This cannot always be done correctly however, in which case the program will exit with an error explaining which formats were attempted. This parameter is ignored for scheme-specified tables.

[Default: (auto)]

inN = <table> (StarTable)

The location of the input table. This may take one of the following forms:

A filename.
A URL.
The special value "-", meaning standard input. In this case the input format must be given explicitly using the ifmtN parameter. Note that not all formats can be streamed in this way.
A scheme specification of the form :<scheme-name>:<scheme-args>.
A system command line with either a "<" character at the start, or a "|" character at the end ("<syscmd" or "syscmd|"). This executes the given pipeline and reads from its standard output. This will probably only work on unix-like systems.

In any case, compressed data in one of the supported compression formats (gzip, Unix compress or bzip2) will be decompressed transparently.

istreamN = true|false (Boolean)

If set true, the input table specified by the inN parameter will be read as a stream. It is necessary to give the ifmtN parameter in this case. Depending on the required operations and processing mode, this may cause the read to fail (sometimes it is necessary to read the table more than once). It is not normally necessary to set this flag; in most cases the data will be streamed automatically if that is the best thing to do. However it can sometimes result in less resource usage when processing large files in certain formats (such as VOTable). This parameter is ignored for scheme-specified tables.

[Default: false]

The functional form of the smoothing kernel. The functions listed refer to the unscaled shape; all kernels are normalised to give a total area of unity.

The available options are:

square: Uniform value: f(x)=1, |x|=0..1
linear: Triangle: f(x)=1-|x|, |x|=0..1
epanechnikov: Parabola: f(x)=1-x*x, |x|=0..1
cos: Cosine: f(x)=cos(x*pi/2), |x|=0..1
cos2: Cosine squared: f(x)=cos^2(x*pi/2), |x|=0..1
gauss3: Gaussian truncated at 3.0 sigma: f(x)=exp(-x*x/2), |x|=0..3
gauss6: Gaussian truncated at 6.0 sigma: f(x)=exp(-x*x/2), |x|=0..6

[Default: epanechnikov]

normaliseN = none|area|unit|maximum|height (Normalisation)

Defines how, if at all, the bars of histogram-like plots are normalised or otherwise scaled vertically.

Note that some of the normalisation options may not make much sense with some other parameter values, for instance averaging aggregation modes.

The available options are:

none: No normalisation is performed.
area: The total area of histogram bars is normalised to unity. For cumulative plots, this behaves like height.
unit: Histogram bars are scaled by the inverse of the bin width in data units. For cumulative plots, this behaves like none.
maximum: The height of the tallest histogram bar is normalised to unity. For cumulative plots, this behaves like height.
height: The total height of histogram bars is normalised to unity.

[Default: none]

smoothN = +<width>|-<count> (BinSizer)

Configures the smoothing width for kernel density estimation. This is the characteristic width of the kernel function to be convolved with the density to produce the visible plot.

If the supplied value is a positive number it is interpreted as a fixed width in the data coordinates of the X axis (if the X axis is logarithmic, the value is a fixed factor). If it is a negative number, then it will be interpreted as the approximate number of smooothing widths that fit in the width of the visible plot (i.e. plot width / smoothing width). If the value is zero, no smoothing is applied.

When setting this value graphically, you can use either the slider to adjust the bin count or the numeric entry field to fix the bin width.

[Default: -100]

thickN = <pixels> (Integer)

Thickness of plotted line in pixels.

[Default: 2]

transparencyN = 0..1 (Double)

Transparency with which components are plotted, in the range 0 (opaque) to 1 (invisible). The value is 1-alpha.

[Default: 0]

weightN = <num-expr> (String)

Weighting of data points. If supplied, each point contributes a value to the histogram equal to the data value multiplied by this coordinate. If not supplied, the effect is the same as supplying a fixed value of one.

The value is a numeric algebraic expression based on column names as described in Section 10.

xN = <num-expr> (String)

Horizontal coordinate.

The value is a numeric algebraic expression based on column names as described in Section 10.

Next Previous Up Contents
Next: knn
Up: Layer Types
Previous: histogram

STILTS - Starlink Tables Infrastructure Library Tool Set
Starlink User Note256
STILTS web page: http://www.starlink.ac.uk/stilts/
Author email: m.b.taylor@bristol.ac.uk
Mailing list: topcat-user@jiscmail.ac.uk

8.3.32 kde

8.3.32 `kde`