summaryrefslogtreecommitdiff
path: root/gsl-1.9/doc/statistics.texi
diff options
context:
space:
mode:
Diffstat (limited to 'gsl-1.9/doc/statistics.texi')
-rw-r--r--gsl-1.9/doc/statistics.texi741
1 files changed, 741 insertions, 0 deletions
diff --git a/gsl-1.9/doc/statistics.texi b/gsl-1.9/doc/statistics.texi
new file mode 100644
index 0000000..c289278
--- /dev/null
+++ b/gsl-1.9/doc/statistics.texi
@@ -0,0 +1,741 @@
+@cindex statistics
+@cindex mean
+@cindex standard deviation
+@cindex variance
+@cindex estimated standard deviation
+@cindex estimated variance
+@cindex t-test
+@cindex range
+@cindex min
+@cindex max
+
+This chapter describes the statistical functions in the library. The
+basic statistical functions include routines to compute the mean,
+variance and standard deviation. More advanced functions allow you to
+calculate absolute deviations, skewness, and kurtosis as well as the
+median and arbitrary percentiles. The algorithms use recurrence
+relations to compute average quantities in a stable way, without large
+intermediate values that might overflow.
+
+The functions are available in versions for datasets in the standard
+floating-point and integer types. The versions for double precision
+floating-point data have the prefix @code{gsl_stats} and are declared in
+the header file @file{gsl_statistics_double.h}. The versions for integer
+data have the prefix @code{gsl_stats_int} and are declared in the header
+file @file{gsl_statistics_int.h}.
+
+@menu
+* Mean and standard deviation and variance::
+* Absolute deviation::
+* Higher moments (skewness and kurtosis)::
+* Autocorrelation::
+* Covariance::
+* Weighted Samples::
+* Maximum and Minimum values::
+* Median and Percentiles::
+* Example statistical programs::
+* Statistics References and Further Reading::
+@end menu
+
+@node Mean and standard deviation and variance
+@section Mean, Standard Deviation and Variance
+
+@deftypefun double gsl_stats_mean (const double @var{data}[], size_t @var{stride}, size_t @var{n})
+This function returns the arithmetic mean of @var{data}, a dataset of
+length @var{n} with stride @var{stride}. The arithmetic mean, or
+@dfn{sample mean}, is denoted by @math{\Hat\mu} and defined as,
+@tex
+\beforedisplay
+$$
+{\Hat\mu} = {1 \over N} \sum x_i
+$$
+\afterdisplay
+@end tex
+@ifinfo
+
+@example
+\Hat\mu = (1/N) \sum x_i
+@end example
+
+@end ifinfo
+@noindent
+where @math{x_i} are the elements of the dataset @var{data}. For
+samples drawn from a gaussian distribution the variance of
+@math{\Hat\mu} is @math{\sigma^2 / N}.
+@end deftypefun
+
+@deftypefun double gsl_stats_variance (const double @var{data}[], size_t @var{stride}, size_t @var{n})
+This function returns the estimated, or @dfn{sample}, variance of
+@var{data}, a dataset of length @var{n} with stride @var{stride}. The
+estimated variance is denoted by @math{\Hat\sigma^2} and is defined by,
+@tex
+\beforedisplay
+$$
+{\Hat\sigma}^2 = {1 \over (N-1)} \sum (x_i - {\Hat\mu})^2
+$$
+\afterdisplay
+@end tex
+@ifinfo
+
+@example
+\Hat\sigma^2 = (1/(N-1)) \sum (x_i - \Hat\mu)^2
+@end example
+
+@end ifinfo
+@noindent
+where @math{x_i} are the elements of the dataset @var{data}. Note that
+the normalization factor of @math{1/(N-1)} results from the derivation
+of @math{\Hat\sigma^2} as an unbiased estimator of the population
+variance @math{\sigma^2}. For samples drawn from a gaussian distribution
+the variance of @math{\Hat\sigma^2} itself is @math{2 \sigma^4 / N}.
+
+This function computes the mean via a call to @code{gsl_stats_mean}. If
+you have already computed the mean then you can pass it directly to
+@code{gsl_stats_variance_m}.
+@end deftypefun
+
+@deftypefun double gsl_stats_variance_m (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean})
+This function returns the sample variance of @var{data} relative to the
+given value of @var{mean}. The function is computed with @math{\Hat\mu}
+replaced by the value of @var{mean} that you supply,
+@tex
+\beforedisplay
+$$
+{\Hat\sigma}^2 = {1 \over (N-1)} \sum (x_i - mean)^2
+$$
+\afterdisplay
+@end tex
+@ifinfo
+
+@example
+\Hat\sigma^2 = (1/(N-1)) \sum (x_i - mean)^2
+@end example
+@end ifinfo
+@end deftypefun
+
+@deftypefun double gsl_stats_sd (const double @var{data}[], size_t @var{stride}, size_t @var{n})
+@deftypefunx double gsl_stats_sd_m (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean})
+The standard deviation is defined as the square root of the variance.
+These functions return the square root of the corresponding variance
+functions above.
+@end deftypefun
+
+@deftypefun double gsl_stats_variance_with_fixed_mean (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean})
+This function computes an unbiased estimate of the variance of
+@var{data} when the population mean @var{mean} of the underlying
+distribution is known @emph{a priori}. In this case the estimator for
+the variance uses the factor @math{1/N} and the sample mean
+@math{\Hat\mu} is replaced by the known population mean @math{\mu},
+@tex
+\beforedisplay
+$$
+{\Hat\sigma}^2 = {1 \over N} \sum (x_i - \mu)^2
+$$
+\afterdisplay
+@end tex
+@ifinfo
+
+@example
+\Hat\sigma^2 = (1/N) \sum (x_i - \mu)^2
+@end example
+@end ifinfo
+@end deftypefun
+
+
+@deftypefun double gsl_stats_sd_with_fixed_mean (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean})
+This function calculates the standard deviation of @var{data} for a
+fixed population mean @var{mean}. The result is the square root of the
+corresponding variance function.
+@end deftypefun
+
+@node Absolute deviation
+@section Absolute deviation
+
+@deftypefun double gsl_stats_absdev (const double @var{data}[], size_t @var{stride}, size_t @var{n})
+This function computes the absolute deviation from the mean of
+@var{data}, a dataset of length @var{n} with stride @var{stride}. The
+absolute deviation from the mean is defined as,
+@tex
+\beforedisplay
+$$
+absdev = {1 \over N} \sum |x_i - {\Hat\mu}|
+$$
+\afterdisplay
+@end tex
+@ifinfo
+
+@example
+absdev = (1/N) \sum |x_i - \Hat\mu|
+@end example
+
+@end ifinfo
+@noindent
+where @math{x_i} are the elements of the dataset @var{data}. The
+absolute deviation from the mean provides a more robust measure of the
+width of a distribution than the variance. This function computes the
+mean of @var{data} via a call to @code{gsl_stats_mean}.
+@end deftypefun
+
+@deftypefun double gsl_stats_absdev_m (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean})
+This function computes the absolute deviation of the dataset @var{data}
+relative to the given value of @var{mean},
+@tex
+\beforedisplay
+$$
+absdev = {1 \over N} \sum |x_i - mean|
+$$
+\afterdisplay
+@end tex
+@ifinfo
+
+@example
+absdev = (1/N) \sum |x_i - mean|
+@end example
+
+@end ifinfo
+@noindent
+This function is useful if you have already computed the mean of
+@var{data} (and want to avoid recomputing it), or wish to calculate the
+absolute deviation relative to another value (such as zero, or the
+median).
+@end deftypefun
+
+@node Higher moments (skewness and kurtosis)
+@section Higher moments (skewness and kurtosis)
+
+@deftypefun double gsl_stats_skew (const double @var{data}[], size_t @var{stride}, size_t @var{n})
+This function computes the skewness of @var{data}, a dataset of length
+@var{n} with stride @var{stride}. The skewness is defined as,
+@tex
+\beforedisplay
+$$
+skew = {1 \over N} \sum
+ {\left( x_i - {\Hat\mu} \over {\Hat\sigma} \right)}^3
+$$
+\afterdisplay
+@end tex
+@ifinfo
+
+@example
+skew = (1/N) \sum ((x_i - \Hat\mu)/\Hat\sigma)^3
+@end example
+
+@end ifinfo
+@noindent
+where @math{x_i} are the elements of the dataset @var{data}. The skewness
+measures the asymmetry of the tails of a distribution.
+
+The function computes the mean and estimated standard deviation of
+@var{data} via calls to @code{gsl_stats_mean} and @code{gsl_stats_sd}.
+@end deftypefun
+
+@deftypefun double gsl_stats_skew_m_sd (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean}, double @var{sd})
+This function computes the skewness of the dataset @var{data} using the
+given values of the mean @var{mean} and standard deviation @var{sd},
+@tex
+\beforedisplay
+$$
+skew = {1 \over N}
+ \sum {\left( x_i - mean \over sd \right)}^3
+$$
+\afterdisplay
+@end tex
+@ifinfo
+
+@example
+skew = (1/N) \sum ((x_i - mean)/sd)^3
+@end example
+
+@end ifinfo
+@noindent
+These functions are useful if you have already computed the mean and
+standard deviation of @var{data} and want to avoid recomputing them.
+@end deftypefun
+
+@deftypefun double gsl_stats_kurtosis (const double @var{data}[], size_t @var{stride}, size_t @var{n})
+This function computes the kurtosis of @var{data}, a dataset of length
+@var{n} with stride @var{stride}. The kurtosis is defined as,
+@tex
+\beforedisplay
+$$
+kurtosis = \left( {1 \over N} \sum
+ {\left(x_i - {\Hat\mu} \over {\Hat\sigma} \right)}^4
+ \right)
+ - 3
+$$
+\afterdisplay
+@end tex
+@ifinfo
+
+@example
+kurtosis = ((1/N) \sum ((x_i - \Hat\mu)/\Hat\sigma)^4) - 3
+@end example
+
+@end ifinfo
+@noindent
+The kurtosis measures how sharply peaked a distribution is, relative to
+its width. The kurtosis is normalized to zero for a gaussian
+distribution.
+@end deftypefun
+
+@deftypefun double gsl_stats_kurtosis_m_sd (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean}, double @var{sd})
+This function computes the kurtosis of the dataset @var{data} using the
+given values of the mean @var{mean} and standard deviation @var{sd},
+@tex
+\beforedisplay
+$$
+kurtosis = {1 \over N}
+ \left( \sum {\left(x_i - mean \over sd \right)}^4 \right)
+ - 3
+$$
+\afterdisplay
+@end tex
+@ifinfo
+
+@example
+kurtosis = ((1/N) \sum ((x_i - mean)/sd)^4) - 3
+@end example
+
+@end ifinfo
+@noindent
+This function is useful if you have already computed the mean and
+standard deviation of @var{data} and want to avoid recomputing them.
+@end deftypefun
+
+@node Autocorrelation
+@section Autocorrelation
+
+@deftypefun double gsl_stats_lag1_autocorrelation (const double @var{data}[], const size_t @var{stride}, const size_t @var{n})
+This function computes the lag-1 autocorrelation of the dataset @var{data}.
+@tex
+\beforedisplay
+$$
+a_1 = {\sum_{i = 1}^{n} (x_{i} - \Hat\mu) (x_{i-1} - \Hat\mu)
+\over
+\sum_{i = 1}^{n} (x_{i} - \Hat\mu) (x_{i} - \Hat\mu)}
+$$
+\afterdisplay
+@end tex
+@ifinfo
+
+@example
+a_1 = @{\sum_@{i = 1@}^@{n@} (x_@{i@} - \Hat\mu) (x_@{i-1@} - \Hat\mu)
+ \over
+ \sum_@{i = 1@}^@{n@} (x_@{i@} - \Hat\mu) (x_@{i@} - \Hat\mu)@}
+@end example
+@end ifinfo
+@end deftypefun
+
+
+@deftypefun double gsl_stats_lag1_autocorrelation_m (const double @var{data}[], const size_t @var{stride}, const size_t @var{n}, const double @var{mean})
+This function computes the lag-1 autocorrelation of the dataset
+@var{data} using the given value of the mean @var{mean}.
+
+@end deftypefun
+
+@node Covariance
+@section Covariance
+@cindex covariance, of two datasets
+
+@deftypefun double gsl_stats_covariance (const double @var{data1}[], const size_t @var{stride1}, const double @var{data2}[], const size_t @var{stride2}, const size_t @var{n})
+This function computes the covariance of the datasets @var{data1} and
+@var{data2} which must both be of the same length @var{n}.
+@tex
+\beforedisplay
+$$
+covar = {1 \over (n - 1)} \sum_{i = 1}^{n} (x_{i} - \Hat x) (y_{i} - \Hat y)
+$$
+\afterdisplay
+@end tex
+@ifinfo
+
+@example
+covar = (1/(n - 1)) \sum_@{i = 1@}^@{n@} (x_i - \Hat x) (y_i - \Hat y)
+@end example
+@end ifinfo
+@end deftypefun
+
+@deftypefun double gsl_stats_covariance_m (const double @var{data1}[], const size_t @var{stride1}, const double @var{data2}[], const size_t @var{stride2}, const size_t @var{n}, const double @var{mean1}, const double @var{mean2})
+This function computes the covariance of the datasets @var{data1} and
+@var{data2} using the given values of the means, @var{mean1} and
+@var{mean2}. This is useful if you have already computed the means of
+@var{data1} and @var{data2} and want to avoid recomputing them.
+@end deftypefun
+
+
+@node Weighted Samples
+@section Weighted Samples
+
+The functions described in this section allow the computation of
+statistics for weighted samples. The functions accept an array of
+samples, @math{x_i}, with associated weights, @math{w_i}. Each sample
+@math{x_i} is considered as having been drawn from a Gaussian
+distribution with variance @math{\sigma_i^2}. The sample weight
+@math{w_i} is defined as the reciprocal of this variance, @math{w_i =
+1/\sigma_i^2}. Setting a weight to zero corresponds to removing a
+sample from a dataset.
+
+@deftypefun double gsl_stats_wmean (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n})
+This function returns the weighted mean of the dataset @var{data} with
+stride @var{stride} and length @var{n}, using the set of weights @var{w}
+with stride @var{wstride} and length @var{n}. The weighted mean is defined as,
+@tex
+\beforedisplay
+$$
+{\Hat\mu} = {{\sum w_i x_i} \over {\sum w_i}}
+$$
+\afterdisplay
+@end tex
+@ifinfo
+
+@example
+\Hat\mu = (\sum w_i x_i) / (\sum w_i)
+@end example
+@end ifinfo
+@end deftypefun
+
+
+@deftypefun double gsl_stats_wvariance (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n})
+This function returns the estimated variance of the dataset @var{data}
+with stride @var{stride} and length @var{n}, using the set of weights
+@var{w} with stride @var{wstride} and length @var{n}. The estimated
+variance of a weighted dataset is defined as,
+@tex
+\beforedisplay
+$$
+\Hat\sigma^2 = {{\sum w_i} \over {(\sum w_i)^2 - \sum (w_i^2)}}
+ \sum w_i (x_i - \Hat\mu)^2
+$$
+\afterdisplay
+@end tex
+@ifinfo
+
+@example
+\Hat\sigma^2 = ((\sum w_i)/((\sum w_i)^2 - \sum (w_i^2)))
+ \sum w_i (x_i - \Hat\mu)^2
+@end example
+
+@end ifinfo
+@noindent
+Note that this expression reduces to an unweighted variance with the
+familiar @math{1/(N-1)} factor when there are @math{N} equal non-zero
+weights.
+@end deftypefun
+
+@deftypefun double gsl_stats_wvariance_m (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{wmean})
+This function returns the estimated variance of the weighted dataset
+@var{data} using the given weighted mean @var{wmean}.
+@end deftypefun
+
+@deftypefun double gsl_stats_wsd (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n})
+The standard deviation is defined as the square root of the variance.
+This function returns the square root of the corresponding variance
+function @code{gsl_stats_wvariance} above.
+@end deftypefun
+
+@deftypefun double gsl_stats_wsd_m (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{wmean})
+This function returns the square root of the corresponding variance
+function @code{gsl_stats_wvariance_m} above.
+@end deftypefun
+
+@deftypefun double gsl_stats_wvariance_with_fixed_mean (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, const double @var{mean})
+This function computes an unbiased estimate of the variance of weighted
+dataset @var{data} when the population mean @var{mean} of the underlying
+distribution is known @emph{a priori}. In this case the estimator for
+the variance replaces the sample mean @math{\Hat\mu} by the known
+population mean @math{\mu},
+@tex
+\beforedisplay
+$$
+\Hat\sigma^2 = {{\sum w_i (x_i - \mu)^2} \over {\sum w_i}}
+$$
+\afterdisplay
+@end tex
+@ifinfo
+
+@example
+\Hat\sigma^2 = (\sum w_i (x_i - \mu)^2) / (\sum w_i)
+@end example
+@end ifinfo
+@end deftypefun
+
+@deftypefun double gsl_stats_wsd_with_fixed_mean (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, const double @var{mean})
+The standard deviation is defined as the square root of the variance.
+This function returns the square root of the corresponding variance
+function above.
+@end deftypefun
+
+@deftypefun double gsl_stats_wabsdev (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n})
+This function computes the weighted absolute deviation from the weighted
+mean of @var{data}. The absolute deviation from the mean is defined as,
+@tex
+\beforedisplay
+$$
+absdev = {{\sum w_i |x_i - \Hat\mu|} \over {\sum w_i}}
+$$
+\afterdisplay
+@end tex
+@ifinfo
+
+@example
+absdev = (\sum w_i |x_i - \Hat\mu|) / (\sum w_i)
+@end example
+@end ifinfo
+@end deftypefun
+
+@deftypefun double gsl_stats_wabsdev_m (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{wmean})
+This function computes the absolute deviation of the weighted dataset
+@var{data} about the given weighted mean @var{wmean}.
+@end deftypefun
+
+@deftypefun double gsl_stats_wskew (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n})
+This function computes the weighted skewness of the dataset @var{data}.
+@tex
+\beforedisplay
+$$
+skew = {{\sum w_i ((x_i - xbar)/\sigma)^3} \over {\sum w_i}}
+$$
+\afterdisplay
+@end tex
+@ifinfo
+
+@example
+skew = (\sum w_i ((x_i - xbar)/\sigma)^3) / (\sum w_i)
+@end example
+@end ifinfo
+@end deftypefun
+
+@deftypefun double gsl_stats_wskew_m_sd (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{wmean}, double @var{wsd})
+This function computes the weighted skewness of the dataset @var{data}
+using the given values of the weighted mean and weighted standard
+deviation, @var{wmean} and @var{wsd}.
+@end deftypefun
+
+@deftypefun double gsl_stats_wkurtosis (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n})
+This function computes the weighted kurtosis of the dataset @var{data}.
+
+@tex
+\beforedisplay
+$$
+kurtosis = {{\sum w_i ((x_i - xbar)/sigma)^4} \over {\sum w_i}} - 3
+$$
+\afterdisplay
+@end tex
+@ifinfo
+
+@example
+kurtosis = ((\sum w_i ((x_i - xbar)/sigma)^4) / (\sum w_i)) - 3
+@end example
+@end ifinfo
+@end deftypefun
+
+@deftypefun double gsl_stats_wkurtosis_m_sd (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{wmean}, double @var{wsd})
+This function computes the weighted kurtosis of the dataset @var{data}
+using the given values of the weighted mean and weighted standard
+deviation, @var{wmean} and @var{wsd}.
+@end deftypefun
+
+@node Maximum and Minimum values
+@section Maximum and Minimum values
+
+The following functions find the maximum and minimum values of a
+dataset (or their indices). If the data contains @code{NaN}s then a
+@code{NaN} will be returned, since the maximum or minimum value is
+undefined. For functions which return an index, the location of the
+first @code{NaN} in the array is returned.
+
+@deftypefun double gsl_stats_max (const double @var{data}[], size_t @var{stride}, size_t @var{n})
+This function returns the maximum value in @var{data}, a dataset of
+length @var{n} with stride @var{stride}. The maximum value is defined
+as the value of the element @math{x_i} which satisfies @c{$x_i \ge x_j$}
+@math{x_i >= x_j} for all @math{j}.
+
+If you want instead to find the element with the largest absolute
+magnitude you will need to apply @code{fabs} or @code{abs} to your data
+before calling this function.
+@end deftypefun
+
+@deftypefun double gsl_stats_min (const double @var{data}[], size_t @var{stride}, size_t @var{n})
+This function returns the minimum value in @var{data}, a dataset of
+length @var{n} with stride @var{stride}. The minimum value is defined
+as the value of the element @math{x_i} which satisfies @c{$x_i \le x_j$}
+@math{x_i <= x_j} for all @math{j}.
+
+If you want instead to find the element with the smallest absolute
+magnitude you will need to apply @code{fabs} or @code{abs} to your data
+before calling this function.
+@end deftypefun
+
+@deftypefun void gsl_stats_minmax (double * @var{min}, double * @var{max}, const double @var{data}[], size_t @var{stride}, size_t @var{n})
+This function finds both the minimum and maximum values @var{min},
+@var{max} in @var{data} in a single pass.
+@end deftypefun
+
+@deftypefun size_t gsl_stats_max_index (const double @var{data}[], size_t @var{stride}, size_t @var{n})
+This function returns the index of the maximum value in @var{data}, a
+dataset of length @var{n} with stride @var{stride}. The maximum value is
+defined as the value of the element @math{x_i} which satisfies
+@c{$x_i \ge x_j$}
+@math{x_i >= x_j} for all @math{j}. When there are several equal maximum
+elements then the first one is chosen.
+@end deftypefun
+
+@deftypefun size_t gsl_stats_min_index (const double @var{data}[], size_t @var{stride}, size_t @var{n})
+This function returns the index of the minimum value in @var{data}, a
+dataset of length @var{n} with stride @var{stride}. The minimum value
+is defined as the value of the element @math{x_i} which satisfies
+@c{$x_i \ge x_j$}
+@math{x_i >= x_j} for all @math{j}. When there are several equal
+minimum elements then the first one is chosen.
+@end deftypefun
+
+@deftypefun void gsl_stats_minmax_index (size_t * @var{min_index}, size_t * @var{max_index}, const double @var{data}[], size_t @var{stride}, size_t @var{n})
+This function returns the indexes @var{min_index}, @var{max_index} of
+the minimum and maximum values in @var{data} in a single pass.
+@end deftypefun
+
+@node Median and Percentiles
+@section Median and Percentiles
+
+The median and percentile functions described in this section operate on
+sorted data. For convenience we use @dfn{quantiles}, measured on a scale
+of 0 to 1, instead of percentiles (which use a scale of 0 to 100).
+
+@deftypefun double gsl_stats_median_from_sorted_data (const double @var{sorted_data}[], size_t @var{stride}, size_t @var{n})
+This function returns the median value of @var{sorted_data}, a dataset
+of length @var{n} with stride @var{stride}. The elements of the array
+must be in ascending numerical order. There are no checks to see
+whether the data are sorted, so the function @code{gsl_sort} should
+always be used first.
+
+When the dataset has an odd number of elements the median is the value
+of element @math{(n-1)/2}. When the dataset has an even number of
+elements the median is the mean of the two nearest middle values,
+elements @math{(n-1)/2} and @math{n/2}. Since the algorithm for
+computing the median involves interpolation this function always returns
+a floating-point number, even for integer data types.
+@end deftypefun
+
+@deftypefun double gsl_stats_quantile_from_sorted_data (const double @var{sorted_data}[], size_t @var{stride}, size_t @var{n}, double @var{f})
+This function returns a quantile value of @var{sorted_data}, a
+double-precision array of length @var{n} with stride @var{stride}. The
+elements of the array must be in ascending numerical order. The
+quantile is determined by the @var{f}, a fraction between 0 and 1. For
+example, to compute the value of the 75th percentile @var{f} should have
+the value 0.75.
+
+There are no checks to see whether the data are sorted, so the function
+@code{gsl_sort} should always be used first.
+
+The quantile is found by interpolation, using the formula
+@tex
+\beforedisplay
+$$
+\hbox{quantile} = (1 - \delta) x_i + \delta x_{i+1}
+$$
+\afterdisplay
+@end tex
+@ifinfo
+
+@example
+quantile = (1 - \delta) x_i + \delta x_@{i+1@}
+@end example
+
+@end ifinfo
+@noindent
+where @math{i} is @code{floor}(@math{(n - 1)f}) and @math{\delta} is
+@math{(n-1)f - i}.
+
+Thus the minimum value of the array (@code{data[0*stride]}) is given by
+@var{f} equal to zero, the maximum value (@code{data[(n-1)*stride]}) is
+given by @var{f} equal to one and the median value is given by @var{f}
+equal to 0.5. Since the algorithm for computing quantiles involves
+interpolation this function always returns a floating-point number, even
+for integer data types.
+@end deftypefun
+
+
+@comment @node Statistical tests
+@comment @section Statistical tests
+
+@comment FIXME, do more work on the statistical tests
+
+@comment -@deftypefun double gsl_stats_ttest (const double @var{data1}[], double @var{data2}[], size_t @var{n1}, size_t @var{n2})
+@comment -@deftypefunx Statistics double gsl_stats_int_ttest (const double @var{data1}[], double @var{data2}[], size_t @var{n1}, size_t @var{n2})
+
+@comment The function @code{gsl_stats_ttest} computes the t-test statistic for
+@comment the two arrays @var{data1}[] and @var{data2}[], of lengths @var{n1} and
+@comment -@var{n2} respectively.
+
+@comment The t-test statistic measures the difference between the means of two
+@comment datasets.
+
+@node Example statistical programs
+@section Examples
+Here is a basic example of how to use the statistical functions:
+
+@example
+@verbatiminclude examples/stat.c
+@end example
+
+The program should produce the following output,
+
+@example
+@verbatiminclude examples/stat.out
+@end example
+
+
+Here is an example using sorted data,
+
+@example
+@verbatiminclude examples/statsort.c
+@end example
+
+This program should produce the following output,
+
+@example
+@verbatiminclude examples/statsort.out
+@end example
+
+@node Statistics References and Further Reading
+@section References and Further Reading
+
+The standard reference for almost any topic in statistics is the
+multi-volume @cite{Advanced Theory of Statistics} by Kendall and Stuart.
+
+@itemize @asis
+@item
+Maurice Kendall, Alan Stuart, and J. Keith Ord.
+@cite{The Advanced Theory of Statistics} (multiple volumes)
+reprinted as @cite{Kendall's Advanced Theory of Statistics}.
+Wiley, ISBN 047023380X.
+@end itemize
+
+@noindent
+Many statistical concepts can be more easily understood by a Bayesian
+approach. The following book by Gelman, Carlin, Stern and Rubin gives a
+comprehensive coverage of the subject.
+
+@itemize @asis
+@item
+Andrew Gelman, John B. Carlin, Hal S. Stern, Donald B. Rubin.
+@cite{Bayesian Data Analysis}.
+Chapman & Hall, ISBN 0412039915.
+@end itemize
+
+@noindent
+For physicists the Particle Data Group provides useful reviews of
+Probability and Statistics in the ``Mathematical Tools'' section of its
+Annual Review of Particle Physics.
+
+@itemize @asis
+@item
+@cite{Review of Particle Properties}
+R.M. Barnett et al., Physical Review D54, 1 (1996)
+@end itemize
+
+@noindent
+The Review of Particle Physics is available online at
+the website @uref{http://pdg.lbl.gov/}.
+
+