diff options
Diffstat (limited to 'gsl-1.9/doc/statistics.texi')
-rw-r--r-- | gsl-1.9/doc/statistics.texi | 741 |
1 files changed, 741 insertions, 0 deletions
diff --git a/gsl-1.9/doc/statistics.texi b/gsl-1.9/doc/statistics.texi new file mode 100644 index 0000000..c289278 --- /dev/null +++ b/gsl-1.9/doc/statistics.texi @@ -0,0 +1,741 @@ +@cindex statistics +@cindex mean +@cindex standard deviation +@cindex variance +@cindex estimated standard deviation +@cindex estimated variance +@cindex t-test +@cindex range +@cindex min +@cindex max + +This chapter describes the statistical functions in the library. The +basic statistical functions include routines to compute the mean, +variance and standard deviation. More advanced functions allow you to +calculate absolute deviations, skewness, and kurtosis as well as the +median and arbitrary percentiles. The algorithms use recurrence +relations to compute average quantities in a stable way, without large +intermediate values that might overflow. + +The functions are available in versions for datasets in the standard +floating-point and integer types. The versions for double precision +floating-point data have the prefix @code{gsl_stats} and are declared in +the header file @file{gsl_statistics_double.h}. The versions for integer +data have the prefix @code{gsl_stats_int} and are declared in the header +file @file{gsl_statistics_int.h}. + +@menu +* Mean and standard deviation and variance:: +* Absolute deviation:: +* Higher moments (skewness and kurtosis):: +* Autocorrelation:: +* Covariance:: +* Weighted Samples:: +* Maximum and Minimum values:: +* Median and Percentiles:: +* Example statistical programs:: +* Statistics References and Further Reading:: +@end menu + +@node Mean and standard deviation and variance +@section Mean, Standard Deviation and Variance + +@deftypefun double gsl_stats_mean (const double @var{data}[], size_t @var{stride}, size_t @var{n}) +This function returns the arithmetic mean of @var{data}, a dataset of +length @var{n} with stride @var{stride}. The arithmetic mean, or +@dfn{sample mean}, is denoted by @math{\Hat\mu} and defined as, +@tex +\beforedisplay +$$ +{\Hat\mu} = {1 \over N} \sum x_i +$$ +\afterdisplay +@end tex +@ifinfo + +@example +\Hat\mu = (1/N) \sum x_i +@end example + +@end ifinfo +@noindent +where @math{x_i} are the elements of the dataset @var{data}. For +samples drawn from a gaussian distribution the variance of +@math{\Hat\mu} is @math{\sigma^2 / N}. +@end deftypefun + +@deftypefun double gsl_stats_variance (const double @var{data}[], size_t @var{stride}, size_t @var{n}) +This function returns the estimated, or @dfn{sample}, variance of +@var{data}, a dataset of length @var{n} with stride @var{stride}. The +estimated variance is denoted by @math{\Hat\sigma^2} and is defined by, +@tex +\beforedisplay +$$ +{\Hat\sigma}^2 = {1 \over (N-1)} \sum (x_i - {\Hat\mu})^2 +$$ +\afterdisplay +@end tex +@ifinfo + +@example +\Hat\sigma^2 = (1/(N-1)) \sum (x_i - \Hat\mu)^2 +@end example + +@end ifinfo +@noindent +where @math{x_i} are the elements of the dataset @var{data}. Note that +the normalization factor of @math{1/(N-1)} results from the derivation +of @math{\Hat\sigma^2} as an unbiased estimator of the population +variance @math{\sigma^2}. For samples drawn from a gaussian distribution +the variance of @math{\Hat\sigma^2} itself is @math{2 \sigma^4 / N}. + +This function computes the mean via a call to @code{gsl_stats_mean}. If +you have already computed the mean then you can pass it directly to +@code{gsl_stats_variance_m}. +@end deftypefun + +@deftypefun double gsl_stats_variance_m (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean}) +This function returns the sample variance of @var{data} relative to the +given value of @var{mean}. The function is computed with @math{\Hat\mu} +replaced by the value of @var{mean} that you supply, +@tex +\beforedisplay +$$ +{\Hat\sigma}^2 = {1 \over (N-1)} \sum (x_i - mean)^2 +$$ +\afterdisplay +@end tex +@ifinfo + +@example +\Hat\sigma^2 = (1/(N-1)) \sum (x_i - mean)^2 +@end example +@end ifinfo +@end deftypefun + +@deftypefun double gsl_stats_sd (const double @var{data}[], size_t @var{stride}, size_t @var{n}) +@deftypefunx double gsl_stats_sd_m (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean}) +The standard deviation is defined as the square root of the variance. +These functions return the square root of the corresponding variance +functions above. +@end deftypefun + +@deftypefun double gsl_stats_variance_with_fixed_mean (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean}) +This function computes an unbiased estimate of the variance of +@var{data} when the population mean @var{mean} of the underlying +distribution is known @emph{a priori}. In this case the estimator for +the variance uses the factor @math{1/N} and the sample mean +@math{\Hat\mu} is replaced by the known population mean @math{\mu}, +@tex +\beforedisplay +$$ +{\Hat\sigma}^2 = {1 \over N} \sum (x_i - \mu)^2 +$$ +\afterdisplay +@end tex +@ifinfo + +@example +\Hat\sigma^2 = (1/N) \sum (x_i - \mu)^2 +@end example +@end ifinfo +@end deftypefun + + +@deftypefun double gsl_stats_sd_with_fixed_mean (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean}) +This function calculates the standard deviation of @var{data} for a +fixed population mean @var{mean}. The result is the square root of the +corresponding variance function. +@end deftypefun + +@node Absolute deviation +@section Absolute deviation + +@deftypefun double gsl_stats_absdev (const double @var{data}[], size_t @var{stride}, size_t @var{n}) +This function computes the absolute deviation from the mean of +@var{data}, a dataset of length @var{n} with stride @var{stride}. The +absolute deviation from the mean is defined as, +@tex +\beforedisplay +$$ +absdev = {1 \over N} \sum |x_i - {\Hat\mu}| +$$ +\afterdisplay +@end tex +@ifinfo + +@example +absdev = (1/N) \sum |x_i - \Hat\mu| +@end example + +@end ifinfo +@noindent +where @math{x_i} are the elements of the dataset @var{data}. The +absolute deviation from the mean provides a more robust measure of the +width of a distribution than the variance. This function computes the +mean of @var{data} via a call to @code{gsl_stats_mean}. +@end deftypefun + +@deftypefun double gsl_stats_absdev_m (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean}) +This function computes the absolute deviation of the dataset @var{data} +relative to the given value of @var{mean}, +@tex +\beforedisplay +$$ +absdev = {1 \over N} \sum |x_i - mean| +$$ +\afterdisplay +@end tex +@ifinfo + +@example +absdev = (1/N) \sum |x_i - mean| +@end example + +@end ifinfo +@noindent +This function is useful if you have already computed the mean of +@var{data} (and want to avoid recomputing it), or wish to calculate the +absolute deviation relative to another value (such as zero, or the +median). +@end deftypefun + +@node Higher moments (skewness and kurtosis) +@section Higher moments (skewness and kurtosis) + +@deftypefun double gsl_stats_skew (const double @var{data}[], size_t @var{stride}, size_t @var{n}) +This function computes the skewness of @var{data}, a dataset of length +@var{n} with stride @var{stride}. The skewness is defined as, +@tex +\beforedisplay +$$ +skew = {1 \over N} \sum + {\left( x_i - {\Hat\mu} \over {\Hat\sigma} \right)}^3 +$$ +\afterdisplay +@end tex +@ifinfo + +@example +skew = (1/N) \sum ((x_i - \Hat\mu)/\Hat\sigma)^3 +@end example + +@end ifinfo +@noindent +where @math{x_i} are the elements of the dataset @var{data}. The skewness +measures the asymmetry of the tails of a distribution. + +The function computes the mean and estimated standard deviation of +@var{data} via calls to @code{gsl_stats_mean} and @code{gsl_stats_sd}. +@end deftypefun + +@deftypefun double gsl_stats_skew_m_sd (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean}, double @var{sd}) +This function computes the skewness of the dataset @var{data} using the +given values of the mean @var{mean} and standard deviation @var{sd}, +@tex +\beforedisplay +$$ +skew = {1 \over N} + \sum {\left( x_i - mean \over sd \right)}^3 +$$ +\afterdisplay +@end tex +@ifinfo + +@example +skew = (1/N) \sum ((x_i - mean)/sd)^3 +@end example + +@end ifinfo +@noindent +These functions are useful if you have already computed the mean and +standard deviation of @var{data} and want to avoid recomputing them. +@end deftypefun + +@deftypefun double gsl_stats_kurtosis (const double @var{data}[], size_t @var{stride}, size_t @var{n}) +This function computes the kurtosis of @var{data}, a dataset of length +@var{n} with stride @var{stride}. The kurtosis is defined as, +@tex +\beforedisplay +$$ +kurtosis = \left( {1 \over N} \sum + {\left(x_i - {\Hat\mu} \over {\Hat\sigma} \right)}^4 + \right) + - 3 +$$ +\afterdisplay +@end tex +@ifinfo + +@example +kurtosis = ((1/N) \sum ((x_i - \Hat\mu)/\Hat\sigma)^4) - 3 +@end example + +@end ifinfo +@noindent +The kurtosis measures how sharply peaked a distribution is, relative to +its width. The kurtosis is normalized to zero for a gaussian +distribution. +@end deftypefun + +@deftypefun double gsl_stats_kurtosis_m_sd (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean}, double @var{sd}) +This function computes the kurtosis of the dataset @var{data} using the +given values of the mean @var{mean} and standard deviation @var{sd}, +@tex +\beforedisplay +$$ +kurtosis = {1 \over N} + \left( \sum {\left(x_i - mean \over sd \right)}^4 \right) + - 3 +$$ +\afterdisplay +@end tex +@ifinfo + +@example +kurtosis = ((1/N) \sum ((x_i - mean)/sd)^4) - 3 +@end example + +@end ifinfo +@noindent +This function is useful if you have already computed the mean and +standard deviation of @var{data} and want to avoid recomputing them. +@end deftypefun + +@node Autocorrelation +@section Autocorrelation + +@deftypefun double gsl_stats_lag1_autocorrelation (const double @var{data}[], const size_t @var{stride}, const size_t @var{n}) +This function computes the lag-1 autocorrelation of the dataset @var{data}. +@tex +\beforedisplay +$$ +a_1 = {\sum_{i = 1}^{n} (x_{i} - \Hat\mu) (x_{i-1} - \Hat\mu) +\over +\sum_{i = 1}^{n} (x_{i} - \Hat\mu) (x_{i} - \Hat\mu)} +$$ +\afterdisplay +@end tex +@ifinfo + +@example +a_1 = @{\sum_@{i = 1@}^@{n@} (x_@{i@} - \Hat\mu) (x_@{i-1@} - \Hat\mu) + \over + \sum_@{i = 1@}^@{n@} (x_@{i@} - \Hat\mu) (x_@{i@} - \Hat\mu)@} +@end example +@end ifinfo +@end deftypefun + + +@deftypefun double gsl_stats_lag1_autocorrelation_m (const double @var{data}[], const size_t @var{stride}, const size_t @var{n}, const double @var{mean}) +This function computes the lag-1 autocorrelation of the dataset +@var{data} using the given value of the mean @var{mean}. + +@end deftypefun + +@node Covariance +@section Covariance +@cindex covariance, of two datasets + +@deftypefun double gsl_stats_covariance (const double @var{data1}[], const size_t @var{stride1}, const double @var{data2}[], const size_t @var{stride2}, const size_t @var{n}) +This function computes the covariance of the datasets @var{data1} and +@var{data2} which must both be of the same length @var{n}. +@tex +\beforedisplay +$$ +covar = {1 \over (n - 1)} \sum_{i = 1}^{n} (x_{i} - \Hat x) (y_{i} - \Hat y) +$$ +\afterdisplay +@end tex +@ifinfo + +@example +covar = (1/(n - 1)) \sum_@{i = 1@}^@{n@} (x_i - \Hat x) (y_i - \Hat y) +@end example +@end ifinfo +@end deftypefun + +@deftypefun double gsl_stats_covariance_m (const double @var{data1}[], const size_t @var{stride1}, const double @var{data2}[], const size_t @var{stride2}, const size_t @var{n}, const double @var{mean1}, const double @var{mean2}) +This function computes the covariance of the datasets @var{data1} and +@var{data2} using the given values of the means, @var{mean1} and +@var{mean2}. This is useful if you have already computed the means of +@var{data1} and @var{data2} and want to avoid recomputing them. +@end deftypefun + + +@node Weighted Samples +@section Weighted Samples + +The functions described in this section allow the computation of +statistics for weighted samples. The functions accept an array of +samples, @math{x_i}, with associated weights, @math{w_i}. Each sample +@math{x_i} is considered as having been drawn from a Gaussian +distribution with variance @math{\sigma_i^2}. The sample weight +@math{w_i} is defined as the reciprocal of this variance, @math{w_i = +1/\sigma_i^2}. Setting a weight to zero corresponds to removing a +sample from a dataset. + +@deftypefun double gsl_stats_wmean (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}) +This function returns the weighted mean of the dataset @var{data} with +stride @var{stride} and length @var{n}, using the set of weights @var{w} +with stride @var{wstride} and length @var{n}. The weighted mean is defined as, +@tex +\beforedisplay +$$ +{\Hat\mu} = {{\sum w_i x_i} \over {\sum w_i}} +$$ +\afterdisplay +@end tex +@ifinfo + +@example +\Hat\mu = (\sum w_i x_i) / (\sum w_i) +@end example +@end ifinfo +@end deftypefun + + +@deftypefun double gsl_stats_wvariance (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}) +This function returns the estimated variance of the dataset @var{data} +with stride @var{stride} and length @var{n}, using the set of weights +@var{w} with stride @var{wstride} and length @var{n}. The estimated +variance of a weighted dataset is defined as, +@tex +\beforedisplay +$$ +\Hat\sigma^2 = {{\sum w_i} \over {(\sum w_i)^2 - \sum (w_i^2)}} + \sum w_i (x_i - \Hat\mu)^2 +$$ +\afterdisplay +@end tex +@ifinfo + +@example +\Hat\sigma^2 = ((\sum w_i)/((\sum w_i)^2 - \sum (w_i^2))) + \sum w_i (x_i - \Hat\mu)^2 +@end example + +@end ifinfo +@noindent +Note that this expression reduces to an unweighted variance with the +familiar @math{1/(N-1)} factor when there are @math{N} equal non-zero +weights. +@end deftypefun + +@deftypefun double gsl_stats_wvariance_m (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{wmean}) +This function returns the estimated variance of the weighted dataset +@var{data} using the given weighted mean @var{wmean}. +@end deftypefun + +@deftypefun double gsl_stats_wsd (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}) +The standard deviation is defined as the square root of the variance. +This function returns the square root of the corresponding variance +function @code{gsl_stats_wvariance} above. +@end deftypefun + +@deftypefun double gsl_stats_wsd_m (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{wmean}) +This function returns the square root of the corresponding variance +function @code{gsl_stats_wvariance_m} above. +@end deftypefun + +@deftypefun double gsl_stats_wvariance_with_fixed_mean (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, const double @var{mean}) +This function computes an unbiased estimate of the variance of weighted +dataset @var{data} when the population mean @var{mean} of the underlying +distribution is known @emph{a priori}. In this case the estimator for +the variance replaces the sample mean @math{\Hat\mu} by the known +population mean @math{\mu}, +@tex +\beforedisplay +$$ +\Hat\sigma^2 = {{\sum w_i (x_i - \mu)^2} \over {\sum w_i}} +$$ +\afterdisplay +@end tex +@ifinfo + +@example +\Hat\sigma^2 = (\sum w_i (x_i - \mu)^2) / (\sum w_i) +@end example +@end ifinfo +@end deftypefun + +@deftypefun double gsl_stats_wsd_with_fixed_mean (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, const double @var{mean}) +The standard deviation is defined as the square root of the variance. +This function returns the square root of the corresponding variance +function above. +@end deftypefun + +@deftypefun double gsl_stats_wabsdev (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}) +This function computes the weighted absolute deviation from the weighted +mean of @var{data}. The absolute deviation from the mean is defined as, +@tex +\beforedisplay +$$ +absdev = {{\sum w_i |x_i - \Hat\mu|} \over {\sum w_i}} +$$ +\afterdisplay +@end tex +@ifinfo + +@example +absdev = (\sum w_i |x_i - \Hat\mu|) / (\sum w_i) +@end example +@end ifinfo +@end deftypefun + +@deftypefun double gsl_stats_wabsdev_m (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{wmean}) +This function computes the absolute deviation of the weighted dataset +@var{data} about the given weighted mean @var{wmean}. +@end deftypefun + +@deftypefun double gsl_stats_wskew (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}) +This function computes the weighted skewness of the dataset @var{data}. +@tex +\beforedisplay +$$ +skew = {{\sum w_i ((x_i - xbar)/\sigma)^3} \over {\sum w_i}} +$$ +\afterdisplay +@end tex +@ifinfo + +@example +skew = (\sum w_i ((x_i - xbar)/\sigma)^3) / (\sum w_i) +@end example +@end ifinfo +@end deftypefun + +@deftypefun double gsl_stats_wskew_m_sd (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{wmean}, double @var{wsd}) +This function computes the weighted skewness of the dataset @var{data} +using the given values of the weighted mean and weighted standard +deviation, @var{wmean} and @var{wsd}. +@end deftypefun + +@deftypefun double gsl_stats_wkurtosis (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}) +This function computes the weighted kurtosis of the dataset @var{data}. + +@tex +\beforedisplay +$$ +kurtosis = {{\sum w_i ((x_i - xbar)/sigma)^4} \over {\sum w_i}} - 3 +$$ +\afterdisplay +@end tex +@ifinfo + +@example +kurtosis = ((\sum w_i ((x_i - xbar)/sigma)^4) / (\sum w_i)) - 3 +@end example +@end ifinfo +@end deftypefun + +@deftypefun double gsl_stats_wkurtosis_m_sd (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{wmean}, double @var{wsd}) +This function computes the weighted kurtosis of the dataset @var{data} +using the given values of the weighted mean and weighted standard +deviation, @var{wmean} and @var{wsd}. +@end deftypefun + +@node Maximum and Minimum values +@section Maximum and Minimum values + +The following functions find the maximum and minimum values of a +dataset (or their indices). If the data contains @code{NaN}s then a +@code{NaN} will be returned, since the maximum or minimum value is +undefined. For functions which return an index, the location of the +first @code{NaN} in the array is returned. + +@deftypefun double gsl_stats_max (const double @var{data}[], size_t @var{stride}, size_t @var{n}) +This function returns the maximum value in @var{data}, a dataset of +length @var{n} with stride @var{stride}. The maximum value is defined +as the value of the element @math{x_i} which satisfies @c{$x_i \ge x_j$} +@math{x_i >= x_j} for all @math{j}. + +If you want instead to find the element with the largest absolute +magnitude you will need to apply @code{fabs} or @code{abs} to your data +before calling this function. +@end deftypefun + +@deftypefun double gsl_stats_min (const double @var{data}[], size_t @var{stride}, size_t @var{n}) +This function returns the minimum value in @var{data}, a dataset of +length @var{n} with stride @var{stride}. The minimum value is defined +as the value of the element @math{x_i} which satisfies @c{$x_i \le x_j$} +@math{x_i <= x_j} for all @math{j}. + +If you want instead to find the element with the smallest absolute +magnitude you will need to apply @code{fabs} or @code{abs} to your data +before calling this function. +@end deftypefun + +@deftypefun void gsl_stats_minmax (double * @var{min}, double * @var{max}, const double @var{data}[], size_t @var{stride}, size_t @var{n}) +This function finds both the minimum and maximum values @var{min}, +@var{max} in @var{data} in a single pass. +@end deftypefun + +@deftypefun size_t gsl_stats_max_index (const double @var{data}[], size_t @var{stride}, size_t @var{n}) +This function returns the index of the maximum value in @var{data}, a +dataset of length @var{n} with stride @var{stride}. The maximum value is +defined as the value of the element @math{x_i} which satisfies +@c{$x_i \ge x_j$} +@math{x_i >= x_j} for all @math{j}. When there are several equal maximum +elements then the first one is chosen. +@end deftypefun + +@deftypefun size_t gsl_stats_min_index (const double @var{data}[], size_t @var{stride}, size_t @var{n}) +This function returns the index of the minimum value in @var{data}, a +dataset of length @var{n} with stride @var{stride}. The minimum value +is defined as the value of the element @math{x_i} which satisfies +@c{$x_i \ge x_j$} +@math{x_i >= x_j} for all @math{j}. When there are several equal +minimum elements then the first one is chosen. +@end deftypefun + +@deftypefun void gsl_stats_minmax_index (size_t * @var{min_index}, size_t * @var{max_index}, const double @var{data}[], size_t @var{stride}, size_t @var{n}) +This function returns the indexes @var{min_index}, @var{max_index} of +the minimum and maximum values in @var{data} in a single pass. +@end deftypefun + +@node Median and Percentiles +@section Median and Percentiles + +The median and percentile functions described in this section operate on +sorted data. For convenience we use @dfn{quantiles}, measured on a scale +of 0 to 1, instead of percentiles (which use a scale of 0 to 100). + +@deftypefun double gsl_stats_median_from_sorted_data (const double @var{sorted_data}[], size_t @var{stride}, size_t @var{n}) +This function returns the median value of @var{sorted_data}, a dataset +of length @var{n} with stride @var{stride}. The elements of the array +must be in ascending numerical order. There are no checks to see +whether the data are sorted, so the function @code{gsl_sort} should +always be used first. + +When the dataset has an odd number of elements the median is the value +of element @math{(n-1)/2}. When the dataset has an even number of +elements the median is the mean of the two nearest middle values, +elements @math{(n-1)/2} and @math{n/2}. Since the algorithm for +computing the median involves interpolation this function always returns +a floating-point number, even for integer data types. +@end deftypefun + +@deftypefun double gsl_stats_quantile_from_sorted_data (const double @var{sorted_data}[], size_t @var{stride}, size_t @var{n}, double @var{f}) +This function returns a quantile value of @var{sorted_data}, a +double-precision array of length @var{n} with stride @var{stride}. The +elements of the array must be in ascending numerical order. The +quantile is determined by the @var{f}, a fraction between 0 and 1. For +example, to compute the value of the 75th percentile @var{f} should have +the value 0.75. + +There are no checks to see whether the data are sorted, so the function +@code{gsl_sort} should always be used first. + +The quantile is found by interpolation, using the formula +@tex +\beforedisplay +$$ +\hbox{quantile} = (1 - \delta) x_i + \delta x_{i+1} +$$ +\afterdisplay +@end tex +@ifinfo + +@example +quantile = (1 - \delta) x_i + \delta x_@{i+1@} +@end example + +@end ifinfo +@noindent +where @math{i} is @code{floor}(@math{(n - 1)f}) and @math{\delta} is +@math{(n-1)f - i}. + +Thus the minimum value of the array (@code{data[0*stride]}) is given by +@var{f} equal to zero, the maximum value (@code{data[(n-1)*stride]}) is +given by @var{f} equal to one and the median value is given by @var{f} +equal to 0.5. Since the algorithm for computing quantiles involves +interpolation this function always returns a floating-point number, even +for integer data types. +@end deftypefun + + +@comment @node Statistical tests +@comment @section Statistical tests + +@comment FIXME, do more work on the statistical tests + +@comment -@deftypefun double gsl_stats_ttest (const double @var{data1}[], double @var{data2}[], size_t @var{n1}, size_t @var{n2}) +@comment -@deftypefunx Statistics double gsl_stats_int_ttest (const double @var{data1}[], double @var{data2}[], size_t @var{n1}, size_t @var{n2}) + +@comment The function @code{gsl_stats_ttest} computes the t-test statistic for +@comment the two arrays @var{data1}[] and @var{data2}[], of lengths @var{n1} and +@comment -@var{n2} respectively. + +@comment The t-test statistic measures the difference between the means of two +@comment datasets. + +@node Example statistical programs +@section Examples +Here is a basic example of how to use the statistical functions: + +@example +@verbatiminclude examples/stat.c +@end example + +The program should produce the following output, + +@example +@verbatiminclude examples/stat.out +@end example + + +Here is an example using sorted data, + +@example +@verbatiminclude examples/statsort.c +@end example + +This program should produce the following output, + +@example +@verbatiminclude examples/statsort.out +@end example + +@node Statistics References and Further Reading +@section References and Further Reading + +The standard reference for almost any topic in statistics is the +multi-volume @cite{Advanced Theory of Statistics} by Kendall and Stuart. + +@itemize @asis +@item +Maurice Kendall, Alan Stuart, and J. Keith Ord. +@cite{The Advanced Theory of Statistics} (multiple volumes) +reprinted as @cite{Kendall's Advanced Theory of Statistics}. +Wiley, ISBN 047023380X. +@end itemize + +@noindent +Many statistical concepts can be more easily understood by a Bayesian +approach. The following book by Gelman, Carlin, Stern and Rubin gives a +comprehensive coverage of the subject. + +@itemize @asis +@item +Andrew Gelman, John B. Carlin, Hal S. Stern, Donald B. Rubin. +@cite{Bayesian Data Analysis}. +Chapman & Hall, ISBN 0412039915. +@end itemize + +@noindent +For physicists the Particle Data Group provides useful reviews of +Probability and Statistics in the ``Mathematical Tools'' section of its +Annual Review of Particle Physics. + +@itemize @asis +@item +@cite{Review of Particle Properties} +R.M. Barnett et al., Physical Review D54, 1 (1996) +@end itemize + +@noindent +The Review of Particle Physics is available online at +the website @uref{http://pdg.lbl.gov/}. + + |