gsl-1.9/doc/statistics.texi


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741

@cindex statistics
@cindex mean
@cindex standard deviation
@cindex variance
@cindex estimated standard deviation
@cindex estimated variance
@cindex t-test
@cindex range
@cindex min
@cindex max

This chapter describes the statistical functions in the library.  The
basic statistical functions include routines to compute the mean,
variance and standard deviation.  More advanced functions allow you to
calculate absolute deviations, skewness, and kurtosis as well as the
median and arbitrary percentiles.  The algorithms use recurrence
relations to compute average quantities in a stable way, without large
intermediate values that might overflow. 

The functions are available in versions for datasets in the standard
floating-point and integer types.  The versions for double precision
floating-point data have the prefix @code{gsl_stats} and are declared in
the header file @file{gsl_statistics_double.h}.  The versions for integer
data have the prefix @code{gsl_stats_int} and are declared in the header
file @file{gsl_statistics_int.h}. 

@menu
* Mean and standard deviation and variance::  
* Absolute deviation::          
* Higher moments (skewness and kurtosis)::  
* Autocorrelation::             
* Covariance::                  
* Weighted Samples::            
* Maximum and Minimum values::  
* Median and Percentiles::      
* Example statistical programs::  
* Statistics References and Further Reading::  
@end menu

@node Mean and standard deviation and variance
@section Mean, Standard Deviation and Variance

@deftypefun double gsl_stats_mean (const double @var{data}[], size_t @var{stride}, size_t @var{n})
This function returns the arithmetic mean of @var{data}, a dataset of
length @var{n} with stride @var{stride}.  The arithmetic mean, or
@dfn{sample mean}, is denoted by @math{\Hat\mu} and defined as,
@tex
\beforedisplay
$$
{\Hat\mu} = {1 \over N} \sum x_i
$$
\afterdisplay
@end tex
@ifinfo

@example
\Hat\mu = (1/N) \sum x_i
@end example

@end ifinfo
@noindent
where @math{x_i} are the elements of the dataset @var{data}.  For
samples drawn from a gaussian distribution the variance of
@math{\Hat\mu} is @math{\sigma^2 / N}.
@end deftypefun

@deftypefun double gsl_stats_variance (const double @var{data}[], size_t @var{stride}, size_t @var{n})
This function returns the estimated, or @dfn{sample}, variance of
@var{data}, a dataset of length @var{n} with stride @var{stride}.  The
estimated variance is denoted by @math{\Hat\sigma^2} and is defined by,
@tex
\beforedisplay
$$
{\Hat\sigma}^2 = {1 \over (N-1)} \sum (x_i - {\Hat\mu})^2
$$
\afterdisplay
@end tex
@ifinfo

@example
\Hat\sigma^2 = (1/(N-1)) \sum (x_i - \Hat\mu)^2
@end example

@end ifinfo
@noindent
where @math{x_i} are the elements of the dataset @var{data}.  Note that
the normalization factor of @math{1/(N-1)} results from the derivation
of @math{\Hat\sigma^2} as an unbiased estimator of the population
variance @math{\sigma^2}.  For samples drawn from a gaussian distribution
the variance of @math{\Hat\sigma^2} itself is @math{2 \sigma^4 / N}.

This function computes the mean via a call to @code{gsl_stats_mean}.  If
you have already computed the mean then you can pass it directly to
@code{gsl_stats_variance_m}.
@end deftypefun

@deftypefun double gsl_stats_variance_m (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean})
This function returns the sample variance of @var{data} relative to the
given value of @var{mean}.  The function is computed with @math{\Hat\mu}
replaced by the value of @var{mean} that you supply,
@tex
\beforedisplay
$$
{\Hat\sigma}^2 = {1 \over (N-1)} \sum (x_i - mean)^2
$$
\afterdisplay
@end tex
@ifinfo

@example
\Hat\sigma^2 = (1/(N-1)) \sum (x_i - mean)^2
@end example
@end ifinfo
@end deftypefun

@deftypefun double gsl_stats_sd (const double @var{data}[], size_t @var{stride}, size_t @var{n})
@deftypefunx double gsl_stats_sd_m (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean})
The standard deviation is defined as the square root of the variance.
These functions return the square root of the corresponding variance
functions above.
@end deftypefun

@deftypefun double gsl_stats_variance_with_fixed_mean (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean})
This function computes an unbiased estimate of the variance of
@var{data} when the population mean @var{mean} of the underlying
distribution is known @emph{a priori}.  In this case the estimator for
the variance uses the factor @math{1/N} and the sample mean
@math{\Hat\mu} is replaced by the known population mean @math{\mu},
@tex
\beforedisplay
$$
{\Hat\sigma}^2 = {1 \over N} \sum (x_i - \mu)^2
$$
\afterdisplay
@end tex
@ifinfo

@example
\Hat\sigma^2 = (1/N) \sum (x_i - \mu)^2
@end example
@end ifinfo
@end deftypefun


@deftypefun double gsl_stats_sd_with_fixed_mean (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean})
This function calculates the standard deviation of @var{data} for a
fixed population mean @var{mean}.  The result is the square root of the
corresponding variance function.
@end deftypefun

@node Absolute deviation
@section Absolute deviation

@deftypefun double gsl_stats_absdev (const double @var{data}[], size_t @var{stride}, size_t @var{n})
This function computes the absolute deviation from the mean of
@var{data}, a dataset of length @var{n} with stride @var{stride}.  The
absolute deviation from the mean is defined as,
@tex
\beforedisplay
$$
absdev  = {1 \over N} \sum |x_i - {\Hat\mu}|
$$
\afterdisplay
@end tex
@ifinfo

@example
absdev  = (1/N) \sum |x_i - \Hat\mu|
@end example

@end ifinfo
@noindent
where @math{x_i} are the elements of the dataset @var{data}.  The
absolute deviation from the mean provides a more robust measure of the
width of a distribution than the variance.  This function computes the
mean of @var{data} via a call to @code{gsl_stats_mean}.
@end deftypefun

@deftypefun double gsl_stats_absdev_m (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean})
This function computes the absolute deviation of the dataset @var{data}
relative to the given value of @var{mean},
@tex
\beforedisplay
$$
absdev  = {1 \over N} \sum |x_i - mean|
$$
\afterdisplay
@end tex
@ifinfo

@example
absdev  = (1/N) \sum |x_i - mean|
@end example

@end ifinfo
@noindent
This function is useful if you have already computed the mean of
@var{data} (and want to avoid recomputing it), or wish to calculate the
absolute deviation relative to another value (such as zero, or the
median).
@end deftypefun

@node Higher moments (skewness and kurtosis)
@section Higher moments (skewness and kurtosis)

@deftypefun double gsl_stats_skew (const double @var{data}[], size_t @var{stride}, size_t @var{n})
This function computes the skewness of @var{data}, a dataset of length
@var{n} with stride @var{stride}.  The skewness is defined as,
@tex
\beforedisplay
$$
skew = {1 \over N} \sum 
 {\left( x_i - {\Hat\mu} \over {\Hat\sigma} \right)}^3
$$
\afterdisplay
@end tex
@ifinfo

@example
skew = (1/N) \sum ((x_i - \Hat\mu)/\Hat\sigma)^3
@end example

@end ifinfo
@noindent
where @math{x_i} are the elements of the dataset @var{data}.  The skewness
measures the asymmetry of the tails of a distribution.

The function computes the mean and estimated standard deviation of
@var{data} via calls to @code{gsl_stats_mean} and @code{gsl_stats_sd}.
@end deftypefun

@deftypefun double gsl_stats_skew_m_sd (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean}, double @var{sd})
This function computes the skewness of the dataset @var{data} using the
given values of the mean @var{mean} and standard deviation @var{sd},
@tex
\beforedisplay
$$
skew = {1 \over N}
     \sum {\left( x_i - mean \over sd \right)}^3
$$
\afterdisplay
@end tex
@ifinfo

@example
skew = (1/N) \sum ((x_i - mean)/sd)^3
@end example

@end ifinfo
@noindent
These functions are useful if you have already computed the mean and
standard deviation of @var{data} and want to avoid recomputing them.
@end deftypefun

@deftypefun double gsl_stats_kurtosis (const double @var{data}[], size_t @var{stride}, size_t @var{n})
This function computes the kurtosis of @var{data}, a dataset of length
@var{n} with stride @var{stride}.  The kurtosis is defined as,
@tex
\beforedisplay
$$
kurtosis = \left( {1 \over N} \sum 
 {\left(x_i - {\Hat\mu} \over {\Hat\sigma} \right)}^4 
 \right) 
 - 3
$$
\afterdisplay
@end tex
@ifinfo

@example
kurtosis = ((1/N) \sum ((x_i - \Hat\mu)/\Hat\sigma)^4)  - 3
@end example

@end ifinfo
@noindent
The kurtosis measures how sharply peaked a distribution is, relative to
its width.  The kurtosis is normalized to zero for a gaussian
distribution.
@end deftypefun

@deftypefun double gsl_stats_kurtosis_m_sd (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean}, double @var{sd})
This function computes the kurtosis of the dataset @var{data} using the
given values of the mean @var{mean} and standard deviation @var{sd},
@tex
\beforedisplay
$$
kurtosis = {1 \over N}
  \left( \sum {\left(x_i - mean \over sd \right)}^4 \right) 
  - 3
$$
\afterdisplay
@end tex
@ifinfo

@example
kurtosis = ((1/N) \sum ((x_i - mean)/sd)^4) - 3
@end example

@end ifinfo
@noindent
This function is useful if you have already computed the mean and
standard deviation of @var{data} and want to avoid recomputing them.
@end deftypefun

@node Autocorrelation
@section Autocorrelation

@deftypefun double gsl_stats_lag1_autocorrelation (const double @var{data}[], const size_t @var{stride}, const size_t @var{n})
This function computes the lag-1 autocorrelation of the dataset @var{data}.
@tex
\beforedisplay
$$
a_1 = {\sum_{i = 1}^{n} (x_{i} - \Hat\mu) (x_{i-1} - \Hat\mu)
\over
\sum_{i = 1}^{n} (x_{i} - \Hat\mu) (x_{i} - \Hat\mu)}
$$
\afterdisplay
@end tex
@ifinfo

@example
a_1 = @{\sum_@{i = 1@}^@{n@} (x_@{i@} - \Hat\mu) (x_@{i-1@} - \Hat\mu)
       \over
       \sum_@{i = 1@}^@{n@} (x_@{i@} - \Hat\mu) (x_@{i@} - \Hat\mu)@}
@end example
@end ifinfo
@end deftypefun


@deftypefun double gsl_stats_lag1_autocorrelation_m (const double @var{data}[], const size_t @var{stride}, const size_t @var{n}, const double @var{mean})
This function computes the lag-1 autocorrelation of the dataset
@var{data} using the given value of the mean @var{mean}.

@end deftypefun

@node Covariance
@section Covariance
@cindex covariance, of two datasets

@deftypefun double gsl_stats_covariance (const double @var{data1}[], const size_t @var{stride1}, const double @var{data2}[], const size_t @var{stride2}, const size_t @var{n})
This function computes the covariance of the datasets @var{data1} and
@var{data2} which must both be of the same length @var{n}.
@tex
\beforedisplay
$$
covar = {1 \over (n - 1)} \sum_{i = 1}^{n} (x_{i} - \Hat x) (y_{i} - \Hat y)
$$
\afterdisplay
@end tex
@ifinfo

@example
covar = (1/(n - 1)) \sum_@{i = 1@}^@{n@} (x_i - \Hat x) (y_i - \Hat y)
@end example
@end ifinfo
@end deftypefun

@deftypefun double gsl_stats_covariance_m (const double @var{data1}[], const size_t @var{stride1}, const double @var{data2}[], const size_t @var{stride2}, const size_t @var{n}, const double @var{mean1}, const double @var{mean2})
This function computes the covariance of the datasets @var{data1} and
@var{data2} using the given values of the means, @var{mean1} and
@var{mean2}.  This is useful if you have already computed the means of
@var{data1} and @var{data2} and want to avoid recomputing them.
@end deftypefun


@node Weighted Samples
@section Weighted Samples

The functions described in this section allow the computation of
statistics for weighted samples.  The functions accept an array of
samples, @math{x_i}, with associated weights, @math{w_i}.  Each sample
@math{x_i} is considered as having been drawn from a Gaussian
distribution with variance @math{\sigma_i^2}.  The sample weight
@math{w_i} is defined as the reciprocal of this variance, @math{w_i =
1/\sigma_i^2}.  Setting a weight to zero corresponds to removing a
sample from a dataset.

@deftypefun double gsl_stats_wmean (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n})
This function returns the weighted mean of the dataset @var{data} with
stride @var{stride} and length @var{n}, using the set of weights @var{w}
with stride @var{wstride} and length @var{n}.  The weighted mean is defined as,
@tex
\beforedisplay
$$
{\Hat\mu} = {{\sum w_i x_i} \over {\sum w_i}}
$$
\afterdisplay
@end tex
@ifinfo

@example
\Hat\mu = (\sum w_i x_i) / (\sum w_i)
@end example
@end ifinfo
@end deftypefun


@deftypefun double gsl_stats_wvariance (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n})
This function returns the estimated variance of the dataset @var{data}
with stride @var{stride} and length @var{n}, using the set of weights
@var{w} with stride @var{wstride} and length @var{n}.  The estimated
variance of a weighted dataset is defined as,
@tex
\beforedisplay
$$
\Hat\sigma^2 = {{\sum w_i} \over {(\sum w_i)^2 - \sum (w_i^2)}} 
                \sum w_i (x_i - \Hat\mu)^2
$$
\afterdisplay
@end tex
@ifinfo

@example
\Hat\sigma^2 = ((\sum w_i)/((\sum w_i)^2 - \sum (w_i^2))) 
                \sum w_i (x_i - \Hat\mu)^2
@end example

@end ifinfo
@noindent
Note that this expression reduces to an unweighted variance with the
familiar @math{1/(N-1)} factor when there are @math{N} equal non-zero
weights.
@end deftypefun

@deftypefun double gsl_stats_wvariance_m (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{wmean})
This function returns the estimated variance of the weighted dataset
@var{data} using the given weighted mean @var{wmean}.
@end deftypefun

@deftypefun double gsl_stats_wsd (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n})
The standard deviation is defined as the square root of the variance.
This function returns the square root of the corresponding variance
function @code{gsl_stats_wvariance} above.
@end deftypefun

@deftypefun double gsl_stats_wsd_m (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{wmean})
This function returns the square root of the corresponding variance
function @code{gsl_stats_wvariance_m} above.
@end deftypefun

@deftypefun double gsl_stats_wvariance_with_fixed_mean (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, const double @var{mean})
This function computes an unbiased estimate of the variance of weighted
dataset @var{data} when the population mean @var{mean} of the underlying
distribution is known @emph{a priori}.  In this case the estimator for
the variance replaces the sample mean @math{\Hat\mu} by the known
population mean @math{\mu},
@tex
\beforedisplay
$$
\Hat\sigma^2 = {{\sum w_i (x_i - \mu)^2} \over {\sum w_i}}
$$
\afterdisplay
@end tex
@ifinfo

@example
\Hat\sigma^2 = (\sum w_i (x_i - \mu)^2) / (\sum w_i)
@end example
@end ifinfo
@end deftypefun

@deftypefun double gsl_stats_wsd_with_fixed_mean (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, const double @var{mean})
The standard deviation is defined as the square root of the variance.
This function returns the square root of the corresponding variance
function above.
@end deftypefun

@deftypefun double gsl_stats_wabsdev (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n})
This function computes the weighted absolute deviation from the weighted
mean of @var{data}.  The absolute deviation from the mean is defined as,
@tex
\beforedisplay
$$
absdev = {{\sum w_i |x_i - \Hat\mu|} \over {\sum w_i}}
$$
\afterdisplay
@end tex
@ifinfo

@example
absdev = (\sum w_i |x_i - \Hat\mu|) / (\sum w_i)
@end example
@end ifinfo
@end deftypefun

@deftypefun double gsl_stats_wabsdev_m (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{wmean})
This function computes the absolute deviation of the weighted dataset
@var{data} about the given weighted mean @var{wmean}.
@end deftypefun

@deftypefun double gsl_stats_wskew (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n})
This function computes the weighted skewness of the dataset @var{data}.
@tex
\beforedisplay
$$
skew = {{\sum w_i ((x_i - xbar)/\sigma)^3} \over {\sum w_i}}
$$
\afterdisplay
@end tex
@ifinfo

@example
skew = (\sum w_i ((x_i - xbar)/\sigma)^3) / (\sum w_i)
@end example
@end ifinfo
@end deftypefun

@deftypefun double gsl_stats_wskew_m_sd (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{wmean}, double @var{wsd})
This function computes the weighted skewness of the dataset @var{data}
using the given values of the weighted mean and weighted standard
deviation, @var{wmean} and @var{wsd}.
@end deftypefun

@deftypefun double gsl_stats_wkurtosis (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n})
This function computes the weighted kurtosis of the dataset @var{data}.

@tex
\beforedisplay
$$
kurtosis = {{\sum w_i ((x_i - xbar)/sigma)^4} \over {\sum w_i}} - 3
$$
\afterdisplay
@end tex
@ifinfo

@example
kurtosis = ((\sum w_i ((x_i - xbar)/sigma)^4) / (\sum w_i)) - 3
@end example
@end ifinfo
@end deftypefun

@deftypefun double gsl_stats_wkurtosis_m_sd (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{wmean}, double @var{wsd})
This function computes the weighted kurtosis of the dataset @var{data}
using the given values of the weighted mean and weighted standard
deviation, @var{wmean} and @var{wsd}.
@end deftypefun

@node Maximum and Minimum values
@section Maximum and Minimum values

The following functions find the maximum and minimum values of a
dataset (or their indices).  If the data contains @code{NaN}s then a
@code{NaN} will be returned, since the maximum or minimum value is
undefined.  For functions which return an index, the location of the
first @code{NaN} in the array is returned.

@deftypefun double gsl_stats_max (const double @var{data}[], size_t @var{stride}, size_t @var{n})
This function returns the maximum value in @var{data}, a dataset of
length @var{n} with stride @var{stride}.  The maximum value is defined
as the value of the element @math{x_i} which satisfies @c{$x_i \ge x_j$}
@math{x_i >= x_j} for all @math{j}.

If you want instead to find the element with the largest absolute
magnitude you will need to apply @code{fabs} or @code{abs} to your data
before calling this function.
@end deftypefun

@deftypefun double gsl_stats_min (const double @var{data}[], size_t @var{stride}, size_t @var{n})
This function returns the minimum value in @var{data}, a dataset of
length @var{n} with stride @var{stride}.  The minimum value is defined
as the value of the element @math{x_i} which satisfies @c{$x_i \le x_j$}
@math{x_i <= x_j} for all @math{j}.

If you want instead to find the element with the smallest absolute
magnitude you will need to apply @code{fabs} or @code{abs} to your data
before calling this function.
@end deftypefun

@deftypefun void gsl_stats_minmax (double * @var{min}, double * @var{max}, const double @var{data}[], size_t @var{stride}, size_t @var{n})
This function finds both the minimum and maximum values @var{min},
@var{max} in @var{data} in a single pass.
@end deftypefun

@deftypefun size_t gsl_stats_max_index (const double @var{data}[], size_t @var{stride}, size_t @var{n})
This function returns the index of the maximum value in @var{data}, a
dataset of length @var{n} with stride @var{stride}.  The maximum value is
defined as the value of the element @math{x_i} which satisfies 
@c{$x_i \ge x_j$}
@math{x_i >= x_j} for all @math{j}.  When there are several equal maximum
elements then the first one is chosen.
@end deftypefun

@deftypefun size_t gsl_stats_min_index (const double @var{data}[], size_t @var{stride}, size_t @var{n})
This function returns the index of the minimum value in @var{data}, a
dataset of length @var{n} with stride @var{stride}.  The minimum value
is defined as the value of the element @math{x_i} which satisfies
@c{$x_i \ge x_j$}
@math{x_i >= x_j} for all @math{j}.  When there are several equal
minimum elements then the first one is chosen.
@end deftypefun

@deftypefun void gsl_stats_minmax_index (size_t * @var{min_index}, size_t * @var{max_index}, const double @var{data}[], size_t @var{stride}, size_t @var{n})
This function returns the indexes @var{min_index}, @var{max_index} of
the minimum and maximum values in @var{data} in a single pass.
@end deftypefun

@node Median and Percentiles
@section Median and Percentiles

The median and percentile functions described in this section operate on
sorted data.  For convenience we use @dfn{quantiles}, measured on a scale
of 0 to 1, instead of percentiles (which use a scale of 0 to 100).

@deftypefun double gsl_stats_median_from_sorted_data (const double @var{sorted_data}[], size_t @var{stride}, size_t @var{n})
This function returns the median value of @var{sorted_data}, a dataset
of length @var{n} with stride @var{stride}.  The elements of the array
must be in ascending numerical order.  There are no checks to see
whether the data are sorted, so the function @code{gsl_sort} should
always be used first.

When the dataset has an odd number of elements the median is the value
of element @math{(n-1)/2}.  When the dataset has an even number of
elements the median is the mean of the two nearest middle values,
elements @math{(n-1)/2} and @math{n/2}.  Since the algorithm for
computing the median involves interpolation this function always returns
a floating-point number, even for integer data types.
@end deftypefun

@deftypefun double gsl_stats_quantile_from_sorted_data (const double @var{sorted_data}[], size_t @var{stride}, size_t @var{n}, double @var{f})
This function returns a quantile value of @var{sorted_data}, a
double-precision array of length @var{n} with stride @var{stride}.  The
elements of the array must be in ascending numerical order.  The
quantile is determined by the @var{f}, a fraction between 0 and 1.  For
example, to compute the value of the 75th percentile @var{f} should have
the value 0.75.

There are no checks to see whether the data are sorted, so the function
@code{gsl_sort} should always be used first.

The quantile is found by interpolation, using the formula
@tex
\beforedisplay
$$
\hbox{quantile} = (1 - \delta) x_i + \delta x_{i+1}
$$
\afterdisplay
@end tex
@ifinfo

@example
quantile = (1 - \delta) x_i + \delta x_@{i+1@}
@end example

@end ifinfo
@noindent
where @math{i} is @code{floor}(@math{(n - 1)f}) and @math{\delta} is
@math{(n-1)f - i}.

Thus the minimum value of the array (@code{data[0*stride]}) is given by
@var{f} equal to zero, the maximum value (@code{data[(n-1)*stride]}) is
given by @var{f} equal to one and the median value is given by @var{f}
equal to 0.5.  Since the algorithm for computing quantiles involves
interpolation this function always returns a floating-point number, even
for integer data types.
@end deftypefun


@comment @node Statistical tests
@comment @section Statistical tests

@comment FIXME, do more work on the statistical tests

@comment -@deftypefun double gsl_stats_ttest (const double @var{data1}[], double @var{data2}[], size_t @var{n1}, size_t @var{n2})
@comment -@deftypefunx Statistics double gsl_stats_int_ttest (const double @var{data1}[], double @var{data2}[], size_t @var{n1}, size_t @var{n2})

@comment The function @code{gsl_stats_ttest} computes the t-test statistic for
@comment the two arrays @var{data1}[] and @var{data2}[], of lengths @var{n1} and
@comment -@var{n2} respectively.

@comment The t-test statistic measures the difference between the means of two
@comment datasets.

@node Example statistical programs
@section Examples
Here is a basic example of how to use the statistical functions:

@example
@verbatiminclude examples/stat.c
@end example

The program should produce the following output,

@example
@verbatiminclude examples/stat.out
@end example


Here is an example using sorted data,

@example
@verbatiminclude examples/statsort.c
@end example

This program should produce the following output,

@example
@verbatiminclude examples/statsort.out
@end example

@node Statistics References and Further Reading
@section References and Further Reading

The standard reference for almost any topic in statistics is the
multi-volume @cite{Advanced Theory of Statistics} by Kendall and Stuart.

@itemize @asis
@item
Maurice Kendall, Alan Stuart, and J. Keith Ord.
@cite{The Advanced Theory of Statistics} (multiple volumes)
reprinted as @cite{Kendall's Advanced Theory of Statistics}.
Wiley, ISBN 047023380X.
@end itemize

@noindent
Many statistical concepts can be more easily understood by a Bayesian
approach.  The following book by Gelman, Carlin, Stern and Rubin gives a
comprehensive coverage of the subject.

@itemize @asis
@item
Andrew Gelman, John B. Carlin, Hal S. Stern, Donald B. Rubin.
@cite{Bayesian Data Analysis}.
Chapman & Hall, ISBN 0412039915.
@end itemize

@noindent
For physicists the Particle Data Group provides useful reviews of
Probability and Statistics in the ``Mathematical Tools'' section of its
Annual Review of Particle Physics. 

@itemize @asis
@item
@cite{Review of Particle Properties}
R.M. Barnett et al., Physical Review D54, 1 (1996)
@end itemize

@noindent
The Review of Particle Physics is available online at
the website @uref{http://pdg.lbl.gov/}.