diff options
Diffstat (limited to 'gsl-1.9/statistics/TODO')
-rw-r--r-- | gsl-1.9/statistics/TODO | 102 |
1 files changed, 102 insertions, 0 deletions
diff --git a/gsl-1.9/statistics/TODO b/gsl-1.9/statistics/TODO new file mode 100644 index 0000000..ca5a984 --- /dev/null +++ b/gsl-1.9/statistics/TODO @@ -0,0 +1,102 @@ +* From: James Theiler <jt@lanl.gov> +To: John Lamb <J.D.Lamb@btinternet.com> +Cc: gsl-discuss@sources.redhat.com +Subject: Re: Collecting statistics for time dependent data? +Date: Thu, 9 Dec 2004 14:18:36 -0700 (MST) + +On Thu, 9 Dec 2004, John Lamb wrote: + +] Raimondo Giammanco wrote: +] > Hello, +] > +] > I was wondering if there is a way to compute "running" statistics with +] > gsl. +] > +] Yes you can do it, but there's nothing in GSL that does it and its eay +] enough that you don't need GSL. Something like (untested) +] +] double update_mean( double* mean, int* n, double x ){ +] if( *n == 1 ) +] *mean = x; +] else +] *mean = (1 - (double)1 / *n ) * *mean + x / n; +] } +] +] will work and you can derive a similar method for updating the variance +] using the usual textbook formula. +] +] var[x] = (1/n) sum x^2_i - mean(x)^2 +] +] I don't know if there is a method that avoids the rounding errors. I +] don't know why so many textbooks repeat this formula without the +] slightest warning that it can go so badly wrong. +] +] + +Stably updating mean and variance is remarkably nontrivial. There was +a series of papers in Comm ACM that discussed the issue; the final one +(that I know of) refers back to the earlier ones, and it can be found +in D.H.D. West, Updating mean and variance estimates: an improved +method, Comm ACM 22:9, 532 (1979) [* I see Luke Stras just sent this +reference! *]. I'll just copy out the pseudocode since the paper is +old enough that it might not be easy to find. This, by the way, is +generalized for weighted data, so it assumes that you get a weight and +a data value (W_i and X_i) that you use to update the estimates XBAR +and S2: + + SUMW = W_1 + M = X_1 + T = 0 + For i=2,3,...,n + { + Q = X_i - M + TEMP = SUM + W_i // typo: He meant SUMW + R = Q*W_i/TEMP + M = M + R + T = T + R*SUMW*Q + SUMW = TEMP + } + XBAR = M + S2 = T*n/((n-1)*SUMW) + + + +jt + +-- +James Theiler Space and Remote Sensing Sciences +MS-B244, ISR-2, LANL Los Alamos National Laboratory +Los Alamos, NM 87545 http://nis-www.lanl.gov/~jt + + +* Look at STARPAC ftp://ftp.ucar.edu/starpac/ and Statlib +http://lib.stat.cmu.edu/ for more ideas + +* Try using the Kahan summation formula to improve accuracy for the +NIST tests (see Brian for details, below is a sketch of the algorithm). + + sum = x(1) + c = 0 + + DO i = 2, 1000000, 1 + y = x(i) - c + t = sum + y + c = (t - sum) - y + sum = t + ENDDO + +* Prevent incorrect use of unsorted data for quartile calculations +using a typedef for sorted data (?) + +* Rejection of outliers + +* Time series. Auto correlation, cross-correlation, smoothing (moving +average), detrending, various econometric things. Integrated +quantities (area under the curve). Interpolation of noisy data/fitting +-- maybe add that to the existing interpolation stuff.What about +missing data and gaps? + + There is a new GNU package called gretl which does econometrics + +* Statistical tests (equal means, equal variance, etc). + |