summaryrefslogtreecommitdiff
path: root/gsl-1.9/statistics/TODO
diff options
context:
space:
mode:
Diffstat (limited to 'gsl-1.9/statistics/TODO')
-rw-r--r--gsl-1.9/statistics/TODO102
1 files changed, 102 insertions, 0 deletions
diff --git a/gsl-1.9/statistics/TODO b/gsl-1.9/statistics/TODO
new file mode 100644
index 0000000..ca5a984
--- /dev/null
+++ b/gsl-1.9/statistics/TODO
@@ -0,0 +1,102 @@
+* From: James Theiler <jt@lanl.gov>
+To: John Lamb <J.D.Lamb@btinternet.com>
+Cc: gsl-discuss@sources.redhat.com
+Subject: Re: Collecting statistics for time dependent data?
+Date: Thu, 9 Dec 2004 14:18:36 -0700 (MST)
+
+On Thu, 9 Dec 2004, John Lamb wrote:
+
+] Raimondo Giammanco wrote:
+] > Hello,
+] >
+] > I was wondering if there is a way to compute "running" statistics with
+] > gsl.
+] >
+] Yes you can do it, but there's nothing in GSL that does it and its eay
+] enough that you don't need GSL. Something like (untested)
+]
+] double update_mean( double* mean, int* n, double x ){
+] if( *n == 1 )
+] *mean = x;
+] else
+] *mean = (1 - (double)1 / *n ) * *mean + x / n;
+] }
+]
+] will work and you can derive a similar method for updating the variance
+] using the usual textbook formula.
+]
+] var[x] = (1/n) sum x^2_i - mean(x)^2
+]
+] I don't know if there is a method that avoids the rounding errors. I
+] don't know why so many textbooks repeat this formula without the
+] slightest warning that it can go so badly wrong.
+]
+]
+
+Stably updating mean and variance is remarkably nontrivial. There was
+a series of papers in Comm ACM that discussed the issue; the final one
+(that I know of) refers back to the earlier ones, and it can be found
+in D.H.D. West, Updating mean and variance estimates: an improved
+method, Comm ACM 22:9, 532 (1979) [* I see Luke Stras just sent this
+reference! *]. I'll just copy out the pseudocode since the paper is
+old enough that it might not be easy to find. This, by the way, is
+generalized for weighted data, so it assumes that you get a weight and
+a data value (W_i and X_i) that you use to update the estimates XBAR
+and S2:
+
+ SUMW = W_1
+ M = X_1
+ T = 0
+ For i=2,3,...,n
+ {
+ Q = X_i - M
+ TEMP = SUM + W_i // typo: He meant SUMW
+ R = Q*W_i/TEMP
+ M = M + R
+ T = T + R*SUMW*Q
+ SUMW = TEMP
+ }
+ XBAR = M
+ S2 = T*n/((n-1)*SUMW)
+
+
+
+jt
+
+--
+James Theiler Space and Remote Sensing Sciences
+MS-B244, ISR-2, LANL Los Alamos National Laboratory
+Los Alamos, NM 87545 http://nis-www.lanl.gov/~jt
+
+
+* Look at STARPAC ftp://ftp.ucar.edu/starpac/ and Statlib
+http://lib.stat.cmu.edu/ for more ideas
+
+* Try using the Kahan summation formula to improve accuracy for the
+NIST tests (see Brian for details, below is a sketch of the algorithm).
+
+ sum = x(1)
+ c = 0
+
+ DO i = 2, 1000000, 1
+ y = x(i) - c
+ t = sum + y
+ c = (t - sum) - y
+ sum = t
+ ENDDO
+
+* Prevent incorrect use of unsorted data for quartile calculations
+using a typedef for sorted data (?)
+
+* Rejection of outliers
+
+* Time series. Auto correlation, cross-correlation, smoothing (moving
+average), detrending, various econometric things. Integrated
+quantities (area under the curve). Interpolation of noisy data/fitting
+-- maybe add that to the existing interpolation stuff.What about
+missing data and gaps?
+
+ There is a new GNU package called gretl which does econometrics
+
+* Statistical tests (equal means, equal variance, etc).
+