Let X 1, X 2, …, Xn be randomly distributed points on the unit interval. Let Nx,x+d be the number of these points contained in the interval (x, x + d). The scan statistic Nd is defined as the maximum number of points in a window of length d; that is, Nd = sup x Nx,x+d. This statistic is used to test for the presence of nonrandom clustering. We say that m points form an m: d clump if these points are all contained in some interval of length d. Let Y denote the number of m: d clumps. In this article we show how to compute the lower-order moments of Y, and we use these moments to obtain approximations and bounds for the distribution of the scan statistic Nd. Our approximations are based on using the methods of moments technique to approximate the distribution of Y. We try two basic types of methods of moments approximations: one involving a simple Markov chain model and others using various different compound Poisson approximations. Our results compare favorably with other approximations and bounds in the literature. In particular, our approximations MC2 and CPG2, which use only the first two moments of Y, do quite well and should be generally useful. We calculate the moments of Y using recursions given by Huffer. We give explicit general formulas for the first two moments of Y and show how the computer programs of Lin may be used to calculate the third and fourth moments.
Journal of the American Statistical Association 92(440), pp.1466-1475