RunningStats.jl

Tests Documentation

A Julia package for streaming statistics computation using Welford’s algorithm and Chan’s parallel merging. Compute means, covariances, correlations, and variances without storing historical data in memory.

Key Features

Memory-efficient: O(p²) space complexity, independent of sample count
Numerically stable: Uses Welford’s algorithm to avoid catastrophic cancellation
Streaming updates: Process data in batches or one sample at a time
Parallel merging: Combine statistics from independent data streams
Exact results: Mathematically identical to standard batch computations

Quick Start

using RunningStats

# Create estimator and process data in batches
estimator = WelfordEstimate()
update_batch!(estimator, randn(1000, 3))

# Get all statistics at once
stats = get_statistics(estimator)
println("Processed $(stats.count) samples")
println("Means: $(stats.mean)")
println("Correlations:")
display(stats.correlation)

Installation

using Pkg
Pkg.add(url="https://github.com/anthony-meza/RunningStats.jl")

Usage

Basic Streaming Updates

using RunningStats

# Create estimator and process data incrementally
estimator = WelfordEstimate()

# Process data in batches (most common)
batch1 = randn(1000, 3)
stats = update_batch!(estimator, batch1)
println("Processed $(stats.count) samples, mean = $(round.(stats.mean, digits=3))")

# Continue with more data - no memory accumulation!
batch2 = randn(500, 3) 
update_batch!(estimator, batch2)
final_stats = get_statistics(estimator)

Single Sample Updates

# Process one sample at a time
estimator = WelfordEstimate()
for i in 1:1000
    sample = randn(3)  # 3-dimensional sample
    update_single!(estimator, sample)
end

Parallel/Distributed Computing

# Process data chunks independently, then merge
estimator1 = WelfordEstimate()
estimator2 = WelfordEstimate()

update_batch!(estimator1, chunk1)
update_batch!(estimator2, chunk2)

# Merge results (two approaches)
combined = merge_estimate(estimator1, estimator2)    # Creates new estimator
merge_estimate!(estimator1, estimator2)              # Merges into estimator1

Available Statistics

stats = get_statistics(estimator)

# Access individual components
println("Sample count: $(stats.count)")
println("Means: $(stats.mean)")
println("Covariance matrix: $(stats.covariance)")
println("Correlation matrix: $(stats.correlation)")  
println("Variances: $(stats.variance)")

# Or get them individually
cov_matrix = get_covariance(estimator)
corr_matrix = get_correlation(estimator)

Performance & Accuracy

Numerical Accuracy

RunningStats produces results identical to standard Julia functions (within machine precision):

using Statistics
data = randn(10000, 5)

# Standard approach 
julia_mean = mean(data, dims=1)[:]
julia_cov = cov(data)

# Streaming approach
estimator = WelfordEstimate()
update_batch!(estimator, data)
stats = get_statistics(estimator)

# Verify identical results
@assert maximum(abs.(julia_mean - stats.mean)) < 1e-14
@assert maximum(abs.(julia_cov - stats.covariance)) < 1e-14

Complexity

When to Use RunningStats

Use when:

Standard Julia may be better when:

API Reference

Types

Core Functions

Statistics Access

Merging Functions

Options

Algorithm Details

For detailed mathematical background including Welford’s recursion formulas and Chan’s parallel merging algorithm, see the mathematical documentation.

Key insight: Traditional variance computation σ² = E[X²] - (E[X])² suffers from catastrophic cancellation when the terms are nearly equal. Welford’s algorithm avoids this by computing deviations from the running mean, maintaining numerical stability even for ill-conditioned data.

References

  1. Welford, B. P. (1962). “Note on a method for calculating corrected sums of squares and products.” Technometrics, 4(3), 419-420.

  2. Chan, T. F., Golub, G. H., & LeVeque, R. J. (1983). “Algorithms for computing the sample variance: Analysis and recommendations.” The American Statistician, 37(3), 242-247.

  3. Knuth, D. E. (1998). The Art of Computer Programming, Volume 2: Seminumerical Algorithms (3rd ed.). Addison-Wesley.

License

MIT License - see LICENSE file for details.