StochasticSplats — Stochastic Rasterization for Sorting-Free 3D Gaussian Splatting

Abstract

3D Gaussian Splatting (3DGS) represents scenes with tens of thousands of anisotropic Gaussians that are "splatted" onto the image plane and alpha-blended front-to-back. The mandatory depth sort is the Achilles heel of classic 3DGS:

𝒪(N log N) CPU/GPU work per frame
branch-divergent fragment shaders
"popping" artifacts when the sort order changes under camera motion
paradoxically, lower-resolution renders are not cheaper because sort cost is scene-, not pixel-, bound

Core Idea

StochasticSplats drops the sort entirely. Each pixel receives K Monte-Carlo samples of the volume-rendering equation, choosing splats probabilistically instead of deterministically front-to-back. The estimator is:

Ĉ(x) = (1/K) * Σ[k=1 to K] (σ(g_ik, x) / p(ik | x)) * T_k-1 = Π[j<k] (1 - α_ij)

where p(ik | x) ∝ expected contribution of splat i to pixel x. This is an unbiased estimator of classical alpha compositing; variance falls as 1/K.

1. Introduction

𝒪(N log N) CPU/GPU work per frame
branch-divergent fragment shaders
"popping" artifacts when the sort order changes under camera motion
paradoxically, lower-resolution renders are not cheaper because sort cost is scene-, not pixel-, bound

2. Mathematical Analysis

2.1 Complexity

Let N be splat count, P pixel count, K samples / pixel.

Pipeline

Expected operations

Sorted splats

P⋅M shade + Nlog N sort

StochasticSplats

P⋅K shade (no sort)

The break-even K* occurs when:

K* ≈ (M/log N) * (N/P)

Typical scenes (N≈60k, P≈1M, M≈40) give K*≈6. Empirically a visually clean render needs 4–8 spp, so stochastic wins.

2.2 Variance Bound

Assuming per-pixel transmittance T and bounded per-splat contribution σ ≤ σ_max:

Var[Ĉ] ≤ (σ_max² / K) * E[T]

Render-time quality trade-off is therefore explicit and tunable.

3. Implementation

Re-use GPU depth buffer by sampling only the nearest hit per strata ("stochastic transparency").
Importance-sample splats with a hierarchical BVH built once after training.
Single-pass GLSL shader; differentiable via re-parameterisation for training/fine-tuning.

4. Results

Resolution 1280×720, RTX 4090, 60k splats.

Samples/pixel

Time (ms)

PSNR↑

DSSIM↓

2.7

28.7

0.043

2.9

30.1

0.033

3.3

32.8

0.018

5.6

34.2

0.011

6.5

36.9

0.006

Sorted 3DGS

14.7

33.4

0.015

StochasticSplats is 4–5 × faster than depth-sorted splats at equal quality and eliminates popping artifacts.

5. Ablations

No importance sampling → 2 × variance.
No depth-buffer early-out → 40% slower.
Lower-res render (640 × 360) gives 1.8 ms at 4 spp; sorted 3DGS barely speeds up (13.2 ms) confirming sort bottleneck.

6. Discussion & Limitations

High-frequency transparency (foliage) needs ≥8 spp.
Mobile GPUs: random sampling incurs divergent memory; a hashed-grid sampler could help.
Training-time gradients have extra variance; we use five-sample antithetic pairs to stabilise.

7. Conclusion

StochasticSplats reframes 3D Gaussian Splatting as Monte-Carlo volume rendering:

removes depth sorting,
exposes a continuous speed-quality dial,
attains real-time framerates while matching classical alpha compositing in expectation.

Future work: space–time sampling for motion-blurred splats and hardware ray-query support for even lower variance.

Previousresearch-articles NextLatent Loop Optimization: Reducing Hallucination in Long-Context Language Models

Last updated 1 month ago