Gaussian Splattering in 3D

The concept of 3D Gaussian splattering is to represent the image as a mixture of three-dimensional Gaussians. The 3D Gaussians are based on the three-dimensional extension of one-dimensional Gaussians. Three-dimensional Gaussians are defined by a covariance matrix $\Sigma$ (in world coordinates) and centered at the point (mean) $\mu$:

\begin{displaymath}
G(\mathbf{x}) = e^{-\frac{1}{2} \left( \mathbf{x} - \mu \right)^{\top} \Sigma^{-1} \left( \mathbf{x} - \mu \right) }
\end{displaymath} (9.97)

To be drawn, this Gaussian must first be transformed into camera coordinates through a rigid transformation $\mathbf{W}$ and finally projected into image coordinates. However, one can consider an approximation by drawing a two-dimensional Gaussian in image space. In 2D space, the covariance $\Sigma'$ becomes

\begin{displaymath}
\Sigma' = \mathbf{J} \mathbf{W} \Sigma \mathbf{W}^{\top} \mathbf{J}^{\top}
\end{displaymath} (9.98)

where $\mathbf{W}$ is the only rotational part of the transformation, and using, as an approximation, the Jacobian $\mathbf{J}$ of the perspective projection calculated at the rotated and translated point in camera $(x,y,z)^{\top}$. For example, in the case of a pinhole camera projection:
\begin{displaymath}
\mathbf{J} = \begin{bmatrix}
k_u / z & 0 & - \frac{k_u x}{ z^2 } \\
0 & k_v / z & - \frac{k_v y}{ z^2 } \\
\end{bmatrix}\end{displaymath} (9.99)

The matrix $\Sigma'$ therefore has a dimensionality of $2 \times 2$ (ZPvBG01) and is comparable to the matrix of a 2D Gaussian.

In (KKLD23), a further step is taken: since it is challenging to parameterize a covariance matrix (which is positive semi-definite), it is based on the fact that the matrix $\Sigma$ represents an ellipsoid, allowing for a minimal parameterization instead of using all the terms of the matrix as unknowns. The idea is to utilize a scaling matrix $\mathbf{S}$ (3 DOF) and a rotation matrix $\mathbf{R}$ (another 3 DOF, typically represented by a quaternion, see section A.3):

\begin{displaymath}
\Sigma = \mathbf{R} \mathbf{S} \mathbf{S}^{\top} \mathbf{R}^{\top}
\end{displaymath} (9.100)

thus parameterizing each Gaussian with 6 DOF. It is noteworthy that $\mathbf{S} \mathbf{S}^{\top} = \diag \left( s_x^2, s_y^2, s_z^2\right)$.

Finally, each point can be associated with an RGB color or spherical harmonics (Spherical Harmonics SH), in addition to the opacity parameter $\alpha$, which is similar to that of NeRF. Practically, the Gaussians are rendered from the nearest to the farthest until the opacity reaches saturation.

Paolo medici
2025-10-22