|𝔻⟩irac's Student: Shotgun Review in Fitting Metrics

In machine learning and statistical modeling, evaluating the quality of a model's predictions is the most fundamental aspect. You want to know did the the neural network or polynomial function capture the observed data. Two commonly used metrics are Mean Squared Error (MSE) and Chi-squared ($\chi^2$). While both measure the goodness of fit/prediction, they differ in their approaches and applications. In this post I'm just going to shotgun review differences, emphasizing how $\chi^2$ incorporates the standard deviation of observed data points.

Mean Squared Error (MSE): Basic staple in ML

MSE is widely used in machine learning and statistics for assessing model accuracy. It is defined as:

\begin{equation} \text{MSE} = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y_i})^2 \label{eq:mse} \end{equation}

where $y_i$ are the observed values, $\hat{y_i}$ are the predicted values, and $N$ is the number of observations. MSE is favored for its simplicity and effectiveness, especially where the goal is accurate prediction across a diverse dataset. Keep in mind that if your dealing with datasets that are in the 10's of thousands of data points, a metric like MSE make evaluation very straightfoward.

Chi-Squared ($\chi^2$): Fit quality using Data point uncertainties

\begin{equation} \chi^2 = \sum \left( \frac{y_i - f(x_i)}{\sigma_i} \right)^2 \label{eq:chi} \end{equation}

Chi-squared is useful when standard deviations or uncertainties of observed values are known and vary. In $\chi^2$, each term of the sum is weighted by the inverse of the variance of each observation, making it sensitive to model fit relative to data point accuracy. One way to think about what $\chi^2$ tells us is to compare terms in the numerator and denominator, if the difference in the numerator is large and the uncertainty/error is small then we get a large value, indicating poor predictive ability of the fit. If the difference is small compared to the uncertainty then the predictive ability could be too agressive (i.e., overfitted). Just to idle on this a bit more, lets put some actual examples of numbers:

\begin{align*} \chi^2 &= \left( \frac{10 - 12}{2} \right)^2 + \left( \frac{20 - 19}{3} \right)^2 + \left( \frac{30 - 29}{4} \right)^2 \\ &= 1 + 0.1111 + 0.0625 \\ &\approx 1.1736 \end{align*}

Upon inspection you can see how individual terms change based on the numerator and denominator values.

Propagating Uncertainty in Model Parameters

Both MSE and $\chi^2$ provide an average perspective on the quality of fit. However, an important aspect in model evaluation is understanding the uncertainty in the fit parameters themselves, such as weights in neural networks or coefficients in polynomial functions. This is particularly meaningful when there is error/uncertaintly on the observed values. To capture this we have to utlize uncertainty propagation which is crucial:

In models with higher parameter uncertainty, predictions might be less reliable, even if the overall fit quality (as measured by MSE or $\chi^2$) is good.
Techniques like error propagation, confidence intervals, and Bayesian methods can be used to quantify the uncertainty in model parameters.
Understanding parameter uncertainty helps in assessing the model's predictive power and robustness, particularly in scenarios where predictions are used for critical decision-making.

Summary

Both MSE and $\chi^2$ are valuable for assessing model prediction quality, but their applicability usually depends on the nature of the data and modeling objectives. In ML it seems $\chi^2$ is rarely used (personally haven't see it) where MSE is a standard loss function. The propagation of uncertainty in model parameters further adds to the complexity of model evaluation, but can be very important in assessing model reliability.