The functional form of galaxy and halo luminosity and mass functions
Published in MNRAS (Submitted), 2026
Recommended citation: A. Ford, H. Desmond, D.J. Bartlett and P.G. Ferreira (2026). "The functional form of galaxy and halo luminosity and mass functions." arXiv:2604.23236.
Abstract
The galaxy luminosity and stellar mass function (LF, SMF), and halo mass function (HMF), are fundamental quantities in astrophysics and crucial inputs to a range of astrophysical and cosmological analyses. They are typically parametrised by fitting functions that have been chosen “by eye” to match observed or simulated data. We apply symbolic regression – specifically the Exhaustive Symbolic Regression (ESR) algorithm – to automate the search for optimal LF, SMF and HMF functional forms. ESR scores all functions up to a maximum complexity composed of a user-defined basis set of operators using the description length, an approximation to the Bayesian evidence that balances accuracy with complexity. We find many functions outperforming the Schechter and double Schechter functions for the LF and SMF, and that outperform the Press–Schechter and Warren/Tinker functions for the HMF. By additionally imposing “physicality checks” on functions’ extrapolation and integration properties, we identify the optimal, low-complexity functional forms in terms of accuracy, simplicity and behaviour beyond the data range. As well as providing drop-in replacements for literature LF, SMF and HMF fitting functions, and identifying robust behaviour across well-fitting functions, we present a framework with which symbolic regression may be used to automate the discovery of optimal functions for any astrophysical dataset.
Comparison of the best ESR, Schechter and Bernardi fits to the LF (left) and SMF (right) data, for both the Sérsic (red) and cmodel (blue) photometries. The upper panels show the data and fits, the middle panels show the uncertainty-normalised residuals, and the lower panels show the per-bin $\Delta$NLL contributions relative to the best ESR function which is therefore a flat line at 0 by construction (not shown). The errorbars on the upper panels show the asymmetric 68 per cent Poisson confidence interval ($16^\mathrm{th}$–$84^\mathrm{th}$ percentiles) on the count in each bin, converted to $\log\phi$ (these are typically very small). The middle panel uses the symmetric Gaussian approximation to the Poisson uncertainties: $\sigma_{\log{\phi}} = 1/(\ln{10}\sqrt{N})$.
