Exhaustive Symbolic Regression
Published in IEEE Transactions on Evolutionary Computation 28, 950, 2023
Recommended citation: D.J. Bartlett, H. Desmond and P.G. Ferreira (2023). "Exhaustive Symbolic Regression." In IEEE Transactions on Evolutionary Computation 28, 950.
Abstract
Symbolic Regression (SR) algorithms attempt to learn analytic expressions which fit data accurately and in a highly interpretable manner. Conventional SR suffers from two fundamental issues which we address here. First, these methods search the space stochastically (typically using genetic programming) and hence do not necessarily find the best function. Second, the criteria used to select the equation optimally balancing accuracy with simplicity have been variable and subjective. To address these issues we introduce Exhaustive Symbolic Regression (ESR), which systematically and efficiently considers all possible equations—made with a given basis set of operators and up to a specified maximum complexity— and is therefore guaranteed to find the true optimum (if parameters are perfectly optimised) and a complete function ranking subject to these constraints. We implement the minimum description length principle as a rigorous method for combining these preferences into a single objective. To illustrate the power of ESR we apply it to a catalogue of cosmic chronometers and the Pantheon+ sample of supernovae to learn the Hubble rate as a function of redshift, finding $\sim$40 functions (out of 5.2 million trial functions) that fit the data more economically than the Friedmann equation. These low-redshift data therefore do not uniquely prefer the expansion history of the standard model of cosmology. We make our code and full equation sets publicly available.
Code
ESR is publicly available.
 Pareto front of solutions found by different SR algorithms for the feynman_I_6_2a benchmark dataset. By construction, ESR finds the optimum solution at a given complexity, whereas other methods have an unknown (high) probability of failing to find this. The vertical dashed line indicates a break in the x-axis.
 Pareto front of solutions found by different SR algorithms for the feynman_I_6_2a benchmark dataset. By construction, ESR finds the optimum solution at a given complexity, whereas other methods have an unknown (high) probability of failing to find this. The vertical dashed line indicates a break in the x-axis.
