Machine Learning in Astronomy

Over the last few months, I have been thinking about this. As someone who came up through observational and computational astrophysics before spending time in industry ML, I find myself increasingly opinionated about when the field reaches for machine learning wisely — and when it doesn’t.

When Not to Use ML

The most common mistake I see is applying ML to problems that already have good physical models. If you can write down the likelihood, use it. A Gaussian process fits your light curve residuals better than a neural network and tells you something about the underlying covariance structure. A least-squares period fit to a pulsar timing residual is interpretable in a way that a learned embedding never will be.

There’s also a subtler failure mode: using ML as a substitute for understanding your data. I spent time working with Gaia XP spectra and the temptation is real — you have 336 coefficients per source and millions of sources, so you reach for UMAP and start looking for clusters. But if you haven’t thought carefully about normalization, about what the BP/RP system actually resolves near H-alpha at R~50, or about which features drive the clustering (stellar type, reddening, or actual emission?), you’ll find structure that doesn’t mean what you think it means. The algorithm will always give you an answer. That’s the danger.

Similarly, I’m skeptical of ML for small, well-characterized samples. If you have thirty long-period transients and you want to classify them, a decision tree with physically motivated features probably outperforms a neural network — and you’ll be able to explain to a referee why it makes the cuts it makes.

When to Use ML

ML earns its keep when the problem is genuinely high-dimensional and the physics doesn’t give you a tractable likelihood. Searching for anomalies in LSST-scale photometric surveys. Rapid transient classification when you need a decision in seconds for follow-up triggering. Neural posterior estimation for gravitational wave parameter inference, where the likelihood is expensive to evaluate. Emulating simulations — replacing a costly N-body or stellar evolution call with a surrogate that runs in microseconds.

The other legitimate use case is when you want to be surprised. UMAP and clustering methods are exploratory tools. I’ve found them useful exactly once for scientific discovery: when I was genuinely uncertain what structure existed in a population and wanted an unbiased look before imposing any cuts. Used that way — as a first pass, not a final answer — dimensionality reduction can surface things you wouldn’t have found by hypothesis-testing your priors one at a time.

The honest version of this principle: ML is a good choice when the alternative is not a clean physical model, but rather an astronomers’ intuition that is itself implicit, inconsistent, and hard to audit. In that regime, a well-validated classifier is actually more interpretable than the status quo, because at least you can measure its failure modes.

— Tony