Research

I am interested in applying machine learning (neural networks, CNNs, contrastive learning, Generative AI, transformers) to large astronomical datasets. A key ingredient to studying cosmology and galaxy evolution is knowing distances to galaxies, which is the primary focus of my research.

JAX Neural Network to Predict Galactic Neon Lights Published in MNRAS

If you've ever seen images of spiral galaxies, you might have noticed glowing red or pink spots sprinkled throughout. These galactic neon lights are actually clouds of hydrogen gas, lit up by the intense radiation of young, massive stars. The process is similar to how neon signs work—except instead of electricity exciting the gas, it's starlight!

M51 galaxy

If starlight is responsible for these glowing regions, can a neural network predict them from the surrounding stellar light? In this project, we trained a JAX-implemented neural network on DESI early data release to predict the strengths of emission lines (or neon lights) of galaxies from their starlight.

When the light from a galaxy is split into its constituent wavelengths, it produces a spectrum. Most of the light comes from stars, which produce a smooth spectrum, and the neon lights coming from gas appear as spikes on top of the smooth spectrum. These spikes are called emission lines. This project shows that these two sources are strongly correlated. This is not surprising — emission lines are produced when the gas is ionized by the light emitted from the stars. Also, the starlight holds information on the history of star formation of the galaxy, which is what determines the content of the gas.

Interactive UMAP visualization

To visualize this correlation, I created an interactive plot using Dash and Plotly. The top-left panel shows a 2D projection of the starlight (using UMAP), with 1,000 galaxies color-coded by H-alpha emission strength. You can explore an interactive version at this link (it might take a moment to load).

Estimating Distances to Galaxies from Space-based Images Using Semi-Supervised Deep Learning Talk at AstroAI 2024

In this project, I developed a semi-supervised deep learning algorithm to improve distance estimates of far-away galaxies (redshifts > 0.3).

Galactic distances are a key ingredient to understanding how the universe and galaxies formed and evolved. However, measuring distances reliably involves a resource-intensive method called spectroscopy. A more efficient (but less accurate) method is to estimate distances from galaxy images. With reliable spectroscopic measurements for a fraction of imaged galaxies, it is possible to train machine learning algorithms to estimate the distances of the remaining galaxies.

Upcoming space-based observatories, such as the Nancy Grace Roman Space Telescope, will provide high-resolution images of hundreds of millions of distant galaxies. In this project, we curated a catalog of ~100,000 Hubble Space Telescope (HST) images, ~20,000 of which have reliable distance labels, to test deep learning algorithms in preparation for future observatories.

Our results show that a semi-supervised approach which makes use of unlabeled images outperforms fully supervised methods. For bright galaxies, our method reduces bias by 87%, normalized median absolute deviation by 20%, and fraction of outliers by 47% compared to predictions from traditional methods.

Semi-supervised model architecture Neighbor comparison in latent space

The top row shows galaxies with incorrect predictions from traditional methods but correct predictions from our method, because traditional methods ignore morphology while our semi-supervised latent space captures it.