Artificial Intelligence
1 Dec 2021

Generating astronomical catalogues from wide-field imaging projects using XAI

Fig. 1: The Euclid space telescope.
Fig. 1: The Euclid space telescope.

Upcoming astronomical surveys like ESA’s Euclid mission (Fig. 1) [1] and the Nancy Grace Roman Space Telescope [2] will produce an extraordinary amount of astrophysical data. For instance, Euclid will measure 1/3 of the sky, producing a catalogue of 1.5 billion resolved galaxies to study the distance–redshift correlation and the geometry of our Universe. While the main science goal of the Euclid mission is to study high redshift galaxies at large distances, the survey will also produce an incredible amount of high resolution images containing astrophysical sources from the nearby Universe. In particular, Euclid will enable an unprecedented view into the low-mass galaxy population. Characterising these galaxies and their structural components like star clusters, star-formation regions, disks or bulges will be of great importance to determine the properties of the nearby galaxy population. However, due to the vast amount of data generated by these surveys, evaluating images manually is not feasible.

Project overview

To approach this challenge, we investigate interpretable machine learning methods for the generation of astronomical catalogues and evaluate whether these can be reliably used as the basis of scientific studies. The main approach lies in cross-referencing data from new surveys with existing catalogues to generate an initial, small training data set that is used to train machine learning models. These models are then subsequently used to generate larger catalogues by evaluating the remaining, unlabelled data.

As a proof of concept, we explored the feasibility of various machine learning methods to aid the search for extragalactic globular clusters (GCs) in extensive databases. We used archival Hubble Space Telescope data in the F475W and F850LP bands of 141 early-type galaxies in the Fornax and Virgo galaxy clusters [3,4]. Using existing GC catalogues [5,6] to label the data, we obtained an extensive data set of 84929 sources containing 18556 GCs and we trained several machine learning methods both on image and tabular data containing physically relevant features extracted from the images. We found that our evaluated machine learning models are capable of producing catalogues of a similar quality as the existing ones which were constructed from mixture modelling and structural fitting. Apart from performance metrics, we demonstrated how interpretable methods such as LIME [7] can be utilised to better understand model predictions, recovering that magnitudes, colours, and sizes are important properties for identifying GCs (Fig. 2).

Fig. 2: Explanations obtained using LIME for (left) two sources that have not been classified as GCs and (right) two sources that have been classified as GCs. For instance, the source shown in the top left has not been classified as a GC since it is too dim and concentrated (low CI and high m values), while the source shown in the bottom left is too extended (high CI and high area values).
Fig. 2: Explanations obtained using LIME for (left) two sources that have not been classified as GCs and (right) two sources that have been classified as GCs. For instance, the source shown in the top left has not been classified as a GC since it is too dim and concentrated (low CI and high m values), while the source shown in the bottom left is too extended (high CI and high area values).

References

[1] Laureijs, R., Amiaux, J., Arduini, S., Augueres, J. L., Brinchmann, J., Cole, R., ... & Cresci, G. (2011). Euclid definition study report. arXiv preprint arXiv:1110.3193.

[2] Spergel, D., Gehrels, N., Baltay, C., Bennett, D., Breckinridge, J., Donahue, M., ... & Zhao, F. (2015). Wide-field infrarred survey telescope-astrophysics focused telescope assets WFIRST-AFTA 2015 report. arXiv preprint arXiv:1503.03757.

[3] Côté, P., Blakeslee, J. P., Ferrarese, L., Jordán, A., Mei, S., Merritt, D., ... & West, M. J. (2004). The ACS Virgo cluster survey. I. Introduction to the Survey. The Astrophysical Journal Supplement Series, 153(1), 223.

[4] Jordán, A., Blakeslee, J. P., Côté, P., Ferrarese, L., Infante, L., Mei, S., ... & West, M. J. (2007). The ACS Fornax Cluster Survey. I. Introduction to the survey and data reduction procedures. The Astrophysical Journal Supplement Series, 169(2), 213.

[5] Jordán, A., Peng, E. W., Blakeslee, J. P., Côté, P., Eyheramendy, S., Ferrarese, L., ... & West, M. J. (2008). The ACS virgo cluster survey XVI. Selection procedure and catalogs of globular cluster candidates. The Astrophysical Journal Supplement Series, 180(1), 54.

[6] Jordán, A., Peng, E. W., Blakeslee, J. P., Côté, P., Eyheramendy, S., & Ferrarese, L. (2015). The ACS Fornax Cluster Survey. XI. Catalog of Globular Cluster Candidates. The Astrophysical Journal Supplement Series, 221(1), 13.

[7] Ribeiro, M. T., Singh, S., & Guestrin, C. (2016, August). " Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135-1144).

Outcome

Artificial Intelligence Peer reviewed article
Evaluating the feasibility of interpretable machine learning for globular cluster detection
Dold, Dominik and Fahrion, Katja
Astronomy & Astrophysics
(2022)
Download
BibTex
Hamburger icon
Menu
Advanced Concepts Team