Knowledge discovery workflows in the exploration of complex astronomical datasets

Raffaele D'Abrusco (Harvard-Smithsonian Center for Astrophysics), Giuseppina Fabbiano (Harvard-Smithsonian Center for Astrophysics), Giuseppe Longo (Dipartimento di Fisica, Universita' di Napoli), Omar Laurino (Smithsonian Astrophysical Observatory)


Abstract

The understanding of the physical mechanisms regulating the behavior of all astronomical sources requires
a multi-wavelength picture of the emission of a large number of sources for several different families of astronomical
objects. The impressive production
of data from large area surveys observing in several different spectral ranges has propelled the federation of massive
and complex datasets that increasingly fit this description and serve that goal. Nonetheless, as the availability of such
kind of data grows, the traditional data analysis techniques appear more and more unable to make justice of the intrinsically
peculiar type of information therein contained.
Knowledge discovery techniques, while relatively new to Astronomy, have been successfully applied in several other disciplines
for the determination of patterns in extremely complex datasets. The concerted use of different unsupervised and supervised
machine learning techniques, in particular, can be a powerful approach to answer specific questions involving high-dimensional
datasets and degenerate observables.
In this talk, I shall firstly review the most relevant applications of clustering techniques to astronomical problems
known in the literature. Then, I shall present an data-driven methodology developed by my collaborators and me based on the sensible
combination of clustering techniques and pattern recognition algorithms. Unsupervised clustering, by its own, is effective for
the discovery of unknown correlations and hidden patterns as traced, in the data, by the spontaneous aggregations in their
observable spaces. Moreover, clustering provides an interesting way to fine-tune regression and classification algorithms
to specific problems in order to exploit efficiently the information contained in the data. At this goal, I shall describe
the results of the application of our original method to distinct problems in the field of extragalactic Astronomy.
Finally, based on my personal experience, I will suggest a practical roadmap for the succesfull deployment and adoption
of advanced knowledge discovery techniques in the general astronomical community.

Slides in PDF format

Paper ID: I02



Latest News

Quick links

ADASS XXI Conference Poster

Download the Official Conference Flyer:

JPG:   A4  A3

PDF (with printer marks):

8.5in x 11in  11in x 17in  A4  A3  A2