Research – Guillermo Cabrera-Vives

Artificial Intelligences for Variable Astronomical Sources

During the last couple of years, astronomy has become a data-driven science. We have evolved from the Megabyte to the Gigabyte and to the Terabyte regime in less than a decade. Thanks to the special characteristics of the skies in Chile (low humidity, high peaks and plains, low light pollution, and large number of clear nights), most of the major observatories are being installed here. Survey telescopes, such as the Large Synoptic Survey Telescope (LSST), scan a large area of the sky in order to observe a significant amount of objects, including galaxies, stars, and asteroids, among others. We are applying prediction models over variable astronomical objects in order to characterize them and detect new unknown sources never seen before. We are applying different neural network architectures for classification and novelty detection of light curves and sequences of images.

A.I. to support diagnosis for patients with cancer

Many machine learning models have been developed for patients with cancer, but integrating and validating them in clinical processes has proven to be hard. We are evaluating the applicability of state-of-the-art predictive models in a real medical scenario, as well as developing new predictive models to aid radiologists and oncologists.

Biased Data

Supervised machine learning usually relies on labels created by human annotators. This labels, called the ground truth are assumed to be correct or contain random white noise errors. However, real data is never perfect, and the process of labeling can be systematically biased due to the quality of the observed data. For example, when labeling images, human labels can be biased in terms of the image resolution, making all humans to agree in an incorrect label. The variance on the labels can be very small and the estimated labels still be wrong.

Deep domain adaptation

Deep learning methods for classification are more accurate than feature engineering approaches but require more labeled data to achieve such performances. We are exploring new ways of training a deep learning architecture on a source labeled dataset and adapt it to a differently distributed target unlabeled dataset. This problem is particularly relevant for astronomy and genomic data, as new datasets are constantly being created and their labeling process is expensive.

Deep learning models for identifying cancer pathways

We are analysing single-cell genome sequencing (scRNAseq) data to identify specific pathways associated to different types of cancer. We found that the regulatory T cells reveal a unique genetic signature across them. Although we found a set of other cell types, our focus is on T cells because they modulate the immune system, controlling immune responses. We are currently developing deep learning models in order to integrate the different types of cells for different types of cancer through a semi-supervised learning approach.