Fachbereich Informatik an der RPTU in Kaiserslautern

Naghmeh Ghanooni

(AG Prof. Kloft)
hosted by PhD Program in CS @ TU KL

"Deep Extreme SNP prediction"

( MPI-SWS talk in Kooperation mit dem Fachbereich Informatik)

Single Nucleotide Polymorphisms (SNPs) are the most common and simplest sequence variants in single bases of DNA in humans. A DNA sequence consists of a chain of four nucleotide bases: A, C, G, and T. An SNP consists in a difference in a nucleotide of paired chromosomes in an individual. For example, a cytosine (C) nucleotide may be replaced by the nucleotide thymine (T) in a certain position of DNA. SNP occurs on average once in every 1000 nucleotides. Thus, one's genome contains an average 4 to 5 million SNPs, which can be unique or shared between individuals. Some SNPs are associated with certain diseases such as diabetes or cancer, or only some genetic differences or traits. Measuring and studying the SNPs of individuals is of huge importance to the study of human health. Indeed, this might help us find the disease-inducing genes that are inherited within families, or even predict a person's response to certain medicine. In this project, we analyse how to predict the outcome of thousands of specific SNPs starting only from phenotypical image data: we implement a deep learning approach to extract features from retinal fundus images from the UK Biobank to predict all the SNPs on a chromosome. Since the number of SNPs can be counted in millions, our model can be categorized as a novel application of extreme multi-label classification. However there are some differences: the SNP prediction problem deals with both multi-label and multi-class classification. The multi-class aspect comes from the fact that each SNP can have two possible alleles and each chromosome has two copies in the human genome, resulting in 3 classes: 0, 1, and 2 (e.g. CC, CT-TC, or TT). The problem is also multi-label because for each sample we can get more than one SNP occurrence. With the view to provide better qualitative understanding of the genetic factors driving disease and other biological phenomena, we are also interested in mining all the genetic differences (in terms of SNP's) between specific pairs of individuals, beyond the well-known ones that give rise to well understood inheritable traits such as the colour of the skin or the eyes.


Time: Monday, 24.08.2020, 15:30
Place: https://bbb.rlp.net/b/mid-wdt-qt2

Termin als iCAL Datei downloaden und in den Kalender importieren.