Learn associations of genetic variations and human diseases with large bio-biobank dataset and statistical methods
Modelling hidden structure between genomic variations and human diseases
The relationship of genomic variations and human diseases has been of great interest to researchers for decades. At present, large amount of results have already been obtained using standard statistical methods like GWAS. However, with the accumulation of next generation sequencing data at an unprecedented high speed and the establishments of large scale bio-bank datasets, more advanced statistical tools are needed to do further mining of data and to gain deeper insight for mechanism and relationships of diseases.
My work involves developing new statistical tools to analyse large scale phenotype data and applying these new algorithms onto bio-bank dataset for clustering of diseases. Currently I am building a new Bayesian statistical model based on widely used machine learning tool "LDA". In the near future, I will analyse UK biobank phenotype data with the new algorithm. And based on the results, various downstream analysis will also be carried out.