Introduction

UK Biobank dataset is the world’s largest collection of human genetic and phenotypic data (about half a million people). We use this dataset to construct accurate genotype-to-phenotype predictive models based on modern machine learning methods. This is the central problem in modern human genetics that can potentially have important applications to automated disease diagnostics.

Working with the UK Biobank data on a large scale poses a significant computational challenge and requires the development of substantial infrastructure for storage and analysis of predictive models, as well as solving programming problems related to scientific codebase. Our computations is powered with “Zhores” - the first Russian petaflops energy-efficient supercomputer specially designed for solving problems in Machine Learning and data-based modelling. Our team work is powered up with DATASKAI - the platform and framework developed at CDISE IoT Center used to apply data-driven approach to a broad variety of industrial and science problems.