Mariame Gnéré Coulibaly's project will focus on the development of a tool for the prediction of bacterial host of phages by machine learning (ML). The first objective will be to develop a ML tool to identify bacteria-phage pairs from CRISPR spacer sequences (clustered regularly interspaced short palindromic repeats) found in bacterial genomes, which are fragments of phage genomes that have infected the bacteria. The second objective is to develop a ML tool to identify bacteria-phage pairs based on sequence and methylation information (i.e. the addition of methyl groups to nucleotides). For the first two objectives, algorithms such as neural networks with attention mechanisms, and similarity predictors based on string kernels, will be developed and tested.
The third objective is to develop a multi-view algorithm combining the two previous objectives, with one view for CRISPR information and a second view for methylation patterns.
The tools developed will have a strong impact on the microbiology and virology research community, by being able to identify new bacteria-phage pairs from microbiota samples.