Via PRACE SHAPE NSilico teamed up with computational experts from CINES (France) and ICHEC (Ireland) to address the key problem of rapid alignment of short DNA sequences to reference genomes by deploying the Smith-Waterman algorithm on an emerging many-core technology, the Intel Xeon Phi co-processor.
The project entitled high performance computation for short read alignment investigated high performance computational techniques for the analysis of ribosomal RNA, which is the mechanism that cells use to translate an organism’s DNA into protein. Next generation sequencing techniques are enabling the capture of vast amounts of data on the ribosomal RNA characteristics of cells in varying conditions. However reads for such RNA fragments are smaller than those typically encountered in sequencing projects, hence most alignment algorithms are optimised for longer reads.
“The SHAPE project has been a very successful collaboration between NSilico and the PRACE partners involved. NSilico has benefited from domain expertise from PRACE in first identifying a bioinformatics codebase with real potential to be deployed on cutting-edge many-core hardware. It has also since gained invaluable insights into the optimisation and parallelisation work involved in porting the code to the Intel Xeon Phi. Next steps are already being discussed on testing and deployment of the code with the release of the next generation “Knights Landing” hardware, as well as potential incorporation into NSilico’s in-house bioinformatics pipelines,” says Paul Walsh of NSilico.
Example output from the Smith-Waterman sequence alignment algorithm.
The project team adopted two approaches to optimise and parallelise the SSW library, first using modern SIMD intrinsics and the second using OpenMP. The OpenMP parallelisation work has led to a code that shows good parallel performance results on standard x86 processors and promising results for Xeon Phi many-core hardware. While the resulting SSW library achieves expectedly limited performance gains on the current generation of the Xeon Phi, it has been re-factored in a way to readily take advantage of the next generation of hardware such as Xeon Phi “Landing” with upcoming AVX 512 features.
The results of the project were presented during the SHAPE parallel track of PRACEdays14
Title: High performance computation for short read alignment
Leader: Dr Paul Walsh; NSilico Life Science Ltd, IRELAND
Collaborators: Dr Simon Wong, Irish Centre for High-End Computing (ICHEC), Ireland | Mr Xiangwu Lu, NSilico Life Science Ltd, Ireland | Dr Tristan Cabel, Mr Gabriel Hautreux, Mr Eric Boyer, CINES, France | Nicolas Mignerey, GENCI, France
Research field: Medicine and Life Sciences
Resource awarded: 100.000 core hours on MareNostrum @ BSC, Spain | 20.000 MIC hours on MareNostrum Hybrid Nodes @ BSC, Spain
NSilico is a company based in Ireland and is a developer of integrated molecular diagnostics and sequence data management and analytic tools for the life sciences and healthcare industries. The company’s offerings are based upon a unique and unrivalled blend of biological, computing, software development and clinical experience and expertise. Currently, the company has two product offerings: SimplicityTM, a cloud based bioinformatics research pipeline tool; and SimplicityEHRTM, for cancer care management. SimplicityTM is NSilico’s lead product and is one of the most comprehensive, easy-to-use, cloud-based software-as –a-service products for the automatic annotation, analysis and visualisation of high-throughput sequencing data. It is scalable and customizable to user needs and allows automated and rapid extraction and reporting of high value information that aids in the discovery of biomarkers/genetic profiles through the creation of publication standard, rich reports. Its usability and power help to dramatically reduce research time cycles.