Faster Genome Analysis Enabling Clinical Application, Population-Scale Translational Research
Genome analysis pipelines are getting faster. These advances in computational pipelines are going to alleviate the notorious analysis bottleneck that challenges clinical adoption of genome sequencing. To achieve widespread clinical relevance, time to results must be cut significantly and to facilitate the next wave of understanding about the genetic origins of disease, these analysis pipelines […]
An Automated Solution To overcome the challenges of analyzing these large amounts of data, White and his team developed a computational pipeline called “Churchill.” By applying novel computational techniques, the fully-automated Churchill can analyze a whole genome in 77 minutes. Churchill developers predict that the platform’s speed will have a “major impact” in clinical diagnostic sequencing. Churchill’s algorithm was licensed to Columbus-based GenomeNext for commercialization as a secure, software-as-a-service. “Accuracy and speed are extremely important even if you are dealing with one sample,” says James Hirmas, CEO of GenomeNext. “If it takes two days to get through the sequencing and then two weeks of analysis to determine the pathologic variant, that is too long to be relevant for a critically ill newborn. “ According to a Jan. 20 article in Genome Biology, Churchill’s performance was validated using the Genome in a Bottle Consortium reference sample. Churchill demonstrated high overall sensitivity (99.7 percent), accuracy (99.9 percent), and diagnostic effectiveness (99.7 percent), the highest of the three pipelines assessed. The other pipelines tested were the Genome Analysis Toolkit-Queue (using scatter-gather parallelization) and HugeSeq (using chromosomal parallelization). The developers say Churchill’s deterministic performance “sets an NGS analysis standard of 100 percent reproducibility, without sacrificing data quality.” “We aren’t naive to think that other groups aren’t trying to do this and they may achieve comparable speed in the future,” James Hirmas, GenomeNext’s CEO tells DTET. “So the issue is quality. The hidden dark secret of genome analysis tools is determinism and reproducibility.” Churchill divides the genome into thousands of smaller regions and runs them in parallel. While this sounds obvious, development was “challenging.” White says that central to Churchill’s parallelization strategy is the development of a novel deterministic algorithm that enables division of the workflow across many genomic regions with fixed boundaries or ‘subregions.’ “This division of work, if naively implemented, would have major drawbacks: read pairs spanning subregional boundaries would be permanently separated, leading to incomplete deduplication and variants on boundary edges would be lost,” White writes in Genome Biology. “To overcome this challenge, Churchill utilizes both an artificial chromosome, where interchromosomal or boundary-spanning read pairs are processed, and overlapping subregional boundaries, which together maintain data integrity and enable significant performance improvements.” Churchill’s speed is also highly scalable, enabling full analysis of the 1000 Genomes raw sequence dataset in a week using cloud resources. This, the developers say, demonstrates Churchill’s utility for population-scale genomic analysis. Churchill identified 41.2 million variants in the set with 34.4 million variant sites in common between Churchill and the 1000 Genomes Project’s analysis. The 1,088 low-coverage whole-genome samples had a total analysis cost of approximately $12,000, inclusive of data storage and processing, White says. Hirmas tells DTET that the company’s platform is well suited to both clinical laboratories and research entities engaging in largescale genomic studies. Sequencing, Hirmas explains, is run in batch jobs and it is more economical, depending on instrument size, to run 20 or even 50 samples in a tube. While 50 samples waiting for analysis doesn’t meet the thousands of genomes associated with population-scale genomics, 50 genomes may still be problematic for a lab if it takes two weeks to analyze each genome. The provision of fast genome analysis solutions as a service in the cloud is expected to accelerate clinical adoption of whole-exome and whole-genome sequencing and will enable the technology to be adopted by smaller laboratories. Genome analysis as a service eliminates many of the upfront costs and on-going overhead expenses tied to in-house analysis development. Labs can get tests up and running faster without having the outlay of capital investment to procure computer infrastructure. Additionally, labs don’t have to assemble hard-to-find bioinformatics teams. These commercial systems are scalable, meaning laboratories have access to the computational power they need when they have high volumes, but aren’t managing the overhead of on-site equipment capacity when testing volumes are low. Finally, despite the uncertainty of added regulation of sequencing-based testing and evolving security policies, GenomeNext and other emerging software-as-a-service genome analysis companies are building their systems to meet security and other laboratory regulations. For instance with these services, clinical laboratories can lock-down their analysis pipeline to meet CLIA and College of American Pathology regulations.