Home 5 Clinical Diagnostics Insider 5 Computational Pipeline Can Analyze 1,000 Genomes a Day

Computational Pipeline Can Analyze 1,000 Genomes a Day

by | May 12, 2015 | Clinical Diagnostics Insider, Diagnostic Testing and Emerging Technologies, Special Focus-dtet

A genome computational pipeline has achieved the remarkable throughput of 1,000 genomes, a speed that will enable population-scale genomics. As part of the Intel Heads In The Clouds Challenge, GenomeNext and Nationwide Children’s Hospital (Columbus, Ohio) were challenged to analyze a complete population dataset compiled by the 1000 Genomes Consortium in one week. The 1000 Genomes Project is the largest publicly available dataset of genomic sequences, with whole-genome and whole-exome samples from 2,504 individuals from around the world. All 5,008 samples were analyzed on GenomeNext’s genomic sequence analysis platform, operated on the Amazon Web Services Cloud and powered by Intel processors. The system achieved “unprecedented throughput” with as many as 1,000 genome samples being completed per day. The analysis of 1,000 genomes generated result files close to 100TB. Not only was there a high-degree of correlation with the original analysis performed by the 1000 Genomes Consortium, but additional variants were potentially discovered during the analysis. “The successful completion of this proof-of-concept not only sets a groundbreaking timeframe for the analysis of a massive quantity of genomic data, but demonstrates the utility of the GenomeNext solution, eliminating the sequence analysis computational bottlenecks, enabling researchers and clinicians to keep pace with processing […]

A genome computational pipeline has achieved the remarkable throughput of 1,000 genomes, a speed that will enable population-scale genomics. As part of the Intel Heads In The Clouds Challenge, GenomeNext and Nationwide Children’s Hospital (Columbus, Ohio) were challenged to analyze a complete population dataset compiled by the 1000 Genomes Consortium in one week. The 1000 Genomes Project is the largest publicly available dataset of genomic sequences, with whole-genome and whole-exome samples from 2,504 individuals from around the world. All 5,008 samples were analyzed on GenomeNext’s genomic sequence analysis platform, operated on the Amazon Web Services Cloud and powered by Intel processors. The system achieved “unprecedented throughput” with as many as 1,000 genome samples being completed per day. The analysis of 1,000 genomes generated result files close to 100TB. Not only was there a high-degree of correlation with the original analysis performed by the 1000 Genomes Consortium, but additional variants were potentially discovered during the analysis. “The successful completion of this proof-of-concept not only sets a groundbreaking timeframe for the analysis of a massive quantity of genomic data, but demonstrates the utility of the GenomeNext solution, eliminating the sequence analysis computational bottlenecks, enabling researchers and clinicians to keep pace with processing the magnitude of genomic data analysis required for population-scale genomics” said James Hirmas, CEO of GenomeNext, in a statement. For more information on how improved genome analysis tools are going to accelerate adoption of sequencing-based technology in clinical settings, please see the Special Focus section on page 8.

Subscribe to Clinical Diagnostics Insider to view

Start a Free Trial for immediate access to this article