Disease-Causing Variants Often Occur Outside of ‘High-Confidence’ Sequence Areas
Asignificant proportion of known genetic disease-causing variants lie outside of regions able to be sequenced with high confidence, according to a study published March 2 in Genome Medicine. Nearly 20 percent of many medically important genes may be sequenced inaccurately with current technology. Genomic region, variant type, read depth, and analytical pipeline all affect accuracy […]
Asignificant proportion of known genetic disease-causing variants lie outside of regions able to be sequenced with high confidence, according to a study published March 2 in Genome Medicine. Nearly 20 percent of many medically important genes may be sequenced inaccurately with current technology. Genomic region, variant type, read depth, and analytical pipeline all affect accuracy of variant calls, the authors say, which highlight the need to improve technical benchmarks in clinical genomics.
“We hope by highlighting and scrutinizing the challenging areas of the genome, we can optimize our pipelines for greater consensus and, at the very least, provide transparency regarding our confidence level in every call,” write the authors led by Rachel Goldfeder, a Ph.D., candidate at Stanford University. “The good news is that, in this case, 77 percent of the donor’s genome was reliably sequenced using current methods. The challenge now is to focus our efforts on the other 23 percent—namely, on regions of the genome that remain elusive. Only then can we realize the full potential of precision medicine.”
The researchers used the U.S. National Institute of Standards and Technology reference genome, which had been previously sequenced with five different sequencing technologies. These five technologies were previously combined to identify genomic areas of agreement, but a reliable consensus was achieved for just 77 percent of the genome. In the present study, the researchers assessed how these “high confidence” areas of the donor’s genome overlap with 3,300 known disease-causing genes in the ClinVar and OMIM (Online Mendelian Inheritance in Man). Additionally, the researchers mapped the high confidence areas to gene regions of high medical relevance, as designated by American College of Medical Genetics and Genomics (ACMG) list of 56 medically actionable genes.
Overall, only 74.6 percent of the exonic bases in ClinVar and OMIM genes and 82.1 percent of the exonic bases in ACMG-reportable genes are found in high-confidence regions. Of the 3,300 ClinVar/OMIM genes, 593 have less than 50 percent of their total exonic base pairs in high-confidence regions. Similarly, only 990 genes in the genome are found entirely within high-confidence regions.
“The knowledge that nearly one fifth of each gene, for which laboratory directors are recommended to provide clinical reporting for every patient undergoing clinical exome or genome sequencing, would not reach consensus across different chem-
istries and pipelines, is sobering,” write the authors. “But it is a call to arms for those interested in clinical grade technical accuracy for genome sequencing. … In contrast with the lack of immediate personal implication of a false call in a discovery cohort study, a false call on a clinical report could have immediate detrimental consequences in the life of an individual, family, or disease community.”
The researchers identified 39,301 loci where the benchmark data contain a high-confidence homozygous reference call, but at least one sequencing technology incorrectly called a variant. For whole-exome sequencing (WES), poor read depth primarily drove sensitivity, with 95 percent of false negative variants (FNV) falling within regions having a read coverage of less than 10. Whereas for whole-genome sequencing (WGS), most FNVs resulted from filtering during variant calling due to their presence within difficult-to-sequence or difficult-to-call regions.
The study found that the majority of disease-causing mutations identified to date fall within easy-to-sequence areas, generally defined as stretches of unique DNA or less repetitive regions. More than 90 percent of 35 bp sequences in high-confidence regions are unique to one location in the genome compared to 47.5 percent of 35 bp sequences in low-confidence regions.
“The challenges of repetitive, paralogous sequence and structural variation complicate the analysis of clinical WGS and WES data,” explain the authors. “Not only is short-read sequencing prone to false negative or false positive variant calls due to systematic sequencing errors, but the repetitive nature of the genome introduces global mapping and local alignment challenges.”
Takeaway: For whole-exome and -genome sequencing to be clinically meaningful, improvement is needed in the technical benchmarks around sequencing. This includes development of better means to characterize more challenging parts of the genome, where a substantial portion of disease-causing variants lie.
This content is exclusive to Diagnostic Testing and Emerging Technologies subscribers
Start a Free Trial for immediate access to this article and our entire archive of over 20 years of DTET reports.