AI in Genomics: The Next Generation
How artificial intelligence and machine learning tools stand to benefit clinical labs throughout the genomics pathway
Over the past 50 years, clinical lab genomics has evolved from analog evaluation to the domination of digital “big data.”1 It’s not only analysis techniques that have advanced, but also the datasets themselves, whose growing scale and complexity are beginning to surpass the capabilities of both conventional analyses and the humans undertaking them. To overcome any potential barriers to effective, timely, and reproducible results, the spotlight is now on emerging technologies such as artificial intelligence (AI) and machine learning (ML). The ability of these tools to parse large-scale, complex, and heterogeneous datasets makes them promising candidates for addressing the increasing need for new computational approaches capable of analyzing the next generations of clinical laboratory genomic data.
Potential across the pipeline
The volume of research investigating both genomics and AI has increased dramatically over the past three decades.2 Although AI technologies’ predictive and pattern-recognizing abilities may make them invaluable in the post-processing analysis and interpretation phases of the genomic data pipeline—and indeed, this is where most emerging activity is currently happening, from noninvasive prediction of embryo ploidy during IVF3 to point-of-care screening for genetic syndromes in children4—their utility may extend even further. “The potential of AI lies in the analysis of large amounts of data, not only to examine genomic data more quickly and in greater depth, but also to integrate different data sources to complement this analysis,” explains a team of genomic experts from PHG Foundation, a non-profit think tank that works to inform policy to make science, in particular genomics, work for health.
Prior to sequencing, AI could aid the collection and preparation of large, heterogeneous, and complex datasets containing many different attributes, measurements, and types of data and information. The technologies may also help during sequencing and processing; examples of this are some pipelines’ partial reliance on ML to accelerate data processing5 or the use of AI methods in improving variance identification accuracy.6 Despite these early applications, progress is still needed before AI’s potential can be fully realized. “The use of AI is, at present, limited, mostly to research contexts,” PHG Foundation says. “The existing and emerging applications of AI are resource and pathway optimization, annotation of variant data, and prediction of the functional consequences of variants.”
The current early stage of AI use in genomics may pose a challenge in determining where these technologies can be suitably applied. “Because the vast majority of AI activity in genomics is within the research phase, we suspect that there are lots of aspects of genomics that are not suitable for AI/ML, at least in its current form,” PHG Foundation says. One area in which they believe caution is needed is the use of AI for clinical genomics result feedback. “We’re sure that companies are developing AI chatbots for sharing genetic test results and that this will help address a bottleneck, because genetic counseling services are in high demand. But, given the complexity of genomic information, its implications for patients and their families, and the difficulty we know patients have in understanding risk, it seems like it might dehumanize the clinical interaction and lead to increased likelihood of misunderstanding and confusion.”
Untangling the web of issues
Alongside AI’s growing promise and capabilities in clinical genomics, it presents a network of interconnected problems for clinical laboratories and the wider health systems in which they work—what PHG Foundation calls the AI web of issues. Broadly, these include data, ethical, legal, regulatory, infrastructural, and communication challenges. Although the clinical laboratory has a part to play in resolving these issues, this won’t be done in isolation, especially because the AI web of issues is formed from problems that also need to be resolved within the wider health system. “Managing these issues system-wide will require collaborative and multidisciplinary working,” PHG Foundation says. “To support the implementation of AI, the needs of AI systems and users need to be considered when designing wider health data systems.”
One key area for collaboration is the development of effective healthcare infrastructure that facilitates AI use in genomic medicine. “Technology development will only get us so far if data infrastructure development does not also keep pace,” PHG Foundation explains. “For example, in the UK, data infrastructure is recognized as an issue that is integral to the delivery of a digital NHS [National Health Service], and is something the government aims to address in a recently launched consultation.7 Establishing robust system-wide health infrastructure will not only support data sharing within the health system, but also have a positive impact on issues relevant to AI, such as the quality of training datasets and the establishment of data standards.”
Analyzing AI adoption
There are things labs need to be aware of when adopting AI technologies into their genomic workflows, many of which overlap with general AI principles. PHG Foundation highlights the following areas labs should consider when implementing these technologies to prevent or mitigate possible issues:
-
- Datasets. “The performance and accuracy of a machine learning model is highly dependent on the quality and reliability of the training data. Healthcare datasets are noisy, complex, heterogeneous, poorly annotated, and generally unstructured.”
-
- Education. Labs need to provide staff with training on AI tools and their integration into current genomic systems to facilitate the effective adoption of AI.
-
- Transparency. “Many AI algorithms, particularly deep learning models, operate as ‘black boxes,’ making it difficult to understand how decisions are made. This lack of transparency can hinder trust among clinicians and patients, who need to understand the basis for clinical decisions.”
-
- Accountability. “There are challenges around determining accountability for errors or adverse outcomes resulting from AI-driven decisions. It raises questions about whether responsibility lies with the developers, the laboratory, or the clinicians using the technology.”
- Bias. “AI/ML models can inadvertently perpetuate or amplify biases present in training data. This can lead to disparities in diagnostic accuracy and treatment recommendations for different populations.”
Although these issues must be addressed before AI can truly make the jump from research to clinical use in medical genomics, the area stands to gain much. “It is a matter of getting the data in the right form—compatible with the AI algorithm being deployed, correcting for technical biases, and having a sufficient amount of data for AI to derive inferences from,” PHG Foundation says. “The adoption of an AI algorithm also relies on there being value over and above any existing non-AI bioinformatic tool. But there is potential in terms of the analysis of large datasets and in understanding the complex links between genomics, environment, behavior, and disease.”
References:
-
- S Aradhya et al. Applications of artificial intelligence in clinical laboratory genomics. Am J Med Genet C Semin Med Genet. 2023;193(3):e32057. doi:10.1002/ajmg.c.32057.
-
- PHG Foundation. Artificial intelligence for genomic medicine. May 20, 2020. https://www.phgfoundation.org/wp-content/uploads/2024/02/Artifical-intelligence-for-genomic-medicine.pdf.
-
- J Barnes et al. A non-invasive artificial intelligence approach for the prediction of human blastocyst ploidy: retrospective model development and validation study. Lancet Digit Health. 2023;5(1):e28–e40. doi:10.1016/S2589-7500(22)00213-8.
-
- AR Porras et al. Development and evaluation of a machine learning-based point-of-care screening tool for genetic syndromes in children: a multinational retrospective study. Lancet Digit Health. 2021;3(10):e635–e643. doi:10.1016/S2589-7500(21)00137-0.
-
- Illumina. DRAGEN sets new standard for data accuracy in PrecisionFDA benchmark data. Optimizing variant calling performance with Illumina machine learning and DRAGEN graph. January 12, 2022. https://www.illumina.com/science/genomics-research/articles/dragen-shines-again-precisionfda-truth-challenge-v2.html.
-
- ND Olson et al. Variant calling and benchmarking in an era of complete human genome sequences. Nat Rev Genet. 2023;24(7):464–483. doi:10.1038/s41576-023-00590-0.
-
- GOV.UK. Protecting and enhancing the security and resilience of UK data infrastructure. December 14, 2023. Protecting and enhancing the security and resilience of UK data infrastructure – GOV.UK.
Subscribe to Clinical Diagnostics Insider to view
Start a Free Trial for immediate access to this article