Skip to main content
bioinformatics1920x640

Blog

NGS
5 min read

The role of bioinformatics in PGTM: streamlining precision with next-generation sequencing.

Help us improve your Revvity blog experience!

Feedback

Preimplantation Genetic Testing for Monogenic diseases (PGTM) offers a powerful tool for families carrying known genetic risks, enabling them to select embryos free from specific inherited conditions. While traditional methods like PCR and FISH have been used for this purpose, the advent of Next-Generation Sequencing (NGS) has revolutionized PGTM, providing unprecedented resolution and throughput. However, this technological leap brings a data deluge, making bioinformatics an indispensable component of a reliable and actionable PGTM workflow. This post explores the critical role of bioinformatics in modern PGTM, detailing data analysis pipelines, variant interpretation tools, and how integrated software solutions streamline the data-intensive aspects of NGS-based testing.

What is Bioinformatics? The Foundation of Genomic Interpretation

Bioinformatics is the interdisciplinary field that combines biology, computer science, and statistics to analyze and interpret biological data, particularly large datasets generated by genomic technologies. In the context of genomics, bioinformatics provides the methods and tools to process raw sequencing data, identify genetic variations, annotate their functional consequences, and ultimately, derive relevant insights. Without robust bioinformatics pipelines, the vast data output from NGS would be essentially unusable, akin to having a powerful microscope but no way to focus the image.

The Bioinformatics Pipeline: A Step-by-Step Journey from Raw Data to Insight

The core of NGS-based PGTM is the bioinformatics pipeline – a carefully orchestrated series of computational steps that transform raw sequencing reads into an interpretable report. While specific implementations may vary, a typical pipeline includes the following stages:

  1. Sample Preparation and Sequencing (Brief Overview): Following DNA extraction from biopsied trophectoderm cells, libraries are prepared and sequenced using an NGS platform such as Illumina™ or Element™. This stage generates millions or billions of short DNA sequences (reads).
  2. Raw Data Quality Control (QC): This crucial initial step assesses the quality of the raw sequencing data. Metrics such as read quality scores (Phred scores), base call accuracy, and read length distribution are evaluated using tools like FastQC and Trimmomatic. Reads failing to meet predefined quality thresholds are trimmed or discarded, ensuring that subsequent analyses are performed on high-quality data. This is fundamental: poor quality input invariably leads to unreliable results.
  3. Alignment/Mapping: The filtered reads are then aligned to a reference human genome (e.g., GRCh38) using algorithms implemented in software like BWA or Bowtie2. This process determines the genomic location from which each read originated, essentially creating a digital reconstruction of the embryo's genome. Accurate alignment is paramount for reliable variant calling.
  4. Variant Calling: This stage identifies differences (variants) between the embryo's genome and the reference genome. Sophisticated algorithms, implemented in tools like GATK, FreeBayes, and SAMtools, analyze the aligned reads to detect single nucleotide polymorphisms (SNPs), insertions and deletions (indels), and, increasingly, copy number variations (CNVs). The output is typically a Variant Call Format (VCF) file, a standardized format for representing genomic variants.
  5. Variant Filtering: Not all called variants are genuine; some are artifacts of sequencing or alignment. Variant filtering applies stringent criteria (e.g., minimum read depth, quality scores, allele balance) to remove false positives and enrich for true variants. This step is critical for minimizing false-positive and false-negative results, particularly important in the high-stakes context of PGTM.
  6. Variant Annotation: This is where raw variants are transformed into meaningful information. Annotation involves adding layers of information to each variant, including:
     
    • Gene Affected: Identifying the gene(s) in which the variant occurs.
    • Functional Consequence: Predicting the variant's impact on protein function (e.g., missense, nonsense, frameshift, splice site alteration).
    • Population Frequency: Determining how common the variant is in various populations, using databases like gnomAD. Rare variants are often more likely to be pathogenic.
    • Disease Associations: Checking if the variant has been previously reported in association with specific diseases, using resources like ClinVar, HGMD, and OMIM.
    • In Silico Predictions: Using algorithms like SIFT, PolyPhen-2, and CADD to predict the deleteriousness of the variant.

Tools like ANNOVAR, VEP (Variant Effect Predictor), and SnpEff are commonly used for variant annotation.

Variant Interpretation: Navigating the Complexity of PGTM

While the pipeline identifies and annotates variants, interpretation is the crucial step that determines significance. This is where the power of bioinformatics tools, combined with expert judgment, is essential. Key considerations in PGTM variant interpretation include:

  • Known Pathogenic Variants: Databases like ClinVar and HGMD provide curated information on variants previously classified as pathogenic for specific genetic disorders. This is the first point of reference.
  • Inheritance Patterns: The bioinformatics pipeline, coupled with family history information, helps determine if the variant follows the expected inheritance pattern (autosomal dominant, autosomal recessive, X-linked). This is vital for accurate risk assessment.
  • De Novo Mutations: In some cases, particularly for dominant disorders, identifying de novo mutations (those not present in the parents) is crucial.
  • Mosaicism: PGTM requires careful assessment for mosaicism, where a variant is present in only a subset of the biopsied cells. Sensitive variant calling and filtering, coupled with appropriate read depth analysis, are critical for detecting low-level mosaicism.
  • ACMG/AMP Guidelines: The American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) guidelines provide a standardized framework for classifying variants into five categories: Pathogenic, Likely Pathogenic, Variant of Uncertain Significance (VUS), Likely Benign, and Benign. This framework is widely adopted in genetics.

Software Solutions: Streamlining the PGTM Workflow

The data-intensive nature of NGS-based PGTM necessitates sophisticated software solutions to manage, analyze, and interpret the vast amounts of data. These solutions offer several key advantages:

  • Automation: Automated pipelines reduce manual intervention, minimizing the risk of human error and improving reproducibility.
  • Scalability: NGS platforms generate massive datasets, and software solutions must be able to handle this scale efficiently. Cloud-based solutions offer significant advantages in this regard.
  • Reduced Turnaround Time: Automation and efficient data processing accelerate the analysis, providing faster results for clinicians and families.
  • Data Management: Laboratory Information Management Systems (LIMS) and dedicated bioinformatics platforms provide secure and organized storage and retrieval of patient data.
  • Reporting: Software tools facilitate the generation of clear, concise, and clinically actionable reports, summarizing the findings and variant classifications.
  • Integration: Ideally, software solutions integrate all stages of the pipeline, from raw data QC to variant interpretation and reporting, providing a seamless workflow.

The Advantages of NGS and Bioinformatics-Driven PGTM

Switching to an NGS-based PGTM workflow, powered by robust bioinformatics, offers compelling advantages over older technologies:

  • Higher Resolution: NGS provides single-base resolution, enabling the detection of a wider range of variants, including SNPs, indels, and CNVs, in a single assay. This is a significant advantage over methods like PCR, which are limited to targeting specific, pre-defined regions.
  • Increased Throughput: NGS allows for the simultaneous analysis of multiple genes or even the entire exome or genome, making it suitable for screening for a wide range of genetic conditions.
  • Improved Accuracy: When combined with a well-validated bioinformatics pipeline, NGS offers high sensitivity and specificity, reducing the risk of false-positive and false-negative results.
  • Comprehensive Analysis: NGS can identify unexpected variants, including those not previously associated with the condition being tested for, potentially providing valuable information for family planning.
  • Future-proofing: An NGS-based platform is adaptable to evolving knowledge and new genetic discoveries. As new genes and variants are identified, the existing data can be re-analyzed without the need for additional testing.

Conclusion: Embracing the Future of PGTM

Bioinformatics is not simply an adjunct to NGS-based PGTM; it is the engine that drives it. The ability to accurately and efficiently process, analyze, and interpret the vast data generated by NGS is what makes this technology so powerful. By adopting a modern, bioinformatics-driven PGTM workflow, laboratories and clinicians can provide families with the most accurate and comprehensive genetic information available, empowering them to make informed reproductive decisions. The shift to NGS represents a significant advancement in precision medicine, and a robust bioinformatics infrastructure is the key to unlocking its full potential in PGTM. The reduced cost of NGS, its scalable throughput, and comprehensive variant detection ability makes transitioning from older technologies a scientifically sound and practical decision.

Questions?
We’re here to help.

Contact us