Quantifying rare somatic mutations is essential for researching cancer formation, progression, and treatment. For example, identifying low-frequency mutations in circulating tumor DNA can aid in detecting minimal residual disease and predicting cancer recurrence earlier than traditional methods. Furthermore, detecting these mutations can guide personalized approaches, helping target the specific genetic alterations driving tumor growth.
The introduction of unique molecular identifiers in the adapter barcodes used in these workflows has reduced the contribution of sequencing errors during rare mutation detection. However, an aspect often overlooked is suboptimal adapter choice, index design and the presence of low levels of index crosstalk those barcodes due to the limitations of current chemical synthesis methods. With researchers pushing to detect mutations well below VAF<0.1% this can be a source of interference, potentially leading to misalignment of sequencing reads and data misinterpretation.
In this document we will review several aspects that contribute to meet the specifications required for sensitive applications and that are implemented in Revvity’s NEXTFLEX® Adapters.
Considerations related to adapter design
Full length, ligation-based adapter
Adapter ligation technology has long been used for its high coverage uniformity, precise strand information, and reliable library preparation. Ligation-based methods for barcode addition can simplify the workflow by eliminating the need for additional PCR step, reducing the risk of contamination and errors. These methods are often critical for degraded samples, as they do not rely on the efficiency of amplification. Revvity’s UDI-UMI barcodes are added to the insert by ligation, and are full length, enabling PCR-free workflow. They are compatible with any TruSeq® style library prep kit.
Unique Dual Index
Unique Dual Indexes (UDIs) provide a unique two-index signature (i5 and i7) for sample identification. UDIs were introduced in the field to mitigate the problem of index hopping, which is particularly prevalent in instruments with patterned flow cells, such as the NovaSeq™ 6000 system. There are a few features that need to be considered about UDIs for best results:
- The number of distinct barcodes increases exponentially with length. On the other hand, longer indices require more sequencing cycles which slightly reduces the overall throughput for the actual genomic regions of interest; therefore, the ideal length is a balance between multiplexing and sequencing efficiency. The current set of adapters, with an index length of 10-base, is the shortest length that can be expanded up to 1,536 different barcodes, making them a suitable option in case ultra-high multiplexing is required.
-
For many applications, 6- to 8-base indices are common. Longer barcodes have higher Hamming distance, a measure of the number of different or mismatched bases between two indexes of equal length. This makes long indexes more tolerant to sequencing errors (such as miscalls or insertions/deletions). If the index is too short, sequencing errors could result in incorrect sample assignment, especially if the barcodes differ by only one nucleotide. Our barcodes are designed to have Hamming distance ≥ 3 between any index of the set (see example below):
Index A: GATTACAATT
Index 2: GAATACGATA
- Homopolymers (repeated identical nucleotides) in barcode indexes are known to increase the likelihood of base-calling errors. This is because sequencers like Illumina often struggle to accurately determine the length of such repetitive sequences. Homopolymers can also introduce biases during PCR amplification steps, where repetitive sequences might amplify at different rates, leading to uneven representation of certain barcodes in the sequencing output. To minimize these effects our do not contain homopolymers with length >2 bases.
- Our barcodes are colour-balanced even at low multiplexing. Illumina's sequencing-by-synthesis technology involves the use of different fluorescence signals for each base. If index sequences are not balanced across different nucleotide types (A, T, C, G), especially in the initial cycles, the signals can be skewed. This results in issues such as reduced cluster identification and data loss. These issues are particularly prominent with low-plex pools, where fewer barcodes are used, as high-plex pools are naturally diverse.
Unique Molecular Identifier
A Unique Molecular Identifier (UMI) is a short, random or quasi random sequence of nucleotides that is added to each during library preparation. Its purpose is to uniquely tag individual molecules before amplification, allowing for the identification and correction of PCR errors and duplicates during data analysis.
The most important consideration about UMI is the number of possible combinations they offer. Revvity UMIs are 9-base long, which means they provide 49= 262,144 combinations, ideal to capture the complexity of targeted sequencing studies. Longer UMI offer higher diversity but at the expense of additional sequencing cycles.
Considerations related to the manufacturing process
Quality requirements for indexed barcoded are higher than ever. Revvity has adopted several procedures to ensure the quality of our adapter barcodes matches the needs of low frequency variant detection studies:
- Traditional purification methods such as HPLC are effective at increasing the amount of the desired full-length product by separating it from shorter, incomplete sequences. However, they are incapable of reducing cross-contamination that can occur in co-synthesized oligos. Even low levels of cross-contamination can lead to barcode misalignment. In addition to HPLC, Revvity uses a proprietary method to minimize the risk of contamination occurring when multiple oligos are synthesized together. This method has been shown to reduce index crosstalk (mixing of sequences) to levels as low as 0.01%.
- Barcode purity is determined by sequencing; libraries are prepared using a collection of synthetic spike-ins (one per each barcode). Library reads associated with each spike-in should match only the barcode associated with that spike-in. No mismatching errors are allowed.
- Automated plating. Our barcodes are arrayed column-wise in 96-well plates using state-of-the-art automated liquid handlers. The provided volumes are optimized for seamless integration with manual and automated workflows.
Conclusion
As a result of their design and manufacturing processes, the libraries generated using the NEXTFLEX® Barcodes have a consistent percentage of usable reads, with low level of mis-assigned reads that facilitate detection of low frequency variants and other rare events.
References:
- Enzymatic Methods for Mutation Detection in Cancer Samples and Liquid Biopsies. https://doi.org/10.3390/ijms24020923
- Achieving robust somatic mutation detection with deep learning models derived from reference data sets of a cancer sample. https://doi.org/10.1186/s13059-021-02592-9
- Calibration-free NGS quantitation of mutations below 0.01% VAF. https://doi.org/10.1038/s41467-021-26308-6
- Sequencing error profiles of Illumina sequencing instruments. https://doi.org/10.1093/nargab/lqab019