Skip to main content
small-rna-blog1920x640

Blog

NGS
3 min read

Navigating the complexities of miRNA sequencing data analysis.

Help us improve your Revvity blog experience!

Feedback

Next-generation sequencing (NGS) has emerged as a powerful tool for studying small RNA. Small RNAs, typically less than 200 nucleotides in length, predominantly consist of non-coding RNAs engaged in cellular regulatory mechanisms. Many are even shorter, such as microRNAs (miRNAs) with a characteristic length of 18-30 nucleotides. These miRNA exhibit altered expression profiles in different disease states, rendering them potential non-invasive biomarkers for diagnosing and monitoring various conditions, including cancer. 

Analysis of the data obtained for miRNAs require specialized approaches. This blog describes some of these challenges and present popular pipelines used to address them. 

We can distinguish 4 stages for miRNA analysis:
 

  • Preprocessing of the raw reads
  • Alignment to the genome and annotated transcriptome
  • Quantitation
  • Normalization of the expression data.

The main difference between miRNA and other RNAs is their short length, typically 18–30 nucleotides. With fewer nucleotides, the probability of reads mapping to multiple locations in the genome increases, a phenomenon known as multi-mapping. Adding to this, mapping algorithms rely on sequence matching, and with shorter molecules, even a single sequencing error can have a disproportionately large effect on mapping.

The small RNA population is diverse, encompassing miRNAs, siRNAs, piRNAs, and tRNA fragments, each with different biogenesis pathways and functions. Many small RNAs share similar sequences, making it difficult to distinguish between closely related molecules, especially when studying non-human samples for which well curated, high-quality miRNA databases may not exist. Furthermore, small RNAs frequently undergo post-transcriptional modifications like methylation (2′-O-methylation for example has been reported frequently in plant miRNA), which can affect library preparation efficiency and subsequent analysis.

When analyzing samples from human or mouse origins, we recommend using exceRpt, a comprehensive pipeline free of charge that addresses all stages of miRNA analysis. This tool effectively manages variable contamination and the often-poor quality data derived from low-input small RNA-seq samples, such as those obtained from extracellular preparations. Additionally, exceRpt is fully capable of processing data from more standard cellular preparations.

For those working with samples from a different origin, or those who prefer controlling each step in the analysis see below some of the most popular tools.

Preprocessing of the raw reads

Due to the small insert size, adapters used during library preparation can constitute a significant portion of the read. Standard trimming tools may not effectively remove adapters from such short reads, leading to poor-quality data and false-positive results. To address this, specialized tools like Cutadapt and Trimmomatic, both free to download, are commonly used. 

Trimming instructions for NEXTFLEX® Small RNA v4 based on cutadapt commands are available from our website, under Resources tab.

Mapping and Annotation

Mapping short reads to the human genome is problematic as indicated above. To improve mapping accuracy, aligners optimized for small RNA data are employed. Bowtie2 is a popular choice because it allows precise control over alignment parameters. To handle multi-mapped reads, tools like STAR aligner with parameters adjusted for small RNA can be used. 

One of the most intricate challenges in analyzing data obtained from biofluids lies in the fact that small RNA come from various sources, including human cells and microorganisms like bacteria virus and fungi. Traditional approaches to overcome this problem involve mapping reads separately to the human genome or miRBase, followed by mapping the unmapped portion to microbial genomes (or vice versa). 

Accurately annotating miRNAs and separating them from other small RNAs like tRNA fragments or degradation products requires specific tools like miRDeep2. Overlapping features in annotation databases cause ambiguity. Reads mapped to overlapping regions can be marked as ambiguous and excluded from analysis, leading to potential data loss

Also, please note that during miRNA analysis, a substantial portion of reads is often mapped outside annotated expressed regions, which classical methods are not designed to analyze. This can result in valuable data being overlooked. 

Quantitation 

Quantifying miRNA expression requires distinguishing between miRNA family members and isomiRs (miRNA variants). Tools like isomiRage and seqBuster extend the analysis by detecting and quantifying isomiRs, although there are several other options available.

Normalization of Expression Data

Normalization is critical for accurate expression analysis. Traditional methods like Reads Per Million (RPM) may not account for compositional biases. Instead, methods like Trimmed Mean of M-values implemented in edgeR or DESeq2 are preferred.

References:
  1. Kozomara, A., Birgaoanu, M., & Griffiths-Jones, S. (2019). miRBase: from microRNA sequences to function. Nucleic Acids Res., 47(D1), D155–D162.
  2. Chen, L., Heikkinen, L., Wang, C. et al (2019). Trends in the development of miRNA bioinformatics tools, Briefings in Bioinformatics, 20(5) 1836–1852.
  3. Schmartz, G.P., Kern, F., Fehlmann, T., et al (2021).  Encyclopedia of tools for the analysis of miRNA isoforms, Briefings in Bioinformatics, 22(4), bbaa346.
  4. Li, J., Kho, A.T., Chase, R.P. et al. (2020). COMPSRA: a COMprehensive Platform for Small RNA-Seq data Analysis. Sci Rep 10, 4552.
  5. Zayakin, P. (2024). sRNAflow: A Tool for the Analysis of Small RNA-Seq Data. Non-Coding RNA, 10, 6. 
     

Questions?
We’re here to help.

Contact us