Revealing the genome's secrets: advances and applications of next-generation sequencing


Sequencing technology refers to the processes and techniques used to determine the order or sequence of nucleotides in a given DNA or RNA sample. The field of DNA sequencing has undergone incredible advancements since its inception in the 1970s. Next generation sequencing (NGS) refers to a group of high-throughput sequencing technologies that have emerged in the last two decades and have significantly reduced the cost and time required to sequence large amounts of DNA or RNA. The advent of modern sequencing technologies has revolutionized biology and medicine by providing unprecedented access to the genetic information of organisms. Today, sequencing is used in a wide range of applications, changing the way we approach healthcare, forensics, and agriculture. In this article, we will explore the history of DNA sequencing technologies, its applications, and the basis of the procedure, with some tips to ensure your DNA sequencing quality.

From Sanger to NGS – Development of sequencing technologies

The first DNA sequencing technology was developed by two scientists, Frederick Sanger and Walter Gilbert, in the 1970s. The Sanger sequencing method provided the scientists with the capacity to read the nucleotide sequence in a DNA molecule. For more than three decades, this revolutionary technology was the go-to method for DNA sequencing, leading to several groundbreaking discoveries, including the human genome project. Sanger sequencing services are still offered by numerous DNA sequencing core facilities for applications where high throughput is not necessary. The most frequent applications are for individual sequencing reactions that use a particular DNA primer on a specific template, for example to verify your plasmid constructs or PCR products.


 In the 1990s, a new technology known as Pyrosequencing was developed. The method involved the detection of released light when a nucleotide was added to a growing DNA strand and allowed the real-time monitoring of the DNA synthesis. This technology enabled faster and less expensive procedure, making DNA sequencing more accessible to researchers. Further, this technique laid the ground for one of the first high-throughput DNA sequencing instruments. In the early 2000s, the development of next-generation sequencing (NGS) technologies revolutionized the genetic field. These allowed massive parallel sequencing of DNA, which dramatically increased the speed and throughput of sequencing. Illumina, 454 Life Sciences, and Ion Torrent are some of the prominent companies that commercialized NGS technologies. In addition to the development of new sequencing platforms, there have been significant improvements in the library preparation and data analysis steps of sequencing.


NGS technologies made sequencing of the human genome more affordable and accessible than ever before. The Human Genome Project, which took over a decade and costed billions of dollars, was completed in 2003. Today, a human genome can be sequenced in a matter of days at a fraction of the cost. NGS technologies evolve at an unprecedented speed and disrupt all areas of life sciences and beyond. Recent examples, like the development of widely available clinical testing, monitoring of viral spread, and development of innovative vaccines during the COVID-19 pandemic, have been only possible due to NGS derived technologies.

Transforming research and clinical practices: NGS applications

The downstream applications of sequencing are wide-ranging, spanning from fundamental research to clinical applications. High-throughput sequencing has transformed the field of genomics and enabled comprehensive analysis of genetic variation, gene expression, epigenetics, and microbiomes. These advances have opened new avenues for understanding biology and improving human health.


- Genome assembly and annotation: Sequencing data can be used to reconstruct whole genomes or transcriptomes, which can provide valuable insights into gene regulation, evolution, and diversity. Genome assembly involves the reconstruction of a complete genome from the short DNA fragments generated by sequencing. Once the genome has been assembled, it can be annotated to identify genes, regulatory regions, and other functional elements. Genome assembly and annotation are critical for understanding the genetic basis of diseases and for identifying new drug targets.

- Variant detection and analysis: Variant calling involve the identification of differences between the sequenced sample and a reference genome, such as single nucleotide polymorphisms (SNPs), insertions, deletions, and structural variants. These variants can be compared between individuals or populations to identify genetic dissimilarities and patterns of inheritance. Variant analysis can provide valuable information for genetic research, population genetics, and personalized medicine.

- Epigenomics: Sequencing technology allows to study epigenetic modifications, such as DNA methylation, histone modifications, and chromatin accessibility. Epigenetics play a critical role in gene regulation, development, and disease, and sequencing-based epigenomics approaches have enabled comprehensive profiling of these modifications on a genome-wide scale.

- Metagenomics: NGS can be used to study complex microbial communities, such as those found in the human gut or soil. Metagenomics involves the sequencing of all the genetic material in a sample, including DNA from multiple organisms. Data provide insights into microbial diversity, function, and interactions, and has applications in fields such as ecology, agriculture, and medicine.

- Transcriptomics: RNA sequencing (RNA-seq) has become a popular method for transcriptomics analysis due to its high sensitivity and dynamic range. RNA-seq data allow the analysis of gene expression, alternative splicing, and non-coding RNA in a tissue or cell type. Transcriptomics can provide valuable insights into cellular processes, disease mechanisms, and drug targets.

- Clinical applications: Sequencing-based approaches can provide comprehensive and accurate genetic information that can inform diagnosis, treatment decisions, and disease risk assessment. Thereby, these data can be used for clinical applications such as cancer genomics, infectious disease diagnosis, and genetic screening.


From sample to data: nucleic acid sequencing workflow

The DNA sequencing workflow involves several steps that collectively generate nucleic acid sequence data (Figure 1). The basic procedure involves:

1.      Sample preparation: The first step in the sequencing workflow involves extracting DNA or RNA from the organism of interest. Nucleic acid can be isolated from blood, tissue, cells, plants, or other sources using commercially available kits. As we discussed in a previous article, the correct handling and storage of your input material will influence the nucleic acid integrity. Further, we recommend including quality control step:  The quality and quantity of the nucleic acids will directly affect the quality of the sequencing data, so it is important to ensure that the DNA/RNA are intact and free from contaminants.


2.          Library construction: Library construction involves the preparation of nucleic acid fragments for sequencing. In this step, the DNA or RNA is fragmented into smaller pieces, and RNA will be transcribed to cDNA. Next, adapters are added to the ends of the DNA fragments. The adapters serve as binding sites for the sequencing machine and allow the fragments to be amplified by PCR to create a library of millions to billions of identical fragments. In this step, the size and quality of the DNA fragments and the efficiency of the adapter ligation will affect the quality of the sequencing data. After library preparation, quality control is a critical step to ensure that the library is of sufficient quality for sequencing. We recommend three main quality control (QC) steps prior to your library preparation. First, to determine the concentration of the nucleic acid fragments, to ensure your library has enough DNA for sequencing. Second, it is also essential to analyze the size distribution of the DNA fragments. This step is crucial to ensure that the library contains a broad range of fragment sizes without any bias towards specific sizes. Third, an additional quality control measure that ensures high-quality sequencing data is assessing the adapter dimer contamination. Adapter dimers form when sequencing adapters ligate to each other rather than to the target DNA fragments, leading to poor sequencing performance. To assess this kind of contamination, a quantitative PCR (qPCR) assay can be used to quantify the amount of adapter dimer in the library.


Figure 1. Nucleic acid sequencing workflow. The basic procedure involves sample preparation, library construction, sequencing, and data analysis. QC is recommended after each step to ensure the accuracy of your sequencing results.

3.          Sequencing: Once the library has been constructed, the DNA fragments can be sequenced using a variety of platforms and methods. The choice of the sequencing platform will depend on factors such as the length and complexity of the DNA fragment, the depth of coverage required, and the budget available. The output of the sequencing machine is a series of short reads that represent the nucleic acid sequence of the fragments in the library. The most important quality control steps after sequencing typically involve evaluating the quality of the raw sequencing data and could be considered as part of the data analysis. The following are some of the most important processes for short read NGS:

a.    Quality assessment of raw data. You can assess the quality of the raw sequencing data using tools such as FastQC. These tools provide information on base quality scores, sequence accuracy, and other quality metrics.

b.    Removal of low-quality reads, adapter sequences and PCR duplicates. You should remove low-quality reads with low base quality scores, adapter sequences, and PCR duplicates introduced during library preparation using quality filtering and adapter trimming tools such as Trimmomatic, Cutadapt and Picard. Failure to remove adapter sequences can result in increased sequencing errors and poor mapping rates.

c.       Alignment to a reference genome. If available, the sequencing reads are aligned to a reference genome to identify potential alignment issues such as mismatches, insertions, or deletions.

d.        Evaluation of sequencing depth and coverage. It is important to assess the depth and coverage of sequencing to ensure that sufficient coverage has been achieved for downstream analysis.


4.          Data analysis and interpretation: The analysis pipeline will depend on the type of sequencing data generated and the research question being addressed. Basic analysis steps include several computational steps such as quality control (see above), read trimming, read alignment to a reference genome, variant calling, and annotation. For example, you can set the called variants into context and interpret them considering the intentional question. Then, the data are cross referenced with larger databases to identify significant deviations in the analyzed samples. Advanced analysis techniques may include de novo assembly of the genome, metagenomic analysis, or transcriptome analysis.


A revolution in science

NGS has enabled many groundbreaking discoveries in genetics and genomics, including the sequencing of the human genome and the identification of disease-causing mutations. NGS technology has also revolutionized other fields, such as microbiology, ecology, and forensic science. For example, metagenomic sequencing, which involves sequencing all the genetic material in a complex mixture of organisms, has revealed the incredible diversity of microbes in our environment and their potential roles in human health and disease. In forensics, DNA sequencing can be used to identify individuals, track the origin of a sample, or even reconstruct a crime scene.

BioEcho offers a wide range of products for nucleic acid extraction, which adapt to different sample types. We ensure high quality DNA/RNA samples for your sequencing applications, obtained from simple and shorter workflows, which let you take time to assess your library and sequencing data excellence.

Stay tuned for our next articles, in which we will examine different high-throughput DNA sequencing techniques such as whole genome sequencing (WGS), whole exome sequencing (WES), and targeted region sequencing (TRS) and its applications. We will also review different sequencing platforms, and which one could adapt to your necessities.



References

 (1) Sanger, F., Nicklen, S. and Coulson, A.R., 1977. DNA sequencing with chain-terminating inhibitors. Proceedings of the national academy of sciences, 74(12), pp.5463-5467.

 (2) Barton E. Slatko, Andrew F. Gardner, and Frederick M. Ausubel, 2019. Overview of Next Generation Sequencing Technologies. Curr Protoc Mol Biol. 2018 April; 122(1).

 




Author: Dr. Laura Torres Benito

Laura is a passionate scientific communicator, with an extensive experience as a researcher in several fields, including human genetics, biotechnology, and neuroscience. Since joining BioEcho in 2022, she enjoys creating appealing content and material for our customers and interested parties. Laura likes practicing yoga, cooking wholesome Mediterranean food, and playing the piano. She also loves spending weekends on the nature, rock climbing and hiking. 

Sign up now!
For exclusive application notes, event tipps, opportunities, and new product announcements. Our newsletter is sent ~ 4x per year.
Newsletter