Algorithms for DNA related analysis – Introduction

Before discussing about algorithms, I will discuss some frequent terms used in DNA analysis. I’m not a biological student but one who uses algorithms  for DNA mappings, sequencing related analysis need to understand frequently used terms around DNA

  • Amino acids
  • Proteins
  • Nucleic acids
  • Genome
  • Transcription
  • Translation
  • DNA replication

Amino acids

Amino acids are chemical compounds which are building blocks of proteins and act as intermediates in metabolism. Protein characteristics always dependent on the amino acids precise content and their sequence. It means each protein has different amino acid contents and sequence. Also the chemical properties determine protein biological activity. To understand a protein structure and its stability (protein folding), it is essential to understand amino acid structure and chemistry first.

There are 20 types of amino acids within a protein 10 of which can be produced by human and other 10 are supplied through food. Human body doesn’t store excess amino acids (those available from food) like fat and starch so they should be available through food daily.

Recent study findings on insulin resistance for diabetes type 2 patients revealed that lipids and Branched-Chain amino acids (BCAA) work together to promote metabolic diseases. The presence of BCAA related signature is predictive of incidence, progression and remission of diabetes and insulin resistance.


Proteins are complex molecules that are made up of smaller unit’s amino acids as a long chain. Proteins can be categorized into types based on their function, few of the types are

  1. Antibodies – attaches to virus to protect the body
  2. Enzymes – Assist in the formation of new molecules from the DNA information
  3. Messenger – to transmit signals between cells, tissues and organs
  4. Structural component – Used for cell structure and support.
  5. Transport/storage – Carry atoms and molecules around the body
  6. Proteins can synthesised using two ways
  7. Biosynthesis – These are synthesised in cytoplasm from encoded gene instructions
  8. Chemical synthesis – These are synthesised chemically with peptide synthesis

A simple good example is identifying antibodies in the blood for determining HIV existence in the patient’s body

Nucleic acids

Nucleic acids are essential molecules of life. They include DNA (deoxyribonucleic acid) and RNA (ribonucleic acid) which are made from nucleotides. Each nucleotide consists of 5-carbon sugar, a nitrogenous base and one or more phosphate groups and categorised as DNA or RNA depending upon the sugar component. If the sugar component is deoxyribose, then it’s a DNA or if sugar component is ribose, then it’s a RNA.

DNA consists of two long strands of nucleotides which are anti-parallel in nature (opposite direction)

RNA plays an important role in protein synthesis and is divided into three types

  1. Transfer RNA (tRNA)
  2. Messenger RNA (mRNA)
  3. Ribosomal RNA (rRNA)

DNA and RNA contains sequence of genetic instructions that are essential for encoding cells, organs etc.


A Genome is encoded DNA genetic material of an organism which contains all the information needed for building and maintaining the organism. It is estimated humans contains more than 3 billion DNA base pairs in all the cells that have a nucleus. A genome contains set of instructions needed to build cells. Each one will have two types of

Genome sequencing and matching helps the analysts in predicting and matching like DNA. It is nothing but decoding genetic sequence in the form four letters A C G T. With newer technologies, sequencing became very cheaper when compared a decade ago.

Scientists uses genome compositions to study evolution history of genomes by comparing the proportion sizes of repetitive DNA and non-repetitive DNA.

Transcription is a process of creating a copy of mRNA (messenger RNA) from a DNA gene sequence. This mRNA enters cytoplasm after leaving cell nucleus. Cytoplasm direct protein synthesis according to encoded instructions stored in mRNA.

DNA contains two strands sense and antisense. mRNA is actually a single stranded unlike DNA and a compliment of antisense strand of DNA (template strand).

Transcription is done with the following steps

  1. Pre-initiation –
  2. Initiation
  3. Promoter clearance
  4. Elongation
  5. Termination

There is a process called reverse transcription in which RNA is transcribed into DNA (opposite of DNA to RNA). This behaviour usually found in viruses such as HIV etc.


Translation is a process of decoding and translating instructions from messenger RNA (mRNA) to direct protein synthesis. It converts gene sequence to amino acids sequence with the help of ribosomes and transfer RNA which in turn forms proteins.

Translation is done three steps

  1. Initiation
  2. Elongation
  3. Termination

Detailed steps involved in translation

  1. DNA transcribes genetic information by creating mRNA
  2. mRNA leaves cell nucleus and enters into cytoplasm
  3. mRNA carries genetic instructions from chromosomes to ribosomes
  4. Ribosomes assembles and translates genetic information to sequence of amino acids provided by tRNA
  5. A protein is formed based on amino acid sequences

DNA replication

DNA replication is process of replicating two identical DNA helices from the original DNA helix. This process is required in every living organism to carry out the following functions

  1. To build and regulate the cell from the encoded genetic information
  2. Genetic information transmission from one generation to other

The replicated helices are called as daughters and original DNA helix is called as parent. Parent strand is divided into two strands and a complimentary strand is created for each separated strand thus forming a daughter. This process is called as semi-conservative because each daughter contains one parent strand and one complimentary strand of parent strand.

By studying DNA replications, researchers were able to find the relations between certain disease behaviours. A recent example of this type of study is presence of DNA replication stress in human cancers. Human cancer is characterized by genomic instability. From the study, it was proved oncogene-induced DNA replication stress rises genomic instability in human cancers (increases deletions in common fragile sites). As a result genome copy number changes.

In the next part, I’ll discuss various algorithms and tools used for DNA analysis.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.