Wednesday, June 13, 2007

Introduction to Bioinformatics

In the last few decades, advances in molecular biology and the equipment available for research in this field have allowed the increasingly rapid sequencing of large portions of the genomes of several species. To date, Various model organisms (mouse, Saccharomyces cerevisiae (yeast), Caenorhabditis elegans and Drosophila , Arabidiopsis thaliana (plant), rice, Escherichia coli and Human Genome Project are completely sequenced.
bioinformatics can be defined as the application of information technology to biology. It aids in the chromosomal localization of genes, rapid searching to find sequences similar to ones of interest, gene identification and searching of scientific literature. in recent years there has been a huge growth in the amount of biological data being generated. In just a few years we will have the entire human genome sequence. The achievement of all the sequencing is the easy part; the difficult task will be trying to interpret this vast array of sequence data and determine its functions. This should ultimately help us to understand how the human and other organisms work, and provide us bioinformatics is to store biological data in databases and provide computing tools to access and analyse this data.

Bioinformatics is a relatively new field that integrates biology with Information science. As the number and size of databases of biochemical and genetic information have' increased over the last few years, it has become important to organize the information and to develop software to understand the significance of the information. This requires the use of computer systems and the development of an interface that makes the data available and unable to scientists all over the world.

Comparing homologous genes within or between organisms (comparative genomics), particularly if they are distantly related by evolution, will help sequence analysis for intron/exon boundaries, functionally important parts of a translated protein, and regions of expression control (e.g. promoter sites). Preliminary analysis of genomic regions in different organisms has shown conservation of homologous genes in the same relative areas (synteny), although the gene order may vary. If a gene of interest has been found and chromosomally located in one organism, synteny could point to a chromosomal location for investigation in another organism. It should be noted that it is often easier to find a gene in a compact genome, e.g. the Fugu fish, rather than in a human, mouse or Zebra fish. This is because the Fugu fish has essentially the same number of genes as in the human genome, but generally has far less ‘junk’ DNA between genes and shorter introns.

Popular sequence databases, such as GenBank, DDBJ, and EMBL, have been growing at exponential rates. This information has to storage, organization and indexing of sequence information. Computer and information science has been applied to this mass of biological data to produce what is often called the field of bioinformatics or computational molecular biology.
Although bioinformatics is a recent term applied to the field, the development and building of databases, development of algorithms, and their use for biological discovery using sequence analysis has been underway for more than 30 years. The sequences in recent years has placed a great demand and impetus for the development, annotation, curation, and analysis of the sequence information. The importance and use of bioinformatics has increased dramatically and is now an invaluable tool required in all disciplines of biology and its applications.

Bioinformatics is needed to handle the enormous amount of data being generated by researchers identifying the lengthy DNA sequences of humans, plants, animals, and microorganisms -- life's blueprint -- and other biological data. DNA and protein sequences are particularly amenable to computer analysis, since they can be represented by strings of letters, which computers are very apt to deal with. A DNA sequence is a string of 4 letters (A, C, G and T), and a protein sequence can also be represented by a string of 20 letters, each of which represents an amino acid. Stored digitally, in computers worldwide, are trillions of pieces of information generated by emerging technologies in molecular biology.

A major task in computational molecular biology is to "decipher" information contained in biological sequences. Since the nucleotide sequence of a genome contains all information necessary to produce a functional organism, we should in theory be able to duplicate this decoding using computers. A bottleneck between the large volume of sequence data readily generated and its meaning is now present which needs to be overcome to facilitate our understanding and breakthroughs in medicine, agriculture, and environmental sciences.
Bioinformatics is an emerging scientific discipline representing the combined power of biology, mathematics, and computers. It is the application of computer technology to the management of biological information. The ultimate goal of the field is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned.

No comments: