Molecular Biology and Genetics
Genome size. The most recent estimate of the size of the genome of D. melanogaster is just under 200-Mb [5]. Of these 200-Mb, about one-third is heterochromatic, contained on the Y-chromosome (of the male) and the chromosomal regions that surround the centromeres. About half of these heterochromatic sequences are satellite sequences, of very simple sequence, and about half are complex sequences largely, but not entirely, composed of transposable elements.
Sequences. D. melanogaster is one of the few multicellular organisms to have its genome "completely" sequenced. This was first published in 2000 [6] as the result of a novel collaboration between a commercial company and the academic sector (see [7]). Since then efforts by researchers in Berkeley have improved and revised this sequence, and the current version (R5) is available from GenBank with the following accession numbers: AE014134, AE013599, AE014296, AE014297, AE014135 and AE014298. The complete mitochondrial genome sequence is available as GenBank:U37541. {note 1}.
There are over 765,000 other sequence records for D. melanogaster available from GenBank as well as over 31,000 protein sequences available from the protein sequence databank UniProt.
Despite this wealth of data the genome of D. melanogaster is not yet complete. The sequencing of the complex portion of the heterochromatin is still in progress in Berkeley <http://www.dhgp.org/>, and the simple sequence satellite DNA will probably never be fully sequenced.
In addition to D. melanogaster, genomic sequences have been achieved for 11 other members of the family Drosophilidae, including D. simulans, D. yakuba and D. erecta, closely related sibling species of D. melanogaster [8].
What distinguishes the genomic sequence of D. melanogaster from that of many other species is the depth and detail of its annotation. Within the sequenced chromosome arms there are now only seven sequence gaps (all numbers will be from Release 5.5 of January 2008) and 15,186 genes are located to genome. The great majority of these (14,146) encode proteins, the remainder non-coding RNA species (e.g., transfer RNAs, ribosomal RNAs, small nucleolar RNAs, microRNAs). The annotation of this genome is the responsibility of FlyBase, the community database for Drosophila genetics and genomics.
About 9,000 of the 15,000 or so genes of D. melanogaster have been identified by experimentally induced mutant alleles. In recent years these mutations have been induced using genetically engineered transposable elements and large-scale screens to disrupt every gene in this species are underway <http://flypush.imgen.bcm.tmc.edu/pscreen/>.
