• Phylogenomics: Molecular Evolution in the Genomics Era

      Seetharam, Arun Somwarpet (2012-10-19)
      Evolutionary studies in recent years have been transformed by the development of new, powerful techniques for investigating many mechanisms and events of molecular evolution. Large collections of many different complete genomes now available in the public domain offer great advantages to genomic scale evolutionary studies. Phylogenomics, a term often used to describe the use of genomic scale data to infer species phylogeny or to predict protein function through evolutionary history, is greatly benefitted by the revolutionary progress in DNA sequencing technology. In the present study we developed and utilized various phylogenomic methods on large genome-scale data. In the first study, we applied Singular Value Decomposition (SVD) analysis to reexamine current evolutionary relationships for 12 Drosophila species using the predicted proteins from whole genomes. An SVD analysis on unfiltered whole genomes (193,622 predicted proteins) produced the currently accepted Drosophila phylogeny at higher dimensions, except for the generally accepted, but difficult to discern, sister relationship between D. erecta and D. yakuba. Also, in accordance with previous studies, many sequences appear to support alternative phylogenies. In this case, we observed grouping of D. erecta with D. sechellia when approximately 55% to 95% of the proteins were removed using a filter based on projection values or by reducing resolution by using fewer dimensions. In the second study, we simulated restriction enzyme digestions on 21 sequenced genomes of various Drosophila species. Using the fragments generated by simulated digestion from the predicted targets of 16 Type IIB restriction enzymes, we sampled a large and effectively arbitrary selection of loci from these genomes. The resulting fragments were then used to compare organisms and to calculate the distance between genomes in pair-wise combination by counting the number of shared fragments between the two genomes. Phylogenetic trees were then generated for each enzyme using this distance measure, and the consensus was calculated. The consensus tree obtained agrees well with the currently accepted tree for these Drosophila species. We conclude that multi-locus sub-genomic representation combined with next generation sequencing, especially for individuals and species without previous genome characterization, can improve studies of comparative genomics and the building of accurate phylogenetic trees. The third study utilized the relatively new Daphnia genome in an attempt to identify 40 orthologous groups of C2H2 Zinc-finger proteins that were previously determined to be well conserved in bilaterians. We identified 58 C2H2 ZFP genes in Daphnia that belong to these 40 distinct families. The Daphnia genome appears to be relatively efficient with respect to these well-conserved C2H2 ZFP, since only 7 of the 40 gene families have more than one identified member. Worms have a comparable number of 6. In flies and humans, C2H2 ZFP gene expansions are more common, since these organisms display 15 and 24 multi-member families respectively. In contrast, only three of the well-conserved C2H2 ZFP families have expanded in Daphnia relative to Drosophila, and in two of these cases, just one additional gene was found. The KLF/SP family in Daphnia, however, is significantly larger than that of Drosophila, and many of the additional members found in Daphnia appear to correspond to KLF 1/2/4 homologs, which are absent in Drosophila, but present in vertebrates. The last study was conducted to investigate the conservation and distribution of 38 C2H2 ZNF gene families in Eukaryotes. We combined two popular approaches for homolog detection, Reciprocal Best Hit (RBH) and Hidden–Markov model (HMM) profile search, on a diverse set of complete genomes of 124 eukaryotic species ranging from excavates to humans. We succeeded in identifying 3,675 genes as distinct members of the 38 C2H2 gene families. This largely automated technique is much faster than manual methods and is able to detect homologs accurately and efficiently among a diverse set of organisms. Our analysis of the 38 evolutionarily conserved C2H2 ZNF gene families revealed a stepwise appearance of ZNF families, agreeing well with the phylogenetic relationship of the organisms compared and their presumed stepwise increase in complexity.