|
|
Sycamore Scholars at Indiana State University >
ISU - Electronic Theses and Dissertations (by Department) >
Biology >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/10484/4582
|
| Title: | Phylogenomics: Molecular Evolution
in the Genomics Era |
| Authors: | Seetharam, Arun Somwarpet |
| Issue Date: | 19-Oct-2012 |
| Abstract: | Evolutionary studies in recent years have been transformed by the development of new,
powerful techniques for investigating many mechanisms and events of molecular evolution.
Large collections of many different complete genomes now available in the public domain offer
great advantages to genomic scale evolutionary studies. Phylogenomics, a term often used to
describe the use of genomic scale data to infer species phylogeny or to predict protein function
through evolutionary history, is greatly benefitted by the revolutionary progress in DNA
sequencing technology. In the present study we developed and utilized various phylogenomic
methods on large genome-scale data.
In the first study, we applied Singular Value Decomposition (SVD) analysis to reexamine
current evolutionary relationships for 12 Drosophila species using the predicted
proteins from whole genomes. An SVD analysis on unfiltered whole genomes (193,622
predicted proteins) produced the currently accepted Drosophila phylogeny at higher dimensions,
except for the generally accepted, but difficult to discern, sister relationship between D. erecta
and D. yakuba. Also, in accordance with previous studies, many sequences appear to support
alternative phylogenies. In this case, we observed grouping of D. erecta with D. sechellia when
approximately 55% to 95% of the proteins were removed using a filter based on projection
values or by reducing resolution by using fewer dimensions.
In the second study, we simulated restriction enzyme digestions on 21 sequenced
genomes of various Drosophila species. Using the fragments generated by simulated digestion from the predicted targets of 16 Type IIB restriction enzymes, we sampled a large and effectively
arbitrary selection of loci from these genomes. The resulting fragments were then used to
compare organisms and to calculate the distance between genomes in pair-wise combination by
counting the number of shared fragments between the two genomes. Phylogenetic trees were
then generated for each enzyme using this distance measure, and the consensus was calculated.
The consensus tree obtained agrees well with the currently accepted tree for these Drosophila
species. We conclude that multi-locus sub-genomic representation combined with next
generation sequencing, especially for individuals and species without previous genome
characterization, can improve studies of comparative genomics and the building of accurate
phylogenetic trees.
The third study utilized the relatively new Daphnia genome in an attempt to identify 40
orthologous groups of C2H2 Zinc-finger proteins that were previously determined to be well
conserved in bilaterians. We identified 58 C2H2 ZFP genes in Daphnia that belong to these 40
distinct families. The Daphnia genome appears to be relatively efficient with respect to these
well-conserved C2H2 ZFP, since only 7 of the 40 gene families have more than one identified
member. Worms have a comparable number of 6. In flies and humans, C2H2 ZFP gene
expansions are more common, since these organisms display 15 and 24 multi-member families
respectively. In contrast, only three of the well-conserved C2H2 ZFP families have expanded in
Daphnia relative to Drosophila, and in two of these cases, just one additional gene was found.
The KLF/SP family in Daphnia, however, is significantly larger than that of Drosophila, and
many of the additional members found in Daphnia appear to correspond to KLF 1/2/4 homologs,
which are absent in Drosophila, but present in vertebrates. The last study was conducted to investigate the conservation and distribution of 38 C2H2
ZNF gene families in Eukaryotes. We combined two popular approaches for homolog detection,
Reciprocal Best Hit (RBH) and Hidden–Markov model (HMM) profile search, on a diverse set
of complete genomes of 124 eukaryotic species ranging from excavates to humans. We
succeeded in identifying 3,675 genes as distinct members of the 38 C2H2 gene families. This
largely automated technique is much faster than manual methods and is able to detect homologs
accurately and efficiently among a diverse set of organisms. Our analysis of the 38 evolutionarily
conserved C2H2 ZNF gene families revealed a stepwise appearance of ZNF families, agreeing
well with the phylogenetic relationship of the organisms compared and their presumed stepwise
increase in complexity. |
| URI: | http://hdl.handle.net/10484/4582 |
| In Collections: | Biology
|
Items in Sycamore Scholars are protected by copyright, with all rights reserved, unless otherwise indicated.
|