In biology, phylogeny [fahy-loj-uh-nee] refers to the evolutionary relationships among groups of organisms (e.g. species, populations). The term ‘phylogenetics’ derives from the Greek terms ‘phyle’  and ‘phylon,’ denoting ‘tribe’ and ‘race’; and the term ‘genetikos,’ denoting ‘relative to birth,’ from ‘genesis’ (‘origin’). The result of phylogenetic studies is a hypothesis about the evolutionary history of taxonomic groups. Phylogenetic analyses have become essential in researching the evolutionary tree of life. The overall goal of National Science Foundation’s Assembling the Tree of Life activity (AToL) is to resolve evolutionary relationships for large groups of organisms throughout the history of life, with the research often involving large teams working across institutions and disciplines.

Taxonomy, the classification, identification, and naming of organisms, is usually richly informed by phylogenetics, but remains methodologically and logically distinct. The degree to which taxonomy depends on phylogenies differs between schools of taxonomy: ‘numerical taxonomy’ ignored phylogeny altogether, trying to represent the similarity between organisms instead; ‘phylogenetic systematics’ tries to reproduce phylogeny in its classification without loss of information; ‘evolutionary taxonomy’ tries to find a compromise between them in order to represent stages of evolution.

Evolution is regarded as a branching process, whereby populations are altered over time and may split into separate branches, hybridize together, or terminate by extinction. This may be visualized in a phylogenetic tree, a hypothesis of the order in which evolutionary events are assumed to have occurred. The scientific methods of phylogenetics are often grouped under the term ‘cladistics.’  ‘Phenetics,’ popular in the mid-20th century but now largely obsolete, uses distance matrix-based methods to construct trees based on overall similarity, which is often assumed to approximate phylogenetic relationships. Ultimately, there is no way to measure whether a particular phylogenetic hypothesis is accurate or not, unless the true relationships among the taxa being examined are already known (which may happen with bacteria or viruses under laboratory conditions). The best result an empirical phylogeneticist can hope to attain is a tree with branches that are well supported by the available evidence.

Several potential pitfalls have been identified: Certain characters are more likely to evolve convergently than others; logically, such characters should be given less weight in the reconstruction of a tree. Weights in the form of a model of evolution can be inferred from sets of molecular data, so that maximum likelihood can be used to analyze them. For morphological data, unfortunately, the only objective way to determine convergence is by the construction of a tree – a somewhat circular method, but still useful. Further refinement can be achieved by weighting changes in one direction higher than changes in another; for instance, the presence of thoracic wings almost guarantees placement among the pterygote insects, although because wings are often lost secondarily, there is no evidence that they have been gained more than once.

In general, organisms can inherit genes in two ways: vertical gene transfer and horizontal gene transfer. Vertical gene transfer is the passage of genes from parent to offspring, and horizontal (also called lateral) gene transfer occurs when genes jump between unrelated organisms, a common phenomenon especially in prokaryotes (the simplest living things; bacteria and archaea without a cell nucleus); a good example of this is the acquired antibiotic resistance as a result of gene exchange between various bacteria leading to multi-drug-resistant bacterial species. Horizontal gene transfer has complicated the determination of phylogenies of organisms, and inconsistencies in phylogeny have been reported among specific groups of organisms depending on the genes used to construct evolutionary trees. The only way to determine which genes have been acquired vertically and which horizontally is to parsimoniously assume that the largest set of genes that have been inherited together have been inherited vertically; this requires analyzing a large number of genes.

Owing to the development of advanced sequencing techniques in molecular biology, it has become feasible to gather large amounts of data (DNA or amino acid sequences) to infer phylogenetic hypotheses. For example, it is not rare to find studies with character matrices based on whole mitochondrial genomes (~16,000 nucleotides, in many animals). However, simulations have shown that it is more important to increase the number of taxa in the matrix than to increase the number of characters, because the more taxa there are, the more accurate and more robust is the resulting phylogenetic tree. This may be partly due to the breaking up of long branches. Another important factor that affects the accuracy of tree reconstruction is whether the data analyzed actually contain a useful ‘phylogenetic signal,’ a term that is used generally to denote whether a character evolves slowly enough to have the same state in closely related taxa as opposed to varying randomly. Tests for phylogenetic signal exist. In general, the more data that is available when constructing a tree, the more accurate and reliable the resulting tree will be.

Missing data is no less detrimental than simply having less data, although its impact is greatest when most of the missing data is in a small number of taxa. The fewer characters that have missing data, the better; concentrating the missing data across a small number of character states produces a more robust tree. Because many characters involve embryological, or soft-tissue or molecular characters that (at best) hardly ever fossilize, and the interpretation of fossils is more ambiguous than living taxa, extinct taxa almost invariably have higher proportions of missing data than living ones. However, despite these limitations, the inclusion of fossils is invaluable, as they can provide information in sparse areas of trees, breaking up long branches and constraining intermediate character states; thus, fossil taxa contribute as much to tree resolution as modern taxa. Fossils can also constrain the age of lineages and thus demonstrate how consistent a tree is with the stratigraphic record (rock layers).

During the late 19th century, Ernst Haeckel’s recapitulation theory, or ‘biogenetic fundamental law,’ was widely accepted. It was often expressed as ‘ontogeny recapitulates phylogeny,’ i.e. the development of an organism successively mirrors the adult stages of successive ancestors of the species it belongs to. This theory has long been rejected. In fact, ontogeny evolves – the phylogenetic history of a species cannot be read directly from its ontogeny, as Haeckel thought would be possible, but characters from ontogeny can be (and have been) used as data for phylogenetic analyses; the more closely related two species are, the more apomorphies (separate forms) their embryos share.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s