Half-sibling reconstruction is the task of determining maternal and paternal sibling relationships from observed genotypes of same-generation individuals in a population. Knowledge of how populations are structured allows biologists to understand mating habits of different species, how threatened a population is, and how best to protect threatened or endangered species. This thesis examines the problem of half-sibling reconstruction and explains an accurate and fast heurstic for reconstructing half-siblings. The heuristic reconstructs half-sibling relationships with high accuracy on large biological populations where existing algorithms fail due to running time constraints. In addition to identifying and discussing some of the major problems with half-sibling reconstruction, we also prove that even the task of determining whether a half-sibling reconstruction obeys genetic inheritance laws is NP-complete.
SibJoin: A Fast Heuristic for Half-Sibling Reconstruction
Kinship inference is the task of identifying genealogically related individuals. Questions of kinship are important for determining mating structures, particularly in endangered populations. Although many solutions exist for reconstructing full-sibling relationships, few exist for half-siblings. We present SibJoin, a heuristic-based clustering approach based on Mendelian genetics, which is reasonably accurate and thousands of times faster than existing algorithms. We also identify issues with partition distance, the traditional method for assessing the quality of estimated sibship partitionings. We prefer an information theoretic alternative called variation of information, which takes into account the degree to which misplaced individuals harm sibship structures.