Genetic linkage studies based on pedigree data have limited resolution, due to the relatively small number of segregations. Disequilibrium mapping, which uses population associations to infer the location of a disease mutation, provides one possible strategy for narrowing the candidate region. The coalescent process provides a model for the ancestry of a sample of disease alleles. Recombination events between disease locus and marker may be placed on this ancestral phylogeny. These events define the recombinant classes, the sets of sampled disease copies descending from the meiosis at which a given recombination occurred. This underlying coancestry induces dependence among the marker haplotypes in the sample, the ones within any recombinant class necessarily carrying the same marker allele. In fact, the number and sizes of the recombinant classes determine allelic associations and the likelihoods for recombination fractions.
We examine inferences of the age of a monophyletic variant (disease) allele within a general population coalescent. The effect of conditioning on this monophyletic origin greatly reduces age estimates; the disease alleles cannot be treated as a random sample of the same size from the population. We then present results on the probability distributions for the sizes and number of recombinant classes, as a function of the length and shape of the ancestral tree, which is, in turn, affected by the pattern of population growth and the age of the disease allele. For example, the shorter ancestral trees of rapidly growing variant populations have tip branches that are a larger proportion of the total tree. These tip branches are thus more likely than the root branches to carry a recombination event, and the recombinant classes will be relatively small. Simulations of populations growing at realistic rates, however, suggest that tip branches of the disease ancestry never dominate the ancestral tree. The underlying ancestry thus results in recombinant classes of size greater than one and dependence among disease copies, even in rapidly growing variant populations. This dependence in the marker alleles on sampled disease haplotypes controls the power of disequilibrium mapping.
Linkage disequilibrium (LD) is a powerful tool for disease mapping and tracing the history of genetic variants in human populations. LD is known to be affected by a variety of population and genetic factors, including changes in population size. Here we present data on LD between 15 microsatellite loci in 2 population samples of European origin (Hutterites and Ashkenazi Jews), and 2 population samples of African origin (Cameroun and Gambia), respectively. Because of their different demographic histories and ethnic backgrounds, these populations allow the comparison of background LD as a function of population history. LD is evaluated by using the P value generated by Fisher's exact test for all possible pairs of loci. We used the P value as an indicator of the strength of LD, rather than to indicate a formal level of significance. The distance between adjacent loci is 1.3 cM on average and ranges between 0 cM and 5 cM. The largest degree of LD, as estimated by the average P value over all loci, was found in the Hutterites. Interestingly, the Cameroun population also showed a relatively large extent of LD, higher than in the Ashkenazi and Gambia populations. The expected positive relationship between LD and genetic distance was observed only in the 2 populations of European origin, although a few loci with low heterozygosities (>0.6) failed to show LD over very short distances (> 1 cM). The latter attests to the importance of assessing the power of detecting LD even when using microsatellite markers. These results will be useful for planning disequilibrium mapping studies in these populations.
Many current mapping approaches focus on the use of linkage disequilibrium for the fine localization of disease loci. Of critical importance in this respect are i) the conditions under which disequilibrium may arise in a population and ii) how this disequilibrium is maintained over successive generations. Many models exist to answer these questions that are based on analytical and simulation models using repetitive sampling of parameters from predefined distributions. /parIn contrast to existing methods, our approach was to create a population simulation program that generates a virtual representation of every individual and makes no assumptions but the Mendelian Rules and random mating. Many relevant parameters regarding population structure, disease model etc. can be specified in our program (POPSIM) and the output of families and triplets is delivered in linkage file format. /parAs an application of our simulation approach, we studied the influences of recombination and drift on populations of 10, 50, 100 and 1.000 and 2.500 thousand individuals size with an initial population frequency of a founder haplotype of 0.8 over 200 generations. The gamma distribution - which has been previously used to model linkage disequilibrium by other authors - correctly mirrors the distribution of allele frequencies in the population, if the population size exceeds approximately 100.000. In smaller populations however, genetic drift prevails. This phenomenon could not be modeled by the gamma distribution. The marker distance over which linkage disequilibrium is effectively maintained over 200 generations is at or below 0.5cM, since not recombination but again genetic drift is determining the founder haplotype frequency. This direct population simulation approach may be a useful tool for theoretical geneticists in the evaluation of mapping strategies in different population settings and in power estimations of linkage disequilibrium studies.
A mutation high frequency is due to an initial founder effect when mutation-bearing chromosomes descend from a single ancestral chromosome present in the initial founding population. This ancestral chromosome has been scrambled through successive meiotic recombinations except for the markers surrounding the mutation. The region of the remaining ancestral haplotype is correlated to the time of occurrence of the mutation and could be used as a genetic clock: the smaller the remaining haplotype, the older the mutation. If g generations have elapsed from the most recent common ancestor (MRCA) the probability that no recombination occurred in a fraction t on each side of the mutation is (1-t)g and the mean size of the remaining original haplotype on each side of a mutation is simply 1/(g+1). Deriving g given t knowing the expression of t conditional on g is a typical Bayesian problem. While the probability of survival of a mutation is highly dependent on population demographic history such as rapid expansions and bottlenecks, the mean size of the remaining haplotype is by contrast only a function of the number of generations from the MRCA. Generation number prior probabilities are uneven except in small intervals and could be estimated using stochastic models. In practice, a crude but convenient estimation of g is 1/t. For instance, a mutation will be estimated to be approximately 100 generations old if t=0.01. /parThe problem of estimation of the age of a private Gypsy mutation, the C283Y mutation in the gamma-sarcoglycan gene has been addressed using this Bayesian approach to linkage disequilibrium. Computation using the dedicated application ABEL suggested that the most likely number of generations is 110 (95% confidence interval: 60-200). Assuming that a generation is represented by 20 years, this would indicate that the C283Y mutation in the gamma-sarcoglycan gene is at least 1200 years old and predates the commonly accepted date of migration of Gypsies out of Northern India.
Theoretically, recently admixed populations between genetically differentiated groups are likely to have high levels of association between genes that are even as much as10 to 20 cM apart. The transmission/disequilibrium test proposed by Spielman et al. (1993)can be a powerful test of linkage in the presence of association and therefore should be a useful test of linkage in admixed populations. One difficulty is that the degree of association between a marker and a disease gene depends upon the differences in allele frequencies in the subpopulations at the two loci. Hence, the choice of marker is important. One strategy that may improve the power of the TDT is to combine marker alleles based on information about the founding populations, thereby collapsing the marker into one with fewer alleles. We have examined the consequences of collapsing a microsatellite into a biallelic marker and have found that the TDT for the collapsed 2-allele marker with the largest difference in allele frequencies in the subpopulations can often be more powerful than for the original microsatellite. To demonstrate the strategy we considered a recent data set published by Jorde et al. (1995) which provides frequency estimates for 30 microsatellites in 243 Africans, Asians and Europeans.
There are few tools available for simulation and power analyses of methods for linkage disequilbrium mapping of complex disease genes in genetically isolated populations. POPSIM (L.A.S.) is a program which generates inbred populations with user-determined characteristics, including a disease model, number of meiotic steps to separate cases, the exact pedigree structure of one or more founding generations, and then for each generation the mean and variance of the number of offspring per mating and proportion of immigrants. The program reports each replicate's population size, proportion of genes inherited from common founders, numbers of ill cases, and for cases and parents both the original haplotypes (with each ancestral allele numbered and the presence of disease alleles designated) and re-coded haplotypes with a user-determined number of alleles per marker.
As an illustrative example we generated replicates based on genealogical data from a real population, and achieved a similar population size and relatedness to key founders. For a disease with lamda(sibs) = 2.27, using a simple measure of total lengths of shared ancestral segments around each marker locus (one 200 cM chromosome per replicate, 5 cM map), for 6 simulated populations with 100 randomly-selected case and 100 control subjects, and subjects in each group separated by 5 or more meiotic steps, maximum sharing was greater on all case chromosomes than on control chromosomes, and was within 10 cM of the disease locus. Methods such as these will facilitate studies to determine the power of LD mapping methods to detect disease loci with re-coded markers, to compare power of linkage and/or LD mapping analyses, and to explore the effects of population and disease parameters on power.
Association studies are one of the major strategies for identifying genetic factors underlying complex traits. In samples of related individuals, conventional statistical procedures are not valid for testing association, and maximum likelihood (ML) methods have to be used, but they are computationally demanding and are not necessarily robust to violations of their assumptions. Estimating Equations (EE) offers an alternative to ML methods, for estimating association parameters in correlated data. We have studied through simulations the behavior of EE in a large range of practical situations, including samples of nuclear families of varying sizes and mixtures of related and unrelated individuals. For a quantitative phenotype, the power of the EE test was comparable to that of a conventional ML test and close to the power expected in a sample of unrelated individuals. For a binary phenotype, the power of the EE test decreased with the degree of clustering, as did the power of the ML test. This result might be partly explain by a modeling of the correlations between responses that is less efficient than that in the quantitative case. In small samples (>50 families), the variance of the EE association parameter tended to be underestimated, leading to an inflation of the type I error. The heterogeneity of cluster size induced a slight loss of efficiency of the EE estimator, by comparison with the balanced samples. The major advantages of the EE technique are its computational simplicity and its great flexibility, easily allowing investigation of gene-gene and gene-environment interactions. It constitutes a powerful tool for testing genotype-phenotype association in related individuals.
The transmission disequilibrium test (TDT) of Spielman et al. (1993) tests for distortion in the segregation ratio from parents to offspring to map genes for complex diseases. In their papers describing the TDT, Spielman et al. have advocated analysis of parent-offspring trios in which the offspring are affected. Based on such trios, they formulated a McNemar or symmetry test statistic comparing the number of times that heterozygous parents pass one marker allele to the affected offspring to the number of times heterozygous parents pass the other marker allele to those offspring.
Spielman et al. counseled that if evidence for transmission distortion is obtained, the same analysis should be done on parent-offspring trios in which the offspring are unaffected to guard against detecting actual meiotic segregation distortion rather than the presence of a disease gene. They argued that the presence of a disease gene should be concluded only if distortion of transmission to unaffected offspring is insignificant or in the opposite direction as distortion of transmission to affected offspring.
We suggest that a generally more powerful approach is to use information on transmissions to both affected and unaffected offspring from the start. For a biallelic marker, this requires only that the McNemar statistic be replaced by a 2 x 2 table test for heterogeneity; here, rows correspond to affected and unaffected offspring and columns to the transmitted allele. The reason for the increased power of the new test is that given transmission distortion for affected offspring in favor of one allele, there should be transmission distortion for unaffected offspring in favor of the other allele, unless actual meiotic segregation distortion is present. We compare the power of the new and original TDT tests under a variety of genetic models and sampling designs, and describe situations in which the new test is more powerful. We also describe simple extensions of the new test to deal with markers with multiple alleles and families with both affected and unaffected offspring.
Linkage disequilibrium (LD) is frequently observed between disease genes and closely linked markers. In narrowing disease gene locations, therefore, LD testing has become common. It has also been proposed that LD testing with the dense maps currently available could be used for genome screening. The most common approach to testing for LD between a diallelic marker locus and a disease locus involves a case-control comparison of allele frequencies, using a chi2 test on 1 degree of freedom (df). For a marker with m alleles (m>2) the obvious extension is to perform a chi2 test on m-1 df. Using this approach, it is not clear what type of marker is likely to have the most power for detecting LD. Markers with many alleles will be very polymorphic, and might therefore be expected to be more useful. However, chi2 statistic increases with the number of marker alleles, and this will have a negative effect on power. The net effect of increasing the number of marker alleles is therefore not obvious.
We have used deterministic calculations to investigate this question, assuming initial complete LD, with no genetic drift, migration, or mutation. The expected power of the chi2 test was calculated for various recombination fractions (theta) and sample sizes, from 1 to 100 generations after the initial mutation, for markers with numbers of alleles varying from 2 to 10. For theta's consistent with those commonly used in genome screens, using a marker with more than two alleles results in a substantial increase in power, particularly when moderately large sample sizes are used. Using a 10 cM genome screen and a sample size of 75 disease and 75 normal chromosomes, 20 generations after the initial mutation the power to detect LD at theta=0.05 using a diallelic marker is 61%, compared to 88% using a marker with six alleles, assuming equifrequent alleles in theta's (e.g. theta=0.01), there is little increase in power when multiallelic rather than diallelic markers are used.
These results suggest two approaches to genome screens using LD tests. A fine grid of markers can be used, in which case diallelic markers and smaller sample sizes are close to optimal. If markers are more widely spaced, larger sample sizes and markers with more than two alleles are preferable.
The transmission/disequilibrium test (TDT) detects linkage between a complex disease and a genetic marker, when allelic association is also present. In this situation, parents heterozygous for marker alleles (M1 and M2 ) will transmit one allele to affected offspring preferentially. The TDT uses the fact that if marker and disease are not linked, the number X of transmissions of M1 from heterozgyous parents to n affected offspring has a binomial distribution with mean n/2 and variance n/4. For the test, marker genotypes of parents and offspring are determined, and linkage is inferred if X departs significantly from its mean.
If parental genotypes are not available, the TDT cannot be used. In such cases, however, genotype information might be available for unaffected sibs. We have derived a test (the sib TDT, or S-TDT) which uses this information. In effect, the frequency of M1 in affected sibs is compared with that in unaffected sibs. The comparison is complicated because the non-independence of sibs must be taken into account. To do this, the test uses the fact that if marker and disease loci are unlinked, the number Y of M1 alleles among affected sibs has a mean and variance given by the hypergeometric distribution. The mean M and variance V depend on the number of affected and unaffected sibs, and on the total number of sibs of each genotype, in each family. Linkage is inferred if Y departs significantly from its mean.
If the data include (1) some families suitable only for the TDT, (2) some suitable only for S-TDT and (3) some suitable for either, separate tests are undesirable. We have derived a way to combine the TDT and the S-TDT into a single test, the combined TDT, or C-TDT. (The third group of families is pooled with the first, and the unaffected offspring are ignored.) The test statistic is X+Y, with mean n/2+M, variance n/4+V, if linkage is absent. Linkage is inferred if X +Y departs significantly from its mean. Thus the S-TDT and C-TDT generalize the TDT to cases where parental genotypes are not available.
Most methods for analysis of linkage disequilibrium use only a single marker or at most two flanking markers. Power has been shown to be higher when multiple markers are analyzed simultaneously using a multiple two-point approach (Terwilliger, 1995). However, no statistically rigorous method is currently available to perform a true multipoint analysis using complete haplotype information. Houwen et al. (1994) suggested identification of shared haplotype segments among affected individuals, but no well defined statistic was provided. We propose a likelihood ratio statistic for the analysis of extended haplotypes. This will enable us to compare haplotype analysis to multiple two-point analysis (and even single marker analysis) in order to determine under what biological and evolutionary circumstances each approach is optimal, with respect to both power and fine scale localization. Since haplotype analysis requires additional model assumptions, parameter estimates may perhaps be less precise, but the additional information used from extended haplotypes should increase the power of the test.
When only a single marker or two flanking markers are analyzed, our method is analogous to Terwilliger's, but the extension to multiple markers is conceptually different. A bi-directional branching process originating at the assumed trait locus is used to analyze all markers jointly on haplotypes. The branching in each direction is modeled as a hidden Markov chain with true states "ancestral" and "non-ancestral"; its transition probabilities are functions of the inter-locus recombination fractions, marker allele frequencies and coalescence time of the mutation in the affected sample population. The likelihood is computed assuming each possible ancestral haplotype and weighted by their population frequencies. The resulting likelihood is numerically maximized over a heterogeneity parameter, a time parameter, the unknown map position of the trait locus and eventually over mutation rates. Software is in preparation and will be made available.
Association studies are assuming an increasingly prominent role in mapping the genes that predispose to complex human diseases. Family-based association methods such as the haplotype relative risk, the transmission disequilibrium test, and affected family-based controls are effective tools for eliminating false positive mapping results due to population stratification. However, requiring genotype data for parents of affected individuals limits the applicability of these methods for late-onset diseases.
We propose several family-based test statistics for association based on a discordant-sib-pair (DSP) design that use data from at least one affected and one unaffected sib, but do not require parental data. The first set of tests contrasts the alleles present in DSPs. Specifically, we count either a) all alleles or b) those alleles not present in both sibs, and compute Pearson or likelihood ratio statistics for heterogeneity. Due to dependence among the alleles, we use permutation tests to determine the level of statistical significance. A permutation test of symmetry among the alleles of the paired sibs also will be presented. The second set of tests contrasts genotypes for the DSPs using standard tests of symmetry and marginal homogeneity. Posited genetic disease models, prior hypotheses regarding an associated allele, and the presence of multiple affected or unaffected sibs are easily incorporated into these tests.
Initial results of computer simulations conducted under several disease models, sampling designs, and association patterns suggest that these methods provide a useful approach for detecting disease-marker association in late-onset diseases. We also shall describe an application of these methods to a set of Alzheimer disease families typed for the ApoE polymorphism.
Identifying loci influencing quantitative traits can play an important role in understanding the etiology of complex human genetic diseases. Traditional methods for mapping quantitative traits in human populations focus on allele-sharing in relative pairs and can have little power to detect trait loci of small effect. For dichotomous traits, the transmission/disequilibrium test (TDT) can be more powerful than allele-sharing tests when there is population association. With this in mind, it seems reasonable that tests could be developed that take advantage of associations between marker and trait alleles to glean a more powerful test for locating quantitative trait loci. Here we propose a test that, like the TDT for dichotomous traits, can be a powerful test for linkage in the presence of association. Also like the TDT, the test uses family data and focuses on marker-allele transmission from heterozygous parents. We base our test on the difference between the mean phenotypic value among children of heterozygous parents that transmit a particular marker allele and the mean phenotypic value among children of heterozygous parents that do not transmit that marker allele. The statistic is summed over marker alleles and is general for any number of alleles. We consider two estimates of variance and show that for large samples either gives an approximately valid chi-square test. In addition, we propose a simple Monte Carlo test that is appropriate for small samples or when other assumptions may be violated since it is always valid. Using computer simulation, we examine the robustness of the chi-square approximation and explore the power of these tests.
Power to detect linkage by the affected sib-pair (ASP) test and transmission/disequilibrium test (TDT) critically depends upon the magnitudes of (Ps-.5) and (Pt-.5), respectively. Ps denotes the probability of ASP allele "sharing" or the probability that a randomly ascertained parent of an ASP transmitted the same marker allele (identical-by-descent) to both affected sibs. Pt denotes the probability of parental allele "transmission" or the probability that a particular marker allele (e.g. allele A) was transmitted to an individual affected child by a randomly ascertained and informative (A/non-A) parent. Assuming linkage between a biallelic marker and biallelic disease locus, I have demonstrated that Ps=.5+(Ls)(Ms)(Rs) and Pt=.5+(Lt)(Mt)(Rt). In these expressions, the factors Ls and Lt depend only on the recombination fraction (theta) between marker and disease locus, the factors Ms and Mt depend on marker allele frequency (m) and disequilibrium (delta) between marker and disease locus, and the factors Rs and Rt depend only on the frequency of disease-causing allele D and the three penetrances of the disease locus genotypes. Based on analysis of the expressions for Ps and Pt, I will present several major findings including: (1) Disequilibrium (delta) increases the magnitudes of both (Ps-.5) and (Pt-.5); (2) The value of Ps for a completely polymorphic marker equals Ps for a biallelic marker in equilibrium with the disease locus; (3) Previous analytic investigations of TDT power such as the analysis by Risch and Merikangas (Science 273:1516-17, 1997) are special cases of a more general framework provided by expressions for Ps, Pt, and the proportion (H/F) of ascertained parents who are informative at the marker. This general framework can be used to compare the power of the TDT and ASP test for genome scans or for tests of a single candidate gene.
In the second round of a genome scan it is often the case that a region of interest is saturated with tightly linked markers for follow-up testing. If each marker is used to test for linkage to a disease gene separately then multiple testing has occurred and should be accounted for. It has been suggested that a standard Bonferroni correction is appropriate when applying the TDT at multiple marker loci since the tests are largely independent (Spielman and Ewens 1996, Risch and Merikangas 1997). When the tests at different markers are correlated, as may be the case when the markers are linked and associated, this correction leads to a conservative test and hence a loss of power. To circumvent this problem, we propose a Monte Carlo procedure that provides a global multilocus TDT for linkage to markers in the region. Using computer simulations, we examine the properties of these two procedures. When the tests at diffferent marker loci are independent, the Monte Carlo procedure and the test using the Bonferroni correction have the same power. However, when dependendencies exist the Monte Carlo test is more powerful, although the increase is nominal.
The transmission/disequilibrium test (TDT) first proposed by Spielman et al. (1993) has been an important tool in the search for genes involved in complex diseases. One problem with the TDT is the need for parental data which is often unavailable for late onset diseases. As a remedy for this problem, Spielman and Ewens (personal communication) propose a permutation procedure which utilizes unaffected siblings as surrogates for missing parents. Their statistic compares the number of times a particular allele occurs in affected siblings with the number of times it occurs in unaffected siblings. To estimate a p-value, they permute genotypes within families, which under the null hypothesis of no linkage or no association leads to pseudosamples that have the same distribution as the original sample. For a multiallele marker, the test is applied to each allele separately and a Bonferroni correction for multiple alleles is used. An alternative statistic that was mentioned by Spielman and Ewens considers all alleles simultaneously. A simulation study with various genetic models was used to compare the power of these two procedures. In addition we compared the power of these tests with the TDT assuming that parental data were available. The simulations revealed that the test that considers all marker alleles simultaneously is more powerful than the test that uses the Bonferroni correction and surprisingly was only slightly less powerful than the TDT. Sampling strategies were also investigated with respect to power.
A variety of genetic studies require the establishment or assessment of kinship between two individuals. For example, in gene mapping studies using affected sibling pairs it is necessary to assure that the stated sibling relationship between a pair of individuals is true when parents are unavailable, as is usually the case for late age-at-onset phenotypes.
For any pair of regular relatives (those whose parents are not inbred) the kinship can be defined by the Cotterman k-coefficients, k2,k1 and k0, where ki is the probability of sharing i alleles identical by descent (IBD). For each such class, and consequently for each pair of individuals, the probability (likelihood) of sharing i alleles identical by state (IBS) can be calculated as functions of the allele frequency moments at any polymorphic locus. Thus, given genotype data at a number of loci the likelihood of a given kinship can be computed. We have studied the power to distinguish between full siblings, half-siblings and unrelateds for a pair of individuals genotyped at 50, 100 and 200 polymorphic markers with 4, 6, 8 and 10 equifrequent alleles. We have also assessed how the use of marker loci mapped at resolutions of 1, 2 or 5 cM, as opposed to the use of unlinked loci, affects the power. These evaluations show that to correctly clasify a pair as full siblings, half-siblings or unrelateds with greater than 95% power requires a minimum of 100 markers with 4 equifrequent alleles. Although the use of loci with greater heterozygosity increases power the increase is small if the baseline of 4 equifrequent allele markers is used. Markers which map close to one another decreases power, as expected. These results suggest that the genotype data generated in sibpair studies are sufficient to discriminate between specific close relationships. The estimation of accurate kinship coefficients is, however, more difficult.
With the advance of the human genome project, attention is being paid to the possibility of large scale, genome-wide association studies. Because of the large volume of genotyping required by such studies, it is important to consider efficiencies that can be brought to bear on this problem, both in terms of study design and analysis. The greatest efficiency can be obtained by DNA pooling, whereby only a small number of DNA pools are genetically characterized rather than the large number of individuals underlying these pools.
Here we consider study designs based on nuclear families - affecteds with parents, or affected and unaffected sibs without parents. We consider statistics based on either pooled DNA (e.g. affected children forming one pool, parents another; or affected sibs forming one pool, unaffected sibs another) or individual genotyping. For sibships without parents and individual genotyping, we introduce a novel disequilibrium statistic (the sibship-based disequilibrium or SD test) which is both powerful and robust.
For sibships without parents, the power of affected-unaffected sib pairs is about half that of singletons with parents, and increases proportionately with the number of unaffected sibs. Designs with two affected sibs are generally superior to those with single affecteds, especially when the disease allele frequency is low.
Pooled studies require statistics based on overall allele frequencies; these tests may be sensitive to population stratification and have inflated type-1 errors, although the power is typically comparable for pooled and unpooled data. Therefore, we recommend a two-stage procedure. First, tests should be performed based on DNA pooling to identify interesting loci for follow-up by individual genotyping (second stage). However, even for this second stage, we show that genotyping efficiency can still be enhanced by some sample pooling with no loss of power or robustness.
In this paper we consider the genetic analysis of quantitative traits using a very high density map of biallelic markers and cases and controls defined and ascertained from "thresholds" associated with a relevant trait distribution. Emphasis is on the identification of genomic regions likely to harbor loci whose allelic variation influences variation in a quantitative trait in the population at large. We consider the power of studies designed to detect such loci, as well as issues concerning the false positive rates of such studies, the degree to which admixture and population stratification can influence false positive rates, and the role that the evolution of the polymorphisms comprising the high density map has on the ability to detect trait-influencing loci. Sample size guidelines are offered, as are strategies for minimizing the effort needed to genotype relevant individuals. Ultimately, we attempt to address the question as to whether or not the identification of loci with minor to moderate effects on quantitative traits is possible without family data.