MAPMAKER/EXP Tutorial/Reference Manual 3.0


Automatic Error Detection

Our sample data set contains over 14,000 genotypes, and it is almost certain that, in spite of our best efforts in the lab, a few errors may be present. In fact, if even 99% of the genotypes are correct, we would be well ahead of the average often reported in the literature.

In a recent Genomics article (Genomics 14: 604-610), we proposed a new method for dealing with the possibility of genotyping error in these sorts of data sets, a method incorporated into this version of MAPMAKER. Because the Genomics article provides a comprehensive discussion, we only summarize it briefly here.

Essentially, the probability of error is now included in MAPMAKER's fundamental underlying model of genetics -- mistypings are considered rare events, as are crossovers in small regions (the a priori probability of error is set using the "error probability" command, discussed in the reference section -- the default is a 1% chance per individual genotype). MAPMAKER thus can weigh the evidence in the entire data set and generate maximum-likelihood maps which are reasonably correct, even when the data have some small number if isolated errors. Moreover, when the "error detection" algorithm is turned on, all of MAPMAKER's three-point and multipoint data analysis commands can take it into account when searching to find map orders and distances. However, not all commands display these data (although error detection does affect their results).

As a helpful side-effect, given a multipoint map order and distances, we are able to calculate a postiori (e.g. in light of all available raw data) the probability that each individual genotype is right or wrong. These numbers are presented as a "LOD of error", and represent on a log-scale the strength of the evidence that a marker is mistyped. For typical data sets, double-checking all genotypes with a LOD-error of about 1.0 or greater (usually a small fraction of the data set) will correct the vast majority of the errors. Note that MAPMAKER does not calculate LOD-error values for markers at the end of an order (simply because, without flanking markers, there is minimal power to tell recombination from mistyping).

As a quick example of the use of this method, we briefly turn the "error detection" option "on", and then re-display the map shown on the previous pages. Here, you can see that candidate mistypings are printed next to the locus in question, with an indication of the individual number, the flanking marker genotypes, and the LOD of error (only errors with LODs greater than a certain minimum, selected with the "error thresholds command", are displayed). Note that the map distances are also somewhat shrunken, particularly near the locus which had apparent mistypings.

Of course, in actual use we would leave the error detection feature on at all times while constructing maps (also, we would enable error detection for three-point, as well as multipoint analyses, using the "triplet errors" option). This has the unfortunate effect of slowing down most analyses (by roughly a factor of three with F2 intercross data, for example). A more detailed discussion of these features is provided in the Reference manual.

81> sequence order1
sequence #29= chrom12: order1

82> map ===============================================================================<> Map: Markers Distance 65 T002 17.5 cM 140 M062 5.2 cM 3 L016 10.0 cM 53 L058 3.4 cM 283 A064 1.1 cM 21 L041 7.3 cM 85 T051 3.7 cM 127 M027 14.3 cM 188 T064 ---------- 62.6 cM 9 markers log-likelihood= -86.06 ===============================================================================<>

83> error detection on 'error detection' is on.

84> map ===============================================================================<> Map: Apriori Markers Distance Prob Candidate Errors 65 T002 17.6 cM 140 M062 5.2 cM 1.0% - 3 L016 10.2 cM 1.0% - 53 L058 3.3 cM 1.0% - 283 A064 1.1 cM 1.0% - 21 L041 7.3 cM 1.0% - 85 T051 3.2 cM 1.0% - 127 M027 13.8 cM 1.0% [#35 H-A-H 1.77] 188 T064 ---------- 61.7 cM 9 markers log-likelihood= -87.09 ===============================================================================<>

85> error detection off 'error detection' is off.


up: table of contents
previous section: analyzing another chromosome
next section: saving and drawing information about a mapped chromosome