From: softlib.ca.rice.edu
Last Mod: May 30, 1995
fastlink 3.0p

What does the output of ILINK and LODSCORE mean?


This file describes the output that the programs ILINK and LODSCORE print to the screen. For the rest of the text we describe things in terms of ILINK because the output for LODSCORE is very similar. The need for this document was suggested by Marcy Speer (Duke).

ILINK uses the GEMINI optimization procedure to find a locally optimal value of the theta vector of recombination fractions. If you use the default scripts produced by lcp, your initial guess for theta is .1 in every dimension. GEMINI evaluates each theta by its likelihood, seeking to find theta vectors that have a higher pedigree likelihood.

The GEMINI procedure has multiple iterations. Each iteration corresponds to one line of output. Each iteration includes multiple likelihood function evaluations. Each iteration has two phases. In Phase I GEMINI seeks to improve the current best theta. In Phase II, GEMINI estimates the gradient of the likelihood with respect to the current best theta vector. In the first iteration, Phase I only evaluates the likelihood at the initial candidate theta.

When ILINK prints out a line such as: maxcensor can be reduced to -32767, it has completed the first likelihood function evaluation. On long runs, this fact can be used to estimate running time. A reasonable rough estimate for the number of function evaluations is 10*(number of dimensions of theta vector). The number of dimensions of the theta vector is one fewer than the number of loci in most cases. If maletheta and femaletheta are allowed to differ (sexdif is set to 1), then the number of dimensions doubles to 2 * (number of loci - 1). Estimating other parameters (with fitmodel set to true) can also increase the number of dimensions.

After each iteration, ILINK prints out one line with four pieces of information:

ITERATION is a positive integer showing the number of the iteration just completed.

T is an indication of the step size that the GEMINI procedure takes in updating theta. Sometimes, very small T indicates that GEMINI did many updates (and hence the iteration probably took longer than average) each of which requires a likelihood function evaluation.

NFE is a positive integer indicating how many likelihood function evaluations have been done through that iteration.

F is a scaled representation of -2log(likelihood) at the current best theta. Because of the - sign, the value of F decreases until it reaches a local minimum.

After the last printed iteration, ILINK in FASTLINK does one more likelihood function evaluation for the purpose of computing Ott's Generalized LODSCORE which shows up in final.dat (transferred to final.out by the default pedin scripts). Ott's generalized LODSCORE compares -2log(likelihood) at the locally optimal theta to -2log(likelihood) at a theta that is .5 in every component (i.e. each locus unlinked to all the rest). In LINKAGE ILINK more likelihood function evaluations are done after the last printed iteration line, but these likelihood function evaluations are unnecessary (see paper2.ps from the FASTLINK distribution for more details).

Some users run ILINK and LODSCORE with execution scripts that do not delete the output file outfile.dat upon termination. The file outfile.dat is primarily useful in storing information about the values of certain variables at each iteration; these variables are not of interest, except for those who wish to modify the code. Of interest to users is the last thing in outfile.dat which is some description of the condition under which LODSCORE and ILINK terminated. This is a code stored in the variable idg and takes one of 8 values:

 1: Maximum possible accuracy reached
 2: Search direction no longer downhill
 3: Accumulation of rounding error prevents further progress
 4: All significant differences lost through cancellation in conditioning
 5: Specified tolerance on normalized gradient met
 6: Specified tolerance on gradient met
 7: Maximum number of iterations reached
 8: Excessive cancellation in gradient

Under all circumstances it should be emphasized that if ILINK or LODSCORE is used with only a single starting theta, the output value is only a local optimum and not a global optimum. It is a good idea to try with several different starting thetas. It is perfectly valid to compare the local optima from different starting points and choose the one that gives the best value of -2*log(likelihood); the more staring points tried, the more likely that the best value will be a global optimum.

If ILINK or LODSCORE exits with condition 5 or 6, the output value is pretty safe as a local optimum.

If ILINK or LODSCORE exits with condition 7, the output values are completely unsafe. The source code must be modified to increase iterationMultiple, which is #defined in gemdefs.h.

If ILINK or LODSCORE exits with conditions 1,2,3,4, or 8 the situation is more nebulous, but it is a good idea to try more experiments to test how robust the output values are. Try starting from different initial thetas. One might also try increasing the constant tol in gemdefs.h Increasing tol will have the effect of relaxing the convergence criteria, so that ILINK and LODSCORE may come close to a local optimum, where a smaller tol causes problems. If increasing tol helps, then one should:

   find the local optimum with the higher tol
   reset tol to its previous value 
   restart the program with the first local optimum as the initial value
This experiment will test whether the initial local optimum can be improved by more precise calculations.

ILINK or LODSCORE does not allow the theta values to get down to 0.0. Therefore, if one of the locally optimal thetas is reported as close to 0.0, the situation ought to be explored further using LINKMAP or MLINK, which will allow arbitrarily small values of theta.


back to fastlink