| LECTURE 6: FAMILY-BASED ASSOCIATION ANALYSIS AND OTHER TOPICS |
| linkage analysis | association (linkage disequilibrium) analysis |
| pedigree data | population data |
| current "in action" recombination events | ancestral recombination events |
| wider span (10cm?) | narrower span (1cm?) |
| ony care about location. allelic heterogenity does not cause any problem | allele is important. allelic heterogeneity will cause problems |
Family-based association analysis is a converging point of both. It can be viewed from both linkage analysis's perspective and linkage disequilibrium anlysis' perspective.
From linkage's perspective: it enhances the linkage signal by knowing which phase is more likely.
[COMMENT: there is a well known result that if a pedigree
has only one (affected) offspring, this pedigree could
not provide any linkage signal. Why, because of the problem
of unknown phase (is the disease allele on the same haplotype
with marker allale "a" or marker allele "A"?), both phases
are assigned a 50% probability in the "LOD" calculation.
On the other hand, the most welll-known design for family-based
association contains only one affected offspring. This pedigree
potentially could contribute to the linkage signal because
two phases are assigned different probabilities.
]
From association's perspective: it deals the problem of population heterogeneity/stratification.
If a family-based association result is significant, both
ancestral and current recombination contribute to the signal.
Without transmission bias in the current recombination (i.e.,
no linkage), there would be no significant result in
family-based association analysis
Similarly, without a linkage disequilibrium between the
marker allele and the disease allele (hitchhike), there
would be no significant result in family-based association analysis
Both linkage and linkage disequilibrium ("AND") are required!
So in some sense, if a family-based association analysis on the trio data and a population-based association analysis lead to different conclusions, do not think there is something wrong!
Example: suppose we have (total 50 parents, or 100 alleles)
10 parents whose genotype is homozygous aa
20 parents whose genotype is homozygous AA
15 parents transmit A, left out a
5 parents transmit a, left out A
If we do not admit the pairing, there are 35 "A" alleles and 15 "a" alleles in the transmitted ("case") group, and 25 "A" and 25 "a" alleles in the untransmitted ("control") group. Using chi-square test, p-value=0.0412, odd-ratio= 2.333 (95%CI is (1.03, 5.30)).
If we admit the pairing, all homozygous parents are thrown away. We only look at these two numbers:
| untrasmitted allele is a | untrasmitted allele is A | |
| trasmitted allele is a | 10 | 5 |
| trasmitted allele is A | 15 | 20 |
McNemar test calculates: TDT = (5-15)2/(5+15)= 100/20=5. TDT should follow the chi-square distribution with one degree of freedom. The p-value for TDT=5 is p-value=0.025347. (1- pchisq(5, df=1) in "R" )
The name TDT (transmission disequilibrium test, something like "linkage and linkage disequilibrium test") is a re-invention of the McNemar test. The two are exactly identical.
In general, pairing/matching reduces the sample size, thus reduce the power of the test
Three alleles
| untrasmitted allele is a | untrasmitted allele is b | untrasmitted allele is A | |
| trasmitted allele is a | 5 | 10 | |
| trasmitted allele is b | 25 | 5 | |
| trasmitted allele is A | 15 | 30 |
The simplicity and easy-to-use feature of TDT in parent(s)-affected-offspring trios/pairs is destroyed if there are other pedigree members are available. There are two schools of opinions in dealing with this situation.
1. Correcting the simplest TDT
see: TG Schulze, FJ McMahon (2002), "Genetic association mapping at the crossroads: which test and why? overview and practical guidelines", American Journal of Medical Genetics, 114:1-11. PDF
| name | publication | type of data |
| TDTg | Bickboller & Clerget-Darpoux (1995) | trio |
| GTDT (in SAS) | Rice et al (1995) | trio |
| ETDT | Sham & Curtis (1995) | trio |
| TDTLIKE | Terwilliger (1995) | trio, affected-sib-pair,general-ped |
| AFBAC | Thomson (1995) | trio, affected-sib-pair |
| Tmhet | Spielman & Ewens (1996) | trio, affected-sib-pair, discordant-sib-pair, general-ped |
| MC-Tm | Kaplan et al (1997) | trio |
| (in SAGE) | Cleves et al. (1997) | trio, affected-sib-pair |
| (in SIBASSOC) | Curtis (1997) | discordant-sib-pair |
| TSU, TSP | Martin et al (1997) | trio, affected-sib-pair |
| S-TDT | Spielman & Ewens (1998) | discordant-sib-pair, general-ped |
| TDT, THT, THS | Risch & Teng (1998) | trio, affected-sib-pair, discordant-sib-pair (DNA pooling) |
| DAT | Boehnke & Langefeld (1998) | discordant-sib-pair |
| SDT | Horvath & Laird (1998) | discordant-sib-pair |
| LRT/TAT | Weinberg (1998) | trio |
| TDTG | Xiong et al (1998) | trio |
| RC-TDT | Knapp (1999) | trio |
| 1-TDT | Sun et al (1999) | trio, affected-sib-pair, discordant-sib-pair |
| TDS | Teng & Risch (1999) | trio, affected-sib-pair, discordant-sib-pair |
| (logistic regression extension to TDT) | Waldman et al (1999) | trio, affected-sib-pair |
| EM-LRT | Weinberg (1999) | trio |
| PDT | Martin et al. (2000) | general-ped |
| FBAT | Rabinowitz & Laird (2000) | general-ped |
| S-TDT, C-TDT | Ho & Bailey-Wilson (2000) | trio, discordant-sib-pair, general-ped |
| XS-TDT | Horvath et al (2000) | trio, affected-sib-pair, discordant-sib-pair, |
| HWSE | Lunette et al (2000) | trio, affected-sib-pair |
| excess sharing/TSP | Wicks (2000) | trio, affected-sib-pair |
2. Using linkage program but adding linkage disequilibrium as an extra parameter
If the simplicity of TDT is lost when other pedigree members are included, why do we still use TDT? Why not use the linkage analysis program (since TDT is, after all, to detect BOTH linkage and linkage disequilibrium)?
see: HHH Goring, JD Terwilliger (2000), "Linkage analysis in the presence of errors IV: joint pseudomarker analysis of linkage and/or linkage disequilibrium on a mixture of pedigree and singletons when the mode of inheritance cannot be accurately specified", American Journal of Human Genetics, 66:1310-1327. PDF
| Interactions |
From Hwang et al (1995), "Association study of transforming growth factor alpha (TGF alpha) TaqI polymorphism and oral clefts: indication of gene-environment interaction in a population-based sample of infants with birth defects", American Journal of Epidemiology, 141:629-636.
| non-smoking | smoking | |||
| a | A (risk allele) | a | A (risk allele) | |
| case | 36 | 7 | 13 | 13 |
| control | 167 | 34 | 69 | 11 |
ODd-ratio due to having the risk allele (not exposed to
environmental risk factor, i.e. non-smoking):
ORg = (7*167)/(36*34) = 0.955.
Odd-ratio due to the exposure to environmental risk
factor (but do not carry the risk allele):
ORe = (13*167)/(36*69)=0.874.
Odd-ratio due to both risk allele and risk environmental
factor:
ORge = (13*167)/(36*11)=5.48. with
95% CI (2.27, 13.21)
2-by-2-by-2 table is reduced to 2-by-2 table:
| non-smoking | smoking | |
| non risk allele | 36 | 13 |
| risk allele | 7 | 13 |
Odd-ratio= 5.14, 95% CI (1.68, 15.71).
Since TDT for one environment has only two independent counts: the number of parents that transmits A but not a, and the number of parents that transmits a but not A, for two environmental conditions, it is a 2-by-2 table:
| transmit A, not trasmit a | transmit a, not trasmit A | |
| non-smoking | 15 | 5 |
| smoking | 10 | 8 |
Further reading: Q Yang, MJ Khoury (1997), "Evolving methods in genetic epidemiology. III. Gene-environment interaction in epidemiologic research", Epidemiology Review, 19:33-43.
| Other Topics Not Covered |