LECTURE 6: FAMILY-BASED ASSOCIATION ANALYSIS AND OTHER TOPICS

Differences Between Linkage Analysis and Association Analysis

linkage analysis association (linkage disequilibrium) analysis
pedigree data population data
current "in action" recombination events ancestral recombination events
wider span (10cm?) narrower span (1cm?)
ony care about location. allelic heterogenity does not cause any problem allele is important. allelic heterogeneity will cause problems

... And Their Co-Appearance in Family-Based Association Analysis

Family-based association analysis is a converging point of both. It can be viewed from both linkage analysis's perspective and linkage disequilibrium anlysis' perspective.

From linkage's perspective: it enhances the linkage signal by knowing which phase is more likely.

[COMMENT: there is a well known result that if a pedigree has only one (affected) offspring, this pedigree could not provide any linkage signal. Why, because of the problem of unknown phase (is the disease allele on the same haplotype with marker allale "a" or marker allele "A"?), both phases are assigned a 50% probability in the "LOD" calculation.

On the other hand, the most welll-known design for family-based association contains only one affected offspring. This pedigree potentially could contribute to the linkage signal because two phases are assigned different probabilities. ]

From association's perspective: it deals the problem of population heterogeneity/stratification.

If a family-based association result is significant, both ancestral and current recombination contribute to the signal.
Without transmission bias in the current recombination (i.e., no linkage), there would be no significant result in family-based association analysis
Similarly, without a linkage disequilibrium between the marker allele and the disease allele (hitchhike), there would be no significant result in family-based association analysis
Both linkage and linkage disequilibrium ("AND") are required!

So in some sense, if a family-based association analysis on the trio data and a population-based association analysis lead to different conclusions, do not think there is something wrong!

Various Forms of Family-Based Association Analysis

See: JD Terwilliger, J Ott (1992), "A haplotype-based 'haplotype relative risk' approach to detecting allelic associations", Human Heredity, 42:337-346.

Example: suppose we have (total 50 parents, or 100 alleles)
10 parents whose genotype is homozygous aa
20 parents whose genotype is homozygous AA
15 parents transmit A, left out a
5 parents transmit a, left out A

If we do not admit the pairing, there are 35 "A" alleles and 15 "a" alleles in the transmitted ("case") group, and 25 "A" and 25 "a" alleles in the untransmitted ("control") group. Using chi-square test, p-value=0.0412, odd-ratio= 2.333 (95%CI is (1.03, 5.30)).

Matched Case-Control Analysis: McNemar Tests

If we admit the pairing, all homozygous parents are thrown away. We only look at these two numbers:

untrasmitted allele is a untrasmitted allele is A
trasmitted allele is a 10 5
trasmitted allele is A 15 20

McNemar test calculates: TDT = (5-15)2/(5+15)= 100/20=5. TDT should follow the chi-square distribution with one degree of freedom. The p-value for TDT=5 is p-value=0.025347. (1- pchisq(5, df=1) in "R" )

The name TDT (transmission disequilibrium test, something like "linkage and linkage disequilibrium test") is a re-invention of the McNemar test. The two are exactly identical.

In general, pairing/matching reduces the sample size, thus reduce the power of the test

Three alleles

untrasmitted allele is a untrasmitted allele is b untrasmitted allele is A
trasmitted allele is a 5 10
trasmitted allele is b 25 5
trasmitted allele is A 15 30
In this example, the multi-allele TDT = (5-25)2/(5+25) + (10-15)2/(10+15) + (5-30)2/(5+30) = 400/30 + 25/25 +625/35 = 32.19047 which should follow the chi-square distribution with 2 degrees of freedom. The corresponding p-value is 1.023 x 10-7.

Pedigree Data More Complicated Than Parents-Affected-Offspring Trios/Pairs

The simplicity and easy-to-use feature of TDT in parent(s)-affected-offspring trios/pairs is destroyed if there are other pedigree members are available. There are two schools of opinions in dealing with this situation.

1. Correcting the simplest TDT

see: TG Schulze, FJ McMahon (2002), "Genetic association mapping at the crossroads: which test and why? overview and practical guidelines", American Journal of Medical Genetics, 114:1-11. PDF

name publication type of data
TDTg Bickboller & Clerget-Darpoux (1995) trio
GTDT (in SAS) Rice et al (1995) trio
ETDT Sham & Curtis (1995) trio
TDTLIKE Terwilliger (1995) trio, affected-sib-pair,general-ped
AFBAC Thomson (1995) trio, affected-sib-pair
Tmhet Spielman & Ewens (1996) trio, affected-sib-pair, discordant-sib-pair, general-ped
MC-Tm Kaplan et al (1997) trio
(in SAGE) Cleves et al. (1997) trio, affected-sib-pair
(in SIBASSOC) Curtis (1997) discordant-sib-pair
TSU, TSP Martin et al (1997) trio, affected-sib-pair
S-TDT Spielman & Ewens (1998) discordant-sib-pair, general-ped
TDT, THT, THS Risch & Teng (1998) trio, affected-sib-pair, discordant-sib-pair (DNA pooling)
DAT Boehnke & Langefeld (1998) discordant-sib-pair
SDT Horvath & Laird (1998) discordant-sib-pair
LRT/TAT Weinberg (1998) trio
TDTG Xiong et al (1998) trio
RC-TDT Knapp (1999) trio
1-TDT Sun et al (1999) trio, affected-sib-pair, discordant-sib-pair
TDS Teng & Risch (1999) trio, affected-sib-pair, discordant-sib-pair
(logistic regression extension to TDT) Waldman et al (1999) trio, affected-sib-pair
EM-LRT Weinberg (1999) trio
PDT Martin et al. (2000) general-ped
FBAT Rabinowitz & Laird (2000) general-ped
S-TDT, C-TDT Ho & Bailey-Wilson (2000) trio, discordant-sib-pair, general-ped
XS-TDT Horvath et al (2000) trio, affected-sib-pair, discordant-sib-pair,
HWSE Lunette et al (2000) trio, affected-sib-pair
excess sharing/TSP Wicks (2000) trio, affected-sib-pair

2. Using linkage program but adding linkage disequilibrium as an extra parameter

If the simplicity of TDT is lost when other pedigree members are included, why do we still use TDT? Why not use the linkage analysis program (since TDT is, after all, to detect BOTH linkage and linkage disequilibrium)?

see: HHH Goring, JD Terwilliger (2000), "Linkage analysis in the presence of errors IV: joint pseudomarker analysis of linkage and/or linkage disequilibrium on a mixture of pedigree and singletons when the mode of inheritance cannot be accurately specified", American Journal of Human Genetics, 66:1310-1327. PDF

Interactions

Detect Gene-Environment Interaction With 2-by-2-by-2 Tables

From Hwang et al (1995), "Association study of transforming growth factor alpha (TGF alpha) TaqI polymorphism and oral clefts: indication of gene-environment interaction in a population-based sample of infants with birth defects", American Journal of Epidemiology, 141:629-636.

non-smoking smoking
a A (risk allele) a A (risk allele)
case 36 7 13 13
control 167 34 69 11

ODd-ratio due to having the risk allele (not exposed to environmental risk factor, i.e. non-smoking): ORg = (7*167)/(36*34) = 0.955.
Odd-ratio due to the exposure to environmental risk factor (but do not carry the risk allele): ORe = (13*167)/(36*69)=0.874.
Odd-ratio due to both risk allele and risk environmental factor: ORge = (13*167)/(36*11)=5.48. with 95% CI (2.27, 13.21)

Detect Gene-Environment Interaction With Case-Samples-Only Design

2-by-2-by-2 table is reduced to 2-by-2 table:

non-smoking smoking
non risk allele 36 13
risk allele 7 13

Odd-ratio= 5.14, 95% CI (1.68, 15.71).

Detect Gene-Environment Interaction in TDT

Since TDT for one environment has only two independent counts: the number of parents that transmits A but not a, and the number of parents that transmits a but not A, for two environmental conditions, it is a 2-by-2 table:

transmit A, not trasmit a transmit a, not trasmit A
non-smoking 15 5
smoking 10 8
Odd-ratio = 2.4, 95% CI= (0.61, 9.49) (so not significant).

Further reading: Q Yang, MJ Khoury (1997), "Evolving methods in genetic epidemiology. III. Gene-environment interaction in epidemiologic research", Epidemiology Review, 19:33-43.

Other Topics Not Covered