# What does effect size mean in GWAS?

Yao Yao on November 6, 2019

## It’s the Effect Size, Stupid

It’s the Effect Size, Stupid 是关于 effect size 的经典文章了，你 google 一般都能搜出这一篇文章。

• $\operatorname{ES}$ 即 effect size
• $\operatorname{SD}$ 是 standard deviation
• 我觉得它其实就是用 standard deviation 作为一个 unit 去量化了两组数据的 difference 了（联系 Gaussian 分布和 Z-score）

• 你两个 groups，到底哪个是 experimental 哪个是 control，这是你自己说了算的，所以可以考虑加个 abs value
• $\operatorname{SD}(X)$ 需要 estimate，具体看文章

## It’s NOT the Only Effect Size

• Regression coefficient (e.g. $\beta$ in $X_e = \beta X_c + \epsilon$)
• Pearson correlation coefficient (i.e. Pearson’s $r$)
• Odds ratio (参 Explaining Odds Ratios)
• Exposure 可以是 Genotype
• Outcome 可以是 Phenotype
• Cohen’s $d$ effect size
• Cohen’s $f^2$ effect size
• Glass’s $\delta$ effect size
• Hedges’ $g$ effect size
• Cramer’s $V$ effect size (a.k.a. Cramer’s $\varphi$)

## WTF is Effect Size?

Effect size is a statistical concept that measures the strength of the relationship between two variables on a numeric scale.

In statistics, an effect size is a quantitative measure of the magnitude of a phenomenon.

• 如果你的 phenomenon 是 difference of two groups，那么 Standardized mean difference 就是 magnitude of difference
• 如果你的 phenomenon 是 correlation of two groups，那么 Pearson correlation coefficient 就是 magnitude of correlation
• 依此类推，只要你的两组数据能构成一个 phenomenon，那么 effect size 它 measure 的就是这个 magnitude of phenomenon

## Effect Size in GWAS

GWAS 中你其实有两个观测对象：genotypes (or SNP alleles) 和 phenotypes，再分一个 experimental 和 control，其实你会有 4 组数据（假设是 bi-allelic；然后 phenotype 只有两种）。参考 CMU: Genomes and Complex Diseases:

• Standardized mean difference 明显不对
• Pearson correlation coefficient 好像也不对
• Odds ratio 貌似是可以的
• ……

• 这里 phenotype 是 continuous 的，但 discrete (categorical) 的情况也是类似的

Penetrance is the probability of developing a particular disease given a particular genotype, i.e. $P(Disease \vert Allele)$. 有时候 penetrance 也被算是一种 effect size，从定义上来看也说得通。