Misplaced Pages

Genome-wide association study: Difference between revisions

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Browse history interactively← Previous editContent deleted Content addedVisualWikitext
Revision as of 02:37, 25 January 2008 edit163.1.210.36 (talk)No edit summary← Previous edit Latest revision as of 03:10, 9 January 2025 edit undoZenomonoz (talk | contribs)Extended confirmed users4,307 edits Undid revision 1268297510 by 80.200.232.89 (talk) this is not an "edit dispute solved", you have inserted numerous primary source studies and then put your own analysis in.Tag: Undo 
(620 intermediate revisions by more than 100 users not shown)
Line 1: Line 1:
{{short description|Study of genetic variants in different individuals}}
{{Cleanup|date=July 2007}}
{{cs1 config|name-list-style=vanc}}
A '''genome-wide association study''' (GWAS) is an examination of genetic variation across the human genome, designed to identify genetic associations with observable traits, such as blood pressure or weight, or why some people get a disease or condition.
{{Use dmy dates|date=October 2020}}
In ], a '''genome-wide association study''' ('''GWA study''', or '''GWAS'''), is an ] of a genome-wide set of ] in different individuals to see if any variant is associated with a trait. GWA studies typically focus on associations between ]s (SNPs) and traits like major human diseases, but can equally be applied to any other genetic variants and any other organisms.


] depicting several strongly associated risk loci. Each dot represents a ], with the X-axis showing genomic location and Y-axis showing ]. This example is taken from a GWA study investigating ], so the peaks indicate genetic variants that are found more often in individuals with kidney stones.|alt=Manhattan plot of a GWAS]]
The completion of the ] in 2003 made it possible to find the genetic contributions to common diseases and analyze whole-genome samples for genetic variations that contribute to their onset.


When applied to human data, GWA studies compare the DNA of participants having varying ]s for a particular trait or disease. These participants may be people with a disease (cases) and similar people without the disease (controls), or they may be people with different phenotypes for a particular trait, for example blood pressure. This approach is known as phenotype-first, in which the participants are classified first by their clinical manifestation(s), as opposed to ]. Each person gives a sample of DNA, from which millions of ] are read using ]s. If there is significant statistical evidence that one type of the variant (one ]) is more frequent in people with the disease, the variant is said to be ''associated'' with the disease. The associated SNPs are then considered to mark a region of the human genome that may influence the risk of disease.
These studies require two groups of participants: people with the disease and similar people without. After obtaining samples from each participant their complete set of DNA is scanned into computers. The computers survey each participant's genome for selected markers of genetic variation, which are called ], or ].


GWA studies investigate the entire genome, in contrast to methods that specifically test a small number of pre-specified genetic regions. Hence, GWAS is a ''non-candidate-driven'' approach, in contrast to '']''. GWA studies identify SNPs and other variants in DNA associated with a disease, but they cannot on their own specify which genes are causal.<ref name="pmid20647212">{{cite journal | vauthors = Manolio TA | title = Genomewide association studies and assessment of the risk of disease | journal = The New England Journal of Medicine | volume = 363 | issue = 2 | pages = 166–76 | date = July 2010 | pmid = 20647212 | doi = 10.1056/NEJMra0905980 | doi-access = free }}</ref><ref name="pmid18349094">{{cite journal | vauthors = Pearson TA, Manolio TA | title = How to interpret a genome-wide association study | journal = JAMA | volume = 299 | issue = 11 | pages = 1335–44 | date = March 2008 | pmid = 18349094 | doi = 10.1001/jama.299.11.1335 }}</ref><ref>{{cite web |url=http://www.genome.gov/20019523 |title=Genome-Wide Association Studies |publisher = ] }}</ref>
If genetic variations are more frequent in people with the disease the variations are said to be "associated" with the disease. The associated genetic variations are then considered pointers to the region of the human genome where the disease-causing problem resides.


The first successful GWAS published in 2002 studied myocardial infarction.<ref>{{cite journal | vauthors = Ozaki K, Ohnishi Y, Iida A, Sekine A, Yamada R, Tsunoda T, Sato H, Sato H, Hori M, Nakamura Y, Tanaka T | display-authors = 6 | title = Functional SNPs in the lymphotoxin-alpha gene that are associated with susceptibility to myocardial infarction | journal = Nature Genetics | volume = 32 | issue = 4 | pages = 650–4 | date = December 2002 | pmid = 12426569 | doi = 10.1038/ng1047 | url = https://www.nature.com/articles/ng1047z | s2cid = 21414260 }}</ref> This study design was then implemented in the landmark GWA 2005 study investigating patients with ], and found two SNPs with significantly altered ] compared to healthy controls.<ref name="pmid15761122">{{cite journal | vauthors = Klein RJ, Zeiss C, Chew EY, Tsai JY, Sackler RS, Haynes C, Henning AK, SanGiovanni JP, Mane SM, Mayne ST, Bracken MB, Ferris FL, Ott J, Barnstable C, Hoh J | display-authors = 6 | title = Complement factor H polymorphism in age-related macular degeneration | journal = Science | volume = 308 | issue = 5720 | pages = 385–9 | date = April 2005 | pmid = 15761122 | pmc = 1512523 | doi = 10.1126/science.1109557 | bibcode = 2005Sci...308..385K }}</ref> {{As of|2017}}, over 3,000 human GWA studies have examined over 1,800 diseases and traits, and thousands of SNP associations have been found.<ref>{{cite web |url=http://www.ebi.ac.uk/gwas/downloads |title=GWAS Catalog: The NHGRI-EBI Catalog of published genome-wide association studies |work=European Molecular Biology Laboratory |access-date=2017-04-18 }}</ref> Except in the case of rare ]s, these associations are very weak, but while each individual association may not explain much of the risk, they provide insight into critical genes and pathways and can be important when considered ].
==Why are these a good idea?==


==Background==
Humans differ in genetic makeup by only 0.1%, but that small part of the genome contains the key differences that can determine a person’s susceptibility to disease. GWA Studies allow researchers to identify factors in many areas, including asthma, cancer, diabetes, heart disease and mental illness research and clinical care.
]
Any two ]s differ in millions of different ways. There are small variations in the individual nucleotides of the genomes (]) as well as many larger variations, such as ], ] and ]s. Any of these may cause alterations in an individual's traits, or ], which can be anything from disease risk to physical properties such as height.<ref name="Strachan">{{cite book | vauthors = Strachan T, Read A | title = Human Molecular Genetics | url = https://archive.org/details/humanmolecularge00stra_254 | url-access = limited | edition = 4th | publisher = Garland Science | pages = –495 | isbn = 978-0-8153-4149-9| year = 2011 }}</ref> Around the year 2000, prior to the introduction of GWA studies, the primary method of investigation was through inheritance studies of ] in families. This approach had proven highly useful towards ].<ref>{{cite web |url=http://www.omim.org |title=Online Mendelian Inheritance in Man |access-date=2011-12-06 |url-status=dead |archive-url=https://web.archive.org/web/20111205231931/http://www.omim.org/ |archive-date=5 December 2011}}</ref><ref name="Strachan" /><ref name="pmid11565063">{{cite journal | vauthors = Altmüller J, Palmer LJ, Fischer G, Scherb H, Wjst M | title = Genomewide scans of complex human diseases: true linkage is hard to find | journal = American Journal of Human Genetics | volume = 69 | issue = 5 | pages = 936–50 | date = November 2001 | pmid = 11565063 | pmc = 1274370 | doi = 10.1086/324069 }}</ref> However, for common and complex diseases the results of genetic linkage studies proved hard to reproduce.<ref name="Strachan" /><ref name="pmid11565063"/> A suggested alternative to linkage studies was the ] study. This study type asks if the ] of a ] is found more often than expected in individuals with the phenotype of interest (e.g. with the disease being studied). Early calculations on statistical power indicated that this approach could be better than linkage studies at detecting weak genetic effects.<ref name="pmid8801636">{{cite journal | vauthors = Risch N, Merikangas K | title = The future of genetic studies of complex human diseases | journal = Science | volume = 273 | issue = 5281 | pages = 1516–7 | date = September 1996 | pmid = 8801636 | doi = 10.1126/science.273.5281.1516 | bibcode = 1996Sci...273.1516R | s2cid = 5228523 }}</ref>


In addition to the conceptual framework several additional factors enabled the GWA studies. One was the advent of ], which are repositories of human genetic material that greatly reduced the cost and difficulty of collecting sufficient numbers of biological specimens for study.<ref name="pmid17550341">{{cite journal | vauthors = Greely HT | title = The uneasy ethical and legal underpinnings of large-scale genomic biobanks | journal = Annual Review of Genomics and Human Genetics | volume = 8 | pages = 343–64 | date = 2007 | pmid = 17550341 | doi = 10.1146/annurev.genom.7.080505.115721 | doi-access = free }}</ref> Another was the ], which, from 2003 identified a majority of the common SNPs interrogated in a GWA study.<ref name="pmid14685227">{{cite journal | vauthors = ((The International HapMap Project)), Gibbs RA, Belmont JW, Hardenbol P, Willis TD, Yu F, Yang H, Ch'Ang LY, Huang W | title = The International HapMap Project | journal = Nature | volume = 426 | issue = 6968 | pages = 789–96 | date = December 2003 | pmid = 14685227 | doi = 10.1038/nature02168 | url = https://deepblue.lib.umich.edu/bitstream/2027.42/62838/1/nature02168.pdf | hdl = 2027.42/62838 | bibcode = 2003Natur.426..789G | s2cid = 4387110 | hdl-access = free }}</ref> The ] identified by HapMap project also allowed the focus on the subset of SNPs that would describe most of the variation. Also the development of the methods to genotype all these SNPs using ] was an important prerequisite.<ref name="pmid7569999">{{cite journal | vauthors = Schena M, Shalon D, Davis RW, Brown PO | title = Quantitative monitoring of gene expression patterns with a complementary DNA microarray | journal = Science | volume = 270 | issue = 5235 | pages = 467–70 | date = October 1995 | pmid = 7569999 | doi = 10.1126/science.270.5235.467 | bibcode = 1995Sci...270..467S | s2cid = 6720459 }}</ref>
==What are the challenges?==


==Methods==
As people have migrated and married over generations, it has become more difficult to limit studies to biological data; for example, people with tuberculosis moving to Colorado might lead to conclusions that Colorado people are biologically inclined to Tuberculosis if correction for population stratification is not properly factored in.
] count of each measured SNP is evaluated—in this case with a ]—to identify variants ] with the trait in question. The numbers in this example are taken from a 2007 study of ] (CAD) that showed that the individuals with the G-allele of SNP1 (''rs1333049'') were overrepresented amongst CAD-patients.<ref name="WTCCC" />]]


]
==What is an example of a successful GWA Study?==


The most common approach of GWA studies is the ] setup, which compares two large groups of individuals, one healthy control group and one case group affected by a disease. All individuals in each group are typically genotyped at common known SNPs. The exact number of SNPs depends on the genotyping technology, but are typically one million or more.<ref name="pmid23300413">{{cite journal | vauthors = Bush WS, Moore JH | title = Chapter 11: Genome-wide association studies | journal = PLOS Computational Biology | volume = 8 | issue = 12 | pages = e1002822 | date = 2012 | pmid = 23300413 | pmc = 3531285 | doi = 10.1371/journal.pcbi.1002822 | veditors = Lewitter F, Kann M | bibcode = 2012PLSCB...8E2822B | doi-access = free }}</ref> For each of these SNPs it is then investigated if the ] is significantly altered between the case and the control group.<ref name="pmid21293453">{{cite journal | vauthors = Clarke GM, Anderson CA, Pettersson FH, Cardon LR, Morris AP, Zondervan KT | title = Basic statistical analysis in genetic case-control studies | journal = Nature Protocols | volume = 6 | issue = 2 | pages = 121–33 | date = February 2011 | pmid = 21293453 | pmc = 3154648 | doi = 10.1038/nprot.2010.182 }}</ref> In such setups, the fundamental unit for reporting effect sizes is the ]. The odds ratio is the ratio of two odds, which in the context of GWA studies are the odds of case for individuals having a specific allele and the odds of case for individuals who do not have that same allele.
In 2005 it was learned through GWA Studies that age-related macular degeneration is associated with variation in the gene for complement factor H, which produces a protein that regulates inflammation.


'''Example''': suppose that there are two alleles, T and C. The number of individuals in the case group having allele T is represented by 'A' and the number of individuals in the control group having allele T is represented by 'B'. Similarly, the number of individuals in the case group having allele C is represented by 'X' and the number of individuals in the control group having allele C is represented by 'Y'. In this case the odds ratio for allele T is A:B (meaning 'A to B', in standard odds terminology) divided by X:Y, which in mathematical notation is simply (A/B)/(X/Y).
In 2007 the Wellcome Trust Case-Control Consortium (WTCCC) carried out genome-wide association studies for the diseases coronary heart disease, type 1 diabetes, type 2 diabetes, rheumatoid arthritis, Crohn's disease, bipolar disorder and hypertension. This study was successful in uncovering many new disease genes underlying these diseases.


When the allele frequency in the case group is much higher than in the control group, the odds ratio is higher than 1, and vice versa for lower allele frequency. Additionally, a ] for the significance of the odds ratio is typically calculated using a simple ]. Finding odds ratios that are significantly different from 1 is the objective of the GWA study because this shows that a SNP is associated with disease.<ref name="pmid21293453" /> Because so many variants are tested, it is standard practice to require the p-value to be lower than {{val|5|e=-8}} to consider a variant significant.
==References==
*
*
*


'''Variations on the case-control approach'''. A common alternative to case-control GWA studies is the analysis of quantitative phenotypic data, e.g. height or ] concentrations or even ]. Likewise, alternative statistics designed for ] or ] penetrance patterns can be used.<ref name="pmid21293453" /> Calculations are typically done using ] such as SNPTEST and PLINK, which also include support for many of these alternative statistics.<ref name="WTCCC" /><ref name="pmid17701901">{{cite journal | vauthors = Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC | display-authors = 6 | title = PLINK: a tool set for whole-genome association and population-based linkage analyses | journal = American Journal of Human Genetics | volume = 81 | issue = 3 | pages = 559–75 | date = September 2007 | pmid = 17701901 | pmc = 1950838 | doi = 10.1086/519795 }}</ref> GWAS focuses on the effect of individual SNPs. However, it is also possible that complex interactions among two or more SNPs (]) might contribute to complex diseases. Due to the potentially exponential number of interactions, detecting statistically significant interactions in GWAS data is both computationally and statistically challenging. This task has been tackled in existing publications that use algorithms inspired from data mining.<ref>{{cite journal | vauthors = Llinares-López F, Grimm DG, Bodenham DA, Gieraths U, Sugiyama M, Rowan B, Borgwardt K | title = Genome-wide detection of intervals of genetic heterogeneity associated with complex traits | journal = Bioinformatics | volume = 31 | issue = 12 | pages = i240-9 | date = June 2015 | pmid = 26072488 | pmc = 4559912 | doi = 10.1093/bioinformatics/btv263 }}</ref> Moreover, the researchers try to integrate GWA data with other biological data such as ] to extract more informative results.<ref>{{cite journal | vauthors = Ayati M, Erten S, Chance MR, Koyutürk M | title = MOBAS: identification of disease-associated protein subnetworks using modularity-based scoring | journal = EURASIP Journal on Bioinformatics & Systems Biology | volume = 2015 | issue = 1 | pages = 7 | date = December 2015 | pmid = 28194175 | pmc = 5270451 | doi = 10.1186/s13637-015-0025-6 | doi-access = free }}</ref><ref>{{cite book |chapter=Assessing the Collective Disease Association of Multiple Genomic Loci |publisher=ACM |title=Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics |date=2015-01-01 |location=New York, NY, USA |isbn=978-1-4503-3853-0 |pages=376–385 |series=BCB '15 |doi=10.1145/2808719.2808758 | vauthors = Ayati M, Koyutürk M |s2cid=5942777 }}</ref> Despite the previously perceived challenge posed by the vast number of SNP combinations, a recent study has successfully unveiled complete epistatic maps at a gene-level resolution in plants/Arabidopsis thaliana<ref>Carré C, Carluer JB, Chaux C, Estoup-Streiff C, Roche N, Hosy E, Mas A, Krouk G (March, 2024). "Next-Gen GWAS: full 2D epistatic interaction maps retrieve part of missing heritability and improve phenotypic prediction". Genome biology. doi:10.1186/s13059-024-03202-0. <nowiki>PMID 38523316</nowiki>. S2CID 146570</ref>
]
]
]
]
A key step in the majority of GWA studies is the ] of genotypes at SNPs not on the genotype chip used in the study.<ref>{{cite journal | vauthors = Marchini J, Howie B | title = Genotype imputation for genome-wide association studies | journal = Nature Reviews Genetics | volume = 11 | issue = 7 | pages = 499–511 | date = July 2010 | pmid = 20517342 | doi = 10.1038/nrg2796 | s2cid = 1465707 }}</ref> This process greatly increases the number of SNPs that can be tested for association, increases the power of the study, and facilitates meta-analysis of GWAS across distinct cohorts. Genotype imputation is carried out by statistical methods that impute genotypic data to a set of reference panel of haplotypes, which typically have been densely genotyped using whole-genome sequencing. These methods take advantage of sharing of haplotypes between individuals over short stretches of sequence to impute alleles. Existing software packages for genotype imputation include IMPUTE2,<ref>{{cite journal | vauthors = Howie B, Marchini J, Stephens M | title = Genotype imputation with thousands of genomes | journal = G3 | volume = 1 | issue = 6 | pages = 457–70 | date = November 2011 | pmid = 22384356 | pmc = 3276165 | doi = 10.1534/g3.111.001198 }}</ref> Minimac, Beagle<ref>{{cite journal | vauthors = Browning BL, Browning SR | title = A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals | journal = American Journal of Human Genetics | volume = 84 | issue = 2 | pages = 210–23 | date = February 2009 | pmid = 19200528 | pmc = 2668004 | doi = 10.1016/j.ajhg.2009.01.005 }}</ref> and MaCH.<ref>{{cite journal | vauthors = Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR | title = MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes | journal = Genetic Epidemiology | volume = 34 | issue = 8 | pages = 816–34 | date = December 2010 | pmid = 21058334 | pmc = 3175618 | doi = 10.1002/gepi.20533 }}</ref>

In addition to the calculation of association, it is common to take into account any variables that could potentially ] the results. Sex, age, and ancestry are common examples of confounding variables. Moreover, it is also known that many genetic variations are associated with the geographical and historical populations in which the mutations first arose.<ref name="pmid18758442">{{cite journal | vauthors = Novembre J, Johnson T, Bryc K, Kutalik Z, Boyko AR, Auton A, Indap A, King KS, Bergmann S, Nelson MR, Stephens M, Bustamante CD | title = Genes mirror geography within Europe | journal = Nature | volume = 456 | issue = 7218 | pages = 98–101 | date = November 2008 | pmid = 18758442 | pmc = 2735096 | doi = 10.1038/nature07331 | bibcode = 2008Natur.456...98N }}</ref> Because of this association, studies must take account of the geographic and ethnic background of participants by controlling for what is called ]. If they did not do so, the studies could produce false positive results.<ref>{{cite journal | vauthors = Charney E | title = Genes, behavior, and behavior genetics | journal = Wiley Interdisciplinary Reviews. Cognitive Science | volume = 8 | issue = 1–2 | pages = e1405 | date = January 2017 | pmid = 27906529 | doi = 10.1002/wcs.1405 | hdl-access = free | hdl = 10161/13337 }}</ref>

After odds ratios and ]s have been calculated for all SNPs, a common approach is to create a ]. In the context of GWA studies, this plot shows the negative logarithm of the ] as a function of genomic location. Thus the SNPs with the most significant association stand out on the plot, usually as stacks of points because of haploblock structure. Importantly, the P-value threshold for significance is corrected for ] issues. The exact threshold varies by study,<ref name="pmid24473445">{{cite journal | vauthors = Wittkowski KM, Sonakya V, Bigio B, Tonn MK, Shic F, Ascano M, Nasca C, Gold-Von Simson G | title = A novel computational biostatistics approach implies impaired dephosphorylation of growth factor receptors as associated with severity of autism | journal = Translational Psychiatry | volume = 4 | issue = 1 | pages = e354 | date = January 2014 | pmid = 24473445 | pmc = 3905234 | doi = 10.1038/tp.2013.124 }}</ref> but the conventional ] threshold is {{val|5|e=-8}} to be significant in the face of hundreds of thousands to millions of tested SNPs.<ref name="pmid23300413" /><ref name="pmid21293453"/><ref>{{cite journal | vauthors = Barsh GS, Copenhaver GP, Gibson G, Williams SM | title = Guidelines for genome-wide association studies | journal = PLOS Genetics | volume = 8 | issue = 7 | pages = e1002812 | date = July 2012 | pmid = 22792080 | pmc = 3390399 | doi = 10.1371/journal.pgen.1002812 | doi-access = free }}<!--| access-date = 21 August 2014 --></ref> GWA studies typically perform the first analysis in a discovery cohort, followed by validation of the most significant SNPs in an independent validation cohort.<ref name="pmid33875891">{{cite journal| vauthors = Smith SM, Douaud G, Chen W, Hanayik T, Alfaro-Almagro F, Sharp K, Elliott LT | title=An expanded set of genome-wide association studies of brain imaging phenotypes in UK Biobank. | journal=Nat Neurosci | year= 2021 | volume= 24| issue= 5| pages= 737–745| pmid=33875891 | doi=10.1038/s41593-021-00826-4 | pmc= 7610742| doi-access=free }}</ref>

==Results==
] region and their association to ] levels. This type of plot is similar to the Manhattan plot in the lead section, but for a more limited section of the genome. The ] is visualized with colour scale and the ] level is given by the left Y-axis. The dot representing the rs73015013 SNP (in the top-middle) has a high Y-axis location because this SNP explains some of the variation in LDL-cholesterol.<ref name="pmid21829380">{{cite journal | vauthors = Sanna S, Li B, Mulas A, Sidore C, Kang HM, Jackson AU, Piras MG, Usala G, Maninchedda G, Sassu A, Serra F, Palmas MA, Wood WH, Njølstad I, Laakso M, Hveem K, Tuomilehto J, Lakka TA, Rauramaa R, Boehnke M, Cucca F, Uda M, Schlessinger D, Nagaraja R, Abecasis GR | display-authors = 6 | title = Fine mapping of five loci associated with low-density lipoprotein cholesterol detects variants that double the explained heritability | journal = PLOS Genetics | volume = 7 | issue = 7 | pages = e1002198 | date = July 2011 | pmid = 21829380 | pmc = 3145627 | doi = 10.1371/journal.pgen.1002198 | veditors = Gibson G | doi-access = free }}</ref>|450x450px]]

]

Attempts have been made at creating comprehensive catalogues of SNPs that have been identified from GWA studies.<ref name="pmid19474294">{{cite journal | vauthors = Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA | title = Potential etiologic and functional implications of genome-wide association loci for human diseases and traits | journal = Proceedings of the National Academy of Sciences of the United States of America | volume = 106 | issue = 23 | pages = 9362–7 | date = June 2009 | pmid = 19474294 | pmc = 2687147 | doi = 10.1073/pnas.0903103106 | bibcode = 2009PNAS..106.9362H | doi-access = free }}</ref> As of 2009, SNPs associated with diseases are numbered in the thousands.<ref name="pmid19161620">{{cite journal | vauthors = Johnson AD, O'Donnell CJ | title = An open access database of genome-wide association results | journal = BMC Medical Genetics | volume = 10 | pages = 6 | date = January 2009 | pmid = 19161620 | pmc = 2639349 | doi = 10.1186/1471-2350-10-6 | doi-access = free }}</ref>

The first GWA study, conducted in 2005, compared 96 patients with ''']''' (ARMD) with 50 healthy controls.<ref name="pmid15761120">{{cite journal | vauthors = Haines JL, Hauser MA, Schmidt S, Scott WK, Olson LM, Gallins P, Spencer KL, Kwan SY, Noureddine M, Gilbert JR, Schnetz-Boutaud N, Agarwal A, Postel EA, Pericak-Vance MA | title = Complement factor H variant increases the risk of age-related macular degeneration | journal = Science | volume = 308 | issue = 5720 | pages = 419–21 | date = April 2005 | pmid = 15761120 | doi = 10.1126/science.1110359 | bibcode = 2005Sci...308..419H | s2cid = 32716116 | doi-access = free }}</ref> It identified two SNPs with significantly altered allele frequency between the two groups. These SNPs were located in the gene encoding ], which was an unexpected finding in the research of ARMD. The findings from these first GWA studies have subsequently prompted further functional research towards therapeutical manipulation of the complement system in ARMD.<ref name="pmid21860027">{{cite journal | vauthors = Fridkis-Hareli M, Storek M, Mazsaroff I, Risitano AM, Lundberg AS, Horvath CJ, Holers VM | title = Design and development of TT30, a novel C3d-targeted C3/C5 convertase inhibitor for treatment of human complement alternative pathway-mediated diseases | journal = Blood | volume = 118 | issue = 17 | pages = 4705–13 | date = October 2011 | pmid = 21860027 | pmc = 3208285 | doi = 10.1182/blood-2011-06-359646 }}</ref>

Another landmark publication in the history of GWA studies was the ''']''' (WTCCC) study, the largest GWA study ever conducted at the time of its publication in 2007. The WTCCC included 14,000 cases of seven common diseases (~2,000 individuals for each of ], ], ], ], ], ], and ]) and 3,000 shared controls.<ref name="WTCCC">{{cite journal | author = Wellcome Trust Case Control Consortium, Burton PR | title = Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls | journal = Nature | volume = 447 | issue = 7145 | pages = 661–78 | date = June 2007 | pmid = 17554300 | pmc = 2719288 | doi = 10.1038/nature05911 | bibcode = 2007Natur.447..661B }}</ref> This study was successful in uncovering many genes associated with these diseases.<ref name="WTCCC" /><ref>{{cite press release |url=http://www.wtccc.org.uk/info/070606.shtml |title=Largest ever study of genetics of common diseases published today |publisher=Wellcome Trust Case Control Consortium |date=2007-06-06 |access-date=2008-06-19 |archive-date=4 June 2008 |archive-url=https://web.archive.org/web/20080604120405/http://www.wtccc.org.uk/info/070606.shtml |url-status=dead }}</ref>

Since these first landmark GWA studies, there have been two general trends.<ref name="pmid19373277">{{cite journal | vauthors = Ioannidis JP, Thomas G, Daly MJ | title = Validating, augmenting and refining genome-wide association signals | journal = Nature Reviews Genetics | volume = 10 | issue = 5 | pages = 318–29 | date = May 2009 | pmid = 19373277 | doi = 10.1038/nrg2544 | s2cid = 6463743 | pmc = 7877552 }}</ref> One has been towards larger and larger sample sizes. In 2018, several genome-wide association studies are reaching a total sample size of over 1 million participants, including 1.1 million in a genome-wide study of ]<ref>{{cite journal | vauthors = Lee JJ, Wedow R, Okbay A, Kong E, Maghzian O, Zacher M, Nguyen-Viet TA, Bowers P, Sidorenko J, Karlsson Linnér R, etal | title = Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals | journal = Nature Genetics | volume = 50 | issue = 8 | pages = 1112–1121 | date = July 2018 | pmid = 30038396 | pmc = 6393768 | doi = 10.1038/s41588-018-0147-3 | url = http://man.ac.uk/C0BbE7 }}</ref> follow by another in 2022 with 3 million individuals<ref>{{cite journal | vauthors = Okbay A, Wu Y, Wang N, Jayashankar H, Bennett M, Nehzati SM, Sidorenko J, Kweon H, Goldman G, Gjorgjieva T, Jiang Y, Hicks B, Tian C, Hinds DA, Ahlskog R, Magnusson PK, Oskarsson S, Hayward C, Campbell A, Porteous DJ, Freese J, Herd P, Watson C, Jala J, Conley D, Koellinger PD, Johannesson M, Laibson D, Meyer MN, Lee JJ, Kong A, Yengo L, Cesarini D, Turley P, Visscher PM, Beauchamp JP, Benjamin DJ, Young AI | display-authors = 6 | title = Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals | journal = Nature Genetics | volume = 54 | issue = 4 | pages = 437–449 | date = April 2022 | pmid = 35361970 | doi = 10.1038/s41588-022-01016-z | pmc = 9005349 | hdl = 11368/3026010 | hdl-access = free }}</ref> and a study of ] containing 1.3 million individuals.<ref>{{cite journal | vauthors = Jansen PR, Watanabe K, Stringer S, Skene N, Bryois J, Hammerschlag AR, de Leeuw CA, Benjamins JS, Muñoz-Manchado AB, Nagel M, Savage JE, Tiemeier H, White T, Tung JY, Hinds DA, Vacic V, Wang X, Sullivan PF, van der Sluis S, Polderman TJ, Smit AB, Hjerling-Leffler J, Van Someren EJ, Posthuma D | display-authors = 6 | title = Genome-wide analysis of insomnia in 1,331,010 individuals identifies new risk loci and functional pathways | journal = Nature Genetics | volume = 51 | issue = 3 | pages = 394–403 | date = March 2019 | pmid = 30804565 | doi = 10.1038/s41588-018-0333-3|biorxiv=10.1101/214973 | doi-access = free | hdl = 1871.1/08af5d9e-8621-41f1-97c5-e77a1063495f | hdl-access = free }}</ref> The reason is the drive towards reliably detecting risk-SNPs that have smaller ]s and lower allele frequency. Another trend has been towards the use of more narrowly defined phenotypes, such as ], ] or similar biomarkers.<ref name="Kathiresan_2009" /><ref name="Strawbridge_2011" /> These are called ''intermediate phenotypes'', and their analyses may be of value to functional research into biomarkers.<ref name="pmid19901186">{{cite journal | vauthors = Danesh J, Pepys MB | title = C-reactive protein and coronary disease: is there a causal link? | journal = Circulation | volume = 120 | issue = 21 | pages = 2036–9 | date = November 2009 | pmid = 19901186 | doi = 10.1161/CIRCULATIONAHA.109.907212 | doi-access = free }}</ref>

A variation of GWAS uses participants that are first-degree ''relatives'' of people with a disease. This type of study has been named '''genome-wide association study by proxy''' (''GWAX'').<ref>{{cite journal | vauthors = Liu JZ, Erlich Y, Pickrell JK | title = Case-control association mapping by proxy using family history of disease | journal = Nature Genetics | volume = 49 | issue = 3 | pages = 325–331 | date = March 2017 | pmid = 28092683 | doi = 10.1038/ng.3766 | s2cid = 5598845 }}</ref>

A central point of debate on GWA studies has been that most of the SNP variations found by GWA studies are associated with only a small increased risk of the disease, and have only a small predictive value. The median odds ratio is 1.33 per risk-SNP, with only a few showing odds ratios above 3.0.<ref name="pmid20647212" /><ref name="pmid20300123">{{cite journal | vauthors = Ku CS, Loy EY, Pawitan Y, Chia KS | title = The pursuit of genome-wide association studies: where are we now? | journal = Journal of Human Genetics | volume = 55 | issue = 4 | pages = 195–206 | date = April 2010 | pmid = 20300123 | doi = 10.1038/jhg.2010.19 | doi-access = free }}</ref> These magnitudes are considered small because they do not explain much of the heritable variation. This ] variation is estimated from heritability studies based on ] twins.<ref name="pmid18987709">{{cite journal | vauthors = Maher B | title = Personal genomes: The case of the missing heritability | journal = Nature | volume = 456 | issue = 7218 | pages = 18–21 | date = November 2008 | pmid = 18987709 | doi = 10.1038/456018a | doi-access = free }}</ref> For example, it is known that 40% of variance in depression can be explained by hereditary differences, but GWA studies only account for a minority of this variance.<ref name="pmid18987709"/>

==Clinical applications and examples==
A challenge for future successful GWA study is to apply the findings in a way that accelerates ] and diagnostics development, including better integration of genetic studies into the drug-development process and a focus on the role of genetic variation in maintaining health as a blueprint for designing new ] and ].<ref name="pmid19759611">{{cite journal | vauthors = Iadonato SP, Katze MG | title = Genomics: Hepatitis C virus gets personal | journal = Nature | volume = 461 | issue = 7262 | pages = 357–8 | date = September 2009 | pmid = 19759611 | doi = 10.1038/461357a | bibcode = 2009Natur.461..357I | s2cid = 7602652 | doi-access = free }}{{closed access}}</ref> Several studies have looked into the use of risk-SNP markers as a means of directly improving the accuracy of ]. Some have found that the accuracy of prognosis improves,<ref name="pmid20837927">{{cite journal | vauthors = Muehlschlegel JD, Liu KY, Perry TE, Fox AA, Collard CD, Shernan SK, Body SC | title = Chromosome 9p21 variant predicts mortality after coronary artery bypass graft surgery | journal = Circulation | volume = 122 | issue = 11 Suppl | pages = S60–5 | date = September 2010 | pmid = 20837927 | pmc = 2943860 | doi = 10.1161/CIRCULATIONAHA.109.924233 }}</ref> while others report only minor benefits from this use.<ref name="pmid20159871">{{cite journal | vauthors = Paynter NP, Chasman DI, Paré G, Buring JE, Cook NR, Miletich JP, Ridker PM | title = Association between a literature-based genetic risk score and cardiovascular events in women | journal = JAMA | volume = 303 | issue = 7 | pages = 631–7 | date = February 2010 | pmid = 20159871 | pmc = 2845522 | doi = 10.1001/jama.2010.119 }}</ref> Generally, a problem with this direct approach is the small magnitudes of the effects observed. A small effect ultimately translates into a poor separation of cases and controls and thus only a small improvement of prognosis accuracy. An alternative application is therefore the potential for GWA studies to elucidate ].<ref name="pmid20522751">{{cite journal | vauthors = Couzin-Frankel J | title = Major heart disease genes prove elusive | journal = Science | volume = 328 | issue = 5983 | pages = 1220–1 | date = June 2010 | pmid = 20522751 | doi = 10.1126/science.328.5983.1220 | bibcode = 2010Sci...328.1220C }}{{closed access}}</ref>

=== Hepatitis C treatment ===
One such success is related to identifying the genetic variant associated with response to anti-] virus treatment. For genotype 1 hepatitis C treated with ] or ] combined with ], a GWA study<ref name="pmid19684573">{{cite journal | vauthors = Ge D, Fellay J, Thompson AJ, Simon JS, Shianna KV, Urban TJ, Heinzen EL, Qiu P, Bertelsen AH, Muir AJ, Sulkowski M, McHutchison JG, Goldstein DB | title = Genetic variation in IL28B predicts hepatitis C treatment-induced viral clearance | journal = Nature | volume = 461 | issue = 7262 | pages = 399–401 | date = September 2009 | pmid = 19684573 | doi = 10.1038/nature08309 | bibcode = 2009Natur.461..399G | s2cid = 1707096 }}</ref> has shown that SNPs near the human ] gene, encoding interferon lambda 3, are associated with significant differences in response to the treatment. A later report demonstrated that the same genetic variants are also associated with the natural clearance of the genotype 1 hepatitis C virus.<ref name="pmid19759533">{{cite journal | vauthors = Thomas DL, Thio CL, Martin MP, Qi Y, Ge D, O'Huigin C, Kidd J, Kidd K, Khakoo SI, Alexander G, Goedert JJ, Kirk GD, Donfield SM, Rosen HR, Tobler LH, Busch MP, McHutchison JG, Goldstein DB, Carrington M | title = Genetic variation in IL28B and spontaneous clearance of hepatitis C virus | journal = Nature | volume = 461 | issue = 7265 | pages = 798–801 | date = October 2009 | pmid = 19759533 | pmc = 3172006 | doi = 10.1038/nature08463 | bibcode = 2009Natur.461..798T }}</ref> These major findings facilitated the development of personalized medicine and allowed physicians to customize medical decisions based on the patient's genotype.<ref>{{cite journal | vauthors = Lu YF, Goldstein DB, Angrist M, Cavalleri G | title = Personalized medicine and human genetic diversity | journal = Cold Spring Harbor Perspectives in Medicine | volume = 4 | issue = 9 | pages = a008581 | date = July 2014 | pmid = 25059740 | pmc = 4143101 | doi = 10.1101/cshperspect.a008581 }}</ref>

=== eQTL, LDL and cardiovascular disease ===
The goal of elucidating pathophysiology has also led to increased interest in the association between risk-SNPs and the ] of nearby genes, the so-called ] (eQTL) studies.<ref name="pmid20562444">{{cite journal | vauthors = Folkersen L, van't Hooft F, Chernogubova E, Agardh HE, Hansson GK, Hedin U, Liska J, Syvänen AC, Paulsson-Berne G, Paulssson-Berne G, Franco-Cereceda A, Hamsten A, Gabrielsen A, Eriksson P | title = Association of genetic risk variants with expression of proximal genes identifies novel susceptibility genes for cardiovascular disease | journal = Circulation: Cardiovascular Genetics | volume = 3 | issue = 4 | pages = 365–73 | date = August 2010 | pmid = 20562444 | doi = 10.1161/CIRCGENETICS.110.948935 | doi-access = free }}</ref> The reason is that GWAS studies identify risk-SNPs, but not risk-genes, and specification of genes is one step closer towards actionable ]. As a result, major GWA studies by 2011 typically included extensive eQTL analysis.<ref name="Bown_2011" /><ref name="pmid21378988">{{cite journal | title = A genome-wide association study in Europeans and South Asians identifies five new loci for coronary artery disease | journal = Nature Genetics | volume = 43 | issue = 4 | pages = 339–44 | date = March 2011 | pmid = 21378988 | doi = 10.1038/ng.782 | author1 = Coronary Artery Disease (C4D) Genetics Consortium | s2cid = 39712343 | url = https://zenodo.org/record/3443929 }}{{Dead link|date=March 2024 |bot=InternetArchiveBot |fix-attempted=yes }}{{closed access}}</ref><ref name="Johnson_2011" /> One of the strongest eQTL effects observed for a GWA-identified risk SNP is the SORT1 locus.<ref name="Kathiresan_2009"/> Functional follow up studies of this locus using ] and ] have shed light on the metabolism of ]s, which have important clinical implications for ].<ref name="Kathiresan_2009"/><ref name="pmid21462369">{{cite journal | vauthors = Dubé JB, Johansen CT, Hegele RA | title = Sortilin: an unusual suspect in cholesterol metabolism: from GWAS identification to in vivo biochemical analyses, sortilin has been identified as a novel mediator of human lipoprotein metabolism | journal = BioEssays | volume = 33 | issue = 6 | pages = 430–7 | date = June 2011 | pmid = 21462369 | doi = 10.1002/bies.201100003 }}{{closed access}}</ref><ref name="pmid21311327">{{cite journal | vauthors = Bauer RC, Stylianou IM, Rader DJ | title = Functional validation of new pathways in lipoprotein metabolism identified by human genetics | journal = Current Opinion in Lipidology | volume = 22 | issue = 2 | pages = 123–8 | date = April 2011 | pmid = 21311327 | doi = 10.1097/MOL.0b013e32834469b3 | s2cid = 24020035 }}{{closed access}}</ref>

=== Atrial fibrillation ===
For example, a ] accomplished in 2018 revealed the discovery of 70 new loci associated with ]. It has been identified different variants associated with ] coding-genes, such as ] and ], ] o ], which are involved in cardiac conduction regulation, in ] modulation and cardiac development. It was also identified new genes involved in ] (]) or associated with alteration of ] communication (]).<ref name="Roselli">{{cite journal |vauthors= Roselli C, Chafin M, Weng L|title=Multi-ethnic genome-wide association study for atrial fibrillation.|journal= Nature Genetics|volume= 50|issue=9|pages=1225–1233|year=2018|pmid=29892015|pmc=6136836|doi=10.1038/s41588-018-0133-9}}</ref>

=== Schizophrenia ===
Research using a High-Precision Protein Interaction Prediction (HiPPIP) computational model that discovered 504 new ] (PPIs) associated with genes linked to ].<ref>{{cite journal | vauthors = Ganapathiraju MK, Thahir M, Handen A, Sarkar SN, Sweet RA, Nimgaonkar VL, Loscher CE, Bauer EM, Chaparala S | title = Schizophrenia interactome with 504 novel protein-protein interactions | journal = npj Schizophrenia | volume = 2 | pages = 16012 | date = 2016-04-27 | pmid = 27336055 | pmc = 4898894 | doi = 10.1038/npjschz.2016.12}}</ref><ref name="psychcentral.com 2016">{{cite web | title=New Schizophrenia Study Focuses on Protein-Protein Interactions | website=psychcentral.com | date=May 3, 2016 | url=https://psychcentral.com/news/2016/05/03/new-schizophrenia-study-focuses-on-protein-protein-interactions/102733.html | archive-url=https://web.archive.org/web/20200111000732/https://psychcentral.com/news/2016/05/03/new-schizophrenia-study-focuses-on-protein-protein-interactions/102733.html | archive-date=January 11, 2020 | url-status=dead | access-date=April 22, 2023}}</ref><ref>{{cite journal | vauthors = Ganapathiraju M, Chaparala S, Lo C | title = F200. Elucidating The Role of Cilia in Neuropsychiatric Diseases Through Interactome Analysis. | journal = Schizophrenia Bulletin | date = April 2018 | volume = 44 | issue = suppl_1 | pages = S298-9 | doi = 10.1093/schbul/sby017.731 | pmc = 5887623 }}</ref> While the evidence supporting the genetic basis of schizophrenia is not controversial, one study found that 25 candidate schizophrenia genes discovered from GWAS had little association with schizophrenia, demonstrating that GWAS alone may be insufficient to identify candidate genes.<ref>{{cite journal | vauthors = Johnson EC, Border R, Melroy-Greif WE, de Leeuw CA, Ehringer MA, Keller MC | title = No Evidence That Schizophrenia Candidate Genes Are More Associated With Schizophrenia Than Noncandidate Genes | journal = Biological Psychiatry | volume = 82 | issue = 10 | pages = 702–708 | date = November 2017 | pmid = 28823710 | pmc = 5643230 | doi = 10.1016/j.biopsych.2017.06.033 }}</ref>

=== Pain susceptibility ===
Another group of researchers conducted a joint analysis of GWAS summary statistics from seventeen ] susceptibility traits in the ] and revealed 99 genome-wide significant risk loci, among which 34 loci were new. Also, with leave-one-trait-out meta-analyses these loci were grouped in four categories: Loci associated with nearly all pain-related traits, associated with a single trait, associated with multiple forms of skeletomuscular pain and with headache-related pain.

Moreover, 664 genes were mapped to the 99 loci by genomic proximity, ] and chromatin interaction, where 15% of these genes showed differential expression in individuals with acute or ] compared to healthy individuals.<ref>{{Cite journal |last1=Mocci |first1=Evelina |last2=Ward |first2=Kathryn |last3=Perry |first3=James A. |last4=Starkweather |first4=Angela |last5=Stone |first5=Laura S. |last6=Schabrun |first6=Siobhan M. |last7=Renn |first7=Cynthia |last8=Dorsey |first8=Susan G. |last9=Ament |first9=Seth A. |date=2023 |title=Genome wide association joint analysis reveals 99 risk loci for pain susceptibility and pleiotropic relationships with psychiatric, metabolic, and immunological traits |journal=PLOS Genetics |volume=19 |issue=10 |pages=e1010977 |doi=10.1371/journal.pgen.1010977 |doi-access=free |issn=1553-7404 |pmid=37844115|pmc=10602383 }}</ref>

==Conservation applications==
] level GWA studies may be used to identify ] genes to help evaluate ability of species to adapt to changing environmental conditions as the global ].<ref>{{cite journal | vauthors = Willi Y, Kristensen TN, Sgrò CM, Weeks AR, Ørsted M, Hoffmann AA | title = Conservation genetics as a management tool: The five best-supported paradigms to assist the management of threatened species | journal = Proceedings of the National Academy of Sciences of the United States of America | volume = 119 | issue = 1 | pages = e2105076119 | date = January 2022 | pmid = 34930821 | doi = 10.1073/pnas.2105076119 | doi-access = free | pmc = 8740573 | bibcode = 2022PNAS..11905076W }}</ref> This could help determine ] risk for species and could therefore be an important tool for ] planning. Utilizing GWA studies to determine adaptive genes could help elucidate the relationship between neutral and adaptive ].

==Agricultural applications==
=== Plant growth stages and yield components ===
GWA studies act as an important tool in plant breeding. With large genotyping and phenotyping data, GWAS are powerful in analyzing complex inheritance modes of traits that are important yield components such as number of grains per spike, weight of each grain and plant structure. In a study on GWAS in spring wheat, GWAS have revealed a strong correlation of grain production with booting data, biomass and number of grains per spike.<ref>{{cite journal | vauthors = Turuspekov Y, Baibulatova A, Yermekbayev K, Tokhetova L, Chudinov V, Sereda G, Ganal M, Griffiths S, Abugalieva S | display-authors = 6 | title = GWAS for plant growth stages and yield components in spring wheat (Triticum aestivum L.) harvested in three regions of Kazakhstan | journal = BMC Plant Biology | volume = 17 | issue = Suppl 1 | pages = 190 | date = November 2017 | pmid = 29143598 | pmc = 5688510 | doi = 10.1186/s12870-017-1131-2 | doi-access = free | bibcode = 2017BMCPB..17S.190T }}</ref> GWA study is also a success in study genetic architecture of complex traits in rice.<ref>{{cite journal | vauthors = Zhao K, Tung CW, Eizenga GC, Wright MH, Ali ML, Price AH, Norton GJ, Islam MR, Reynolds A, Mezey J, McClung AM, Bustamante CD, McCouch SR | display-authors = 6 | title = Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa | journal = Nature Communications | volume = 2 | issue = 1 | pages = 467 | date = September 2011 | pmid = 21915109 | pmc = 3195253 | doi = 10.1038/ncomms1467 | bibcode = 2011NatCo...2..467Z }}</ref>

=== Plant pathogens ===
The emergences of ]s have posed serious threats to plant health and biodiversity. Under this consideration, identification of wild types that have the natural resistance to certain pathogens could be of vital importance. Furthermore, we need to predict which ]s are associated with the resistance. GWA studies is a powerful tool to detect the relationships of certain variants and the resistance to the ], which is beneficial for developing new pathogen-resisted cultivars.<ref>{{cite journal | vauthors = Bartoli C, Roux F | title = Genome-Wide Association Studies In Plant Pathosystems: Toward an Ecological Genomics Approach | language = en | journal = Frontiers in Plant Science | volume = 8 | pages = 763 | date = 2017 | pmid = 28588588 | pmc = 5441063 | doi = 10.3389/fpls.2017.00763 | doi-access = free }}</ref>

=== Chicken ===
The first GWA study in chickens was done by Abasht and Lamont <ref>{{cite journal | vauthors = Abasht B, Lamont SJ | title = Genome-wide association analysis reveals cryptic alleles as an important factor in heterosis for fatness in chicken F2 population | journal = Animal Genetics | volume = 38 | issue = 5 | pages = 491–498 | date = October 2007 | pmid = 17894563 | doi = 10.1111/j.1365-2052.2007.01642.x }}</ref> in 2007. This GWA was used to study the fatness trait in F2 population found previously. Significantly related SNPs were found are on 10 chromosomes (1, 2, 3, 4, 7, 8, 10, 12, 15 and 27).

==Limitations==
GWA studies have several issues and limitations that can be taken care of through proper quality control and study setup. Lack of well defined case and control groups, insufficient sample size, control for ] are common problems.<ref name="pmid18349094" /> On the statistical issue of multiple testing, it has been noted that "the GWA approach can be problematic because the massive number of statistical tests performed presents an unprecedented potential for ] results".<ref name="pmid18349094"/> This is why all modern GWAS use a ] In addition to easily correctible problems such as these, some more subtle but important issues have surfaced. A high-profile GWA study that investigated individuals with very long life spans to identify SNPs associated with longevity is an example of this.<ref name="pmid20595579">{{cite journal | vauthors = Sebastiani P, Solovieff N, Puca A, Hartley SW, Melista E, Andersen S, Dworkis DA, Wilk JB, Myers RH, Steinberg MH, Montano M, Baldwin CT, Perls TT | title = Genetic signatures of exceptional longevity in humans | journal = Science | volume = 2010 | date = July 2010 | issue = 5987 | pmid = 20595579 | doi = 10.1126/science.1190532 | url = https://escholarship.org/uc/item/9d16w2s0 }}{{Retracted|doi=10.1126/science.333.6041.404-a|pmid=21778381|http://retractionwatch.com/2010/11/11/science-plays-two-a-retraction-and-concern-issued-about-genetics-papers/ ''Retraction Watch''|http://retractionwatch.com/2011/07/21/sebastiani-group-retracts-genetics-of-aging-study-from-science/ ''Retraction Watch''|http://retractionwatch.com/2012/01/18/sebastiani-and-perls-longevity-work-finds-a-new-home-in-plos-one-following-science-retraction/ ''Retraction Watch''|intentional=yes}}{{closed access}}</ref> The publication came under scrutiny because of a discrepancy between the type of ] in the case and control group, which caused several SNPs to be falsely highlighted as associated with longevity.<ref name=MacArthur>{{Cite magazine | title = Serious flaws revealed in "longevity genes" study | url = https://www.wired.com/wiredscience/2010/07/serious-flaws-revealed-in-longevity-genes-study/ | vauthors = MacArthur D | magazine = Wired | access-date = 2011-12-07 | date = 2010-07-08 }}</ref> The study was subsequently ],<ref name="pmid21778381">{{cite journal | vauthors = Sebastiani P, Solovieff N, Puca A, Hartley SW, Melista E, Andersen S, Dworkis DA, Wilk JB, Myers RH, Steinberg MH, Montano M, Baldwin CT, Perls TT | title = Retraction | journal = Science | volume = 333 | issue = 6041 | pages = 404 | date = July 2011 | pmid = 21778381 | doi = 10.1126/science.333.6041.404-a }}{{closed access}}</ref> but a modified manuscript was later published.<ref>{{cite journal | vauthors = Sebastiani P, Solovieff N, Dewan AT, Walsh KM, Puca A, Hartley SW, Melista E, Andersen S, Dworkis DA, Wilk JB, Myers RH, Steinberg MH, Montano M, Baldwin CT, Hoh J, Perls TT | title = Genetic signatures of exceptional longevity in humans | journal = PLOS ONE| volume = 7 | issue = 1 | pages = e29848 | date = 2012-01-18 | pmid = 22279548 | pmc = 3261167 | doi = 10.1371/journal.pone.0029848 | doi-access = free | bibcode = 2012PLoSO...729848S }}</ref> Now, many GWAS control for genotyping array. If there are substantial differences between groups on the type of genotyping array, as with any confounder, GWA studies could result in a false positive. Another consequence is that such studies are unable to detect the contribution of very rare mutations not included in the array or able to be imputed.<ref>{{Cite journal |vauthors=Tam V, Patel N, Turcotte M, Bossé Y, Paré G, Meyre D |date=May 2019 |title=Benefits and limitations of genome-wide association studies |url=http://www.nature.com/articles/s41576-019-0127-1 |journal=Nature Reviews Genetics |language=en |volume=20 |issue=8 |pages=467–484 |doi=10.1038/s41576-019-0127-1 |issn=1471-0056 |pmid=31068683|s2cid=148570302 }}</ref>

Additionally, GWA studies identify candidate risk variants for the population from which their analysis is performed, and with most GWA studies historically stemming from European databases, there is a lack of translation of the identified risk variants to other non-European populations.<ref>{{cite journal | vauthors = Rosenberg NA, Huang L, Jewett EM, Szpiech ZA, Jankovic I, Boehnke M | title = Genome-wide association studies in diverse populations | journal = Nature Reviews Genetics | volume = 11 | issue = 5 | pages = 356–66 | date = May 2010 | pmid = 20395969 | pmc = 3079573 | doi = 10.1038/nrg2760 }}</ref> For instance, GWA studies for diseases like ] have been conducted primarily in Caucasian populations, which does not give adequate insight in other ethnic populations, including African Americans or ]. Alternative strategies suggested involve ].<ref>{{cite journal | vauthors = Sham PC, Cherny SS, Purcell S, Hewitt JK | title = Power of linkage versus association analysis of quantitative traits, by use of variance-components models, for sibship data | journal = American Journal of Human Genetics | volume = 66 | issue = 5 | pages = 1616–30 | date = May 2000 | pmid = 10762547 | pmc = 1378020 | doi = 10.1086/302891 }}</ref><ref>{{cite book | vauthors = Borecki IB |chapter=Linkage and Association Studies|date=2006 |work=eLS|publisher=John Wiley & Sons, Ltd. |doi=10.1038/npg.els.0005483|isbn=9780470015902 |title=Encyclopedia of Life Sciences }}</ref> More recently, the rapidly decreasing price of complete genome ] have also provided a realistic alternative to ]-based GWA studies. High-throughput sequencing does have potential to side-step some of the shortcomings of non-sequencing GWA.<ref name="pmid21670730">{{cite journal | vauthors = Visscher PM, Goddard ME, Derks EM, Wray NR | title = Evidence-based psychiatric genetics, AKA the false dichotomy between common and rare variant hypotheses | journal = Molecular Psychiatry | volume = 17 | issue = 5 | pages = 474–85 | date = May 2012 | pmid = 21670730 | doi = 10.1038/mp.2011.65 | author-link2 = Michael E. Goddard | doi-access = free }}{{closed access}}</ref> Cross-trait ] can inflate estimates of genetic phenotype similarity.<ref>{{Cite journal |last1=Border |first1=Richard |last2=Athanasiadis |first2=Georgios |last3=Buil |first3=Alfonso |last4=Schork |first4=Andrew J. |last5=Cai |first5=Na |last6=Young |first6=Alexander I. |last7=Werge |first7=Thomas |last8=Flint |first8=Jonathan |last9=Kendler |first9=Kenneth S. |last10=Sankararaman |first10=Sriram |last11=Dahl |first11=Andy W. |last12=Zaitlen |first12=Noah A. |date=2022 |title=Cross-trait assortative mating is widespread and inflates genetic correlation estimates |journal=Science |language=en |volume=378 |issue=6621 |pages=754–761 |doi=10.1126/science.abo2059 |issn=0036-8075 |pmc=9901291 |pmid=36395242|bibcode=2022Sci...378..754B }}</ref>

===Fine-mapping===
Genotyping arrays designed for GWAS rely on ] to provide coverage of the entire genome by genotyping a subset of variants. Because of this, the reported associated variants are unlikely to be the actual causal variants. Associated regions can contain hundreds of variants spanning large regions and encompassing many different genes, making the biological interpretation of GWAS loci more difficult. Fine-mapping is a process to refine these lists of associated variants to a credible set most likely to include the causal variant.

Fine-mapping requires all variants in the associated region to have been genotyped or imputed (dense coverage), very stringent quality control resulting in high-quality genotypes, and large sample sizes sufficient in separating out highly correlated signals. There are several different methods to perform fine-mapping, and all methods produce a posterior probability that a variant in that locus is causal. Because the requirements are often difficult to satisfy, there are still limited examples of these methods being more generally applied.

== See also ==
{{Portal|Biology}}
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ]
* ] tool designed to help interpret the results generated from a genome wide association study

== References ==
{{Reflist|30em|refs=

<ref name="Bown_2011">{{cite journal | vauthors = Bown MJ, Jones GT, Harrison SC, Wright BJ, Bumpstead S, Baas AF, Gretarsdottir S, Badger SA, Bradley DT, Burnand K, Child AH, Clough RE, Cockerill G, Hafez H, Scott DJ, Futers S, Johnson A, Sohrabi S, Smith A, Thompson MM, van Bockxmeer FM, Waltham M, Matthiasson SE, Thorleifsson G, Thorsteinsdottir U, Blankensteijn JD, Teijink JA, Wijmenga C, de Graaf J, Kiemeney LA, Assimes TL, McPherson R, Folkersen L, Franco-Cereceda A, Palmen J, Smith AJ, Sylvius N, Wild JB, Refstrup M, Edkins S, Gwilliam R, Hunt SE, Potter S, Lindholt JS, Frikke-Schmidt R, Tybjærg-Hansen A, Hughes AE, Golledge J, Norman PE, van Rij A, Powell JT, Eriksson P, Stefansson K, Thompson JR, Humphries SE, Sayers RD, Deloukas P, Samani NJ | display-authors = 6 | title = Abdominal aortic aneurysm is associated with a variant in low-density lipoprotein receptor-related protein 1 | journal = American Journal of Human Genetics | volume = 89 | issue = 5 | pages = 619–27 | date = November 2011 | pmid = 22055160 | pmc = 3213391 | doi = 10.1016/j.ajhg.2011.10.002 }}</ref>
<!--unused<ref name="Ehret_2011">{{cite journal | vauthors = Ehret GB, Munroe PB, Rice KM, Bochud M, Johnson AD, Chasman DI, Smith AV, Tobin MD, Verwoert GC, Hwang SJ, Pihur V, Vollenweider P, O'Reilly PF, Amin N, Bragg-Gresham JL, Teumer A, Glazer NL, Launer L, Zhao JH, Aulchenko Y, Heath S, Sõber S, Parsa A, Luan J, Arora P, Dehghan A, Zhang F, Lucas G, Hicks AA, Jackson AU, Peden JF, Tanaka T, Wild SH, Rudan I, Igl W, Milaneschi Y, Parker AN, Fava C, Chambers JC, Fox ER, Kumari M, Go MJ, van der Harst P, Kao WH, Sjögren M, Vinay DG, Alexander M, Tabara Y, Shaw-Hawkins S, Whincup PH, Liu Y, Shi G, Kuusisto J, Tayo B, Seielstad M, Sim X, Nguyen KD, Lehtimäki T, Matullo G, Wu Y, Gaunt TR, Onland-Moret NC, Cooper MN, Platou CG, Org E, Hardy R, Dahgam S, Palmen J, Vitart V, Braund PS, Kuznetsova T, Uiterwaal CS, Adeyemo A, Palmas W, Campbell H, Ludwig B, Tomaszewski M, Tzoulaki I, Palmer ND, Aspelund T, Garcia M, Chang YP, O'Connell JR, Steinle NI, Grobbee DE, Arking DE, Kardia SL, Morrison AC, Hernandez D, Najjar S, McArdle WL, Hadley D, Brown MJ, Connell JM, Hingorani AD, Day IN, Lawlor DA, Beilby JP, Lawrence RW, Clarke R, Hopewell JC, Ongen H, Dreisbach AW, Li Y, Young JH, Bis JC, Kähönen M, Viikari J, Adair LS, Lee NR, Chen MH, Olden M, Pattaro C, Bolton JA, Köttgen A, Bergmann S, Mooser V, Chaturvedi N, Frayling TM, Islam M, Jafar TH, Erdmann J, Kulkarni SR, Bornstein SR, Grässler J, Groop L, Voight BF, Kettunen J, Howard P, Taylor A, Guarrera S, Ricceri F, Emilsson V, Plump A, Barroso I, Khaw KT, Weder AB, Hunt SC, Sun YV, Bergman RN, Collins FS, Bonnycastle LL, Scott LJ, Stringham HM, Peltonen L, Perola M, Vartiainen E, Brand SM, Staessen JA, Wang TJ, Burton PR, Soler Artigas M, Dong Y, Snieder H, Wang X, Zhu H, Lohman KK, Rudock ME, Heckbert SR, Smith NL, Wiggins KL, Doumatey A, Shriner D, Veldre G, Viigimaa M, Kinra S, Prabhakaran D, Tripathy V, Langefeld CD, Rosengren A, Thelle DS, Corsi AM, Singleton A, Forrester T, Hilton G, McKenzie CA, Salako T, Iwai N, Kita Y, Ogihara T, Ohkubo T, Okamura T, Ueshima H, Umemura S, Eyheramendy S, Meitinger T, Wichmann HE, Cho YS, Kim HL, Lee JY, Scott J, Sehmi JS, Zhang W, Hedblad B, Nilsson P, Smith GD, Wong A, Narisu N, Stančáková A, Raffel LJ, Yao J, Kathiresan S, O'Donnell CJ, Schwartz SM, Ikram MA, Longstreth WT, Mosley TH, Seshadri S, Shrine NR, Wain LV, Morken MA, Swift AJ, Laitinen J, Prokopenko I, Zitting P, Cooper JA, Humphries SE, Danesh J, Rasheed A, Goel A, Hamsten A, Watkins H, Bakker SJ, van Gilst WH, Janipalli CS, Mani KR, Yajnik CS, Hofman A, Mattace-Raso FU, Oostra BA, Demirkan A, Isaacs A, Rivadeneira F, Lakatta EG, Orru M, Scuteri A, Ala-Korpela M, Kangas AJ, Lyytikäinen LP, Soininen P, Tukiainen T, Würtz P, Ong RT, Dörr M, Kroemer HK, Völker U, Völzke H, Galan P, Hercberg S, Lathrop M, Zelenika D, Deloukas P, Mangino M, Spector TD, Zhai G, Meschia JF, Nalls MA, Sharma P, Terzic J, Kumar MV, Denniff M, Zukowska-Szczechowska E, Wagenknecht LE, Fowkes FG, Charchar FJ, Schwarz PE, Hayward C, Guo X, Rotimi C, Bots ML, Brand E, Samani NJ, Polasek O, Talmud PJ, Nyberg F, Kuh D, Laan M, Hveem K, Palmer LJ, van der Schouw YT, Casas JP, Mohlke KL, Vineis P, Raitakari O, Ganesh SK, Wong TY, Tai ES, Cooper RS, Laakso M, Rao DC, Harris TB, Morris RW, Dominiczak AF, Kivimaki M, Marmot MG, Miki T, Saleheen D, Chandak GR, Coresh J, Navis G, Salomaa V, Han BG, Zhu X, Kooner JS, Melander O, Ridker PM, Bandinelli S, Gyllensten UB, Wright AF, Wilson JF, Ferrucci L, Farrall M, Tuomilehto J, Pramstaller PP, Elosua R, Soranzo N, Sijbrands EJ, Altshuler D, Loos RJ, Shuldiner AR, Gieger C, Meneton P, Uitterlinden AG, Wareham NJ, Gudnason V, Rotter JI, Rettig R, Uda M, Strachan DP, Witteman JC, Hartikainen AL, Beckmann JS, Boerwinkle E, Vasan RS, Boehnke M, Larson MG, Järvelin MR, Psaty BM, Abecasis GR, Chakravarti A, Elliott P, van Duijn CM, Newton-Cheh C, Levy D, Caulfield MJ, Johnson T | display-authors = 6 | title = Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk | journal = Nature | volume = 478 | issue = 7367 | pages = 103–9 | date = September 2011 | pmid = 21909115 | pmc = 3340926 | doi = 10.1038/nature10405 | bibcode = 2011Natur.478..103T }}</ref>-->

<!-- Not in use
<ref name="Ikram_2010">{{cite journal | vauthors = Ikram MK, Sim X, Xueling S, Jensen RA, Cotch MF, Hewitt AW, Ikram MA, Wang JJ, Klein R, Klein BE, Breteler MM, Cheung N, Liew G, Mitchell P, Uitterlinden AG, Rivadeneira F, Hofman A, de Jong PT, van Duijn CM, Kao L, Cheng CY, Smith AV, Glazer NL, Lumley T, McKnight B, Psaty BM, Jonasson F, Eiriksdottir G, Aspelund T, Harris TB, Launer LJ, Taylor KD, Li X, Iyengar SK, Xi Q, Sivakumaran TA, Mackey DA, Macgregor S, Martin NG, Young TL, Bis JC, Wiggins KL, Heckbert SR, Hammond CJ, Andrew T, Fahy S, Attia J, Holliday EG, Scott RJ, Islam FM, Rotter JI, McAuley AK, Boerwinkle E, Tai ES, Gudnason V, Siscovick DS, Vingerling JR, Wong TY | display-authors = 6 | title = Four novel Loci (19q13, 6q24, 12q24, and 5q14) influence the microcirculation in vivo | journal = PLOS Genetics | volume = 6 | issue = 10 | pages = e1001184 | date = October 2010 | pmid = 21060863 | pmc = 2965750 | doi = 10.1371/journal.pgen.1001184 | veditors = McCarthy MI }}</ref>
Not in use-->

<ref name="Johnson_2011">{{cite journal | vauthors = Johnson T, Gaunt TR, Newhouse SJ, Padmanabhan S, Tomaszewski M, Kumari M, Morris RW, Tzoulaki I, O'Brien ET, Poulter NR, Sever P, Shields DC, Thom S, Wannamethee SG, Whincup PH, Brown MJ, Connell JM, Dobson RJ, Howard PJ, Mein CA, Onipinla A, Shaw-Hawkins S, Zhang Y, Davey Smith G, Day IN, Lawlor DA, Goodall AH, Fowkes FG, Abecasis GR, Elliott P, Gateva V, Braund PS, Burton PR, Nelson CP, Tobin MD, van der Harst P, Glorioso N, Neuvrith H, Salvi E, Staessen JA, Stucchi A, Devos N, Jeunemaitre X, Plouin PF, Tichet J, Juhanson P, Org E, Putku M, Sõber S, Veldre G, Viigimaa M, Levinsson A, Rosengren A, Thelle DS, Hastie CE, Hedner T, Lee WK, Melander O, Wahlstrand B, Hardy R, Wong A, Cooper JA, Palmen J, Chen L, Stewart AF, Wells GA, Westra HJ, Wolfs MG, Clarke R, Franzosi MG, Goel A, Hamsten A, Lathrop M, Peden JF, Seedorf U, Watkins H, Ouwehand WH, Sambrook J, Stephens J, Casas JP, Drenos F, Holmes MV, Kivimaki M, Shah S, Shah T, Talmud PJ, Whittaker J, Wallace C, Delles C, Laan M, Kuh D, Humphries SE, Nyberg F, Cusi D, Roberts R, Newton-Cheh C, Franke L, Stanton AV, Dominiczak AF, Farrall M, Hingorani AD, Samani NJ, Caulfield MJ, Munroe PB | display-authors = 6 | title = Blood pressure loci identified with a gene-centric array | journal = American Journal of Human Genetics | volume = 89 | issue = 6 | pages = 688–700 | date = December 2011 | pmid = 22100073 | pmc = 3234370 | doi = 10.1016/j.ajhg.2011.10.013 }}</ref>

<ref name="Kathiresan_2009">{{cite journal | vauthors = Kathiresan S, Willer CJ, Peloso GM, Demissie S, Musunuru K, Schadt EE, Kaplan L, Bennett D, Li Y, Tanaka T, Voight BF, Bonnycastle LL, Jackson AU, Crawford G, Surti A, Guiducci C, Burtt NP, Parish S, Clarke R, Zelenika D, Kubalanza KA, Morken MA, Scott LJ, Stringham HM, Galan P, Swift AJ, Kuusisto J, Bergman RN, Sundvall J, Laakso M, Ferrucci L, Scheet P, Sanna S, Uda M, Yang Q, Lunetta KL, ], de Bakker PI, O'Donnell CJ, Chambers JC, Kooner JS, Hercberg S, Meneton P, Lakatta EG, Scuteri A, Schlessinger D, Tuomilehto J, Collins FS, Groop L, Altshuler D, Collins R, Lathrop GM, Melander O, Salomaa V, Peltonen L, Orho-Melander M, Ordovas JM, Boehnke M, Abecasis GR, Mohlke KL, Cupples LA | display-authors = 6 | title = Common variants at 30 loci contribute to polygenic dyslipidemia | journal = Nature Genetics | volume = 41 | issue = 1 | pages = 56–65 | date = January 2009 | pmid = 19060906 | pmc = 2881676 | doi = 10.1038/ng.291 }}</ref>

<ref name="Strawbridge_2011">{{cite journal | vauthors = Strawbridge RJ, ], Prokopenko I, Barker A, Ahlqvist E, Rybin D, Petrie JR, Travers ME, Bouatia-Naji N, Dimas AS, Nica A, Wheeler E, Chen H, Voight BF, Taneera J, Kanoni S, Peden JF, Turrini F, Gustafsson S, Zabena C, Almgren P, Barker DJ, Barnes D, Dennison EM, Eriksson JG, Eriksson P, Eury E, Folkersen L, Fox CS, Frayling TM, Goel A, Gu HF, Horikoshi M, Isomaa B, Jackson AU, Jameson KA, Kajantie E, Kerr-Conte J, Kuulasmaa T, Kuusisto J, Loos RJ, Luan J, Makrilakis K, Manning AK, Martínez-Larrad MT, Narisu N, Nastase Mannila M, Ohrvik J, Osmond C, Pascoe L, Payne F, Sayer AA, Sennblad B, Silveira A, Stancáková A, Stirrups K, Swift AJ, Syvänen AC, Tuomi T, van 't Hooft FM, Walker M, Weedon MN, Xie W, Zethelius B, Ongen H, Mälarstig A, Hopewell JC, Saleheen D, Chambers J, Parish S, Danesh J, Kooner J, Ostenson CG, Lind L, Cooper CC, Serrano-Ríos M, Ferrannini E, Forsen TJ, Clarke R, Franzosi MG, Seedorf U, Watkins H, Froguel P, Johnson P, Deloukas P, Collins FS, Laakso M, Dermitzakis ET, Boehnke M, McCarthy MI, Wareham NJ, Groop L, Pattou F, Gloyn AL, Dedoussis GV, Lyssenko V, Meigs JB, Barroso I, Watanabe RM, Ingelsson E, Langenberg C, Hamsten A, Florez JC | display-authors = 6 | title = Genome-wide association identifies nine common variants associated with fasting proinsulin levels and provides new insights into the pathophysiology of type 2 diabetes | journal = Diabetes | volume = 60 | issue = 10 | pages = 2624–34 | date = October 2011 | pmid = 21873549 | pmc = 3178302 | doi = 10.2337/db11-0415 }}</ref>

}}

== External links ==
{{Commons category|Genome-wide association studies}}
* {{Dead link|date=December 2022 |bot=InternetArchiveBot |fix-attempted=yes }}
*
* — by the ]
* — a central database of summary-level genetic association findings
* {{cite web | vauthors = Barrett J |url=http://www.genomesunzipped.org/2010/07/how-to-read-a-genome-wide-association-study.php |title=How to read a genome-wide association study |publisher=Genomes Unzipped |date=18 July 2010}}
* {{Webarchive|url=https://web.archive.org/web/20180226002915/http://www.wikigenes.org/e/art/e/185.html |date=26 February 2018 }} — by Bennett SN, Caporaso, NE, ''et al.''
* — whole genome association analysis toolset
* Impact of functional information on understanding variation. ]

{{Personal genomics}}

{{Good article}}

{{DEFAULTSORT:Genome-Wide Association Study}}
]
]
]
]

Latest revision as of 03:10, 9 January 2025

Study of genetic variants in different individuals

In genomics, a genome-wide association study (GWA study, or GWAS), is an observational study of a genome-wide set of genetic variants in different individuals to see if any variant is associated with a trait. GWA studies typically focus on associations between single-nucleotide polymorphisms (SNPs) and traits like major human diseases, but can equally be applied to any other genetic variants and any other organisms.

Manhattan plot of a GWAS
An illustration of a Manhattan plot depicting several strongly associated risk loci. Each dot represents a SNP, with the X-axis showing genomic location and Y-axis showing association level. This example is taken from a GWA study investigating kidney stone disease, so the peaks indicate genetic variants that are found more often in individuals with kidney stones.

When applied to human data, GWA studies compare the DNA of participants having varying phenotypes for a particular trait or disease. These participants may be people with a disease (cases) and similar people without the disease (controls), or they may be people with different phenotypes for a particular trait, for example blood pressure. This approach is known as phenotype-first, in which the participants are classified first by their clinical manifestation(s), as opposed to genotype-first. Each person gives a sample of DNA, from which millions of genetic variants are read using SNP arrays. If there is significant statistical evidence that one type of the variant (one allele) is more frequent in people with the disease, the variant is said to be associated with the disease. The associated SNPs are then considered to mark a region of the human genome that may influence the risk of disease.

GWA studies investigate the entire genome, in contrast to methods that specifically test a small number of pre-specified genetic regions. Hence, GWAS is a non-candidate-driven approach, in contrast to gene-specific candidate-driven studies. GWA studies identify SNPs and other variants in DNA associated with a disease, but they cannot on their own specify which genes are causal.

The first successful GWAS published in 2002 studied myocardial infarction. This study design was then implemented in the landmark GWA 2005 study investigating patients with age-related macular degeneration, and found two SNPs with significantly altered allele frequency compared to healthy controls. As of 2017, over 3,000 human GWA studies have examined over 1,800 diseases and traits, and thousands of SNP associations have been found. Except in the case of rare genetic diseases, these associations are very weak, but while each individual association may not explain much of the risk, they provide insight into critical genes and pathways and can be important when considered in aggregate.

Background

GWA studies typically identify common variants with small effect sizes (lower right).

Any two human genomes differ in millions of different ways. There are small variations in the individual nucleotides of the genomes (SNPs) as well as many larger variations, such as deletions, insertions and copy number variations. Any of these may cause alterations in an individual's traits, or phenotype, which can be anything from disease risk to physical properties such as height. Around the year 2000, prior to the introduction of GWA studies, the primary method of investigation was through inheritance studies of genetic linkage in families. This approach had proven highly useful towards single gene disorders. However, for common and complex diseases the results of genetic linkage studies proved hard to reproduce. A suggested alternative to linkage studies was the genetic association study. This study type asks if the allele of a genetic variant is found more often than expected in individuals with the phenotype of interest (e.g. with the disease being studied). Early calculations on statistical power indicated that this approach could be better than linkage studies at detecting weak genetic effects.

In addition to the conceptual framework several additional factors enabled the GWA studies. One was the advent of biobanks, which are repositories of human genetic material that greatly reduced the cost and difficulty of collecting sufficient numbers of biological specimens for study. Another was the International HapMap Project, which, from 2003 identified a majority of the common SNPs interrogated in a GWA study. The haploblock structure identified by HapMap project also allowed the focus on the subset of SNPs that would describe most of the variation. Also the development of the methods to genotype all these SNPs using genotyping arrays was an important prerequisite.

Methods

Example calculation illustrating the methodology of a case-control GWA study. The allele count of each measured SNP is evaluated—in this case with a chi-squared test—to identify variants associated with the trait in question. The numbers in this example are taken from a 2007 study of coronary artery disease (CAD) that showed that the individuals with the G-allele of SNP1 (rs1333049) were overrepresented amongst CAD-patients.
Illustration of a simulated genotype by phenotype regression for a single SNP. Each dot represents an individual. A GWAS of a continuous trait essentially consists of repeating this analysis at each SNP.

The most common approach of GWA studies is the case-control setup, which compares two large groups of individuals, one healthy control group and one case group affected by a disease. All individuals in each group are typically genotyped at common known SNPs. The exact number of SNPs depends on the genotyping technology, but are typically one million or more. For each of these SNPs it is then investigated if the allele frequency is significantly altered between the case and the control group. In such setups, the fundamental unit for reporting effect sizes is the odds ratio. The odds ratio is the ratio of two odds, which in the context of GWA studies are the odds of case for individuals having a specific allele and the odds of case for individuals who do not have that same allele.

Example: suppose that there are two alleles, T and C. The number of individuals in the case group having allele T is represented by 'A' and the number of individuals in the control group having allele T is represented by 'B'. Similarly, the number of individuals in the case group having allele C is represented by 'X' and the number of individuals in the control group having allele C is represented by 'Y'. In this case the odds ratio for allele T is A:B (meaning 'A to B', in standard odds terminology) divided by X:Y, which in mathematical notation is simply (A/B)/(X/Y).

When the allele frequency in the case group is much higher than in the control group, the odds ratio is higher than 1, and vice versa for lower allele frequency. Additionally, a P-value for the significance of the odds ratio is typically calculated using a simple chi-squared test. Finding odds ratios that are significantly different from 1 is the objective of the GWA study because this shows that a SNP is associated with disease. Because so many variants are tested, it is standard practice to require the p-value to be lower than 5×10 to consider a variant significant.

Variations on the case-control approach. A common alternative to case-control GWA studies is the analysis of quantitative phenotypic data, e.g. height or biomarker concentrations or even gene expression. Likewise, alternative statistics designed for dominance or recessive penetrance patterns can be used. Calculations are typically done using bioinformatics software such as SNPTEST and PLINK, which also include support for many of these alternative statistics. GWAS focuses on the effect of individual SNPs. However, it is also possible that complex interactions among two or more SNPs (epistasis) might contribute to complex diseases. Due to the potentially exponential number of interactions, detecting statistically significant interactions in GWAS data is both computationally and statistically challenging. This task has been tackled in existing publications that use algorithms inspired from data mining. Moreover, the researchers try to integrate GWA data with other biological data such as protein-protein interaction network to extract more informative results. Despite the previously perceived challenge posed by the vast number of SNP combinations, a recent study has successfully unveiled complete epistatic maps at a gene-level resolution in plants/Arabidopsis thaliana

Full 2D epistatic interaction maps point to epistatic signal
Zoom in a full epistatic map for an Arabidopsis phenotype

A key step in the majority of GWA studies is the imputation of genotypes at SNPs not on the genotype chip used in the study. This process greatly increases the number of SNPs that can be tested for association, increases the power of the study, and facilitates meta-analysis of GWAS across distinct cohorts. Genotype imputation is carried out by statistical methods that impute genotypic data to a set of reference panel of haplotypes, which typically have been densely genotyped using whole-genome sequencing. These methods take advantage of sharing of haplotypes between individuals over short stretches of sequence to impute alleles. Existing software packages for genotype imputation include IMPUTE2, Minimac, Beagle and MaCH.

In addition to the calculation of association, it is common to take into account any variables that could potentially confound the results. Sex, age, and ancestry are common examples of confounding variables. Moreover, it is also known that many genetic variations are associated with the geographical and historical populations in which the mutations first arose. Because of this association, studies must take account of the geographic and ethnic background of participants by controlling for what is called population stratification. If they did not do so, the studies could produce false positive results.

After odds ratios and P-values have been calculated for all SNPs, a common approach is to create a Manhattan plot. In the context of GWA studies, this plot shows the negative logarithm of the P-value as a function of genomic location. Thus the SNPs with the most significant association stand out on the plot, usually as stacks of points because of haploblock structure. Importantly, the P-value threshold for significance is corrected for multiple testing issues. The exact threshold varies by study, but the conventional genome-wide significance threshold is 5×10 to be significant in the face of hundreds of thousands to millions of tested SNPs. GWA studies typically perform the first analysis in a discovery cohort, followed by validation of the most significant SNPs in an independent validation cohort.

Results

Regional association plot, showing individual SNPs in the LDL receptor region and their association to LDL-cholesterol levels. This type of plot is similar to the Manhattan plot in the lead section, but for a more limited section of the genome. The haploblock structure is visualized with colour scale and the association level is given by the left Y-axis. The dot representing the rs73015013 SNP (in the top-middle) has a high Y-axis location because this SNP explains some of the variation in LDL-cholesterol.
Relationship between the minor allele frequency and the effect size of genome wide significant variants in a GWAS of height.

Attempts have been made at creating comprehensive catalogues of SNPs that have been identified from GWA studies. As of 2009, SNPs associated with diseases are numbered in the thousands.

The first GWA study, conducted in 2005, compared 96 patients with age-related macular degeneration (ARMD) with 50 healthy controls. It identified two SNPs with significantly altered allele frequency between the two groups. These SNPs were located in the gene encoding complement factor H, which was an unexpected finding in the research of ARMD. The findings from these first GWA studies have subsequently prompted further functional research towards therapeutical manipulation of the complement system in ARMD.

Another landmark publication in the history of GWA studies was the Wellcome Trust Case Control Consortium (WTCCC) study, the largest GWA study ever conducted at the time of its publication in 2007. The WTCCC included 14,000 cases of seven common diseases (~2,000 individuals for each of coronary heart disease, type 1 diabetes, type 2 diabetes, rheumatoid arthritis, Crohn's disease, bipolar disorder, and hypertension) and 3,000 shared controls. This study was successful in uncovering many genes associated with these diseases.

Since these first landmark GWA studies, there have been two general trends. One has been towards larger and larger sample sizes. In 2018, several genome-wide association studies are reaching a total sample size of over 1 million participants, including 1.1 million in a genome-wide study of educational attainment follow by another in 2022 with 3 million individuals and a study of insomnia containing 1.3 million individuals. The reason is the drive towards reliably detecting risk-SNPs that have smaller effect sizes and lower allele frequency. Another trend has been towards the use of more narrowly defined phenotypes, such as blood lipids, proinsulin or similar biomarkers. These are called intermediate phenotypes, and their analyses may be of value to functional research into biomarkers.

A variation of GWAS uses participants that are first-degree relatives of people with a disease. This type of study has been named genome-wide association study by proxy (GWAX).

A central point of debate on GWA studies has been that most of the SNP variations found by GWA studies are associated with only a small increased risk of the disease, and have only a small predictive value. The median odds ratio is 1.33 per risk-SNP, with only a few showing odds ratios above 3.0. These magnitudes are considered small because they do not explain much of the heritable variation. This heritable variation is estimated from heritability studies based on monozygotic twins. For example, it is known that 40% of variance in depression can be explained by hereditary differences, but GWA studies only account for a minority of this variance.

Clinical applications and examples

A challenge for future successful GWA study is to apply the findings in a way that accelerates drug and diagnostics development, including better integration of genetic studies into the drug-development process and a focus on the role of genetic variation in maintaining health as a blueprint for designing new drugs and diagnostics. Several studies have looked into the use of risk-SNP markers as a means of directly improving the accuracy of prognosis. Some have found that the accuracy of prognosis improves, while others report only minor benefits from this use. Generally, a problem with this direct approach is the small magnitudes of the effects observed. A small effect ultimately translates into a poor separation of cases and controls and thus only a small improvement of prognosis accuracy. An alternative application is therefore the potential for GWA studies to elucidate pathophysiology.

Hepatitis C treatment

One such success is related to identifying the genetic variant associated with response to anti-hepatitis C virus treatment. For genotype 1 hepatitis C treated with Pegylated interferon-alpha-2a or Pegylated interferon-alpha-2b combined with ribavirin, a GWA study has shown that SNPs near the human IL28B gene, encoding interferon lambda 3, are associated with significant differences in response to the treatment. A later report demonstrated that the same genetic variants are also associated with the natural clearance of the genotype 1 hepatitis C virus. These major findings facilitated the development of personalized medicine and allowed physicians to customize medical decisions based on the patient's genotype.

eQTL, LDL and cardiovascular disease

The goal of elucidating pathophysiology has also led to increased interest in the association between risk-SNPs and the gene expression of nearby genes, the so-called expression quantitative trait loci (eQTL) studies. The reason is that GWAS studies identify risk-SNPs, but not risk-genes, and specification of genes is one step closer towards actionable drug targets. As a result, major GWA studies by 2011 typically included extensive eQTL analysis. One of the strongest eQTL effects observed for a GWA-identified risk SNP is the SORT1 locus. Functional follow up studies of this locus using small interfering RNA and gene knock-out mice have shed light on the metabolism of low-density lipoproteins, which have important clinical implications for cardiovascular disease.

Atrial fibrillation

For example, a meta-analysis accomplished in 2018 revealed the discovery of 70 new loci associated with atrial fibrillation. It has been identified different variants associated with transcription factor coding-genes, such as TBX3 and TBX5, NKX2-5 o PITX2, which are involved in cardiac conduction regulation, in ionic channel modulation and cardiac development. It was also identified new genes involved in tachycardia (CASQ2) or associated with alteration of cardiac muscle cell communication (PKP2).

Schizophrenia

Research using a High-Precision Protein Interaction Prediction (HiPPIP) computational model that discovered 504 new protein-protein interactions (PPIs) associated with genes linked to schizophrenia. While the evidence supporting the genetic basis of schizophrenia is not controversial, one study found that 25 candidate schizophrenia genes discovered from GWAS had little association with schizophrenia, demonstrating that GWAS alone may be insufficient to identify candidate genes.

Pain susceptibility

Another group of researchers conducted a joint analysis of GWAS summary statistics from seventeen pain susceptibility traits in the UK Biobank and revealed 99 genome-wide significant risk loci, among which 34 loci were new. Also, with leave-one-trait-out meta-analyses these loci were grouped in four categories: Loci associated with nearly all pain-related traits, associated with a single trait, associated with multiple forms of skeletomuscular pain and with headache-related pain.

Moreover, 664 genes were mapped to the 99 loci by genomic proximity, eQTLs and chromatin interaction, where 15% of these genes showed differential expression in individuals with acute or chronic pain compared to healthy individuals.

Conservation applications

Population level GWA studies may be used to identify adaptive genes to help evaluate ability of species to adapt to changing environmental conditions as the global climate becomes warmer. This could help determine extirpation risk for species and could therefore be an important tool for conservation planning. Utilizing GWA studies to determine adaptive genes could help elucidate the relationship between neutral and adaptive genetic diversity.

Agricultural applications

Plant growth stages and yield components

GWA studies act as an important tool in plant breeding. With large genotyping and phenotyping data, GWAS are powerful in analyzing complex inheritance modes of traits that are important yield components such as number of grains per spike, weight of each grain and plant structure. In a study on GWAS in spring wheat, GWAS have revealed a strong correlation of grain production with booting data, biomass and number of grains per spike. GWA study is also a success in study genetic architecture of complex traits in rice.

Plant pathogens

The emergences of plant pathogens have posed serious threats to plant health and biodiversity. Under this consideration, identification of wild types that have the natural resistance to certain pathogens could be of vital importance. Furthermore, we need to predict which alleles are associated with the resistance. GWA studies is a powerful tool to detect the relationships of certain variants and the resistance to the plant pathogen, which is beneficial for developing new pathogen-resisted cultivars.

Chicken

The first GWA study in chickens was done by Abasht and Lamont in 2007. This GWA was used to study the fatness trait in F2 population found previously. Significantly related SNPs were found are on 10 chromosomes (1, 2, 3, 4, 7, 8, 10, 12, 15 and 27).

Limitations

GWA studies have several issues and limitations that can be taken care of through proper quality control and study setup. Lack of well defined case and control groups, insufficient sample size, control for population stratification are common problems. On the statistical issue of multiple testing, it has been noted that "the GWA approach can be problematic because the massive number of statistical tests performed presents an unprecedented potential for false-positive results". This is why all modern GWAS use a very low p-value threshold. In addition to easily correctible problems such as these, some more subtle but important issues have surfaced. A high-profile GWA study that investigated individuals with very long life spans to identify SNPs associated with longevity is an example of this. The publication came under scrutiny because of a discrepancy between the type of genotyping array in the case and control group, which caused several SNPs to be falsely highlighted as associated with longevity. The study was subsequently retracted, but a modified manuscript was later published. Now, many GWAS control for genotyping array. If there are substantial differences between groups on the type of genotyping array, as with any confounder, GWA studies could result in a false positive. Another consequence is that such studies are unable to detect the contribution of very rare mutations not included in the array or able to be imputed.

Additionally, GWA studies identify candidate risk variants for the population from which their analysis is performed, and with most GWA studies historically stemming from European databases, there is a lack of translation of the identified risk variants to other non-European populations. For instance, GWA studies for diseases like Alzheimer's disease have been conducted primarily in Caucasian populations, which does not give adequate insight in other ethnic populations, including African Americans or East Asians. Alternative strategies suggested involve linkage analysis. More recently, the rapidly decreasing price of complete genome sequencing have also provided a realistic alternative to genotyping array-based GWA studies. High-throughput sequencing does have potential to side-step some of the shortcomings of non-sequencing GWA. Cross-trait assortative mating can inflate estimates of genetic phenotype similarity.

Fine-mapping

Genotyping arrays designed for GWAS rely on linkage disequilibrium to provide coverage of the entire genome by genotyping a subset of variants. Because of this, the reported associated variants are unlikely to be the actual causal variants. Associated regions can contain hundreds of variants spanning large regions and encompassing many different genes, making the biological interpretation of GWAS loci more difficult. Fine-mapping is a process to refine these lists of associated variants to a credible set most likely to include the causal variant.

Fine-mapping requires all variants in the associated region to have been genotyped or imputed (dense coverage), very stringent quality control resulting in high-quality genotypes, and large sample sizes sufficient in separating out highly correlated signals. There are several different methods to perform fine-mapping, and all methods produce a posterior probability that a variant in that locus is causal. Because the requirements are often difficult to satisfy, there are still limited examples of these methods being more generally applied.

See also

References

  1. ^ Manolio TA (July 2010). "Genomewide association studies and assessment of the risk of disease". The New England Journal of Medicine. 363 (2): 166–76. doi:10.1056/NEJMra0905980. PMID 20647212.
  2. ^ Pearson TA, Manolio TA (March 2008). "How to interpret a genome-wide association study". JAMA. 299 (11): 1335–44. doi:10.1001/jama.299.11.1335. PMID 18349094.
  3. "Genome-Wide Association Studies". National Human Genome Research Institute.
  4. Ozaki K, Ohnishi Y, Iida A, Sekine A, Yamada R, Tsunoda T, et al. (December 2002). "Functional SNPs in the lymphotoxin-alpha gene that are associated with susceptibility to myocardial infarction". Nature Genetics. 32 (4): 650–4. doi:10.1038/ng1047. PMID 12426569. S2CID 21414260.
  5. Klein RJ, Zeiss C, Chew EY, Tsai JY, Sackler RS, Haynes C, et al. (April 2005). "Complement factor H polymorphism in age-related macular degeneration". Science. 308 (5720): 385–9. Bibcode:2005Sci...308..385K. doi:10.1126/science.1109557. PMC 1512523. PMID 15761122.
  6. "GWAS Catalog: The NHGRI-EBI Catalog of published genome-wide association studies". European Molecular Biology Laboratory. Retrieved 18 April 2017.
  7. ^ Bush WS, Moore JH (2012). Lewitter F, Kann M (eds.). "Chapter 11: Genome-wide association studies". PLOS Computational Biology. 8 (12): e1002822. Bibcode:2012PLSCB...8E2822B. doi:10.1371/journal.pcbi.1002822. PMC 3531285. PMID 23300413.
  8. ^ Strachan T, Read A (2011). Human Molecular Genetics (4th ed.). Garland Science. pp. 467–495. ISBN 978-0-8153-4149-9.
  9. "Online Mendelian Inheritance in Man". Archived from the original on 5 December 2011. Retrieved 6 December 2011.
  10. ^ Altmüller J, Palmer LJ, Fischer G, Scherb H, Wjst M (November 2001). "Genomewide scans of complex human diseases: true linkage is hard to find". American Journal of Human Genetics. 69 (5): 936–50. doi:10.1086/324069. PMC 1274370. PMID 11565063.
  11. Risch N, Merikangas K (September 1996). "The future of genetic studies of complex human diseases". Science. 273 (5281): 1516–7. Bibcode:1996Sci...273.1516R. doi:10.1126/science.273.5281.1516. PMID 8801636. S2CID 5228523.
  12. Greely HT (2007). "The uneasy ethical and legal underpinnings of large-scale genomic biobanks". Annual Review of Genomics and Human Genetics. 8: 343–64. doi:10.1146/annurev.genom.7.080505.115721. PMID 17550341.
  13. The International HapMap Project, Gibbs RA, Belmont JW, Hardenbol P, Willis TD, Yu F, Yang H, Ch'Ang LY, Huang W (December 2003). "The International HapMap Project" (PDF). Nature. 426 (6968): 789–96. Bibcode:2003Natur.426..789G. doi:10.1038/nature02168. hdl:2027.42/62838. PMID 14685227. S2CID 4387110.
  14. Schena M, Shalon D, Davis RW, Brown PO (October 1995). "Quantitative monitoring of gene expression patterns with a complementary DNA microarray". Science. 270 (5235): 467–70. Bibcode:1995Sci...270..467S. doi:10.1126/science.270.5235.467. PMID 7569999. S2CID 6720459.
  15. ^ Wellcome Trust Case Control Consortium, Burton PR (June 2007). "Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls". Nature. 447 (7145): 661–78. Bibcode:2007Natur.447..661B. doi:10.1038/nature05911. PMC 2719288. PMID 17554300.
  16. ^ Clarke GM, Anderson CA, Pettersson FH, Cardon LR, Morris AP, Zondervan KT (February 2011). "Basic statistical analysis in genetic case-control studies". Nature Protocols. 6 (2): 121–33. doi:10.1038/nprot.2010.182. PMC 3154648. PMID 21293453.
  17. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. (September 2007). "PLINK: a tool set for whole-genome association and population-based linkage analyses". American Journal of Human Genetics. 81 (3): 559–75. doi:10.1086/519795. PMC 1950838. PMID 17701901.
  18. Llinares-López F, Grimm DG, Bodenham DA, Gieraths U, Sugiyama M, Rowan B, Borgwardt K (June 2015). "Genome-wide detection of intervals of genetic heterogeneity associated with complex traits". Bioinformatics. 31 (12): i240-9. doi:10.1093/bioinformatics/btv263. PMC 4559912. PMID 26072488.
  19. Ayati M, Erten S, Chance MR, Koyutürk M (December 2015). "MOBAS: identification of disease-associated protein subnetworks using modularity-based scoring". EURASIP Journal on Bioinformatics & Systems Biology. 2015 (1): 7. doi:10.1186/s13637-015-0025-6. PMC 5270451. PMID 28194175.
  20. Ayati M, Koyutürk M (1 January 2015). "Assessing the Collective Disease Association of Multiple Genomic Loci". Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics. BCB '15. New York, NY, USA: ACM. pp. 376–385. doi:10.1145/2808719.2808758. ISBN 978-1-4503-3853-0. S2CID 5942777.
  21. Carré C, Carluer JB, Chaux C, Estoup-Streiff C, Roche N, Hosy E, Mas A, Krouk G (March, 2024). "Next-Gen GWAS: full 2D epistatic interaction maps retrieve part of missing heritability and improve phenotypic prediction". Genome biology. doi:10.1186/s13059-024-03202-0. PMID 38523316. S2CID 146570
  22. ^ Carré C, Carluer JB, Chaux C, Estoup-Streiff C, Roche N, Hosy E, Mas A, Krouk G (25 March 2024). "Next-Gen GWAS: full 2D epistatic interaction maps retrieve part of missing heritability and improve phenotypic prediction". Genome Biology. 25 (1): 76. doi:10.1186/s13059-024-03202-0. ISSN 1474-760X. PMC 10962106. PMID 38523316.
  23. Marchini J, Howie B (July 2010). "Genotype imputation for genome-wide association studies". Nature Reviews Genetics. 11 (7): 499–511. doi:10.1038/nrg2796. PMID 20517342. S2CID 1465707.
  24. Howie B, Marchini J, Stephens M (November 2011). "Genotype imputation with thousands of genomes". G3. 1 (6): 457–70. doi:10.1534/g3.111.001198. PMC 3276165. PMID 22384356.
  25. Browning BL, Browning SR (February 2009). "A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals". American Journal of Human Genetics. 84 (2): 210–23. doi:10.1016/j.ajhg.2009.01.005. PMC 2668004. PMID 19200528.
  26. Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR (December 2010). "MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes". Genetic Epidemiology. 34 (8): 816–34. doi:10.1002/gepi.20533. PMC 3175618. PMID 21058334.
  27. Novembre J, Johnson T, Bryc K, Kutalik Z, Boyko AR, Auton A, Indap A, King KS, Bergmann S, Nelson MR, Stephens M, Bustamante CD (November 2008). "Genes mirror geography within Europe". Nature. 456 (7218): 98–101. Bibcode:2008Natur.456...98N. doi:10.1038/nature07331. PMC 2735096. PMID 18758442.
  28. Charney E (January 2017). "Genes, behavior, and behavior genetics". Wiley Interdisciplinary Reviews. Cognitive Science. 8 (1–2): e1405. doi:10.1002/wcs.1405. hdl:10161/13337. PMID 27906529.
  29. Wittkowski KM, Sonakya V, Bigio B, Tonn MK, Shic F, Ascano M, Nasca C, Gold-Von Simson G (January 2014). "A novel computational biostatistics approach implies impaired dephosphorylation of growth factor receptors as associated with severity of autism". Translational Psychiatry. 4 (1): e354. doi:10.1038/tp.2013.124. PMC 3905234. PMID 24473445.
  30. Barsh GS, Copenhaver GP, Gibson G, Williams SM (July 2012). "Guidelines for genome-wide association studies". PLOS Genetics. 8 (7): e1002812. doi:10.1371/journal.pgen.1002812. PMC 3390399. PMID 22792080.
  31. Smith SM, Douaud G, Chen W, Hanayik T, Alfaro-Almagro F, Sharp K, Elliott LT (2021). "An expanded set of genome-wide association studies of brain imaging phenotypes in UK Biobank". Nat Neurosci. 24 (5): 737–745. doi:10.1038/s41593-021-00826-4. PMC 7610742. PMID 33875891.
  32. Sanna S, Li B, Mulas A, Sidore C, Kang HM, Jackson AU, et al. (July 2011). Gibson G (ed.). "Fine mapping of five loci associated with low-density lipoprotein cholesterol detects variants that double the explained heritability". PLOS Genetics. 7 (7): e1002198. doi:10.1371/journal.pgen.1002198. PMC 3145627. PMID 21829380.
  33. Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA (June 2009). "Potential etiologic and functional implications of genome-wide association loci for human diseases and traits". Proceedings of the National Academy of Sciences of the United States of America. 106 (23): 9362–7. Bibcode:2009PNAS..106.9362H. doi:10.1073/pnas.0903103106. PMC 2687147. PMID 19474294.
  34. Johnson AD, O'Donnell CJ (January 2009). "An open access database of genome-wide association results". BMC Medical Genetics. 10: 6. doi:10.1186/1471-2350-10-6. PMC 2639349. PMID 19161620.
  35. Haines JL, Hauser MA, Schmidt S, Scott WK, Olson LM, Gallins P, Spencer KL, Kwan SY, Noureddine M, Gilbert JR, Schnetz-Boutaud N, Agarwal A, Postel EA, Pericak-Vance MA (April 2005). "Complement factor H variant increases the risk of age-related macular degeneration". Science. 308 (5720): 419–21. Bibcode:2005Sci...308..419H. doi:10.1126/science.1110359. PMID 15761120. S2CID 32716116.
  36. Fridkis-Hareli M, Storek M, Mazsaroff I, Risitano AM, Lundberg AS, Horvath CJ, Holers VM (October 2011). "Design and development of TT30, a novel C3d-targeted C3/C5 convertase inhibitor for treatment of human complement alternative pathway-mediated diseases". Blood. 118 (17): 4705–13. doi:10.1182/blood-2011-06-359646. PMC 3208285. PMID 21860027.
  37. "Largest ever study of genetics of common diseases published today" (Press release). Wellcome Trust Case Control Consortium. 6 June 2007. Archived from the original on 4 June 2008. Retrieved 19 June 2008.
  38. Ioannidis JP, Thomas G, Daly MJ (May 2009). "Validating, augmenting and refining genome-wide association signals". Nature Reviews Genetics. 10 (5): 318–29. doi:10.1038/nrg2544. PMC 7877552. PMID 19373277. S2CID 6463743.
  39. Lee JJ, Wedow R, Okbay A, Kong E, Maghzian O, Zacher M, Nguyen-Viet TA, Bowers P, Sidorenko J, Karlsson Linnér R, et al. (July 2018). "Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals". Nature Genetics. 50 (8): 1112–1121. doi:10.1038/s41588-018-0147-3. PMC 6393768. PMID 30038396.
  40. Okbay A, Wu Y, Wang N, Jayashankar H, Bennett M, Nehzati SM, et al. (April 2022). "Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals". Nature Genetics. 54 (4): 437–449. doi:10.1038/s41588-022-01016-z. hdl:11368/3026010. PMC 9005349. PMID 35361970.
  41. Jansen PR, Watanabe K, Stringer S, Skene N, Bryois J, Hammerschlag AR, et al. (March 2019). "Genome-wide analysis of insomnia in 1,331,010 individuals identifies new risk loci and functional pathways". Nature Genetics. 51 (3): 394–403. bioRxiv 10.1101/214973. doi:10.1038/s41588-018-0333-3. hdl:1871.1/08af5d9e-8621-41f1-97c5-e77a1063495f. PMID 30804565.
  42. ^ Kathiresan S, Willer CJ, Peloso GM, Demissie S, Musunuru K, Schadt EE, et al. (January 2009). "Common variants at 30 loci contribute to polygenic dyslipidemia". Nature Genetics. 41 (1): 56–65. doi:10.1038/ng.291. PMC 2881676. PMID 19060906.
  43. Strawbridge RJ, Dupuis J, Prokopenko I, Barker A, Ahlqvist E, Rybin D, et al. (October 2011). "Genome-wide association identifies nine common variants associated with fasting proinsulin levels and provides new insights into the pathophysiology of type 2 diabetes". Diabetes. 60 (10): 2624–34. doi:10.2337/db11-0415. PMC 3178302. PMID 21873549.
  44. Danesh J, Pepys MB (November 2009). "C-reactive protein and coronary disease: is there a causal link?". Circulation. 120 (21): 2036–9. doi:10.1161/CIRCULATIONAHA.109.907212. PMID 19901186.
  45. Liu JZ, Erlich Y, Pickrell JK (March 2017). "Case-control association mapping by proxy using family history of disease". Nature Genetics. 49 (3): 325–331. doi:10.1038/ng.3766. PMID 28092683. S2CID 5598845.
  46. Ku CS, Loy EY, Pawitan Y, Chia KS (April 2010). "The pursuit of genome-wide association studies: where are we now?". Journal of Human Genetics. 55 (4): 195–206. doi:10.1038/jhg.2010.19. PMID 20300123.
  47. ^ Maher B (November 2008). "Personal genomes: The case of the missing heritability". Nature. 456 (7218): 18–21. doi:10.1038/456018a. PMID 18987709.
  48. Iadonato SP, Katze MG (September 2009). "Genomics: Hepatitis C virus gets personal". Nature. 461 (7262): 357–8. Bibcode:2009Natur.461..357I. doi:10.1038/461357a. PMID 19759611. S2CID 7602652.Closed access icon
  49. Muehlschlegel JD, Liu KY, Perry TE, Fox AA, Collard CD, Shernan SK, Body SC (September 2010). "Chromosome 9p21 variant predicts mortality after coronary artery bypass graft surgery". Circulation. 122 (11 Suppl): S60–5. doi:10.1161/CIRCULATIONAHA.109.924233. PMC 2943860. PMID 20837927.
  50. Paynter NP, Chasman DI, Paré G, Buring JE, Cook NR, Miletich JP, Ridker PM (February 2010). "Association between a literature-based genetic risk score and cardiovascular events in women". JAMA. 303 (7): 631–7. doi:10.1001/jama.2010.119. PMC 2845522. PMID 20159871.
  51. Couzin-Frankel J (June 2010). "Major heart disease genes prove elusive". Science. 328 (5983): 1220–1. Bibcode:2010Sci...328.1220C. doi:10.1126/science.328.5983.1220. PMID 20522751.Closed access icon
  52. Ge D, Fellay J, Thompson AJ, Simon JS, Shianna KV, Urban TJ, Heinzen EL, Qiu P, Bertelsen AH, Muir AJ, Sulkowski M, McHutchison JG, Goldstein DB (September 2009). "Genetic variation in IL28B predicts hepatitis C treatment-induced viral clearance". Nature. 461 (7262): 399–401. Bibcode:2009Natur.461..399G. doi:10.1038/nature08309. PMID 19684573. S2CID 1707096.
  53. Thomas DL, Thio CL, Martin MP, Qi Y, Ge D, O'Huigin C, Kidd J, Kidd K, Khakoo SI, Alexander G, Goedert JJ, Kirk GD, Donfield SM, Rosen HR, Tobler LH, Busch MP, McHutchison JG, Goldstein DB, Carrington M (October 2009). "Genetic variation in IL28B and spontaneous clearance of hepatitis C virus". Nature. 461 (7265): 798–801. Bibcode:2009Natur.461..798T. doi:10.1038/nature08463. PMC 3172006. PMID 19759533.
  54. Lu YF, Goldstein DB, Angrist M, Cavalleri G (July 2014). "Personalized medicine and human genetic diversity". Cold Spring Harbor Perspectives in Medicine. 4 (9): a008581. doi:10.1101/cshperspect.a008581. PMC 4143101. PMID 25059740.
  55. Folkersen L, van't Hooft F, Chernogubova E, Agardh HE, Hansson GK, Hedin U, Liska J, Syvänen AC, Paulsson-Berne G, Paulssson-Berne G, Franco-Cereceda A, Hamsten A, Gabrielsen A, Eriksson P (August 2010). "Association of genetic risk variants with expression of proximal genes identifies novel susceptibility genes for cardiovascular disease". Circulation: Cardiovascular Genetics. 3 (4): 365–73. doi:10.1161/CIRCGENETICS.110.948935. PMID 20562444.
  56. Bown MJ, Jones GT, Harrison SC, Wright BJ, Bumpstead S, Baas AF, et al. (November 2011). "Abdominal aortic aneurysm is associated with a variant in low-density lipoprotein receptor-related protein 1". American Journal of Human Genetics. 89 (5): 619–27. doi:10.1016/j.ajhg.2011.10.002. PMC 3213391. PMID 22055160.
  57. Coronary Artery Disease (C4D) Genetics Consortium (March 2011). "A genome-wide association study in Europeans and South Asians identifies five new loci for coronary artery disease". Nature Genetics. 43 (4): 339–44. doi:10.1038/ng.782. PMID 21378988. S2CID 39712343.{{cite journal}}: CS1 maint: numeric names: authors list (link)Closed access icon
  58. Johnson T, Gaunt TR, Newhouse SJ, Padmanabhan S, Tomaszewski M, Kumari M, et al. (December 2011). "Blood pressure loci identified with a gene-centric array". American Journal of Human Genetics. 89 (6): 688–700. doi:10.1016/j.ajhg.2011.10.013. PMC 3234370. PMID 22100073.
  59. Dubé JB, Johansen CT, Hegele RA (June 2011). "Sortilin: an unusual suspect in cholesterol metabolism: from GWAS identification to in vivo biochemical analyses, sortilin has been identified as a novel mediator of human lipoprotein metabolism". BioEssays. 33 (6): 430–7. doi:10.1002/bies.201100003. PMID 21462369.Closed access icon
  60. Bauer RC, Stylianou IM, Rader DJ (April 2011). "Functional validation of new pathways in lipoprotein metabolism identified by human genetics". Current Opinion in Lipidology. 22 (2): 123–8. doi:10.1097/MOL.0b013e32834469b3. PMID 21311327. S2CID 24020035.Closed access icon
  61. Roselli C, Chafin M, Weng L (2018). "Multi-ethnic genome-wide association study for atrial fibrillation". Nature Genetics. 50 (9): 1225–1233. doi:10.1038/s41588-018-0133-9. PMC 6136836. PMID 29892015.
  62. Ganapathiraju MK, Thahir M, Handen A, Sarkar SN, Sweet RA, Nimgaonkar VL, Loscher CE, Bauer EM, Chaparala S (27 April 2016). "Schizophrenia interactome with 504 novel protein-protein interactions". npj Schizophrenia. 2: 16012. doi:10.1038/npjschz.2016.12. PMC 4898894. PMID 27336055.
  63. "New Schizophrenia Study Focuses on Protein-Protein Interactions". psychcentral.com. 3 May 2016. Archived from the original on 11 January 2020. Retrieved 22 April 2023.
  64. Ganapathiraju M, Chaparala S, Lo C (April 2018). "F200. Elucidating The Role of Cilia in Neuropsychiatric Diseases Through Interactome Analysis". Schizophrenia Bulletin. 44 (suppl_1): S298-9. doi:10.1093/schbul/sby017.731. PMC 5887623.
  65. Johnson EC, Border R, Melroy-Greif WE, de Leeuw CA, Ehringer MA, Keller MC (November 2017). "No Evidence That Schizophrenia Candidate Genes Are More Associated With Schizophrenia Than Noncandidate Genes". Biological Psychiatry. 82 (10): 702–708. doi:10.1016/j.biopsych.2017.06.033. PMC 5643230. PMID 28823710.
  66. Mocci E, Ward K, Perry JA, Starkweather A, Stone LS, Schabrun SM, Renn C, Dorsey SG, Ament SA (2023). "Genome wide association joint analysis reveals 99 risk loci for pain susceptibility and pleiotropic relationships with psychiatric, metabolic, and immunological traits". PLOS Genetics. 19 (10): e1010977. doi:10.1371/journal.pgen.1010977. ISSN 1553-7404. PMC 10602383. PMID 37844115.
  67. Willi Y, Kristensen TN, Sgrò CM, Weeks AR, Ørsted M, Hoffmann AA (January 2022). "Conservation genetics as a management tool: The five best-supported paradigms to assist the management of threatened species". Proceedings of the National Academy of Sciences of the United States of America. 119 (1): e2105076119. Bibcode:2022PNAS..11905076W. doi:10.1073/pnas.2105076119. PMC 8740573. PMID 34930821.
  68. Turuspekov Y, Baibulatova A, Yermekbayev K, Tokhetova L, Chudinov V, Sereda G, et al. (November 2017). "GWAS for plant growth stages and yield components in spring wheat (Triticum aestivum L.) harvested in three regions of Kazakhstan". BMC Plant Biology. 17 (Suppl 1): 190. Bibcode:2017BMCPB..17S.190T. doi:10.1186/s12870-017-1131-2. PMC 5688510. PMID 29143598.
  69. Zhao K, Tung CW, Eizenga GC, Wright MH, Ali ML, Price AH, et al. (September 2011). "Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa". Nature Communications. 2 (1): 467. Bibcode:2011NatCo...2..467Z. doi:10.1038/ncomms1467. PMC 3195253. PMID 21915109.
  70. Bartoli C, Roux F (2017). "Genome-Wide Association Studies In Plant Pathosystems: Toward an Ecological Genomics Approach". Frontiers in Plant Science. 8: 763. doi:10.3389/fpls.2017.00763. PMC 5441063. PMID 28588588.
  71. Abasht B, Lamont SJ (October 2007). "Genome-wide association analysis reveals cryptic alleles as an important factor in heterosis for fatness in chicken F2 population". Animal Genetics. 38 (5): 491–498. doi:10.1111/j.1365-2052.2007.01642.x. PMID 17894563.
  72. Sebastiani P, Solovieff N, Puca A, Hartley SW, Melista E, Andersen S, Dworkis DA, Wilk JB, Myers RH, Steinberg MH, Montano M, Baldwin CT, Perls TT (July 2010). "Genetic signatures of exceptional longevity in humans". Science. 2010 (5987). doi:10.1126/science.1190532. PMID 20595579. (Retracted, see doi:10.1126/science.333.6041.404-a, PMID 21778381,  Retraction Watch)Closed access icon
  73. MacArthur D (8 July 2010). "Serious flaws revealed in "longevity genes" study". Wired. Retrieved 7 December 2011.
  74. Sebastiani P, Solovieff N, Puca A, Hartley SW, Melista E, Andersen S, Dworkis DA, Wilk JB, Myers RH, Steinberg MH, Montano M, Baldwin CT, Perls TT (July 2011). "Retraction". Science. 333 (6041): 404. doi:10.1126/science.333.6041.404-a. PMID 21778381.Closed access icon
  75. Sebastiani P, Solovieff N, Dewan AT, Walsh KM, Puca A, Hartley SW, Melista E, Andersen S, Dworkis DA, Wilk JB, Myers RH, Steinberg MH, Montano M, Baldwin CT, Hoh J, Perls TT (18 January 2012). "Genetic signatures of exceptional longevity in humans". PLOS ONE. 7 (1): e29848. Bibcode:2012PLoSO...729848S. doi:10.1371/journal.pone.0029848. PMC 3261167. PMID 22279548.
  76. Tam V, Patel N, Turcotte M, Bossé Y, Paré G, Meyre D (May 2019). "Benefits and limitations of genome-wide association studies". Nature Reviews Genetics. 20 (8): 467–484. doi:10.1038/s41576-019-0127-1. ISSN 1471-0056. PMID 31068683. S2CID 148570302.
  77. Rosenberg NA, Huang L, Jewett EM, Szpiech ZA, Jankovic I, Boehnke M (May 2010). "Genome-wide association studies in diverse populations". Nature Reviews Genetics. 11 (5): 356–66. doi:10.1038/nrg2760. PMC 3079573. PMID 20395969.
  78. Sham PC, Cherny SS, Purcell S, Hewitt JK (May 2000). "Power of linkage versus association analysis of quantitative traits, by use of variance-components models, for sibship data". American Journal of Human Genetics. 66 (5): 1616–30. doi:10.1086/302891. PMC 1378020. PMID 10762547.
  79. Borecki IB (2006). "Linkage and Association Studies". Encyclopedia of Life Sciences. John Wiley & Sons, Ltd. doi:10.1038/npg.els.0005483. ISBN 9780470015902. {{cite book}}: |work= ignored (help)
  80. Visscher PM, Goddard ME, Derks EM, Wray NR (May 2012). "Evidence-based psychiatric genetics, AKA the false dichotomy between common and rare variant hypotheses". Molecular Psychiatry. 17 (5): 474–85. doi:10.1038/mp.2011.65. PMID 21670730.Closed access icon
  81. Border R, Athanasiadis G, Buil A, Schork AJ, Cai N, Young AI, Werge T, Flint J, Kendler KS, Sankararaman S, Dahl AW, Zaitlen NA (2022). "Cross-trait assortative mating is widespread and inflates genetic correlation estimates". Science. 378 (6621): 754–761. Bibcode:2022Sci...378..754B. doi:10.1126/science.abo2059. ISSN 0036-8075. PMC 9901291. PMID 36395242.

External links

Personal genomics
Data collection
Field concepts
Applications
Analysis techniques
Major projects

Categories: