Misplaced Pages

Phylogenetic autocorrelation: Difference between revisions

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Browse history interactively← Previous editContent deleted Content addedVisualWikitext
Revision as of 22:27, 26 July 2016 editDouglas R. White (talk | contribs)Extended confirmed users1,001 edits Resources← Previous edit Latest revision as of 14:01, 2 December 2024 edit undoInteraccoonale (talk | contribs)Extended confirmed users2,846 editsm Resources: // Edit via Wikiplus 
(34 intermediate revisions by 21 users not shown)
Line 1: Line 1:
{{short description|Problem of drawing inferences from cross-cultural data}}
'''Galton's problem''', named after Sir ], is the problem of drawing inferences from ] data, due to the statistical phenomenon now called ]. The problem is now recognized as a general one that applies to all nonexperimental studies and to ] as well. It is most simply described as the problem of external dependencies in making statistical estimates when the elements sampled are not ]. '''Phylogenetic autocorrelation''' also known as '''Galton's problem''', after Sir ] who described it, is the problem of drawing inferences from ] data, due to the statistical phenomenon now called ]. The problem is now recognized as a general one that applies to all nonexperimental studies and to some ]s as well. It is most simply described as the problem of external dependencies in making statistical estimates when the elements sampled are not ].
Asking two people in the same household whether they watch TV, for example, does not give you statistically independent answers. The sample size, ''n'', for independent observations in this case is one, not two. Once proper adjustments are made that deal with external dependencies, then the axioms of probability theory concerning statistical independence will apply. These axioms are important for deriving measures of ], for example, or tests of ]. Asking two people in the same household whether they watch TV, for example, does not give you statistically independent answers. The sample size, ''n'', for independent observations in this case is one, not two. Once proper adjustments are made that deal with external dependencies, then the axioms of probability theory concerning statistical independence will apply. These axioms are important for deriving measures of ], for example, or tests of ].


==Origin== ==Origin==
In 1888, Galton was present when Sir ] presented a paper at the Royal Anthropological Institute. Tylor had compiled information on institutions of marriage and descent for 350 cultures and examined the correlations between these institutions and measures of societal complexity. Tylor interpreted his results as indications of a general evolutionary sequence, in which institutions change focus from the maternal line to the paternal line as societies become increasingly complex. Galton disagreed, pointing out that similarity between cultures could be due to borrowing, could be due to common descent, or could be due to evolutionary development; he maintained that without controlling for borrowing and common descent one cannot make valid inferences regarding evolutionary development. Galton's critique has become the eponymous ''Galton's Problem'',<ref>Stocking, George W. Jr. (1968). "Edward Burnett Tylor." ''International Encyclopedia of the Social Sciences.'' David L. Sills, editor, New York, Mcmillan Company: v.16, pp.&nbsp;170–177.</ref>{{rp|175}} as named by ],<ref>{{Cite journal |author=] |year=1961 |title=Two solutions to Galton's Problem |journal=] |volume=28 |pages=15–29 |doi=10.1086/287778}}</ref><ref>{{Cite journal |author=] |year=1965 |title=Galton's problem: The logic of cross cultural research |journal=] |volume=32 |pages=428–451}}</ref> who proposed the first statistical solutions. In 1888, Galton was present when Sir ] presented a paper at the ]. Tylor had compiled information on institutions of marriage and descent for 350 cultures and examined the associations between these institutions and measures of societal complexity. Tylor interpreted his results as indications of a general evolutionary sequence, in which institutions change focus from the maternal line to the paternal line as societies become increasingly complex. Galton disagreed, pointing out that similarity between cultures could be due to borrowing, could be due to common descent, or could be due to evolutionary development; he maintained that without controlling for borrowing and common descent one cannot make valid inferences regarding evolutionary development. Galton's critique has become the eponymous ''Galton's Problem'',<ref>Stocking, George W. Jr. (1968). "Edward Burnett Tylor." ''International Encyclopedia of the Social Sciences.'' David L. Sills, editor, New York, Mcmillan Company: v.16, pp.&nbsp;170–177.</ref>{{rp|175}} as named by ],<ref>{{Cite journal |author=Raoul Naroll |year=1961 |title=Two solutions to Galton's Problem |journal=] |volume=28 |pages=15–29 |doi=10.1086/287778|s2cid=121671403 |author-link=Raoul Naroll }}</ref><ref>{{Cite journal |author=Raoul Naroll |year=1965 |title=Galton's problem: The logic of cross cultural research |journal=] |volume=32 |pages=428–451|author-link=Raoul Naroll }}</ref> who proposed the first statistical solutions.


By the early 20th century unilineal evolutionism was abandoned and along with it the drawing of direct inferences from correlations to evolutionary sequences. Galton's criticisms proved equally valid, however, for inferring functional relations from correlations. The problem of autocorrelation remained. By the early 20th century ] was abandoned and along with it the drawing of direct inferences from correlations to evolutionary sequences. Galton's criticisms proved equally valid, however, for inferring functional relations from correlations. The problem of autocorrelation remained.


==Solutions== ==Solutions==
Line 11: Line 12:
Statistician ] in 1914 developed methods of eliminating spurious correlation due to how position in time or space affects similarities. Today's election ] have a similar problem: the closer the poll to the election, the less individuals make up their mind independently, and the greater the ] of the polling results, especially the ] or ]. The effective ''n'' of ] from their sample drops as the election nears. ] falls with lower effective sample size. Statistician ] in 1914 developed methods of eliminating spurious correlation due to how position in time or space affects similarities. Today's election ] have a similar problem: the closer the poll to the election, the less individuals make up their mind independently, and the greater the ] of the polling results, especially the ] or ]. The effective ''n'' of ] from their sample drops as the election nears. ] falls with lower effective sample size.


The problem pops up in ] when sociologists want to reduce the travel time to do their interviews, and hence they divide their population into ] and sample the clusters randomly, then sample again within the clusters. If they interview ''n'' people in clusters of size ''m'' the effective ] (efs) would have a lower limit of 1 + (''n'' - 1) / ''m'' if everyone in each cluster were identical. When there are only partial similarities within clusters, the ''m'' in this formula has to be lowered accordingly. A formula of this sort is 1 + d (''n'' - 1) where d is the ] for the statistic in question.<ref></ref> In general, estimations of the appropriate efs depends on the ] estimated, as for example, ], ], ], ] coefficient, and their ]s. The problem pops up in ] when sociologists want to reduce the travel time to do their interviews, and hence they divide their population into ] and sample the clusters randomly, then sample again within the clusters. If they interview ''n'' people in clusters of size ''m'' the effective ] (efs) would have a lower limit of {{nowrap|1 + (''n'' 1) / ''m''}} if everyone in each cluster were identical. When there are only partial similarities within clusters, the ''m'' in this formula has to be lowered accordingly. A formula of this sort is {{nowrap|1 + d (''n'' 1)}} where ''d'' is the ] for the statistic in question.<ref>{{Cite web |url=http://faculty.smu.edu/slstokes/stat3370/deff.pdf |title=Sample Size and Design Effect |access-date=2006-11-01 |archive-url=https://web.archive.org/web/20060414134125/http://faculty.smu.edu/slstokes/stat3370/deff.pdf |archive-date=2006-04-14 |url-status=dead }}</ref> In general, estimation of the appropriate efs depends on the ] estimated, as for example, ], ], ], ] coefficient, and their ]s.


For ], ]<ref>{{Cite journal For ], ]<ref>{{Cite journal
Line 19: Line 20:
| journal = ] | journal = ]
| volume = 9 | volume = 9
| pages = 329&ndash;369. | pages = 329&ndash;369
| url = http://repositories.cdlib.org/imbs/socdyn/wp/Standard_Cross-Cultural_Sample | url = http://repositories.cdlib.org/imbs/socdyn/wp/Standard_Cross-Cultural_Sample
}}</ref> }}</ref>
estimated the size of patches of similarities in their sample of 186 societies. The four variables they tested – language, economy, political integration, and descent – had patches of similarities that varied from size three to size ten. A very crude rule of thumb might be to divide the square root of the similarity-patch sizes into ''n'', so that the effective sample sizes are 58 and 107 for these patches, respectively. Again, statistical significance falls with lower effective sample size. estimated the size of patches of similarities in their sample of 186 societies. The four variables they tested – language, economy, political integration, and descent – had patches of similarities that varied from size three to size ten. A very crude rule of thumb might be to divide the square root of the similarity-patch sizes into ''n'', so that the effective sample sizes are 58 and 107 for these patches, respectively. Again, statistical significance falls with lower effective sample size.


In modern analysis spatial lags have been modelled in order to estimate the degree of globalization on modern societies.<ref>{{cite journal | author = Jahn, Detlef | year = 2006 | title = Globalization as ''Galton's Problem'': The Missing Link in the Analysis of the Diffusion Patterns in Welfare State Development | url = http://intersci.ss.uci.edu/pw/DetlefJahn2006.pdf | format = PDF | journal = ] | volume = 60 | issue = 2| pages = 401–431 | doi=10.1017/s0020818306060127}} </ref> In modern analysis spatial lags have been modelled in order to estimate the degree of globalization on modern societies.<ref>{{cite journal | author = Jahn, Detlef | year = 2006 | title = Globalization as ''Galton's Problem'': The Missing Link in the Analysis of the Diffusion Patterns in Welfare State Development | url = http://intersci.ss.uci.edu/pw/DetlefJahn2006.pdf | journal = ] | volume = 60 | issue = 2| pages = 401–431 | doi=10.1017/s0020818306060127| s2cid = 154976704 }} </ref>


] or ] is a fundamental concept in geography. Methods developed by geographers that measure and control for spatial autocorrelation<ref>Cliff, A.D., and J.K. Ord. 1973. ''Spatial Autocorrelation''. London: Pion Press.</ref><ref>Cliff, A.D. and J.K. Ord. 1981. ''Spatial Processes''. London: Pion Press.</ref> do far more than reduce the effective ''n'' for ] of a correlation. One example is the complicated hypothesis that "the presence of gambling in a society is directly proportional to the presence of a commercial money and to the presence of considerable socioeconomic differences and is inversely related to whether or not the society is a nomadic herding society." Tests of this hypothesis in a sample of 60 societies failed to reject the null hypothesis. Autocorrelation analysis, however, showed a significant effect of socioeconomic differences.<ref>{{Cite journal |author=Malcolm M. Dow, Michael L. Burton, ], and Karl P. Reitz |year=1984 |title=Galton's problem as network autocorrelation |journal=] |volume=11 |issue=4 |pages=754–770 |doi=10.1525/ae.1984.11.4.02a00080}}</ref> ] or ] is a fundamental concept in geography. Methods developed by geographers that measure and control for spatial autocorrelation<ref>Cliff, A.D., and J.K. Ord. 1973. ''Spatial Autocorrelation''. London: Pion Press.</ref><ref>Cliff, A.D. and J.K. Ord. 1981. ''Spatial Processes''. London: Pion Press.</ref> do far more than reduce the effective ''n'' for ] of a correlation. One example is the complicated hypothesis that "the presence of gambling in a society is directly proportional to the presence of a commercial money and to the presence of considerable socioeconomic differences and is inversely related to whether or not the society is a nomadic herding society."
<ref>
{{cite journal
| last = Pryor
| first = Frederick
| author-link =
| title = The Diffusion Possibility Method: A More General and Simpler Solution to Galton's Problem
| journal = American Ethnologist
| volume = 3
| issue = 4
| pages = 731–749
| publisher = American Anthropological Association
| language =
| jstor =
| issn =
| doi =10.1525/ae.1976.3.4.02a00100
| id =
| mr =
| zbl =
| jfm =
| year = 1976
| doi-access =
}}
</ref>
Tests of this hypothesis in a sample of 60 societies failed to reject the null hypothesis. Autocorrelation analysis, however, showed a significant effect of socioeconomic differences.<ref>{{Cite journal |author=Malcolm M. Dow, Michael L. Burton, ], and Karl P. Reitz |year=1984 |title=Galton's problem as network autocorrelation |journal=] |volume=11 |issue=4 |pages=754–770 |doi=10.1525/ae.1984.11.4.02a00080|s2cid=143111431 |url=https://www.researchgate.net/publication/246285163}}</ref>


How prevalent is autocorrelation among the variables studied in cross-cultural research? A test by Anthon Eff on 1700 variables in the cumulative database for the ], published in , measured Moran's I for spatial autocorrelation (distance), linguistic autocorrelation (common descent), and autocorrelation in cultural complexity (mainline evolution). "The results suggest that ... it would be prudent to test for spatial and phylogenetic autoccorrelation when conducting regression analyses with the Standard Cross-Cultural Sample."<ref>{{Cite journal |author=E. Anthon Eff |year=2004 |url=http://www.mtsu.edu/~eaeff/downloads/SAS2001.pdf |title=Does Mr. Galton still have a Problem? Autocorrelation in the Standard Cross-Cultural Sample |journal=] |volume=15 |issue=2 |pages=153–170}}</ref> How prevalent is autocorrelation among the variables studied in cross-cultural research? A test by Anthon Eff on 1700 variables in the cumulative database for the ], published in , measured ] for spatial autocorrelation (distance), linguistic autocorrelation (common descent), and autocorrelation in cultural complexity (mainline evolution). "The results suggest that ... it would be prudent to test for spatial and phylogenetic autoccorrelation when conducting regression analyses with the Standard Cross-Cultural Sample."<ref>{{Cite journal |author=E. Anthon Eff |year=2004 |url=http://www.mtsu.edu/~eaeff/downloads/SAS2001.pdf |title=Does Mr. Galton still have a Problem? Autocorrelation in the Standard Cross-Cultural Sample |journal=World Cultures |volume=15 |issue=2 |pages=153–170}}</ref>
The use of autocorrelation tests in exploratory data analysis is illustrated, showing how all variables in a given study can be evaluated for nonindependence of cases in terms of distance, language, and cultural complexity. The methods for estimating these autocorrelation effects are then explained and illustrated for ordinary least squares regression using again the Moran I significance measure of autocorrelation. The use of autocorrelation tests in exploratory data analysis is illustrated, showing how all variables in a given study can be evaluated for nonindependence of cases in terms of distance, language, and cultural complexity. The methods for estimating these autocorrelation effects are then explained and illustrated for ordinary least squares regression using again the Moran I significance measure of autocorrelation.


Line 35: Line 60:
==Resources== ==Resources==


A public server, if used externally at http://SocSciCompute.ss.uci.edu, offers ethnographic data, variables and tools for inference with R scripts by Dow (2007) and Eff and Dow (2009) in an NSF supported Galaxy (http://getgalaxy.org) framework (https://www.xsede.org) for instructors, students and researchers to do http://SocSciCompute.ss.uci.edu "CoSSci Galaxy" cross-cultural research modeling with controls for Galton's problem using Standard Cross-Cultural Sample variables at https://dl.dropboxusercontent.com/u/9256203/SCCScodebook.txt. A public server, if used externally at http://SocSciCompute.ss.uci.edu {{Webarchive|url=https://web.archive.org/web/20160220092715/http://socscicompute.ss.uci.edu/ |date=2016-02-20 }}, offers ethnographic data, variables and tools for inference with R scripts by Dow (2007) and Eff and Dow (2009) in an NSF supported Galaxy (http://getgalaxy.org) framework (https://www.xsede.org) for instructors, students and researchers to do {{Webarchive|url=https://web.archive.org/web/20160220092715/http://socscicompute.ss.uci.edu/ |date=2016-02-20 }} with controls for Galton's problem using Standard Cross-Cultural Sample variables at https://web.archive.org/web/20160402201432/https://dl.dropboxusercontent.com/u/9256203/SCCScodebook.txt.


==Opportunities== ==Opportunities==
In Anthropology, where Tylor's problem was first recognized by the statistician Galton in 1889, it is still not widely recognized that there are standard statistical adjustments for the problem of patches of similarity in observed cases and opportunities for new discoveries using autocorrelation methods. Some cross-cultural researchers (see, e.g., ] and de Munck 2003)<ref>{{Cite journal |url=http://www.anthrosource.net/doi/abs/10.1525/aa.2003.105.2.353?journalCode=aa |title=''Galton's Asset'' and ''Flower's Problem'': Cultural Networks and Cultural Units in Cross-Cultural Research |author=] and Victor de Munck |journal=] |year=2003 |volume=105 |issue=2 |pages=353–358 |doi=10.1525/aa.2003.105.2.353}}</ref> In anthropology, where Tylor's problem was first recognized by the statistician Galton in 1889, it is still not widely recognized that there are standard statistical adjustments for the problem of patches of similarity in observed cases and opportunities for new discoveries using autocorrelation methods. Some cross-cultural researchers (see, e.g., ] and de Munck 2003)<ref>{{Cite journal |url=https://www.academia.edu/3116534 |title=''Galton's Asset'' and ''Flower's Problem'': Cultural Networks and Cultural Units in Cross-Cultural Research |author=] and Victor de Munck |journal=] |year=2003 |volume=105 |issue=2 |pages=353–358 |doi=10.1525/aa.2003.105.2.353}}</ref>
have begun to realize that evidence of diffusion, historical origin, and other sources of similarity among related societies or individuals should be renamed Galton's Opportunity and Galton's Asset rather than Galton's Problem. Researchers now use longitudinal, cross-cultural, and regional variation analysis routinely to analyze all the competing hypotheses: ] relationships, ], common historical origin, ], ] with environment, and complex ] dynamics.<ref>{{cite journal |author1=Mace, Ruth |author2=Pagel, Mark | year = 1994 | title = The Comparative Method in Anthropology | url = http://www.journals.uchicago.edu/cgi-bin/resolve?doi=10.1086/204317&erFrom=8676762015401742140Guest | journal = ] | volume = 35 | issue = 5| pages = 549–564 | doi=10.1086/204317 }}</ref> have begun to realize that evidence of diffusion, historical origin, and other sources of similarity among related societies or individuals should be renamed Galton's Opportunity and Galton's Asset rather than Galton's Problem. Researchers now use longitudinal, cross-cultural, and regional variation analysis routinely to analyze all the competing hypotheses: functional relationships, ], common historical origin, ], ] with environment, and complex ] dynamics.<ref>{{cite journal |author1=Mace, Ruth |author2=Pagel, Mark | year = 1994 | title = The Comparative Method in Anthropology | journal = ] | volume = 35 | issue = 5| pages = 549–564 | doi=10.1086/204317 |s2cid=146297584 }}</ref>


==Controversies== ==Controversies==
{{pov|section|date=November 2017}}
Within anthropology, Galton's problem is often given as a cause to reject comparative studies altogether. Since the problem is a general one, common to the sciences and statistical inference generally, this particular criticism of cross-cultural or comparative studies – and there are many – is one that, logically speaking, amounts to a rejection of science and statistics altogether. Any data collected and analyzed by ethnographers, for example, is equally subject to Galton's problem, understood in its most general sense. A critique of the anticomparative critique is not limited to statistical comparison since it would apply as well to the analysis of text. That is, the analysis and use of text in argumentation is subject to critique as to the evidential basis of inference. Reliance purely on rhetoric is no protection against critique as to the validity of argument and its evidentiary basis.


Within anthropology, the problem of phylogenetic auocorrelation is often given as a cause to reject comparative studies altogether. Since the problem is a general one, common to the sciences and statistical inference generally, this particular criticism of cross-cultural or comparative studies – and there are many – is one that, logically speaking, amounts to a rejection of science and statistics altogether. Any data collected and analyzed by ethnographers, for example, is equally subject to autocorrelation, understood in its most general sense. A critique of the anticomparative critique is not limited to statistical comparison since it would apply as well to the analysis of text. That is, the analysis and use of text in argumentation is subject to critique as to the evidential basis of inference. Reliance purely on rhetoric is no protection against critique as to the validity of argument and its evidentiary basis.
There is little doubt, however, that the community of cross-cultural researchers have been remiss in ignoring Galton's problem. Expert investigation of this question shows results that "strongly suggest that the extensive reporting of naïve chi-square independence tests using cross-cultural data sets over the past several decades has led to incorrect rejection of null hypotheses at levels much higher than the expected 5% rate."<ref name="Dow">{{Cite journal |author=Malcolm M. Dow |year=1993 |title=Saving the theory: on chi-square tests with cross-cultural survey data |journal=] |volume=27 |issue=3–4 |pages=247–276 |doi=10.1177/106939719302700305}}</ref>{{rp|247}} The investigator concludes that "Incorrect theories that have been 'saved' by naïve chi-square tests with comparative data may yet be more rigorously tested another day."<ref name="Dow"/>{{rp|270}} Once again, the adjusted variance of a cluster sample is given as one multiplied by 1 + ''d'' (''k'' + 1) where ''k'' is the average size of a cluster, and a more complicated correction is given for the variance of contingency table correlations with ''r'' rows and ''c'' columns. Since this critique was published in 1993, and others like it, more authors have begun to adopt corrections for Galton's problem, but the majority in the cross-cultural field have not. Consequently, a large proportion of published results that rely on naive significance tests and that adopt the ''P'' < 0.05 rather than a ''P'' < 0.005 standard are likely to be in error because they are more susceptible to ], which is to reject the null hypothesis when it is true.


There is little doubt, however, that the community of cross-cultural researchers have been remiss in ignoring autocorrelation. Expert investigation of this question shows results that "strongly suggest that the extensive reporting of naïve chi-square independence tests using cross-cultural data sets over the past several decades has led to incorrect rejection of null hypotheses at levels much higher than the expected 5% rate."<ref name="Dow">{{Cite journal |author=Malcolm M. Dow |year=1993 |title=Saving the theory: on chi-square tests with cross-cultural survey data |journal=] |volume=27 |issue=3–4 |pages=247–276 |doi=10.1177/106939719302700305|s2cid=122509821 }}</ref>{{rp|247}} The investigator concludes that "Incorrect theories that have been 'saved' by naïve chi-square tests with comparative data may yet be more rigorously tested another day."<ref name="Dow"/>{{rp|270}} Once again, the adjusted variance of a cluster sample is given as one multiplied by 1 + ''d'' (''k'' + 1) where ''k'' is the average size of a cluster, and a more complicated correction is given for the variance of contingency table correlations with ''r'' rows and ''c'' columns. Since this critique was published in 1993, and others like it, more authors have begun to adopt corrections for Galton's problem, but the majority in the cross-cultural field have not. Consequently, a large proportion of published results that rely on naive significance tests and that adopt the ''P'' < 0.05 rather than a ''P'' < 0.005 standard are likely to be in error because they are more susceptible to ], which is to reject the null hypothesis when it is true.
Some cross-cultural researchers reject the seriousness of Galton's problem because, they argue, estimates of correlations and means may be unbiased even if autocorrelation, weak or strong, is present. Without investigating autocorrelation, however, they may still ] statistics dealing with relationships among variables. In ], for example, examining the patterns of ] may give important clues to third factors that may affect the relationships among variables but that have not been included in the regression model. Second, if there are

clusters of similar and related societies in the sample, measures of variance will be underestimated, leading to spurious statistical conclusions. for example, exaggerating the statistical significance of correlations. Third, the underestimation of variance makes it difficult to test for replication of results from two different samples, as the results will more often be rejected as similar.
Some cross-cultural researchers reject the seriousness of the problem of autocorrelation because, they argue, estimates of correlations and means may be unbiased even if autocorrelation, weak or strong, is present. Without investigating autocorrelation, however, they may still ] statistics dealing with relationships among variables. In ], for example, examining the patterns of ] may give important clues to third factors that may affect the relationships among variables but that have not been included in the regression model. Second, if there are clusters of similar and related societies in the sample, measures of variance will be underestimated, leading to spurious statistical conclusions. for example, exaggerating the statistical significance of correlations. Third, the underestimation of variance makes it difficult to test for replication of results from two different samples, as the results will more often be rejected as similar.


==See also== ==See also==
*] *]

==References==
{{reflist|24em}}


==Further reading== ==Further reading==
*{{cite journal | author = Dow, M. M. | year = 2007 | title = Galton's Problem as multiple network autocorrelation effects | url = http://intersci.ss.uci.edu/pdf/DowM.M.2007.pdf | format = PDF | journal = Cross-Cultural Research | volume = 41 | issue = | pages = 336–363 }} *{{cite journal | author = Dow, M. M. | year = 2007 | title = Galton's Problem as multiple network autocorrelation effects | url = http://intersci.ss.uci.edu/pdf/DowM.M.2007.pdf | journal = Cross-Cultural Research | volume = 41 | issue = 4| pages = 336–363 | doi = 10.1177/1069397107305452 | s2cid = 143230639 }}
*Eff, E. Anthon and Malcolm M. Dow. 2009. "How to Deal with Missing Data and Galton's Problem in Cross-Cultural Survey Research: A Primer for R." Structure and Dynamics: eJournal of Anthropological and Related Sciences 3(3):223-252. https://escholarship.org/uc/item/7cm1f10b *Eff, E. Anthon and Malcolm M. Dow. 2009. "How to Deal with Missing Data and Galton's Problem in Cross-Cultural Survey Research: A Primer for R." Structure and Dynamics: eJournal of Anthropological and Related Sciences 3(3):223–252. https://escholarship.org/uc/item/7cm1f10b
*Oztan, B. Tolga. 2016. Evolution of Cooperation: Comparative Study of Kinship Behavior. PhD Thesis, UC Irvine. Mathematical Behavioral Sciences. http://intersci.ss.uci.edu/pdf/latest/thesisJan2Tolga2015.pdf (extensive treatment of Dow-Eff solution to Galton's problem). *Oztan, B. Tolga. 2016. Evolution of Cooperation: Comparative Study of Kinship Behavior. PhD Thesis, UC Irvine. Mathematical Behavioral Sciences. http://intersci.ss.uci.edu/pdf/latest/thesisJan2Tolga2015.pdf (extensive treatment of Dow–Eff solution to Galton's problem).
*IntersciWiki. 2007. (including software and tutorial) *IntersciWiki. 2007. (including software and tutorial)
*IntersciWiki. 2009. (bibliography) *IntersciWiki. 2009. (bibliography)
*{{cite journal | author = Student (W. S. Gosset) | year = 1914 | title = The elimination of spurious correlation due to position in time or space | journal = ] | volume = 10 | issue = | pages = 179–181 | doi=10.2307/2331746}} *{{cite journal | author = Student (W. S. Gosset) | year = 1914 | title = The elimination of spurious correlation due to position in time or space | journal = ] | volume = 10 | issue = 1| pages = 179–181 | doi=10.2307/2331746| jstor = 2331746 | url = https://zenodo.org/record/1449460 }}
*{{cite journal | author = Tylor, Edward E. | authorlink = Edward Tylor | year = 1889 | title = On a Method of Investigating the Development of Institutions Applied to the Laws of Marriage and Descent | journal = Journal of the Royal Anthropological Institute | volume = 18 | issue = 3| pages = 245–72 | doi=10.2307/2842423}} *{{cite journal | author = Tylor, Edward E. | authorlink = Edward Tylor | year = 1889 | title = On a Method of Investigating the Development of Institutions Applied to the Laws of Marriage and Descent | journal = Journal of the Royal Anthropological Institute | volume = 18 | issue = 3| pages = 245–72 | doi=10.2307/2842423| jstor = 2842423 | hdl = 2027/hvd.32044097779680 | hdl-access = free }}
*{{cite journal | author = Witkowski, Stanley | year = 1974 | title = Galton's opportunity - hologeistic study of historical processes | journal = Behavior Science Research | volume = 9 | issue = 1| pages = 11–15 | doi=10.1177/106939717400900105}} *{{cite journal | author = Witkowski, Stanley | year = 1974 | title = Galton's opportunity hologeistic study of historical processes | journal = Behavior Science Research | volume = 9 | issue = 1| pages = 11–15 | doi=10.1177/106939717400900105| s2cid = 144398651 }}

==References==
{{reflist|24em}}


] ]
] ]
]

Latest revision as of 14:01, 2 December 2024

Problem of drawing inferences from cross-cultural data

Phylogenetic autocorrelation also known as Galton's problem, after Sir Francis Galton who described it, is the problem of drawing inferences from cross-cultural data, due to the statistical phenomenon now called autocorrelation. The problem is now recognized as a general one that applies to all nonexperimental studies and to some experimental designs as well. It is most simply described as the problem of external dependencies in making statistical estimates when the elements sampled are not statistically independent. Asking two people in the same household whether they watch TV, for example, does not give you statistically independent answers. The sample size, n, for independent observations in this case is one, not two. Once proper adjustments are made that deal with external dependencies, then the axioms of probability theory concerning statistical independence will apply. These axioms are important for deriving measures of variance, for example, or tests of statistical significance.

Origin

In 1888, Galton was present when Sir Edward Tylor presented a paper at the Royal Anthropological Institute. Tylor had compiled information on institutions of marriage and descent for 350 cultures and examined the associations between these institutions and measures of societal complexity. Tylor interpreted his results as indications of a general evolutionary sequence, in which institutions change focus from the maternal line to the paternal line as societies become increasingly complex. Galton disagreed, pointing out that similarity between cultures could be due to borrowing, could be due to common descent, or could be due to evolutionary development; he maintained that without controlling for borrowing and common descent one cannot make valid inferences regarding evolutionary development. Galton's critique has become the eponymous Galton's Problem, as named by Raoul Naroll, who proposed the first statistical solutions.

By the early 20th century unilineal evolutionism was abandoned and along with it the drawing of direct inferences from correlations to evolutionary sequences. Galton's criticisms proved equally valid, however, for inferring functional relations from correlations. The problem of autocorrelation remained.

Solutions

Statistician William S. Gosset in 1914 developed methods of eliminating spurious correlation due to how position in time or space affects similarities. Today's election polls have a similar problem: the closer the poll to the election, the less individuals make up their mind independently, and the greater the unreliability of the polling results, especially the margin of error or confidence limits. The effective n of independent cases from their sample drops as the election nears. Statistical significance falls with lower effective sample size.

The problem pops up in sample surveys when sociologists want to reduce the travel time to do their interviews, and hence they divide their population into local clusters and sample the clusters randomly, then sample again within the clusters. If they interview n people in clusters of size m the effective sample size (efs) would have a lower limit of 1 + (n − 1) / m if everyone in each cluster were identical. When there are only partial similarities within clusters, the m in this formula has to be lowered accordingly. A formula of this sort is 1 + d (n − 1) where d is the intraclass correlation for the statistic in question. In general, estimation of the appropriate efs depends on the statistic estimated, as for example, mean, chi-square, correlation, regression coefficient, and their variances.

For cross-cultural studies, Murdock and White estimated the size of patches of similarities in their sample of 186 societies. The four variables they tested – language, economy, political integration, and descent – had patches of similarities that varied from size three to size ten. A very crude rule of thumb might be to divide the square root of the similarity-patch sizes into n, so that the effective sample sizes are 58 and 107 for these patches, respectively. Again, statistical significance falls with lower effective sample size.

In modern analysis spatial lags have been modelled in order to estimate the degree of globalization on modern societies.

Spatial dependency or auto-correlation is a fundamental concept in geography. Methods developed by geographers that measure and control for spatial autocorrelation do far more than reduce the effective n for tests of significance of a correlation. One example is the complicated hypothesis that "the presence of gambling in a society is directly proportional to the presence of a commercial money and to the presence of considerable socioeconomic differences and is inversely related to whether or not the society is a nomadic herding society." Tests of this hypothesis in a sample of 60 societies failed to reject the null hypothesis. Autocorrelation analysis, however, showed a significant effect of socioeconomic differences.

How prevalent is autocorrelation among the variables studied in cross-cultural research? A test by Anthon Eff on 1700 variables in the cumulative database for the Standard Cross-Cultural Sample, published in World Cultures, measured Moran's I for spatial autocorrelation (distance), linguistic autocorrelation (common descent), and autocorrelation in cultural complexity (mainline evolution). "The results suggest that ... it would be prudent to test for spatial and phylogenetic autoccorrelation when conducting regression analyses with the Standard Cross-Cultural Sample." The use of autocorrelation tests in exploratory data analysis is illustrated, showing how all variables in a given study can be evaluated for nonindependence of cases in terms of distance, language, and cultural complexity. The methods for estimating these autocorrelation effects are then explained and illustrated for ordinary least squares regression using again the Moran I significance measure of autocorrelation.

When autocorrelation is present, it can often be removed to get unbiased estimates of regression coefficients and their variances by constructing a respecified dependent variable that is "lagged" by weightings on the dependent variable on other locations, where the weights are degree of relationship. This lagged dependent variable is endogenous, and estimation requires either two-stage least squares or maximum likelihood methods.

Resources

A public server, if used externally at http://SocSciCompute.ss.uci.edu Archived 2016-02-20 at the Wayback Machine, offers ethnographic data, variables and tools for inference with R scripts by Dow (2007) and Eff and Dow (2009) in an NSF supported Galaxy (http://getgalaxy.org) framework (https://www.xsede.org) for instructors, students and researchers to do "CoSSci Galaxy" cross-cultural research modeling Archived 2016-02-20 at the Wayback Machine with controls for Galton's problem using Standard Cross-Cultural Sample variables at https://web.archive.org/web/20160402201432/https://dl.dropboxusercontent.com/u/9256203/SCCScodebook.txt.

Opportunities

In anthropology, where Tylor's problem was first recognized by the statistician Galton in 1889, it is still not widely recognized that there are standard statistical adjustments for the problem of patches of similarity in observed cases and opportunities for new discoveries using autocorrelation methods. Some cross-cultural researchers (see, e.g., Korotayev and de Munck 2003) have begun to realize that evidence of diffusion, historical origin, and other sources of similarity among related societies or individuals should be renamed Galton's Opportunity and Galton's Asset rather than Galton's Problem. Researchers now use longitudinal, cross-cultural, and regional variation analysis routinely to analyze all the competing hypotheses: functional relationships, diffusion, common historical origin, multilineal evolution, co-adaptation with environment, and complex social interaction dynamics.

Controversies

The neutrality of this article is disputed. Relevant discussion may be found on the talk page. Please do not remove this message until conditions to do so are met. (November 2017) (Learn how and when to remove this message)

Within anthropology, the problem of phylogenetic auocorrelation is often given as a cause to reject comparative studies altogether. Since the problem is a general one, common to the sciences and statistical inference generally, this particular criticism of cross-cultural or comparative studies – and there are many – is one that, logically speaking, amounts to a rejection of science and statistics altogether. Any data collected and analyzed by ethnographers, for example, is equally subject to autocorrelation, understood in its most general sense. A critique of the anticomparative critique is not limited to statistical comparison since it would apply as well to the analysis of text. That is, the analysis and use of text in argumentation is subject to critique as to the evidential basis of inference. Reliance purely on rhetoric is no protection against critique as to the validity of argument and its evidentiary basis.

There is little doubt, however, that the community of cross-cultural researchers have been remiss in ignoring autocorrelation. Expert investigation of this question shows results that "strongly suggest that the extensive reporting of naïve chi-square independence tests using cross-cultural data sets over the past several decades has led to incorrect rejection of null hypotheses at levels much higher than the expected 5% rate." The investigator concludes that "Incorrect theories that have been 'saved' by naïve chi-square tests with comparative data may yet be more rigorously tested another day." Once again, the adjusted variance of a cluster sample is given as one multiplied by 1 + d (k + 1) where k is the average size of a cluster, and a more complicated correction is given for the variance of contingency table correlations with r rows and c columns. Since this critique was published in 1993, and others like it, more authors have begun to adopt corrections for Galton's problem, but the majority in the cross-cultural field have not. Consequently, a large proportion of published results that rely on naive significance tests and that adopt the P < 0.05 rather than a P < 0.005 standard are likely to be in error because they are more susceptible to type I error, which is to reject the null hypothesis when it is true.

Some cross-cultural researchers reject the seriousness of the problem of autocorrelation because, they argue, estimates of correlations and means may be unbiased even if autocorrelation, weak or strong, is present. Without investigating autocorrelation, however, they may still mis-estimate statistics dealing with relationships among variables. In regression analysis, for example, examining the patterns of autocorrelated residuals may give important clues to third factors that may affect the relationships among variables but that have not been included in the regression model. Second, if there are clusters of similar and related societies in the sample, measures of variance will be underestimated, leading to spurious statistical conclusions. for example, exaggerating the statistical significance of correlations. Third, the underestimation of variance makes it difficult to test for replication of results from two different samples, as the results will more often be rejected as similar.

See also

References

  1. Stocking, George W. Jr. (1968). "Edward Burnett Tylor." International Encyclopedia of the Social Sciences. David L. Sills, editor, New York, Mcmillan Company: v.16, pp. 170–177.
  2. Raoul Naroll (1961). "Two solutions to Galton's Problem". Philosophy of Science. 28: 15–29. doi:10.1086/287778. S2CID 121671403.
  3. Raoul Naroll (1965). "Galton's problem: The logic of cross cultural research". Social Research. 32: 428–451.
  4. "Sample Size and Design Effect" (PDF). Archived from the original (PDF) on 2006-04-14. Retrieved 2006-11-01.
  5. George P. Murdock and Douglas R. White (1969). "Standard cross-cultural sample". Ethnology. 9: 329–369.
  6. Jahn, Detlef (2006). "Globalization as Galton's Problem: The Missing Link in the Analysis of the Diffusion Patterns in Welfare State Development" (PDF). International Organization. 60 (2): 401–431. doi:10.1017/s0020818306060127. S2CID 154976704. abstract
  7. Cliff, A.D., and J.K. Ord. 1973. Spatial Autocorrelation. London: Pion Press.
  8. Cliff, A.D. and J.K. Ord. 1981. Spatial Processes. London: Pion Press.
  9. Pryor, Frederick (1976). "The Diffusion Possibility Method: A More General and Simpler Solution to Galton's Problem". American Ethnologist. 3 (4). American Anthropological Association: 731–749. doi:10.1525/ae.1976.3.4.02a00100.
  10. Malcolm M. Dow, Michael L. Burton, Douglas R. White, and Karl P. Reitz (1984). "Galton's problem as network autocorrelation". American Ethnologist. 11 (4): 754–770. doi:10.1525/ae.1984.11.4.02a00080. S2CID 143111431.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  11. E. Anthon Eff (2004). "Does Mr. Galton still have a Problem? Autocorrelation in the Standard Cross-Cultural Sample" (PDF). World Cultures. 15 (2): 153–170.
  12. Anselin, Luc. 1988. Spatial Econometrics: Methods and Models. Dordrecht: Kluwer Academic Publishers.
  13. Andrey Korotayev and Victor de Munck (2003). "Galton's Asset and Flower's Problem: Cultural Networks and Cultural Units in Cross-Cultural Research". American Anthropologist. 105 (2): 353–358. doi:10.1525/aa.2003.105.2.353.
  14. Mace, Ruth; Pagel, Mark (1994). "The Comparative Method in Anthropology". Current Anthropology. 35 (5): 549–564. doi:10.1086/204317. S2CID 146297584.
  15. ^ Malcolm M. Dow (1993). "Saving the theory: on chi-square tests with cross-cultural survey data". Cross-Cultural Research. 27 (3–4): 247–276. doi:10.1177/106939719302700305. S2CID 122509821.

Further reading

Categories: