Psychometrics: Difference between revisions

Browse history interactively ← Previous editContent deleted Content addedVisual WikitextInline

Revision as of 13:45, 23 January 2014 editIsomorphic (talk \| contribs)Extended confirmed users6,546 edits →Key concepts: change "validity" link to the concept specific to psychometrics← Previous edit		Latest revision as of 12:58, 6 January 2025 edit undoMathKeduor7 (talk \| contribs)Extended confirmed users1,581 editsm link Cambridge University Press
(428 intermediate revisions by more than 100 users not shown)
Line 1:		Line 1:
			{{short description\|Theory and technique of psychological measurement}}
	{{Distinguish2\|], the measurement of the heat and water vapor properties of air}}
			{{distinguish\|text=], the measurement of the heat and water vapor properties of air}}
	{{For\|other uses of this term and similar terms\|Psychometry (disambiguation)}}		{{For\|other uses of this term and similar terms\|Psychometry (disambiguation)}}
			{{About\|the theory and technique of measurement of psychological attributes\|research design and methodology in psychology\|Psychological statistics\|the mathematical modeling of psychological theories and phenomena\|Mathematical psychology}}<!-- definitions loosely taken from the three articles and https://dictionary.apa.org/ -->
	{{Psychology sidebar}}
	'''Psychometrics''' is the field of study concerned with the theory and technique of ] ], which includes the measurement of knowledge, abilities, attitudes, ] traits, and ]. The field is primarily concerned with the construction and validation of measurement instruments such as ]s, ], and personality assessments.

			{{Psychology sidebar\|applied}}
	It involves two major research tasks, namely: (i) the construction of instruments and procedures for measurement; and (ii) the development and refinement of theoretical approaches to measurement. Those who practice psychometrics are known as psychometricians. All psychometricians possess a specific psychometric qualification, and while many are clinical psychologists, others work as ] or ] professionals.
			'''Psychometrics''' is a field of study within ] concerned with the theory and technique of ]. Psychometrics generally covers specialized fields within psychology and education devoted to testing, measurement, assessment, and related activities.<ref>{{Cite web\|url=http://www.ncme.org/ncme/NCME/Resource_Center/Glossary/NCME/Resource_Center/Glossary1.aspx?hkey=4bb87415-44dc-4088-9ed9-e8515326a061#anchorP\|title=Glossary1\|date=22 July 2017\|archive-url=https://web.archive.org/web/20170722194028/http://www.ncme.org/ncme/NCME/Resource_Center/Glossary/NCME/Resource_Center/Glossary1.aspx?hkey=4bb87415-44dc-4088-9ed9-e8515326a061#anchorP \|access-date=28 June 2022\|archive-date=2017-07-22 }}</ref> Psychometrics is concerned with the objective measurement of ] that cannot be directly observed. Examples of latent constructs include ], ], ], and ].<ref name=":0">{{cite book\|last1=Tabachnick\|first1=B.G.\|title=Using Multivariate Analysis\|last2=Fidell\|first2=L.S.\|publisher=Allyn and Bacon\|year=2001\|isbn=978-0-321-05677-1\|location=Boston}}{{Page needed\|date=November 2010}}</ref> The levels of individuals on nonobservable latent variables are ] through ] based on what is observed from individuals' responses to items on tests and scales.<ref name=":0" />

			Practitioners are described as psychometricians, although not all who engage in psychometric research go by this title. Psychometricians usually possess specific qualifications, such as degrees or certifications, and most are ] with advanced graduate training in psychometrics and measurement theory. In addition to traditional academic institutions, practitioners also work for organizations such as the ] and ]. Some psychometric researchers focus on the construction and validation of assessment instruments, including ], ], and ] or ] ]. Others focus on research relating to measurement theory (e.g., ], ]) or specialize as ] professionals.
	==19th century foundation==

			== Historical foundation ==
	Psychological testing has come from two streams of thought: one, from Darwin, Galton, and Cattell on the measurement of individual differences, and the second, from Herbart, Weber, Fechner, and Wundt and their psychophysical measurements of a similar construct. The second set of individuals and their research is what has led to the development of experimental psychology, and standardized testing.<ref name="Kaplan, R.M. 2010">Kaplan, R.M., & Saccuzzo, D.P. (2010). ''Psychological Testing: Principles, Applications, and Issues.'' (8th ed.). Belmont, CA: Wadsworth, Cengage Learning.</ref>
			Psychological testing has come from two streams of thought: the first, from ], ], and ], on the measurement of individual differences and the second, from ], ], ], and ] and their psychophysical measurements of a similar construct. The second set of individuals and their research is what has led to the development of ] and standardized testing.<ref name="Kaplan, R.M. 2010">Kaplan, R.M., & Saccuzzo, D.P. (2010). ''Psychological Testing: Principles, Applications, and Issues.'' (8th ed.). Belmont, CA: Wadsworth, Cengage Learning.</ref>

	===Victorian stream===		=== Victorian stream ===
			Charles Darwin was the inspiration behind Francis Galton, a scientist who advanced the development of psychometrics. In 1859, Darwin published his book '']''. Darwin described the role of natural selection in the emergence, over time, of different populations of species of plants and animals. The book showed how individual members of a ] differ among themselves and how they possess characteristics that are more or less adaptive to their environment. Those with more adaptive characteristics are more likely to survive to procreate and give rise to another generation. Those with less adaptive characteristics are less likely. These ideas stimulated Galton's interest in the study of human beings and how they differ one from another and how to measure those differences.

			Galton wrote a book entitled '']'' which was first published in 1869. The book described different characteristics that people possess and how those characteristics make some more "fit" than others. Today these differences, such as sensory and motor functioning (reaction time, visual acuity, and physical strength), are important domains of scientific psychology. Much of the early theoretical and applied work in psychometrics was undertaken in an attempt to measure ]. Galton often referred to as "the father of psychometrics," devised and included mental tests among his ] measures. ], a pioneer in the field of psychometrics, went on to extend Galton's work. Cattell coined the term ''mental test'', and is responsible for research and knowledge that ultimately led to the development of modern tests.<ref name = "kap">Kaplan, R.M., & Saccuzzo, D.P. (2010). ''Psychological testing: Principles, applications, and issues'' (8th ed.). Belmont, CA: Wadsworth, Cengage Learning.</ref>
	] was the inspiration behind ] who led to the creation of psychometrics. In 1859, Charles Darwin published his book "]", which pertained to individual differences in animals. This book discussed how individual members in a species differ and how they possess characteristics that are more adaptive and successful or less adaptive and less successful. Those who are adaptive and successful are the ones that survive and give way to the next generation, who would be just as or more adaptive and successful. This idea, studied previously in animals, led to Galton's interest and study of human beings and how they differ one from another, and more importantly, how to measure those differences.

			=== German stream ===
	Galton wrote a book entitled "Hereditary Genius" about different characteristics that people possess and how those characteristics make them more "fit" than others. Today these differences, such as sensory and motor functioning (reaction time, visual acuity, and physical strength) are important domains of scientific psychology. Much of the early theoretical and applied work in psychometrics was undertaken in an attempt to measure ]. ], often referred to as "the father of psychometrics," devised and included mental tests among his ] measures. James McKeen Cattell, who is considered a pioneer of psychometrics went on to extend Galton's work. Cattell also coined the term ''mental test'', and is responsible for the research and knowledge which ultimately led to the development of modern tests. (Kaplan & Saccuzzo, 2010)
			The origin of psychometrics also has connections to the related field of ]. Around the same time that Darwin, Galton, and Cattell were making their discoveries, Herbart was also interested in "unlocking the mysteries of human consciousness" through the scientific method.<ref name = "kap"/> Herbart was responsible for creating mathematical models of the mind, which were influential in educational practices for years to come.

			] built upon Herbart's work and tried to prove the existence of a psychological threshold, saying that a minimum stimulus was necessary to activate a ]. After Weber, ] expanded upon the knowledge he gleaned from Herbart and Weber, to devise the law that the strength of a sensation grows as the logarithm of the stimulus intensity. A follower of Weber and Fechner, ] is credited with founding the science of psychology. It is Wundt's influence that paved the way for others to develop psychological testing.<ref name = "kap"/>
	===German stream===

			=== 20th century ===
	The origin of psychometrics also has connections to the related field of ]. Around the same time that Darwin, Galton, and Cattell were making their discoveries, ] was also interested in "unlocking the mysteries of human consciousness" through the scientific method. (Kaplan & Saccuzzo, 2010) Herbart was responsible for creating mathematical models of the mind, which were influential in educational practices in years to come.
			In 1936, the psychometrician ], founder and first president of the Psychometric Society, developed and applied a theoretical approach to measurement referred to as the ], an approach that has close connections to the psychophysical theory of ] and ]. In addition, Spearman and Thurstone both made important contributions to the theory and application of ], a statistical method developed and used extensively in psychometrics.<ref>Nunnally, J., & Berstein, I. H. (1994). ''Psychometric theory'' (3rd ed.). New York: McGraw-Hill.</ref> In the late 1950s, ] made a historical and epistemological assessment of the impact of statistical thinking on psychology during previous few decades: "in the last decades, the specifically psychological thinking has been almost completely suppressed and removed, and replaced by a statistical thinking. Precisely here we see the cancer of testology and testomania of today."<ref>] (1960) ''Das zweite Buch: Lehrbuch der Experimentellen Triebdiagnostik''. Huber, Bern und Stuttgart, 2nd edition. Ch.27, From the Spanish translation, B)II ''Las condiciones estadisticas'', p.396. Quotation: {{quotation\|el pensamiento psicologico especifico, en las ultima decadas, fue suprimido y eliminado casi totalmente, siendo sustituido por un pensamiento estadistico. Precisamente aqui vemos el cáncer de la testología y testomania de hoy.}}</ref>

			More recently, psychometric theory has been applied in the measurement of ], ], and ]s, and ]. These latent constructs cannot truly be measured, and much of the research and science in this discipline has been developed in an attempt to measure these constructs as close to the true score as possible.
	Following Herbart, ] built upon Herbart's work and tried to prove the existence of a psychological threshold saying that a minimum stimulus was necessary to activate a sensory system. After Weber, G.T. Fechner expanded upon the knowledge he gleaned from Herbart and Weber, to devise the law that the strength of a sensation grows as the logarithm of the stimulus intensity. A follower of Weber and Fechner, Wilhelm Wundt is credited with founding the science of psychology. It is Wundt's influence that paved the way for others to develop psychological testing.<ref name="Kaplan, R.M. 2010"/>

			Figures who made significant contributions to psychometrics include ], Henry F. Kaiser, ], ], ], ], ], ], ], ], ], and ].
	===20th century===
	The psychometrician ], founder and first president of the Psychometric Society in 1936, developed and applied a theoretical approach to measurement referred to as the ], an approach that has close connections to the psychophysical theory of ] and ]. In addition, Spearman and Thurstone both made important contributions to the theory and application of ], a statistical method developed and used extensively in psychometrics.{{citation needed\|date=February 2013}} In the late 1950s, ] made an historical and epistemological assessment of the impact of statistical thinking onto psychology during previous few decades: "in the last decades, the specifically psychological thinking has been almost completely suppressed and removed, and replaced by a statistical thinking. Precisely here we see the cancer of testology and testomania of today."<ref>] (1960) ''Das zweite Buch: Lehrbuch der Experimentellen Triebdiagnostik''. Huber, Bern und Stuttgart, 2nd edition. Ch.27, From the Spanish translation, B)II ''Las condiciones estadisticas'', p.396. Quotation: {{quotation\|el pensamiento psicologico especifico, en las ultima decadas, fue suprimido y eliminado casi totalmente, siendo sustituido por un pensamineto estadistico. Precisamente aqui vemos el cáncer de la testología y testomania de hoy.}}</ref>

			== Definition of measurement in the social sciences ==
	More recently, psychometric theory has been applied in the measurement of ], attitudes, and beliefs, and ]. Measurement of these unobservable phenomena is difficult, and much of the research and accumulated science in this discipline has been developed in an attempt to properly define and quantify such phenomena. Critics, including practitioners in the ] and social activists, have argued that such definition and quantification is impossibly difficult, and that such measurements are often misused, such as with psychometric personality tests used in employment procedures:
			The definition of measurement in the social sciences has a long history. A current widespread definition, proposed by ], is that measurement is "the assignment of numerals to objects or events according to some rule." This definition was introduced in a 1946 '']'' article in which Stevens proposed four ].<ref name="Stevens 1946">{{cite journal\|last=Stevens\|first=S. S.\|author-link=Stanley Smith Stevens\|date=7 June 1946\|title=On the Theory of Scales of Measurement\|journal=]\|volume=103\|issue=2684\|pages=677–680\|bibcode=1946Sci...103..677S\|doi=10.1126/science.103.2684.677\|pmid=17750512\|s2cid=4667599}}</ref> Although widely adopted, this definition differs in important respects from the more classical definition of measurement adopted in the physical sciences, namely that scientific measurement entails "the estimation or discovery of the ratio of some magnitude of a quantitative attribute to a unit of the same attribute" (p. 358)<ref>{{cite journal\|last1=Michell\|first1=Joel\|title=Quantitative science and the definition of measurement in psychology\|journal=British Journal of Psychology\|date=August 1997\|volume=88\|issue=3\|pages=355–383\|doi=10.1111/j.2044-8295.1997.tb02641.x}}</ref>

			Indeed, Stevens's definition of measurement was put forward in response to the British Ferguson Committee, whose chair, A. Ferguson, was a physicist. The committee was appointed in 1932 by the British Association for the Advancement of Science to investigate the possibility of quantitatively estimating sensory events. Although its chair and other members were physicists, the committee also included several psychologists. The committee's report highlighted the importance of the definition of measurement. While Stevens's response was to propose a new definition, which has had considerable influence in the field, this was by no means the only response to the report. Another, notably different, response was to accept the classical definition, as reflected in the following statement:
	: "For example, an employer wanting someone for a role requiring consistent attention to repetitive detail will probably not want to give that job to someone who is very creative and gets bored easily."<ref>Psychometric Assessments. '''' University of Melbourne.</ref>

			:Measurement in psychology and physics are in no sense different. Physicists can measure when they can find the operations by which they may meet the necessary criteria; psychologists have to do the same. They need not worry about the mysterious differences between the meaning of measurement in the two sciences (Reese, 1943, p. 49).<ref>Reese, T.W. (1943). The application of the theory of physical measurement to the measurement of psychological magnitudes, with three experimental examples. ''Psychological Monographs, 55'', 1–89. {{doi\|10.1037/h0061367}}</ref>
	Figures who made significant contributions to psychometrics include ], ], ], ], ], ], ], ], ], ], and ].

			These divergent responses are reflected in alternative approaches to measurement. For example, methods based on ] are typically employed on the premise that numbers, such as raw scores derived from assessments, are measurements. Such approaches implicitly entail Stevens's definition of measurement, which requires only that numbers are ''assigned'' according to some rule. The main research task, then, is generally considered to be the discovery of associations between scores, and of factors posited to underlie such associations.<ref>{{Cite web\|url=http://www.assessmentpsychology.com/psychometrics.htm\|title=Psychometrics\|website=Assessmentpsychology.com\|access-date=28 June 2022}}</ref>
	==Definition of measurement in the social sciences==
	The definition of measurement in the social sciences has a long history. A currently widespread definition, proposed by ] (1946), is that measurement is "the assignment of numerals to objects or events according to some rule." This definition was introduced in the paper in which Stevens proposed four ]. Although widely adopted, this definition differs in important respects from the more classical definition of measurement adopted in the physical sciences, which is that ''measurement is the numerical estimation and expression of the magnitude of one quantity relative to another'' (Michell, 1997).

			On the other hand, when measurement models such as the ] are employed, numbers are not assigned based on a rule. Instead, in keeping with Reese's statement above, specific criteria for measurement are stated, and the goal is to construct procedures or operations that provide data that meet the relevant criteria. Measurements are estimated based on the models, and tests are conducted to ascertain whether the relevant criteria have been met.{{citation needed\|date=March 2015}}
	Indeed, Stevens's definition of measurement was put forward in response to the British Ferguson Committee, whose chair, A. Ferguson, was a physicist. The committee was appointed in 1932 by the British Association for the Advancement of Science to investigate the possibility of quantitatively estimating sensory events. Although its chair and other members were physicists, the committee also included several psychologists. The committee's report highlighted the importance of the definition of measurement. While Stevens's response was to propose a new definition, which has had considerable influence in the field, this was by no means the only response to the report. Another, notably different, response was to accept the classical definition, as reflected in the following statement:

			== Instruments and procedures ==
	:Measurement in psychology and physics are in no sense different. Physicists can measure when they can find the operations by which they may meet the necessary criteria; psychologists have but to do the same. They need not worry about the mysterious differences between the meaning of measurement in the two sciences. (Reese, 1943, p. 49)
			The first psychometric instruments were designed to measure ].<ref>{{cite book \|title=Massachusetts General Hospital comprehensive clinical psychiatry \|date=2016 \|location=London \|isbn=978-0323295079 \|page=73 \|edition=Second \|url=https://books.google.com/books?id=y5nTBgAAQBAJ&pg=PA74 \|access-date=31 October 2021\|last1=Stern \|first1=Theodore A. \|last2=Fava \|first2=Maurizio \|last3=Wilens \|first3=Timothy E. \|last4=Rosenbaum \|first4=Jerrold F. }}</ref> One early approach to measuring intelligence was the test developed in France by ] and ]. That test was known as the {{ill\|Test Binet-Simon\|fr}}.The French test was adapted for use in the U. S. by ] of Stanford University, and named the ].

			Another major focus in psychometrics has been on ]ing. There has been a range of theoretical approaches to conceptualizing and measuring personality, though there is no widely agreed upon theory. Some of the better-known instruments include the ], the ] (or "Big 5") and tools such as ] and the ]. Attitudes have also been studied extensively using psychometric approaches.{{citation needed\|date=March 2015}}<ref>{{Cite book \|title=The Gale Encyclopedia of Psychology \|publisher=Gale \|year=2022 \|isbn=9780028683867 \|editor-last=Longe \|editor-first=Jacqueline L. \|edition=4th \|volume=2 \|location=Farmington Hills, Michigan \|pages=1000}}</ref> An alternative method involves the application of unfolding measurement models, the most general being the Hyperbolic Cosine Model (Andrich & Luo, 1993).<ref>Andrich, D. & Luo, G. (1993). A hyperbolic cosine latent trait model for unfolding ] single-stimulus responses. Applied Psychological Measurement, 17, 253–276.</ref>
	These divergent responses are reflected in alternative approaches to measurement. For example, methods based on ] are typically employed on the premise that numbers, such as raw scores derived from assessments, are measurements. Such approaches implicitly entail Stevens's definition of measurement, which requires only that numbers are ''assigned'' according to some rule. The main research task, then, is generally considered to be the discovery of associations between scores, and of factors posited to underlie such associations.

			== Theoretical approaches ==
	On the other hand, when measurement models such as the ] are employed, numbers are not assigned based on a rule. Instead, in keeping with Reese's statement above, specific criteria for measurement are stated, and the goal is to construct procedures or operations that provide data that meet the relevant criteria. Measurements are estimated based on the models, and tests are conducted to ascertain whether the relevant criteria have been met.
			Psychometricians have developed a number of different measurement theories. These include ] (CTT) and ] (IRT).<ref>Embretson, S.E., & Reise, S.P. (2000). ''Item Response Theory for Psychologists''. Mahwah, NJ: Erlbaum.</ref><ref>Hambleton, R.K., & Swaminathan, H. (1985). ''Item Response Theory: Principles and Applications.'' Boston: Kluwer-Nijhoff.</ref> An approach that seems mathematically to be similar to IRT but also quite distinctive, in terms of its origins and features, is represented by the ] for measurement. The development of the Rasch model, and the broader class of models to which it belongs, was explicitly founded on requirements of measurement in the physical sciences.<ref>Rasch, G. (1960/1980). ''Probabilistic models for some intelligence and attainment tests''. Copenhagen, Danish Institute for Educational Research, expanded edition (1980) with foreword and afterword by B.D. Wright. Chicago: The University of Chicago Press.</ref>

			Psychometricians have also developed methods for working with large matrices of correlations and covariances. Techniques in this general tradition include: ],<ref>Thompson, B.R. (2004). ''Exploratory and Confirmatory Factor Analysis: Understanding Concepts and Applications.'' American Psychological Association.</ref> a method of determining the underlying dimensions of data. One of the main challenges faced by users of factor analysis is a lack of consensus on appropriate procedures for ].<ref name="Zwick1986">{{cite journal \|last1=Zwick \|first1=William R. \|last2=Velicer \|first2=Wayne F. \|title=Comparison of five rules for determining the number of components to retain. \|journal=Psychological Bulletin \|date=1986 \|volume=99 \|issue=3 \|pages=432–442 \|doi=10.1037/0033-2909.99.3.432}}</ref> A usual procedure is to stop factoring when ] drop below one because the original sphere shrinks. The lack of the cutting points concerns other multivariate methods, also.<ref>{{Cite book \|last=Singh \|first=Manoj Kumar \|url=https://books.google.com/books?id=wodCEAAAQBAJ&dq=A+usual+procedure+is+to+stop+factoring+when+eigenvalues+drop+below+one+because+the+original+sphere+shrinks.&pg=PA107 \|title=Introduction to Social Psychology \|date=2021-09-11 \|publisher=K.K. Publications \|language=en}}</ref>
	==Instruments and procedures==
	The first psychometric instruments were designed to measure the concept of ]. The best known historical approach involved the ], developed originally by the French psychologist ]. Intelligence tests are useful tools for various purposes. An alternative conception of intelligence is that cognitive capacities within individuals are a manifestation of a general component, or ], as well as cognitive capacity specific to a given domain.

			]<ref>] (1992). ''Multidimensional Scaling.'' Krieger.</ref> is a method for finding a simple representation for data with a large number of latent dimensions. ] is an approach to finding objects that are like each other. Factor analysis, multidimensional scaling, and cluster analysis are all multivariate descriptive methods used to distill from large amounts of data simpler structures.
	Psychometrics is applied widely in educational assessment to measure abilities in domains such as reading, writing, and mathematics. The main approaches in applying tests in these domains have been Classical Test Theory and the more recent Item Response Theory and ] measurement models. These latter approaches permit joint scaling of persons and assessment items, which provides a basis for mapping of developmental continua by allowing descriptions of the skills displayed at various points along a continuum. Such approaches provide powerful information regarding the nature of developmental growth within various domains.

			More recently, ]<ref>Kaplan, D. (2008). ''Structural Equation Modeling: Foundations and Extensions'', 2nd ed. Sage.</ref> and ] represent more sophisticated approaches to working with large ]. These methods allow statistically sophisticated models to be fitted to data and tested to determine if they are adequate fits. Because at a granular level psychometric research is concerned with the extent and nature of multidimensionality in each of the items of interest, a relatively new procedure known as bi-factor analysis<ref>DeMars, C. E. (2013). A tutorial on interpreting bi-factor model scores.
	Another major focus in psychometrics has been on ] testing. There have been a range of theoretical approaches to conceptualizing and measuring personality. Some of the better known instruments include the ], the ] (or "Big 5") and tools such as ] and the ]. Attitudes have also been studied extensively using psychometric approaches. A common method in the measurement of attitudes is the use of the ]. An alternative method involves the application of unfolding measurement models, the most general being the Hyperbolic Cosine Model (Andrich & Luo, 1993).
			''International Journal of Testing, 13'', 354–378. http://dx.doi.org/10
			.1080/15305058.2013.799067</ref><ref>Reise, S. P. (2012). The rediscovery of bi-factor modeling. ''Multivariate Behavioral Research, 47'', 667–696. http://dx.doi.org/10.1080/00273171.2012.715555</ref><ref>Rodriguez, A., Reise, S. P., & Haviland, M. G. (2016). Evaluating bifactor models: Calculating and interpreting statistical indices. ''Psychological Methods, 21'', 137–150. http://dx.doi.org/10.1037/met0000045</ref> can be helpful. Bi-factor analysis can decompose "an item's systematic variance in terms of, ideally, two sources, a general factor and one source of additional systematic variance."<ref>Schonfeld, I.S., Verkuilen, J. & Bianchi, R. (2019). An exploratory structural equation modeling bi-factor analytic approach to uncovering what burnout, depression, and anxiety scales measure. ''Psychological Assessment, 31'', 1073–1079. http://dx.doi.org/10.1037/pas0000721 p. 1075</ref>

			=== Key concepts ===
	==Theoretical approaches==
			Key concepts in classical test theory are ] and ]. A reliable measure is one that measures a construct consistently across time, individuals, and situations. A valid measure is one that measures what it is intended to measure. Reliability is necessary, but not sufficient, for validity.
	Psychometricians have developed a number of different measurement theories. These include ] (CTT) and ] (IRT).<ref>Embretson, S.E., & Reise, S.P. (2000). ''Item Response Theory for Psychologists''. Mahwah, NJ: Erlbaum.</ref><ref>Hambleton, R.K., & Swaminathan, H. (1985). ''Item Response Theory: Principles and Applications.'' Boston: Kluwer-Nijhoff.</ref> An approach which seems mathematically to be similar to IRT but also quite distinctive, in terms of its origins and features, is represented by the ] for measurement. The development of the Rasch model, and the broader class of models to which it belongs, was explicitly founded on requirements of measurement in the physical sciences.<ref>Rasch, G. (1960/1980). ''Probabilistic models for some intelligence and attainment tests''. Copenhagen, Danish Institute for Educational Research, expanded edition (1980) with foreword and afterword by B.D. Wright. Chicago: The University of Chicago Press.</ref>

			Both reliability and validity can be assessed statistically. Consistency over repeated measures of the same test can be assessed with the Pearson correlation coefficient, and is often called ''test-retest reliability.''<ref name="gifted.uconn">{{cite web\|url=http://www.gifted.uconn.edu/Siegle/research/Instrument+Reliability+and+Validity/Reliability.htm\|title=Home – Educational Research Basics by Del Siegle\|website=www.gifted.uconn.edu\|date=17 February 2015}}</ref> Similarly, the equivalence of different versions of the same measure can be indexed by a ], and is called ''equivalent forms reliability'' or a similar term.<ref name="gifted.uconn"/>
	Psychometricians have also developed methods for working with large matrices of correlations and covariances. Techniques in this general tradition include: ],<ref>Thompson, B.R. (2004). ''Exploratory and Confirmatory Factor Analysis: Understanding Concepts and Applications.'' American Psychological Association.</ref> a method of determining the underlying dimensions of data; ],<ref>Davison, M.L. (1992). ''Multidimensional Scaling.'' Krieger.</ref> a method for finding a simple representation for data with a large number of latent dimensions; and ], an approach to finding objects that are like each other. All these multivariate descriptive methods try to distill large amounts of data into simpler structures. More recently, ]<ref>Kaplan, D. (2008). ''Structural Equation Modeling: Foundations and Extensions'', 2nd ed. Sage.</ref> and ] represent more sophisticated approaches to working with large covariance matrices. These methods allow statistically sophisticated models to be fitted to data and tested to determine if they are adequate fits.

			Internal consistency, which addresses the homogeneity of a single test form, may be assessed by correlating performance on two halves of a test, which is termed ''split-half reliability''; the value of this ] for two half-tests is adjusted with the ] to correspond to the correlation between two full-length tests.<ref name="gifted.uconn"/> Perhaps the most commonly used index of reliability is ], which is equivalent to the ] of all possible split-half coefficients. Other approaches include the ], which is the ratio of variance of measurements of a given target to the variance of all targets.
	One of the main deficiencies in various factor analyses is a lack of consensus in cutting points for determining the number of latent factors. A usual procedure is to stop factoring when eigenvalues drop below one because the original sphere shrinks. The lack of the cutting points concerns other multivariate methods, also.{{Citation needed\|date=October 2011}}

			There are a number of different forms of validity. ] refers to the extent to which a test or scale predicts a sample of behavior, i.e., the criterion, that is "external to the measuring instrument itself."<ref>Nunnally, J.C. (1978). ''Psychometric theory'' (2nd ed.). New York: McGraw-Hill.</ref> That external sample of behavior can be many things including another test; college grade point average as when the high school SAT is used to predict performance in college; and even behavior that occurred in the past, for example, when a test of current psychological symptoms is used to predict the occurrence of past victimization (which would accurately represent postdiction). When the criterion measure is collected at the same time as the measure being validated the goal is to establish '']''; when the criterion is collected later the goal is to establish '']''. A measure has '']'' if it is related to measures of other constructs as required by theory. '']'' is a demonstration that the items of a test do an adequate job of covering the domain being measured. In a personnel selection example, test content is based on a defined statement or set of statements of knowledge, skill, ability, or other characteristics obtained from a '']''.
	===Key concepts===
	Key concepts in classical test theory are ] and ]. A reliable measure is one that measures a construct consistently across time, individuals, and situations. A valid measure is one that measures what it is intended to measure. Reliability is necessary, but not sufficient, for validity.

			] models the relationship between ]s and responses to test items. Among other advantages, IRT provides a basis for obtaining an estimate of the location of a test-taker on a given latent trait as well as the standard error of measurement of that location. For example, a university student's knowledge of history can be deduced from his or her score on a university test and then be compared reliably with a high school student's knowledge deduced from a less difficult test. Scores derived by classical test theory do not have this characteristic, and assessment of actual ability (rather than ability relative to other test-takers) must be assessed by comparing scores to those of a "norm group" randomly selected from the population. In fact, all measures derived from classical test theory are dependent on the sample tested, while, in principle, those derived from item response theory are not.
	Both reliability and validity can be assessed statistically. Consistency over repeated measures of the same test can be assessed with the Pearson correlation coefficient, and is often called ''test-retest reliability.''<ref name="gifted.uconn"></ref> Similarly, the equivalence of different versions of the same measure can be indexed by a Pearson correlation, and is called ''equivalent forms reliability'' or a similar term.<ref name="gifted.uconn" />

			== Standards of quality ==
	Internal consistency, which addresses the homogeneity of a single test form, may be assessed by correlating performance on two halves of a test, which is termed ''split-half reliability''; the value of this ] for two half-tests is adjusted with the ] to correspond to the correlation between two full-length tests.<ref name="gifted.uconn" /> Perhaps the most commonly used index of reliability is ], which is equivalent to the mean of all possible split-half coefficients. Other approaches include the ], which is the ratio of variance of measurements of a given target to the variance of all targets.
			The considerations of ] and ] typically are viewed as essential elements for determining the ] of any test. However, professional and practitioner associations frequently have placed these concerns within broader contexts when developing ] and making overall judgments about the quality of any test as a whole within a given context. A consideration of concern in many applied research settings is whether or not the metric of a given psychological inventory is meaningful or arbitrary.<ref> {{webarchive\|url=https://web.archive.org/web/20060510182915/http://psychology.tamu.edu/Faculty/blanton/bj.2006.arbitrary.pdf \|date=2006-05-10 }} ''American Psychologist, 61''(1), 27–41.</ref>

			=== Testing standards ===
	There are a number of different forms of validity. Criterion-related validity can be assessed by correlating a measure with a criterion measure known to be valid. When the criterion measure is collected at the same time as the measure being validated the goal is to establish '']''; when the criterion is collected later the goal is to establish '']''. A measure has '']'' if it is related to measures of other constructs as required by theory. '']'' is a demonstration that the items of a test are drawn from the domain being measured. In a personnel selection example, test content is based on a defined statement or set of statements of knowledge, skill, ability, or other characteristics obtained from a '']''.
			In 2014, the American Educational Research Association (AERA), American Psychological Association (APA), and National Council on Measurement in Education (NCME) published a revision of the '']'',<ref>{{cite web\|url=http://www.apa.org/science/standards.html#overview\|title=The Standards for Educational and Psychological Testing\|website=apa.org}}</ref> which describes standards for test development, evaluation, and use. The ''Standards'' cover essential topics in testing including validity, reliability/errors of measurement, and fairness in testing. The book also establishes standards related to testing operations including test design and development, scores, scales, norms, score linking, cut scores, test administration, scoring, reporting, score interpretation, test documentation, and rights and responsibilities of test takers and test users. Finally, the ''Standards'' cover topics related to testing applications, including ], workplace testing and ], ], and testing in ] and public policy.

			=== Evaluation standards ===
	] models the relationship between ]s and responses to test items. Among other advantages, IRT provides a basis for obtaining an estimate of the location of a test-taker on a given latent trait as well as the standard error of measurement of that location. For example, a university student's knowledge of history can be deduced from his or her score on a university test and then be compared reliably with a high school student's knowledge deduced from a less difficult test. Scores derived by classical test theory do not have this characteristic, and assessment of actual ability (rather than ability relative to other test-takers) must be assessed by comparing scores to those of a "norm group" randomly selected from the population. In fact, all measures derived from classical test theory are dependent on the sample tested, while, in principle, those derived from item response theory are not.
			In the field of ], and in particular ], the ]<ref>{{Cite web\|url=http://www.wmich.edu/evalctr/jc/\|archiveurl=https://web.archive.org/web/20091015044732/http://www.wmich.edu/evalctr/jc/\|url-status=dead\|title=Joint Committee on Standards for Educational Evaluation\|archive-date=15 October 2009\|access-date=28 June 2022}}</ref> has published three sets of standards for evaluations. ''The Personnel Evaluation Standards''<ref>Joint Committee on Standards for Educational Evaluation. (1988). '' {{webarchive\|url=https://web.archive.org/web/20051212001638/http://www.wmich.edu/evalctr/jc/PERSTNDS-SUM.htm \|date=2005-12-12 }}'' Newbury Park, CA: Sage Publications.</ref> was published in 1988, ''The Program Evaluation Standards'' (2nd edition)<ref>Joint Committee on Standards for Educational Evaluation. (1994). '' {{webarchive\|url=https://web.archive.org/web/20060222025348/http://www.wmich.edu/evalctr/jc/PGMSTNDS-SUM.htm \|date=2006-02-22 }}'' Newbury Park, CA: Sage Publications.</ref> was published in 1994, and ''The Student Evaluation Standards''<ref>Committee on Standards for Educational Evaluation. (2003). '' {{webarchive\|url=https://web.archive.org/web/20060524144621/http://www.wmich.edu/evalctr/jc/briefing/ses/ \|date=2006-05-24 }}'' Newbury Park, CA: Corwin Press.</ref> was published in 2003.

			Each publication presents and elaborates a set of standards for use in a variety of educational settings. The standards provide guidelines for designing, implementing, assessing, and improving the identified form of evaluation.<ref>\|volume=1\|issue=2\|pages=99–103\|author=E. Cabrera-Nguyen\|journal=Academia.edu\|year=2010}}</ref> Each of the standards has been placed in one of four fundamental categories to promote educational evaluations that are proper, useful, feasible, and accurate. In these sets of standards, validity and reliability considerations are covered under the accuracy topic. For example, the student accuracy standards help ensure that student evaluations will provide sound, accurate, and credible information about student learning and performance.
	==Standards of quality==
	The considerations of ] and ] typically are viewed as essential elements for determining the ] of any test. However, professional and practitioner associations frequently have placed these concerns within broader contexts when developing ] and making overall judgments about the quality of any test as a whole within a given context. A consideration of concern in many applied research settings is whether or not the metric of a given psychological inventory is meaningful or arbitrary.<ref> ''American Psychologist, 61''(1), 27-41.</ref>

			== Controversy and criticism ==
	===Testing standards===
			Because psychometrics is based on ] measured through ], there has been controversy about some psychometric measures.<ref>{{cite book\|last1=Tabachnick\|first1=B.G.\|title=Using Multivariate Analysis\|last2=Fidell\|first2=L.S.\|publisher=Allyn and Bacon\|year=2001\|isbn=978-0-321-05677-1\|location=Boston}}</ref>{{Page needed\|date=November 2010}} Critics, including practitioners in the ], have argued that such definition and quantification is difficult, and that such measurements are often misused by laymen, such as with personality tests used in employment procedures. The Standards for Educational and Psychological Measurement gives the following statement on ]: "validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests".<ref name="1999standards">American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999) ''Standards for educational and psychological testing''. Washington, DC: American Educational Research Association.</ref> Simply put, a test is not valid unless it is used and interpreted in the way it is intended.<ref>{{Cite book\|last=Bandalos\|first=Deborah L.\|url=https://www.worldcat.org/oclc/1015955756\|title=Measurement theory and applications for the social sciences\|date=2018\|isbn=978-1-4625-3215-5\|location=New York\|pages=261\|oclc=1015955756}}</ref>
	In this field, the '']''<ref></ref> place standards about validity and reliability, along with ] and related considerations under the general topic of test construction, evaluation and documentation. The second major topic covers standards related to fairness in testing, including ] in testing and test use, the ]s and ] of test takers, testing individuals of ] ], and testing individuals with ]. The third and final major topic covers standards related to testing applications, including the responsibilities of test users, ], ], testing in ] and ], plus testing in ] and ].

			Two types of tools used to measure ] are ] and ]. Examples of such tests are the: ] (BFI), ] (MMPI-2), ], ],<ref><span lang="PL">{{Cite journal\|vauthors=Aleksandrowicz JW, Klasa K, Sobański JA, Stolarska D\|year=2009\|title=KON-2006 Neurotic Personality Questionnaire\|url=http://www.archivespp.pl/uploads/images/2009_11_1/21_p_Archives_1_09.pdf\|journal=Archives of Psychiatry and Psychotherapy\|volume=1\|pages=21–22}}</span></ref> or ]. Some of these tests are helpful because they have adequate ] and ], two factors that make tests consistent and accurate reflections of the underlying construct. ] (MBTI), however, has questionable validity and has been the subject of much criticism. Psychometric specialist ] wrote of the measure: "Most personality psychologists regard the MBTI as little more than an elaborate Chinese fortune cookie."<ref>{{cite book\|last=Hogan\|first=Robert\|title=Personality and the fate of organizations\|date=2007\|publisher=]\|isbn=978-0-8058-4142-8\|location=Mahwah, NJ\|page=28\|oclc=65400436\|author-link=Robert Hogan (psychologist)}}</ref>
	===Evaluation standards===
	In the field of ], and in particular ], the ]<ref></ref> has published three sets of standards for evaluations. ''The Personnel Evaluation Standards''<ref>Joint Committee on Standards for Educational Evaluation. (1988). '''' Newbury Park, CA: Sage Publications.</ref> was published in 1988, ''The Program Evaluation Standards'' (2nd edition)<ref>Joint Committee on Standards for Educational Evaluation. (1994). '''' Newbury Park, CA: Sage Publications.</ref> was published in 1994, and ''The Student Evaluation Standards''<ref>Committee on Standards for Educational Evaluation. (2003). '''' Newbury Park, CA: Corwin Press.</ref> was published in 2003.

			] noted in '']'' (1957) that, "correlational psychology, though fully as old as experimentation, was slower to mature. It qualifies equally as a discipline, however, because it asks a distinctive type of question and has technical methods of examining whether the question has been properly put and the data properly interpreted." He would go on to say, "The correlation method, for its part, can study what man has not learned to control or can never hope to control ... A true federation of the disciplines is required. Kept independent, they can give only wrong answers or no answers at all regarding certain important problems."<ref>{{Cite journal\|last=Cronbach\|first=L. J.\|date=1957\|title=The two disciplines of scientific psychology.\|journal=American Psychologist\|volume=12\|issue=11\|pages=671–684\|doi=10.1037/h0043943\|via=EBSCO}}</ref>
	Each publication presents and elaborates a set of standards for use in a variety of educational settings. The standards provide guidelines for designing, implementing, assessing and improving the identified form of evaluation. <ref>http://www.academia.edu/2395969/Author_guidelines_for_reporting_scale_development_and_validation_results_in_the_Journal_of_the_Society_for_Social_Work_and_Research</ref>Each of the standards has been placed in one of four fundamental categories to promote educational evaluations that are proper, useful, feasible, and accurate. In these sets of standards, validity and reliability considerations are covered under the accuracy topic. For example, the student accuracy standards help ensure that student evaluations will provide sound, accurate, and credible information about student learning and performance.

	== Non-human ~~psychometrics~~: animals and machines ==		== Non-human: animals and machines ==
			Psychometrics addresses ''human'' abilities, attitudes, traits, and educational evolution. Notably, the study of behavior, mental processes, and abilities of non-human ''animals'' is usually addressed by ], or with a continuum between non-human animals and the rest of animals by ]. Nonetheless, there are some advocators for a more gradual transition between the approach taken for humans and the approach taken for (non-human) animals.<ref name="Humphreys">{{Cite journal \| author = Humphreys, L.G.

	Psychometrics addresses ''human'' abilities, attitudes, traits and educational evolution. Notably, the study of behavior, mental processes and abilities of non-human ''animals'' is usually addressed by ], or with a continuum between non-human animals and the rest of animals by ]. Nonetheless there are some advocators for a more gradual transition between the approach taken for humans and the approach taken for (non-human) animals.<ref name="Humphreys">
	{{Cite journal \| author = Humphreys, L.G.
	\| year = 1987		\| year = 1987
	~~\| month =~~ \| title = Psychometrics considerations in the evaluation of intraspecies differences in intelligence		\| title = Psychometrics considerations in the evaluation of intraspecies differences in intelligence
	\| journal = Behav Brain Sci		\| journal = Behav Brain Sci
	\| volume = 10		\| volume = 10
	\| issue =		\| issue = 4
	\| pages = 668–669		\| pages = 668–669
			\| doi=10.1017/s0140525x0005514x
	\| id =
			}}</ref><ref name="Eysenck">{{Cite journal \| author = Eysenck, H.J.
	}}
	</ref>
	<ref name="Eysenck">
	{{Cite journal \| author = Eysenck, H.J.
	\| year = 1987		\| year = 1987
	\| title = The several meanings of intelligence		\| title = The several meanings of intelligence
	\| journal = Behav Brain Sci		\| journal = Behav Brain Sci
	\| volume = 10		\| volume = 10
	\| issue =		\| issue = 4
	\| pages = 663		\| pages = 663
			\| doi=10.1017/s0140525x00055060
	}}
			}}</ref><ref name="Locurto">{{Cite journal \|author1=Locurto, C. \|author2=Scanlon, C
	</ref>
			\|name-list-style=amp \| year = 1987
	<ref name="Locurto">
	{{Cite journal \| author = Locurto, C. and Scanlon, C
	\| year = 1987
	\| title = Individual differences and spatial learning factor in two strains of mice		\| title = Individual differences and spatial learning factor in two strains of mice
	\| journal = Behav Brain Sci		\| journal = Behav Brain Sci
	\| volume = 112		\| volume = 112
	\| ~~issue~~ =		\| pages = 344–352
			}}</ref><ref name="king1997five">{{Cite journal \|author1=King, James E \|author2=Figueredo, Aurelio Jose
	\| pages = 344-352
			\|name-list-style=amp \| year = 1997
	}}
	</ref>
	<ref name="king1997five">
	{{Cite journal \| author = King, James E and Figueredo, Aurelio Jose
	\| year = 1997
	\| title = The five-factor model plus dominance in chimpanzee personality		\| title = The five-factor model plus dominance in chimpanzee personality
	\| journal = Journal of ~~research~~ in ~~personality~~		\| journal = Journal of Research in Personality
	\| volume = 31		\| volume = 31
	\| issue = 2		\| issue = 2
	\| pages = ~~257-271~~		\| pages = 257–271
			\| doi=10.1006/jrpe.1997.2179
	}}
	</ref>		}}</ref>

	The evaluation of abilities, traits and learning evolution of ''machines'' has been mostly unrelated to the case of humans and non-human animals, with specific approaches in the area of ]. A more integrated approach, under the name of universal psychometrics, has also been proposed.<ref name="upsycho">		The evaluation of abilities, traits and learning evolution of ''machines'' has been mostly unrelated to the case of humans and non-human animals, with specific approaches in the area of ]. A more integrated approach, under the name of ], has also been proposed.<ref name="upsycho">{{Cite journal \|author1=J. Hernández-Orallo \|author2=D.L. Dowe \|author3=M.V. Hernández-Lloreda \| year = 2013
			\|title = Universal Psychometrics: Measuring Cognitive Abilities in the Machine Kingdom
	{{Cite journal \| author = J. Hernández-Orallo, D.L. Dowe, M.V. Hernández-Lloreda
			\|journal = Cognitive Systems Research
	\| year = 2013
			\|volume=27 \|pages=50–74 \|hdl=10251/50244\|doi=10.1016/j.cogsys.2013.06.001 \|s2cid=26440282 \|url=https://riunet.upv.es/bitstream/10251/50244/3/upsycho.pdf \|hdl-access=free }}</ref><ref>{{Cite book \|last=Hernández-Orallo \|first=José \|url=https://www.cambridge.org/core/books/measure-of-all-minds/DC3DFD0C1D5B3A3AD6F56CD6A397ABCA \|title=The Measure of All Minds: Evaluating Natural and Artificial Intelligence \|date=2017 \|publisher=] \|isbn=978-1-107-15301-1 \|location=Cambridge}}</ref>
	\| title = Universal Psychometrics: Measuring Cognitive Abilities in the Machine Kingdom
	\| journal = Cognitive Systems Research
	}}
	</ref>

			== See also ==

			{{portal\|border=no\|Psychology}}
	==See also==
			{{columns-list\|colwidth=30em\|
	{{col-begin}}{{col-break}}
			*]
	*]
	*]		*]
			*]
	*]		*]
	*]		*]
			*]
	*]		*]
	*]		*]
	*]		*]
	*]		*]
			*]
	*]
			*]
	{{col-break}}
			*]
	*]		*]
	*]		*]
			*]
			*]
	*]		*]
	*]		*]
	*]		*]
	*]
	*]		*]
	*]		*]
			}}

			== References ==
	{{col-end}}
			{{Reflist}}

			=== Bibliography ===
	==References==

	===Bibliography===
	{{refbegin}}		{{refbegin}}
	*{{Cite journal \| doi = 10.1177/014662169301700307 \| ~~author~~ = Andrich, D. & Luo, G. \| year = 1993 \| title = A hyperbolic cosine model for unfolding dichotomous single-stimulus responses \| url = \| journal = Applied Psychological Measurement \| volume = 17 \| issue = 3\| pages = 253–276 }}		*{{Cite journal \| doi = 10.1177/014662169301700307 \| author1 = Andrich, D. \| author2 = Luo, G. \| name-list-style = amp \| year = 1993 \| title = A hyperbolic cosine model for unfolding dichotomous single-stimulus responses \| url = http://apm.sagepub.com/content/19/3/269.full.pdf \| journal = Applied Psychological Measurement \| volume = 17 \| issue = 3 \| pages = 253–276 \| citeseerx = 10.1.1.1003.8107 \| s2cid = 120745971 }}
			*Michell, J. (1999). ''Measurement in Psychology''. Cambridge: Cambridge University Press. {{doi\|10.1017/CBO9780511490040}}
	*{{Cite journal \| author = Michell, J. B \| year = 1997 \| title = Quantitative science and the definition of measurement in psychology \| url = \| journal = British Journal of Psychology \| volume = 88 \| issue = 3\| pages = 355–383 \| doi = 10.1111/j.2044-8295.1997.tb02641.x }}
			*Rasch, G. (1960/1980). ''Probabilistic models for some intelligence and attainment tests''. Copenhagen, Danish Institute for Educational Research), expanded edition (1980) with foreword and afterword by B.D. Wright. Chicago: The University of Chicago Press.
	*Michell, J. (1999). ''Measurement in Psychology''. Cambridge: Cambridge University Press.
			*Reese, T.W. (1943). The application of the theory of physical measurement to the measurement of psychological magnitudes, with three experimental examples. ''Psychological Monographs, 55'', 1–89. {{doi\|10.1037/h0061367}}
	*Rasch, G. (1960/1980). ''Probabilistic models for some intelligence and attainment tests''. Copenhagen, Danish Institute for Educational Research), expanded edition (1980) with foreword and afterword by B.D. Wright. Chicago: The University of Chicago Press.
	*{{Cite journal \| author = ~~Reese~~, T.W. \| year = ~~1943~~ \| title = ~~The application of~~ the theory of ~~physical~~ ~~measurement to the~~ measurement of ~~psychological~~ ~~magnitudes, with three experimental~~ ~~examples~~ \| ~~url~~ = \| ~~journal~~ = ~~Psychological Monographs~~ \| ~~volume~~ = 55 \| ~~issue~~ = \| ~~pages~~ = ~~1–89~~ }}		*{{Cite journal \| doi = 10.1126/science.103.2684.677 \| author = Stevens, S. S. \| year = 1946 \| title = On the theory of scales of measurement \| journal = Science \| volume = 103 \| issue = 2684\| pages = 677–80 \| pmid = 17750512 \| bibcode = 1946Sci...103..677S }}
	*{{Cite journal ~~\| doi = 10.1126/science.103.2684.677~~ \| author = ~~Stevens~~, S. S. \| year = ~~1946~~ \| title = On ~~the theory~~ of ~~scales~~ ~~of measurement~~ \| ~~url~~ = ~~\| journal =~~ ~~Science~~ \| volume = ~~103~~ \| issue = ~~2684~~\| pages = ~~677–80~~ \| ~~pmid~~ = ~~17750512~~ }}		*{{Cite journal \| author = Thurstone, L.L. \| year = 1927 \| title = A law of comparative judgement \| journal = Psychological Review \| volume = 34 \| issue = 4\| pages = 278–286 \| doi=10.1037/h0070288}}
			*Thurstone, L.L. (1929). The Measurement of Psychological Value. In T.V. Smith and W.K. Wright (Eds.), ''Essays in Philosophy by Seventeen Doctors of Philosophy of the University of Chicago''. Chicago: Open Court.
	*{{Cite journal \| author = Thurstone, L.L. \| year = 1927 \| title = A law of comparative judgement \| url = \| journal = Psychological Review \| volume = 34 \| issue = 4\| pages = 278–286 \| doi=10.1037/h0070288}}
	*Thurstone, L.L. (~~1929~~). The Measurement of ~~Psychological Value. In T.V. Smith and W.K. Wright (Eds.),~~ ''~~Essays~~ in ~~Philosophy by Seventeen Doctors of Philosophy of the~~ University of Chicago~~''. Chicago: Open~~ ~~Court~~.		*Thurstone, L.L. (1959). ''The Measurement of Values''. Chicago: The University of Chicago Press.
			*{{Cite journal \| author = S.F. Blinkhorn \| year = 1997 \| title = Past imperfect, future conditional: fifty years of test theory \| journal = British Journal of Mathematical and Statistical Psychology \| volume = 50 \| issue = 2\| pages = 175–185 \| doi = 10.1111/j.2044-8317.1997.tb01139.x \| author-link = Steve Blinkhorn }}
	*Thurstone, L.L. (1959). ''The Measurement of Values''. Chicago: The University of Chicago Press.
			*{{Cite web\|url=https://www.linkedin.com/pulse/cambridge-just-told-me-big-data-doesnt-work-yet-david-sanford/\|title=Cambridge just told me Big Data doesn't work yet\|last=Sanford\|first=David\|date=18 November 2017\|website=LinkedIn}}
	*http://www.services.unimelb.edu.au/careers/student/interviews/test.html .''Psychometric Assessments'' University of Melbourne.
	*{{Cite journal \| author = ] \| year = 1997 \| title = Past imperfect, future conditional: fifty years of test theory \| url = \| journal = Br. J. Math. Statist. Psychol \| volume = 50 \| issue = 2\| pages = 175–185 \| doi = 10.1111/j.2044-8317.1997.tb01139.x }}
	{{refend}}		{{refend}}

			== Further reading ==
	===Notes===
			*{{cite book\|author=Robert F. DeVellis\|title=Scale Development: Theory and Applications\|url=https://books.google.com/books?id=48ACCwAAQBAJ\|year=2016\|publisher=SAGE Publications\|isbn=978-1-5063-4158-3}}
	{{reflist}}
			*{{cite book \|author=Borsboom, Denny \|title=Measuring the Mind: Conceptual Issues in Contemporary Psychometrics \|location=Cambridge \|publisher=] \|isbn=978-0-521-84463-5 \|year=2005 \|title-link=Measuring the Mind }}
			*{{cite book\|author1=Leslie A. Miller\|author2=Robert L. Lovler\|title=Foundations of Psychological Testing: A Practical Approach\|url=https://books.google.com/books?id=8EYdCAAAQBAJ\|year=2015\|publisher=SAGE Publications\|isbn=978-1-4833-6927-3}}
			*{{cite book\|author=Roderick P. McDonald\|title=Test Theory: A Unified Treatment\|url=https://books.google.com/books?id=_feqA2RdyOoC\|year=2013\|publisher=Psychology Press\|isbn=978-1-135-67530-1}}
			*{{cite book\|author=Paul Kline\|title=The Handbook of Psychological Testing\|url=https://books.google.com/books?id=lm2RxaKaok8C\|year=2000\|publisher=Psychology Press\|isbn=978-0-415-21158-1}}
			*{{cite book\|author1=Rush AJ Jr\|author2=First MB\|author3=Blacker D\|title=Handbook of Psychiatric Measures\|year=2008\|publisher=American Psychiatric Publishing\|url=https://books.google.com/books?id=ddyZUcaGaRIC\|isbn=978-1-58562-218-4\|oclc=85885343}}
			*{{cite book\|author=Ann C Silverlake\|title=Comprehending Test Manuals: A Guide and Workbook\|url=https://books.google.com/books?id=zl8PDQAAQBAJ\|year=2016\|publisher=Taylor & Francis\|isbn=978-1-351-97086-0}}

			== External links ==
	==Further reading==
	*{{cite book \|author=Borsboom, Denny \|title=]: Conceptual Issues in Contemporary Psychometrics \|location=Cambridge \|publisher=] \|isbn=978-0-521-84463-5 \|year=2005 \|laysummary=http://www.cambridge.org/uk/catalogue/catalogue.asp?isbn=978-0-521-84463-5 \|laydate=28 June 2010 }}
	*{{Cite book \|year=2003 \|author=DeVellis, Robert F \|title=Scale Development: Theory and Applications \|edition=2nd \|place=London \|publisher=Sage Publications \|isbn=0-7619-2604-6 (cloth) \|url=http://books.google.com/?id=BYGxL6xLokUC&printsec=frontcover&dq=scale+development#v=onepage&q&f=false \|accessdate=11 August 2010 \|postscript=<!-- Bot inserted parameter. Either remove it; or change its value to "." for the cite to end in a ".", as necessary. -->{{inconsistent citations}}}} Paperback ISBN 0-7619-2605-4

	==External links==
	{{wikiversity}}		{{wikiversity}}
	{{Wiktionary}}		{{Wiktionary}}
	*		*
			*
	*
			*
	*
	*		*
	*		*
	*		*
	*, May 5, 2006, ''NY Times''. "Psychometrics, one of the most obscure, esoteric and cerebral professions in America, is now also one of the hottest."

	{{Library resources box		{{Library resources box
	\|by=no		\|by=no
	\|onlinebooks=no		\|onlinebooks=no
	\|others=no		\|others=no
	\|about=yes		\|about=yes
	\|label=psychometrics}}		\|label=psychometrics}}

Line 202:		Line 199:
	]		]
	]		]
	]
	]
	]
	]		]
	]		]
			]
			]

Latest revision as of 12:58, 6 January 2025

Theory and technique of psychological measurement Not to be confused with psychrometrics, the measurement of the heat and water vapor properties of air. For other uses of this term and similar terms, see Psychometry (disambiguation). This article is about the theory and technique of measurement of psychological attributes. For research design and methodology in psychology, see Psychological statistics. For the mathematical modeling of psychological theories and phenomena, see Mathematical psychology.

Psychology
Part of a series on

Outline History Subfields
Basic psychology Abnormal Affective neuroscience Affective science Behavioral genetics Behavioral neuroscience Behaviorism Cognitive/Cognitivism Cognitive neuroscience Social Comparative Cross-cultural Cultural Developmental Differential Ecological Evolutionary Experimental Gestalt Intelligence Mathematical Moral Neuropsychology Perception Personality Psycholinguistics Psychophysiology Quantitative Social Theoretical
Applied psychology Anomalistic Applied behavior analysis Assessment Clinical Coaching Community Consumer Counseling Critical Educational Ergonomics Feminist Forensic Health Humanistic Industrial and organizational Legal Media Medical Military Music Occupational health Pastoral Political Positive Psychometrics Psychotherapy Religion School Sport and exercise Suicidology Systems Traffic
Concepts Behavior Behavioral engineering Behavioral genetics Behavioral neuroscience Cognition Competence Consciousness Consumer behavior Emotions Feelings Human factors and ergonomics Intelligence Mind Psychology of religion Psychometrics Terror management theory
Lists Counseling topics Disciplines Organizations Outline Psychologists Psychotherapies Research methods Schools of thought Timeline Topics
Psychology portal
v t e

Psychometrics is a field of study within psychology concerned with the theory and technique of measurement. Psychometrics generally covers specialized fields within psychology and education devoted to testing, measurement, assessment, and related activities. Psychometrics is concerned with the objective measurement of latent constructs that cannot be directly observed. Examples of latent constructs include intelligence, introversion, mental disorders, and educational achievement. The levels of individuals on nonobservable latent variables are inferred through mathematical modeling based on what is observed from individuals' responses to items on tests and scales.

Practitioners are described as psychometricians, although not all who engage in psychometric research go by this title. Psychometricians usually possess specific qualifications, such as degrees or certifications, and most are psychologists with advanced graduate training in psychometrics and measurement theory. In addition to traditional academic institutions, practitioners also work for organizations such as the Educational Testing Service and Psychological Corporation. Some psychometric researchers focus on the construction and validation of assessment instruments, including surveys, scales, and open- or close-ended questionnaires. Others focus on research relating to measurement theory (e.g., item response theory, intraclass correlation) or specialize as learning and development professionals.

Historical foundation

Psychological testing has come from two streams of thought: the first, from Darwin, Galton, and Cattell, on the measurement of individual differences and the second, from Herbart, Weber, Fechner, and Wundt and their psychophysical measurements of a similar construct. The second set of individuals and their research is what has led to the development of experimental psychology and standardized testing.

Victorian stream

Charles Darwin was the inspiration behind Francis Galton, a scientist who advanced the development of psychometrics. In 1859, Darwin published his book On the Origin of Species. Darwin described the role of natural selection in the emergence, over time, of different populations of species of plants and animals. The book showed how individual members of a species differ among themselves and how they possess characteristics that are more or less adaptive to their environment. Those with more adaptive characteristics are more likely to survive to procreate and give rise to another generation. Those with less adaptive characteristics are less likely. These ideas stimulated Galton's interest in the study of human beings and how they differ one from another and how to measure those differences.

Galton wrote a book entitled Hereditary Genius which was first published in 1869. The book described different characteristics that people possess and how those characteristics make some more "fit" than others. Today these differences, such as sensory and motor functioning (reaction time, visual acuity, and physical strength), are important domains of scientific psychology. Much of the early theoretical and applied work in psychometrics was undertaken in an attempt to measure intelligence. Galton often referred to as "the father of psychometrics," devised and included mental tests among his anthropometric measures. James McKeen Cattell, a pioneer in the field of psychometrics, went on to extend Galton's work. Cattell coined the term mental test, and is responsible for research and knowledge that ultimately led to the development of modern tests.

German stream

The origin of psychometrics also has connections to the related field of psychophysics. Around the same time that Darwin, Galton, and Cattell were making their discoveries, Herbart was also interested in "unlocking the mysteries of human consciousness" through the scientific method. Herbart was responsible for creating mathematical models of the mind, which were influential in educational practices for years to come.

E.H. Weber built upon Herbart's work and tried to prove the existence of a psychological threshold, saying that a minimum stimulus was necessary to activate a sensory system. After Weber, G.T. Fechner expanded upon the knowledge he gleaned from Herbart and Weber, to devise the law that the strength of a sensation grows as the logarithm of the stimulus intensity. A follower of Weber and Fechner, Wilhelm Wundt is credited with founding the science of psychology. It is Wundt's influence that paved the way for others to develop psychological testing.

20th century

In 1936, the psychometrician L. L. Thurstone, founder and first president of the Psychometric Society, developed and applied a theoretical approach to measurement referred to as the law of comparative judgment, an approach that has close connections to the psychophysical theory of Ernst Heinrich Weber and Gustav Fechner. In addition, Spearman and Thurstone both made important contributions to the theory and application of factor analysis, a statistical method developed and used extensively in psychometrics. In the late 1950s, Leopold Szondi made a historical and epistemological assessment of the impact of statistical thinking on psychology during previous few decades: "in the last decades, the specifically psychological thinking has been almost completely suppressed and removed, and replaced by a statistical thinking. Precisely here we see the cancer of testology and testomania of today."

More recently, psychometric theory has been applied in the measurement of personality, attitudes, and beliefs, and academic achievement. These latent constructs cannot truly be measured, and much of the research and science in this discipline has been developed in an attempt to measure these constructs as close to the true score as possible.

Figures who made significant contributions to psychometrics include Karl Pearson, Henry F. Kaiser, Carl Brigham, L. L. Thurstone, E. L. Thorndike, Georg Rasch, Eugene Galanter, Johnson O'Connor, Frederic M. Lord, Ledyard R Tucker, Louis Guttman, and Jane Loevinger.

Definition of measurement in the social sciences

The definition of measurement in the social sciences has a long history. A current widespread definition, proposed by Stanley Smith Stevens, is that measurement is "the assignment of numerals to objects or events according to some rule." This definition was introduced in a 1946 Science article in which Stevens proposed four levels of measurement. Although widely adopted, this definition differs in important respects from the more classical definition of measurement adopted in the physical sciences, namely that scientific measurement entails "the estimation or discovery of the ratio of some magnitude of a quantitative attribute to a unit of the same attribute" (p. 358)

Indeed, Stevens's definition of measurement was put forward in response to the British Ferguson Committee, whose chair, A. Ferguson, was a physicist. The committee was appointed in 1932 by the British Association for the Advancement of Science to investigate the possibility of quantitatively estimating sensory events. Although its chair and other members were physicists, the committee also included several psychologists. The committee's report highlighted the importance of the definition of measurement. While Stevens's response was to propose a new definition, which has had considerable influence in the field, this was by no means the only response to the report. Another, notably different, response was to accept the classical definition, as reflected in the following statement:

Measurement in psychology and physics are in no sense different. Physicists can measure when they can find the operations by which they may meet the necessary criteria; psychologists have to do the same. They need not worry about the mysterious differences between the meaning of measurement in the two sciences (Reese, 1943, p. 49).

These divergent responses are reflected in alternative approaches to measurement. For example, methods based on covariance matrices are typically employed on the premise that numbers, such as raw scores derived from assessments, are measurements. Such approaches implicitly entail Stevens's definition of measurement, which requires only that numbers are assigned according to some rule. The main research task, then, is generally considered to be the discovery of associations between scores, and of factors posited to underlie such associations.

On the other hand, when measurement models such as the Rasch model are employed, numbers are not assigned based on a rule. Instead, in keeping with Reese's statement above, specific criteria for measurement are stated, and the goal is to construct procedures or operations that provide data that meet the relevant criteria. Measurements are estimated based on the models, and tests are conducted to ascertain whether the relevant criteria have been met.

Instruments and procedures

The first psychometric instruments were designed to measure intelligence. One early approach to measuring intelligence was the test developed in France by Alfred Binet and Theodore Simon. That test was known as the Test Binet-Simon [fr].The French test was adapted for use in the U. S. by Lewis Terman of Stanford University, and named the Stanford-Binet IQ test.

Another major focus in psychometrics has been on personality testing. There has been a range of theoretical approaches to conceptualizing and measuring personality, though there is no widely agreed upon theory. Some of the better-known instruments include the Minnesota Multiphasic Personality Inventory, the Five-Factor Model (or "Big 5") and tools such as Personality and Preference Inventory and the Myers–Briggs Type Indicator. Attitudes have also been studied extensively using psychometric approaches. An alternative method involves the application of unfolding measurement models, the most general being the Hyperbolic Cosine Model (Andrich & Luo, 1993).

Theoretical approaches

Psychometricians have developed a number of different measurement theories. These include classical test theory (CTT) and item response theory (IRT). An approach that seems mathematically to be similar to IRT but also quite distinctive, in terms of its origins and features, is represented by the Rasch model for measurement. The development of the Rasch model, and the broader class of models to which it belongs, was explicitly founded on requirements of measurement in the physical sciences.

Psychometricians have also developed methods for working with large matrices of correlations and covariances. Techniques in this general tradition include: factor analysis, a method of determining the underlying dimensions of data. One of the main challenges faced by users of factor analysis is a lack of consensus on appropriate procedures for determining the number of latent factors. A usual procedure is to stop factoring when eigenvalues drop below one because the original sphere shrinks. The lack of the cutting points concerns other multivariate methods, also.

Multidimensional scaling is a method for finding a simple representation for data with a large number of latent dimensions. Cluster analysis is an approach to finding objects that are like each other. Factor analysis, multidimensional scaling, and cluster analysis are all multivariate descriptive methods used to distill from large amounts of data simpler structures.

More recently, structural equation modeling and path analysis represent more sophisticated approaches to working with large covariance matrices. These methods allow statistically sophisticated models to be fitted to data and tested to determine if they are adequate fits. Because at a granular level psychometric research is concerned with the extent and nature of multidimensionality in each of the items of interest, a relatively new procedure known as bi-factor analysis can be helpful. Bi-factor analysis can decompose "an item's systematic variance in terms of, ideally, two sources, a general factor and one source of additional systematic variance."

Key concepts

Key concepts in classical test theory are reliability and validity. A reliable measure is one that measures a construct consistently across time, individuals, and situations. A valid measure is one that measures what it is intended to measure. Reliability is necessary, but not sufficient, for validity.

Both reliability and validity can be assessed statistically. Consistency over repeated measures of the same test can be assessed with the Pearson correlation coefficient, and is often called test-retest reliability. Similarly, the equivalence of different versions of the same measure can be indexed by a Pearson correlation, and is called equivalent forms reliability or a similar term.

Internal consistency, which addresses the homogeneity of a single test form, may be assessed by correlating performance on two halves of a test, which is termed split-half reliability; the value of this Pearson product-moment correlation coefficient for two half-tests is adjusted with the Spearman–Brown prediction formula to correspond to the correlation between two full-length tests. Perhaps the most commonly used index of reliability is Cronbach's α, which is equivalent to the mean of all possible split-half coefficients. Other approaches include the intra-class correlation, which is the ratio of variance of measurements of a given target to the variance of all targets.

There are a number of different forms of validity. Criterion-related validity refers to the extent to which a test or scale predicts a sample of behavior, i.e., the criterion, that is "external to the measuring instrument itself." That external sample of behavior can be many things including another test; college grade point average as when the high school SAT is used to predict performance in college; and even behavior that occurred in the past, for example, when a test of current psychological symptoms is used to predict the occurrence of past victimization (which would accurately represent postdiction). When the criterion measure is collected at the same time as the measure being validated the goal is to establish concurrent validity; when the criterion is collected later the goal is to establish predictive validity. A measure has construct validity if it is related to measures of other constructs as required by theory. Content validity is a demonstration that the items of a test do an adequate job of covering the domain being measured. In a personnel selection example, test content is based on a defined statement or set of statements of knowledge, skill, ability, or other characteristics obtained from a job analysis.

Item response theory models the relationship between latent traits and responses to test items. Among other advantages, IRT provides a basis for obtaining an estimate of the location of a test-taker on a given latent trait as well as the standard error of measurement of that location. For example, a university student's knowledge of history can be deduced from his or her score on a university test and then be compared reliably with a high school student's knowledge deduced from a less difficult test. Scores derived by classical test theory do not have this characteristic, and assessment of actual ability (rather than ability relative to other test-takers) must be assessed by comparing scores to those of a "norm group" randomly selected from the population. In fact, all measures derived from classical test theory are dependent on the sample tested, while, in principle, those derived from item response theory are not.

Standards of quality

The considerations of validity and reliability typically are viewed as essential elements for determining the quality of any test. However, professional and practitioner associations frequently have placed these concerns within broader contexts when developing standards and making overall judgments about the quality of any test as a whole within a given context. A consideration of concern in many applied research settings is whether or not the metric of a given psychological inventory is meaningful or arbitrary.

Testing standards

In 2014, the American Educational Research Association (AERA), American Psychological Association (APA), and National Council on Measurement in Education (NCME) published a revision of the Standards for Educational and Psychological Testing, which describes standards for test development, evaluation, and use. The Standards cover essential topics in testing including validity, reliability/errors of measurement, and fairness in testing. The book also establishes standards related to testing operations including test design and development, scores, scales, norms, score linking, cut scores, test administration, scoring, reporting, score interpretation, test documentation, and rights and responsibilities of test takers and test users. Finally, the Standards cover topics related to testing applications, including psychological testing and assessment, workplace testing and credentialing, educational testing and assessment, and testing in program evaluation and public policy.

Evaluation standards

In the field of evaluation, and in particular educational evaluation, the Joint Committee on Standards for Educational Evaluation has published three sets of standards for evaluations. The Personnel Evaluation Standards was published in 1988, The Program Evaluation Standards (2nd edition) was published in 1994, and The Student Evaluation Standards was published in 2003.

Each publication presents and elaborates a set of standards for use in a variety of educational settings. The standards provide guidelines for designing, implementing, assessing, and improving the identified form of evaluation. Each of the standards has been placed in one of four fundamental categories to promote educational evaluations that are proper, useful, feasible, and accurate. In these sets of standards, validity and reliability considerations are covered under the accuracy topic. For example, the student accuracy standards help ensure that student evaluations will provide sound, accurate, and credible information about student learning and performance.

Controversy and criticism

Because psychometrics is based on latent psychological processes measured through correlations, there has been controversy about some psychometric measures. Critics, including practitioners in the physical sciences, have argued that such definition and quantification is difficult, and that such measurements are often misused by laymen, such as with personality tests used in employment procedures. The Standards for Educational and Psychological Measurement gives the following statement on test validity: "validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests". Simply put, a test is not valid unless it is used and interpreted in the way it is intended.

Two types of tools used to measure personality traits are objective tests and projective measures. Examples of such tests are the: Big Five Inventory (BFI), Minnesota Multiphasic Personality Inventory (MMPI-2), Rorschach Inkblot test, Neurotic Personality Questionnaire KON-2006, or Eysenck Personality Questionnaire. Some of these tests are helpful because they have adequate reliability and validity, two factors that make tests consistent and accurate reflections of the underlying construct. The Myers–Briggs Type Indicator (MBTI), however, has questionable validity and has been the subject of much criticism. Psychometric specialist Robert Hogan wrote of the measure: "Most personality psychologists regard the MBTI as little more than an elaborate Chinese fortune cookie."

Lee Cronbach noted in American Psychologist (1957) that, "correlational psychology, though fully as old as experimentation, was slower to mature. It qualifies equally as a discipline, however, because it asks a distinctive type of question and has technical methods of examining whether the question has been properly put and the data properly interpreted." He would go on to say, "The correlation method, for its part, can study what man has not learned to control or can never hope to control ... A true federation of the disciplines is required. Kept independent, they can give only wrong answers or no answers at all regarding certain important problems."

Non-human: animals and machines

Psychometrics addresses human abilities, attitudes, traits, and educational evolution. Notably, the study of behavior, mental processes, and abilities of non-human animals is usually addressed by comparative psychology, or with a continuum between non-human animals and the rest of animals by evolutionary psychology. Nonetheless, there are some advocators for a more gradual transition between the approach taken for humans and the approach taken for (non-human) animals.

The evaluation of abilities, traits and learning evolution of machines has been mostly unrelated to the case of humans and non-human animals, with specific approaches in the area of artificial intelligence. A more integrated approach, under the name of universal psychometrics, has also been proposed.

References

"Glossary1". 22 July 2017. Archived from the original on 2017-07-22. Retrieved 28 June 2022.
^ Tabachnick, B.G.; Fidell, L.S. (2001). Using Multivariate Analysis. Boston: Allyn and Bacon. ISBN 978-0-321-05677-1.
Kaplan, R.M., & Saccuzzo, D.P. (2010). Psychological Testing: Principles, Applications, and Issues. (8th ed.). Belmont, CA: Wadsworth, Cengage Learning.
^ Kaplan, R.M., & Saccuzzo, D.P. (2010). Psychological testing: Principles, applications, and issues (8th ed.). Belmont, CA: Wadsworth, Cengage Learning.
Nunnally, J., & Berstein, I. H. (1994). Psychometric theory (3rd ed.). New York: McGraw-Hill.
Leopold Szondi (1960) Das zweite Buch: Lehrbuch der Experimentellen Triebdiagnostik. Huber, Bern und Stuttgart, 2nd edition. Ch.27, From the Spanish translation, B)II Las condiciones estadisticas, p.396. Quotation:
el pensamiento psicologico especifico, en las ultima decadas, fue suprimido y eliminado casi totalmente, siendo sustituido por un pensamiento estadistico. Precisamente aqui vemos el cáncer de la testología y testomania de hoy.
Stevens, S. S. (7 June 1946). "On the Theory of Scales of Measurement". Science. 103 (2684): 677–680. Bibcode:1946Sci...103..677S. doi:10.1126/science.103.2684.677. PMID 17750512. S2CID 4667599.
Michell, Joel (August 1997). "Quantitative science and the definition of measurement in psychology". British Journal of Psychology. 88 (3): 355–383. doi:10.1111/j.2044-8295.1997.tb02641.x.
Reese, T.W. (1943). The application of the theory of physical measurement to the measurement of psychological magnitudes, with three experimental examples. Psychological Monographs, 55, 1–89. doi:10.1037/h0061367
"Psychometrics". Assessmentpsychology.com. Retrieved 28 June 2022.
Stern, Theodore A.; Fava, Maurizio; Wilens, Timothy E.; Rosenbaum, Jerrold F. (2016). Massachusetts General Hospital comprehensive clinical psychiatry (Second ed.). London. p. 73. ISBN 978-0323295079. Retrieved 31 October 2021.{{cite book}}: CS1 maint: location missing publisher (link)
Longe, Jacqueline L., ed. (2022). The Gale Encyclopedia of Psychology. Vol. 2 (4th ed.). Farmington Hills, Michigan: Gale. p. 1000. ISBN 9780028683867.
Andrich, D. & Luo, G. (1993). A hyperbolic cosine latent trait model for unfolding dichotomous single-stimulus responses. Applied Psychological Measurement, 17, 253–276.
Embretson, S.E., & Reise, S.P. (2000). Item Response Theory for Psychologists. Mahwah, NJ: Erlbaum.
Hambleton, R.K., & Swaminathan, H. (1985). Item Response Theory: Principles and Applications. Boston: Kluwer-Nijhoff.
Rasch, G. (1960/1980). Probabilistic models for some intelligence and attainment tests. Copenhagen, Danish Institute for Educational Research, expanded edition (1980) with foreword and afterword by B.D. Wright. Chicago: The University of Chicago Press.
Thompson, B.R. (2004). Exploratory and Confirmatory Factor Analysis: Understanding Concepts and Applications. American Psychological Association.
Zwick, William R.; Velicer, Wayne F. (1986). "Comparison of five rules for determining the number of components to retain". Psychological Bulletin. 99 (3): 432–442. doi:10.1037/0033-2909.99.3.432.
Singh, Manoj Kumar (2021-09-11). Introduction to Social Psychology. K.K. Publications.
Davison, M.L. (1992). Multidimensional Scaling. Krieger.
Kaplan, D. (2008). Structural Equation Modeling: Foundations and Extensions, 2nd ed. Sage.
DeMars, C. E. (2013). A tutorial on interpreting bi-factor model scores. International Journal of Testing, 13, 354–378. http://dx.doi.org/10 .1080/15305058.2013.799067
Reise, S. P. (2012). The rediscovery of bi-factor modeling. Multivariate Behavioral Research, 47, 667–696. http://dx.doi.org/10.1080/00273171.2012.715555
Rodriguez, A., Reise, S. P., & Haviland, M. G. (2016). Evaluating bifactor models: Calculating and interpreting statistical indices. Psychological Methods, 21, 137–150. http://dx.doi.org/10.1037/met0000045
Schonfeld, I.S., Verkuilen, J. & Bianchi, R. (2019). An exploratory structural equation modeling bi-factor analytic approach to uncovering what burnout, depression, and anxiety scales measure. Psychological Assessment, 31, 1073–1079. http://dx.doi.org/10.1037/pas0000721 p. 1075
^ "Home – Educational Research Basics by Del Siegle". www.gifted.uconn.edu. 17 February 2015.
Nunnally, J.C. (1978). Psychometric theory (2nd ed.). New York: McGraw-Hill.
Blanton, H., & Jaccard, J. (2006). Arbitrary metrics in psychology. Archived 2006-05-10 at the Wayback Machine American Psychologist, 61(1), 27–41.
"The Standards for Educational and Psychological Testing". apa.org.
"Joint Committee on Standards for Educational Evaluation". Archived from the original on 15 October 2009. Retrieved 28 June 2022.
Joint Committee on Standards for Educational Evaluation. (1988). The Personnel Evaluation Standards: How to Assess Systems for Evaluating Educators. Archived 2005-12-12 at the Wayback Machine Newbury Park, CA: Sage Publications.
Joint Committee on Standards for Educational Evaluation. (1994). The Program Evaluation Standards, 2nd Edition. Archived 2006-02-22 at the Wayback Machine Newbury Park, CA: Sage Publications.
Committee on Standards for Educational Evaluation. (2003). The Student Evaluation Standards: How to Improve Evaluations of Students. Archived 2006-05-24 at the Wayback Machine Newbury Park, CA: Corwin Press.
[E. Cabrera-Nguyen (2010). "Author guidelines for reporting scale development and validation results in the Journal of the Society for Social Work and Research]". Academia.edu. 1 (2): 99–103.
Tabachnick, B.G.; Fidell, L.S. (2001). Using Multivariate Analysis. Boston: Allyn and Bacon. ISBN 978-0-321-05677-1.
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999) Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
Bandalos, Deborah L. (2018). Measurement theory and applications for the social sciences. New York. p. 261. ISBN 978-1-4625-3215-5. OCLC 1015955756.{{cite book}}: CS1 maint: location missing publisher (link)
Aleksandrowicz JW, Klasa K, Sobański JA, Stolarska D (2009). "KON-2006 Neurotic Personality Questionnaire" (PDF). Archives of Psychiatry and Psychotherapy. 1: 21–22.
Hogan, Robert (2007). Personality and the fate of organizations. Mahwah, NJ: Lawrence Erlbaum Associates. p. 28. ISBN 978-0-8058-4142-8. OCLC 65400436.
Cronbach, L. J. (1957). "The two disciplines of scientific psychology". American Psychologist. 12 (11): 671–684. doi:10.1037/h0043943 – via EBSCO.
Humphreys, L.G. (1987). "Psychometrics considerations in the evaluation of intraspecies differences in intelligence". Behav Brain Sci. 10 (4): 668–669. doi:10.1017/s0140525x0005514x.
Eysenck, H.J. (1987). "The several meanings of intelligence". Behav Brain Sci. 10 (4): 663. doi:10.1017/s0140525x00055060.
Locurto, C. & Scanlon, C (1987). "Individual differences and spatial learning factor in two strains of mice". Behav Brain Sci. 112: 344–352.
King, James E & Figueredo, Aurelio Jose (1997). "The five-factor model plus dominance in chimpanzee personality". Journal of Research in Personality. 31 (2): 257–271. doi:10.1006/jrpe.1997.2179.
J. Hernández-Orallo; D.L. Dowe; M.V. Hernández-Lloreda (2013). "Universal Psychometrics: Measuring Cognitive Abilities in the Machine Kingdom" (PDF). Cognitive Systems Research. 27: 50–74. doi:10.1016/j.cogsys.2013.06.001. hdl:10251/50244. S2CID 26440282.
Hernández-Orallo, José (2017). The Measure of All Minds: Evaluating Natural and Artificial Intelligence. Cambridge: Cambridge University Press. ISBN 978-1-107-15301-1.

Bibliography

Andrich, D. & Luo, G. (1993). "A hyperbolic cosine model for unfolding dichotomous single-stimulus responses" (PDF). Applied Psychological Measurement. 17 (3): 253–276. CiteSeerX 10.1.1.1003.8107. doi:10.1177/014662169301700307. S2CID 120745971.
Michell, J. (1999). Measurement in Psychology. Cambridge: Cambridge University Press. doi:10.1017/CBO9780511490040
Rasch, G. (1960/1980). Probabilistic models for some intelligence and attainment tests. Copenhagen, Danish Institute for Educational Research), expanded edition (1980) with foreword and afterword by B.D. Wright. Chicago: The University of Chicago Press.
Reese, T.W. (1943). The application of the theory of physical measurement to the measurement of psychological magnitudes, with three experimental examples. Psychological Monographs, 55, 1–89. doi:10.1037/h0061367
Stevens, S. S. (1946). "On the theory of scales of measurement". Science. 103 (2684): 677–80. Bibcode:1946Sci...103..677S. doi:10.1126/science.103.2684.677. PMID 17750512.
Thurstone, L.L. (1927). "A law of comparative judgement". Psychological Review. 34 (4): 278–286. doi:10.1037/h0070288.
Thurstone, L.L. (1929). The Measurement of Psychological Value. In T.V. Smith and W.K. Wright (Eds.), Essays in Philosophy by Seventeen Doctors of Philosophy of the University of Chicago. Chicago: Open Court.
Thurstone, L.L. (1959). The Measurement of Values. Chicago: The University of Chicago Press.
S.F. Blinkhorn (1997). "Past imperfect, future conditional: fifty years of test theory". British Journal of Mathematical and Statistical Psychology. 50 (2): 175–185. doi:10.1111/j.2044-8317.1997.tb01139.x.
Sanford, David (18 November 2017). "Cambridge just told me Big Data doesn't work yet". LinkedIn.

External links

Library resources about
psychometrics

Resources in your library

v t e Human intelligence topics
Types	Collective Emotional Intellectual Linguistic Multiple Social Spatial (visuospatial)
Abilities, traits, and constructs	Cognition Cognitive liberty Communication Creativity Fluid and crystallized intelligence g factor Intellect Intelligence quotient Knowledge Learning Memory Problem solving Reasoning Skill Thought (abstraction) Understanding Visual processing
Models and theories	Cattell–Horn–Carroll theory Fluid and crystallized intelligence Multiple-intelligences theory PASS theory Three-stratum theory Triarchic theory
Areas of research	Evolution of human intelligence Heritability of IQ Psychometrics Intelligence and environment / fertility / height / health / longevity / neuroscience / personality / race / sex
Outline of human intelligence / thought

v t e Psychology
History Philosophy Portal Psychologist
Basic psychology	Abnormal Affective neuroscience Affective science Behavioral genetics Behavioral neuroscience Behaviorism Cognitive/Cognitivism Cognitive neuroscience Social Comparative Cross-cultural Cultural Developmental Differential Ecological Evolutionary Experimental Gestalt Intelligence Mathematical Moral Neuropsychology Perception Personality Psycholinguistics Psychophysiology Quantitative Social Theoretical
Applied psychology	Anomalistic Applied behavior analysis Assessment Clinical Coaching Community Consumer Counseling Critical Educational Ergonomics Feminist Forensic Health Humanistic Industrial and organizational Legal Media Medical Military Music Occupational health Pastoral Political Positive Psychometrics Psychotherapy Religion School Sport and exercise Suicidology Systems Traffic
Methodologies	Animal testing Archival research Behavior epigenetics Case study Content analysis Experiments Human subject research Interviews Neuroimaging Observation Psychophysics Qualitative research Quantitative research Self-report inventory Statistical surveys
Concepts	Behavior Behavioral engineering Behavioral genetics Behavioral neuroscience Cognition Competence Consciousness Consumer behavior Emotions Feelings Human factors and ergonomics Intelligence Mind Psychology of religion Psychometrics Terror management theory
Psychologists	Wilhelm Wundt William James Ivan Pavlov Sigmund Freud Edward Thorndike Carl Jung John B. Watson Clark L. Hull Kurt Lewin Jean Piaget Gordon Allport J. P. Guilford Carl Rogers Erik Erikson B. F. Skinner Donald O. Hebb Ernest Hilgard Harry Harlow Raymond Cattell Abraham Maslow Neal E. Miller Jerome Bruner Donald T. Campbell Hans Eysenck Herbert A. Simon David McClelland Leon Festinger George A. Miller Richard Lazarus Stanley Schachter Robert Zajonc Albert Bandura Roger Brown Endel Tulving Lawrence Kohlberg Noam Chomsky Ulric Neisser Jerome Kagan Walter Mischel Elliot Aronson Daniel Kahneman Paul Ekman Michael Posner Amos Tversky Bruce McEwen Larry Squire Richard E. Nisbett Martin Seligman Ed Diener Shelley E. Taylor John Anderson Ronald C. Kessler Joseph E. LeDoux Richard Davidson Susan Fiske Roy Baumeister
Lists	Counseling topics Disciplines Organizations Outline Psychologists Psychotherapies Research methods Schools of thought Timeline Topics
Wiktionary definition Wiktionary category Wikisource Wikimedia Commons Wikiquote Wikinews Wikibooks

Statistics

Descriptive statistics

Continuous data

Center	Mean Arithmetic Arithmetic-Geometric Contraharmonic Cubic Generalized/power Geometric Harmonic Heronian Heinz Lehmer Median Mode
Dispersion	Average absolute deviation Coefficient of variation Interquartile range Percentile Range Standard deviation Variance
Shape	Central limit theorem Moments Kurtosis L-moments Skewness

Count data

Index of dispersion

Summary tables

Dependence

Graphics

Data collection

Study design	Effect size Missing data Optimal design Population Replication Sample size determination Statistic Statistical power
Survey methodology	Sampling Cluster Stratified Opinion poll Questionnaire Standard error
Controlled experiments	Blocking Factorial experiment Interaction Random assignment Randomized controlled trial Randomized experiment Scientific control
Adaptive designs	Adaptive clinical trial Stochastic approximation Up-and-down designs
Observational studies	Cohort study Cross-sectional study Natural experiment Quasi-experiment

Statistical inference

Statistical theory

Frequentist inference

Point estimation	Estimating equations Maximum likelihood Method of moments M-estimator Minimum distance Unbiased estimators Mean-unbiased minimum-variance Rao–Blackwellization Lehmann–Scheffé theorem Median unbiased Plug-in
Interval estimation	Confidence interval Pivot Likelihood interval Prediction interval Tolerance interval Resampling Bootstrap Jackknife
Testing hypotheses	1- & 2-tails Power Uniformly most powerful test Permutation test Randomization test Multiple comparisons
Parametric tests	Likelihood-ratio Score/Lagrange multiplier Wald

Specific tests

Z-test (normal) Student's t-test F-test
Goodness of fit	Chi-squared G-test Kolmogorov–Smirnov Anderson–Darling Lilliefors Jarque–Bera Normality (Shapiro–Wilk) Likelihood-ratio test Model selection Cross validation AIC BIC
Rank statistics	Sign Sample median Signed rank (Wilcoxon) Hodges–Lehmann estimator Rank sum (Mann–Whitney) Nonparametric anova 1-way (Kruskal–Wallis) 2-way (Friedman) Ordered alternative (Jonckheere–Terpstra) Van der Waerden test

Bayesian inference

Correlation	Pearson product-moment Partial correlation Confounding variable Coefficient of determination
Regression analysis	Errors and residuals Regression validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS)
Linear regression	Simple linear regression Ordinary least squares General linear model Bayesian regression
Non-standard predictors	Nonlinear regression Nonparametric Semiparametric Isotonic Robust Homoscedasticity and Heteroscedasticity
Generalized linear model	Exponential families Logistic (Bernoulli) / Binomial / Poisson regressions
Partition of variance	Analysis of variance (ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom

Categorical / Multivariate / Time-series / Survival analysis

Categorical

Multivariate

Time-series

General	Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality
Specific tests	Dickey–Fuller Johansen Q-statistic (Ljung–Box) Durbin–Watson Breusch–Godfrey
Time domain	Autocorrelation (ACF) partial (PACF) Cross-correlation (XCF) ARMA model ARIMA model (Box–Jenkins) Autoregressive conditional heteroskedasticity (ARCH) Vector autoregression (VAR)
Frequency domain	Spectral density estimation Fourier analysis Least-squares spectral analysis Wavelet Whittle likelihood

Survival

Survival function	Kaplan–Meier estimator (product limit) Proportional hazards models Accelerated failure time (AFT) model First hitting time
Hazard function	Nelson–Aalen estimator
Test	Log-rank test

Applications

Biostatistics	Bioinformatics Clinical trials / studies Epidemiology Medical statistics
Engineering statistics	Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification
Social statistics	Actuarial science Census Crime statistics Demography Econometrics Jurimetrics National accounts Official statistics Population statistics Psychometrics
Spatial statistics	Cartography Environmental statistics Geographic information system Geostatistics Kriging