Educational data mining: Difference between revisions

Browse history interactively ← Previous edit Next edit →Content deleted Content addedVisual WikitextInline

Revision as of 08:34, 3 October 2011 editAnomieBOT (talk \| contribs)Bots6,573,838 editsm Dating maintenance tags: {{Cleanup-section}}← Previous edit		Revision as of 03:07, 4 October 2011 edit undo166.248.0.81 (talk) Removed off-topic inappropriate long section on one researcher's project, recently addedTag: section blankingNext edit →
Line 59:		Line 59:
	In 2010, the ]'s was held, with a data and topic		In 2010, the ]'s was held, with a data and topic
	in the domain of educational data mining. The data set was provided by the ] DataShop, and consisted of over a million data points from students using ] educational software. 600 Teams competed for over $8000 dollars in prize money donated by ]. The used the ], ], and ] techniques to accurately predict the performance of over half a million unseen student responses.		in the domain of educational data mining. The data set was provided by the ] DataShop, and consisted of over a million data points from students using ] educational software. 600 Teams competed for over $8000 dollars in prize money donated by ]. The used the ], ], and ] techniques to accurately predict the performance of over half a million unseen student responses.

	== Cluster analysis in education data mining ==
	{{cleanup-section\|date=October 2011}}
	<!-- merged from ] -->
	In educational research analysis, data for clustering can be students, parents, sex or test score. Clustering is an important method for understanding and utility of grouping or streaming<ref name="Kumar">Kumar , Prepublication chapter from a book</ref> in educational research. Cluster analysis in educational research can be used for data exploration,<ref name="Finch">Finch, H. (2005). "Comparison of distance measures in cluster analysis with dichotomous data". ''Journal of Data Science'', 3, 85-100</ref> cluster confirmation <ref name="Huberty">Huberty, C. J., Jordan, E. M., & Brandt, W. C. (2005). "Cluster analysis in higher education research". In J. C. Smart (Ed.), ''Higher Education: Handbook of Theory and Research'' (Vol. 20, pp. 437-457). Great Britain: Springer.</ref> and hypothesis testing.<ref name="Huberty" /> Data exploration is used when there is little information about which schools or students will be grouped together.<ref name="Finch" /> It aims at discovering any meaningful clusters of units based on measures on a set of response variables. Cluster confirmation is used for confirming the previously reported cluster results.<ref name="Huberty" /> Hypothesis testing is used for arranging cluster structure.<ref name="Huberty" />

	===Example of cluster analysis in educational research===
	In 2002, Hattie used cluster analysis in the project 'School Like Mine' <ref name="Hattie">Hattie (2002). "Schools Like Mine: Cluster Analysis of New Zealand Schools". ''Technical Report 14, Project asTTle''. University of Auckland.</ref> to compare students' achievement in literacy and numeracy by the type of school they attended. 2707 majority and minority students in New Zealand were classified into different clusters according to school size, student ethnicity, region, size of civil jurisdiction and socioeconomic status for comparison. The clusters in this research were calculated across five dimensions, decile, region, size, minority and rurality. All schools were placed into one of twenty clusters that are used in the asTTle software{{clarify\|date=December 2010}} as a basis of student achievement comparison. The results showed that solely using the power of socioeconomic status to describe schools is inadequate. By clustering schools, Hattie suggested that school types had no significant relation with performance of schools.

	===Common cluster techniques in educational research===
	All cluster techniques have two basic concerns: firstly, the measurement of similarity between individual profiles; and secondly, the use of that measure to form the groups or clusters. Brennan <ref name="Bennett75">Bennett, S. N. (1975). Cluster analysis in educational research: A non-statistical introduction. Research Intelligence, 1, 64-70.</ref> described Iterative Relocation as the most important cluster technique in behavioral and educational research. It has been adopted at Lancaster to create typologies of pupils based on personality and behavioral items to identify types of students <ref name="Entwistle">Entwistle, N. J., & Brennan, T. (1971). "Types of successful students". ''British Journal of Educational Psychology'', 41, 268-276.</ref> and to isolate the skills considered to be important for certain grades of technologist in industry. Other similarity coefficients are available and the one chosen will depend upon the type of data gained.

	The number of groups needs to be decided. This is often an arbitrary decision and the groupings are random. The analysis then proceeds by computing the group profile (or group centroid) of each group, which is the cumulative frequencies of all variables measured. Each individual should be compared with each of the group centroids. A number of formulae are available for measuring this similarity. Among others, Wishart<ref name="Bennett75" /> has found the error sum of squares, a measure of dissimilarity, to be one of the most successful coefficients for continuous data. When relocate or alters the composition of groups and recalculate the group centroids are completed, a new iteration cycle commences. This sequence of comparison and relocation continues until all individuals are in the group whose central profile is most similar to their own. The solution is then said to be stable. The analysis is then continued by reducing the number of groups by one (N-1). This is achieved by a fushion process whereby a measure of dissimilarity (error sum of squares) between all pairs of group centroids is calculated again. The two most similar groups are then combined to reduce one group. Recalculate the group centroids and repeat steps 2 and 3 until the solution is stable for the N-1 group level. This process can continue until the two level group is reached, at which point the analysis is complete.

	===Advantages of cluster analysis===
	Frisvad of BioCentrum-DTU said that cluster analysis is a good way for quick review of data, especially if the objects are classified into many groups.<ref name="Frisvad">Frisvad. . Based on BasedonH.C. Romesburg: ''Cluster analysis for researchers'', Lifetime Learning Publications, Belmont, CA, 1984 P.H.A. Sneathand R.R. Sokal: NumericxalTaxonomy, Freeman, San Francisco, CA, 1973 http://www2.imm.dtu.dk/courses/27411/doc/Lection10/Cluster%20analysis.pdf</ref> In the 'Schools Like Mine' example,<ref name="Hattie" /> 23 clusters of schools with different properties were clearly clustered. It is easy for users to assign or nominate themselves into a cluster they would most like to compare with in a school cluster database<ref name="Hattie" /> because each cluster is clearly named with understandable terms.

	Cluster Analysis provides a simple profile of individuals.<ref name="Hattie" /> Given a number of analysis units, for example school size, student ethnicity, region, size of civil jurisdiction and social economic status in this example, each of which is described by a set of characteristics and attributes. Cluster Analysis also suggests how groups of units are determined such that units within groups are similar in some respect and unlike those from other groups.<ref name="Huberty" />

	===Disadvantages of cluster analysis===
	An object can be assigned in one cluster only.<ref name="Hattie" /> For example in 'Schools Like Mine', schools are automatically assigned into the first twenty-two clusters. However, if schools want to compare themselves with integrated schools, they will have to manually assign themselves into cluster twenty-three. Data-driven clustering may not represent reality, because once a school is assigned to a cluster, it cannot be assigned to another one. Some schools may have more than one significant property or fall on the edge of two clusters.<ref name="Hattie"/>

	Clustering may have detrimental effects to teachers who work in low-decile schools, students who are educated in them, and parents who support them, by telling them the schools are classified as ineffective, when in fact many are doing well in some unique aspects that are not sufficiently illustrated by the clusters formed.<ref name="Hattie" />

	In ] clustering methods, it is often requires several analysis before the number of clusters can be determined.<ref name="Cornish">Cornish, (2007). Cluster Analysis. ''Mathematics Learning Support Chapter 3.1''.</ref> It can be very sensitive to the choice of initial cluster centres.<ref name="Cornish" />

	===Solution to problems of cluster analysis in educational research===
	Hattie stated although Cluster analysis provides an easy way to make comparison between schools, no particular variable should be taken as the "short cut" for judging school quality.<ref name="Hattie" /> in order to overcome the unit reassignment issue, some researchers suggest a nonhierarchical cluster method which allows for reassignment of units from one cluster. This operates through an iterative partitioning k- means algorithm, where k denotes the number of clusters.<ref name="Huberty" /> Nevertheless, to conduct a k-means analysis, the number of clusters needs to be specified at the start. This limits the exploratory power of cluster analysis.

	Cluster Analysis has to be very carefully used in classifying schools into groups because results are heavily influenced by partial sampling, choice of clustering criteria and compositional variables, as well as cluster labeling. Like assigning schools into different bands, clustering may bring about unnecessary comparisons and inappropriate discriminations among schools, thereby adversely affecting students.<ref name="Hattie" />

	== References ==		== References ==

Revision as of 03:07, 4 October 2011

Template:Wikify is deprecated. Please use a more specific cleanup template as listed in the documentation.

Educational Data Mining (called EDM) is an emerging discipline, concerned with developing methods for exploring the unique types of data that come from educational settings, and using those methods to better understand students, and the settings which they learn in. A key area of EDM is mining computer logs of student performance. Another key area is mining enrollment data. Key uses of EDM include predicting student performance, and studying learning in order to recommend improvements to current educational practice. EDM can be considered one of the learning sciences, as well as an area of data mining. A related field is learning analytics.

EDM methods

The types of EDM method are related to those found in data mining in general, but with some differences based on the unique features of educational data.

Ryan Baker classifies the areas of EDM as follows:

Prediction
Clustering
Relationship mining
- Association rule mining
- Correlation mining
- Sequential pattern mining
- Causal data mining
Distillation of data for human judgment
Discovery with models

Baker and Kalina Yacef claim that discovery with models is particularly prominent in EDM, as compared to data mining in general. In discovery with models, a model of a phenomenon is developed through any process that can be validated in some fashion (most commonly, prediction or knowledge engineering), and this model is then used as a component in another analysis, such as prediction or relationship mining.

Applications

A list of the primary applications of EDM is provided by Cristobal Romero and Sebastian Ventura. In their taxonomy, the areas of EDM application are:

Analysis and visualization of data
Providing feedback for supporting instructors
Recommendations for students
Predicting student performance
Student modeling
Detecting undesirable student behaviors
Grouping students
Social network analysis
Developing concept maps
Constructing courseware
Planning and scheduling

Publication Venues

Considerable amounts of EDM work are published at the peer-reviewed International Conference on Educational Data Mining, organized by the International Educational Data Mining Society.

1st International Conference on Educational Data Mining (2008) -- Montreal, Canada
2nd International Conference on Educational Data Mining (2009) -- Cordoba, Spain
3rd International Conference on Educational Data Mining (2010) -- Pittsburgh, USA
4th International Conference on Educational Data Mining (2011) -- Eindhoven, Netherlands

EDM papers are also published in the Journal of Educational Data Mining (JEDM).

Many EDM papers are routinely published in related conferences, such as Artificial Intelligence and Education, Intelligent Tutoring Systems, and User Modeling and Adaptive Personalization.

The use of Educational Data Mining in the KDD Cup

In 2010, the Association of Computing Machinery's KDD Cup was held, with a data and topic in the domain of educational data mining. The data set was provided by the Pittsburgh Science of Learning Center DataShop, and consisted of over a million data points from students using Cognitive Tutor educational software. 600 Teams competed for over $8000 dollars in prize money donated by Facebook. The winners used the Random forest, Bayesian Networks, and Feature generation techniques to accurately predict the performance of over half a million unseen student responses.

References

"EducationalDataMining.org". 2010. Retrieved 2011-01-16.
R. Baker, K. Yacef (2010). "The State of Educational Data Mining in 2009: A Review and Future Visions". Journal of Educational Data Mining, Volume 1, Issue 1. 1: 3–17.
C. Romero, S. Ventura, E. Garcia (2008). "Data Mining in Course Management Systems: MOODLE Case Study and Tutorial". Computers & Education. 51(1): 368–384.{{cite journal}}: CS1 maint: multiple names: authors list (link)
R. Baker (2010) Data Mining for Education. In McGaw, B., Peterson, P., Baker, E. (Eds.) International Encyclopedia of Education (3rd edition), vol. 7, pp. 112-118. Oxford, UK: Elsevier.
C. Romero, S. Ventura. Educational Data Mining: A Review of the State-of-the-Art. IEEE Transaction on Systems, Man, and Cybernetics, Part C: Applications and Reviews. 40(6), 601-618, 2010.
http://www.educationaldata.mining.org/EDM2008
http://www.educationaldata.mining.org/EDM2009
http://www.educationaldata.mining.org/EDM2010
http://www.educationaldata.mining.org/EDM2011
http://www.educationaldatamining.org/JEDM/

Categories: