Revision as of 01:16, 19 January 2018 view sourceEddie891 (talk | contribs)Autopatrolled, Administrators56,138 edits fixTag: 2017 wikitext editor← Previous edit | Revision as of 01:20, 19 January 2018 view source Eddie891 (talk | contribs)Autopatrolled, Administrators56,138 edits findingsTag: 2017 wikitext editorNext edit → | ||
Line 13: | Line 13: | ||
{{WRN}} | {{WRN}} | ||
===Determining Quality of Articles in Polish Misplaced Pages Based on Linguistic Features=== | ===Determining Quality of Articles in Polish Misplaced Pages Based on Linguistic Features=== | ||
This article <ref>{{Cite journal| doi = 10.20944/preprints201801.0017.v1| last1 = Lewoniewski| first1 = Włodzimierz| last2 = Węcel| first2 = Krzysztof| last3 = Abramowicz| first3 = Witold| title = Determining Quality of Articles in Polish Misplaced Pages Based on Linguistic Features| date = 2018-01-03| url = http://www.preprints.org/manuscript/201801.0017/v1}}</ref> focuses on the 1.2 million unassesed articles in the ], and considers "over 100 linguistic features to determine the quality of Misplaced Pages articles in Polish language." | This article <ref>{{Cite journal| doi = 10.20944/preprints201801.0017.v1| last1 = Lewoniewski| first1 = Włodzimierz| last2 = Węcel| first2 = Krzysztof| last3 = Abramowicz| first3 = Witold| title = Determining Quality of Articles in Polish Misplaced Pages Based on Linguistic Features| date = 2018-01-03| url = http://www.preprints.org/manuscript/201801.0017/v1}}</ref> focuses on the 1.2 million unassesed articles in the ], and considers "over 100 linguistic features to determine the quality of Misplaced Pages articles in Polish language." Conclusion: Use of linguistic features is valuable for automatic determination of quality of Misplaced Pages article in Polish language. Better results in terms of precision can be achieved when the whole text of article is taken into the account. Then our model shows over 93% classification precision using such features as relative number of unique nouns and verbs (unique, 3rd person, impersonal). However, if we take into account only leading section of an article, relative quantity of common words, locatives, vocatives and third person words are the most significant for determination of quality.Using the obtained quality models we asses 500 000 randomly chosen unevaluated articles from Polish Misplaced Pages. According to result, about 4-5% of assessed articles can be considered by Misplaced Pages community as high quality articles. | ||
===Briefly=== | ===Briefly=== | ||
Revision as of 01:20, 19 January 2018
Article display preview: TKTK – TKTKRecent researchTKTK Worthwhile Canadian initiative TKTKTKTK Nemo enim ipsam voluptatem, quia voluptas sit, aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos. TKTK | This is a draft of a potential Signpost article, and should not be interpreted as a finished piece. Its content is subject to review by the editorial team and ultimately by JPxG, the editor in chief. Please do not link to this draft as it is unfinished and the URL will change upon publication. If you would like to contribute and are familiar with the requirements of a Signpost article, feel free to be bold in making improvements!
Last revised 01:20, 19 January 2018 (UTC) (6 years ago) by Eddie891 (refresh) |
Recent research
(Your article's descriptive subtitle here)
Contribute — Share this By ...A monthly overview of recent academic research about Misplaced Pages and other Wikimedia projects, also published as the Wikimedia Research Newsletter.
Determining Quality of Articles in Polish Misplaced Pages Based on Linguistic Features
This article focuses on the 1.2 million unassesed articles in the Polish Misplaced Pages, and considers "over 100 linguistic features to determine the quality of Misplaced Pages articles in Polish language." Conclusion: Use of linguistic features is valuable for automatic determination of quality of Misplaced Pages article in Polish language. Better results in terms of precision can be achieved when the whole text of article is taken into the account. Then our model shows over 93% classification precision using such features as relative number of unique nouns and verbs (unique, 3rd person, impersonal). However, if we take into account only leading section of an article, relative quantity of common words, locatives, vocatives and third person words are the most significant for determination of quality.Using the obtained quality models we asses 500 000 randomly chosen unevaluated articles from Polish Misplaced Pages. According to result, about 4-5% of assessed articles can be considered by Misplaced Pages community as high quality articles.
Briefly
Conferences and events
See the research events page on Meta-wiki for upcoming conferences and events, including submission deadlines.
Other recent publications
Other recent publications that could not be covered in time for this issue include the items listed below. contributions are always welcome for reviewing or summarizing newly published research.
- "..."
- "..."
- "..."
- "..."
- "..."
References
- Lewoniewski, Włodzimierz; Węcel, Krzysztof; Abramowicz, Witold (2018-01-03). "Determining Quality of Articles in Polish Misplaced Pages Based on Linguistic Features". doi:10.20944/preprints201801.0017.v1.
{{cite journal}}
: Cite journal requires|journal=
(help)CS1 maint: unflagged free DOI (link)
- Supplementary references:
full width content
← Previous "Recent research"In this issue5 February 2018 (all comments)Discuss this story
These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.- This paper from 2011 is a much better analysis of talk page dynamics than that Carnegie Mellon paper. 185.13.106.213 (talk) 07:01, 5 February 2018 (UTC)
- Possibly, but it is based upon a data dump from 2008. Carnegie Mellon has a whole department dedicated to WP. I've been there. Ironically, most do not edit. Barbara (WVS) ✐ ✉ 14:16, 5 February 2018 (UTC)
- @Barbara (WVS): Note how the 2011 paper isolated authority citations and opinion changes ("alignment moves") as the primary features (beyond the writing parties, their semantic assertions, etc.) of talk pages. While the CMU paper says as much in section 2 on page 1027, they proceed to focus solely on authority claims in section 5.2.3 on page 1030, along with the other features in section 5.2, but ignore the crucial instance of participants coming into agreement with others. That's a really stark omission and I am sure their analysis would have been stronger if they included it. Do you know the authors? If so, please suggest that if it makes sense to you. 213.86.87.228 (talk) 18:36, 5 February 2018 (UTC)
- @Barbara (WVS): Note how the 2011 paper isolated authority citations and opinion changes ("alignment moves") as the primary features (beyond the writing parties, their semantic assertions, etc.) of talk pages. While the CMU paper says as much in section 2 on page 1027, they proceed to focus solely on authority claims in section 5.2.3 on page 1030, along with the other features in section 5.2, but ignore the crucial instance of participants coming into agreement with others. That's a really stark omission and I am sure their analysis would have been stronger if they included it. Do you know the authors? If so, please suggest that if it makes sense to you. 213.86.87.228 (talk) 18:36, 5 February 2018 (UTC)
- Possibly, but it is based upon a data dump from 2008. Carnegie Mellon has a whole department dedicated to WP. I've been there. Ironically, most do not edit. Barbara (WVS) ✐ ✉ 14:16, 5 February 2018 (UTC)