Revision as of 14:28, 2 June 2008 editFlagSteward (talk | contribs)Extended confirmed users27,680 edits →Thoughts: Italy experience - aim for 100% coverage of a level of administration, not arbitrary population cutoffs← Previous edit | Revision as of 14:35, 2 June 2008 edit undoFritzpoll (talk | contribs)Extended confirmed users12,706 edits →Italy experience: replyNext edit → | ||
Line 131: | Line 131: | ||
# Beware misleading blue links - a check of the appropriate Categeory:Townships in... can help you catch those</b> | # Beware misleading blue links - a check of the appropriate Categeory:Townships in... can help you catch those</b> | ||
But as long as you don't go mad going down to village level, I fully support the intent, albeit quite not the letter of how the proposal is currently phrased - good luck. ] (]) 14:28, 2 June 2008 (UTC) | But as long as you don't go mad going down to village level, I fully support the intent, albeit quite not the letter of how the proposal is currently phrased - good luck. ] (]) 14:28, 2 June 2008 (UTC) | ||
:Thanks for some of the tips. Using census data should help achieve many of the aims you specify. Maplandia will not be used in this new proposal. Redirects are possible if the WikiProject desires them. Tagging the talk page is also no problem. Misleading blue links will be picked up by the last stage, where the uploaded lists are checked. Thank you for your thoughtful response - I hope some consensus on what depth to aim for can be achieved ] (]) 14:35, 2 June 2008 (UTC) |
Revision as of 14:35, 2 June 2008
Shortcut- Moved to subpage by User:Anonymous Dissident from Misplaced Pages:Village pump (proposals).
- Subpage renamed for accuracy and neutral point of view by User:Geometry guy and User:Maxim.
- New proposal formulated - old discussion archived to talk page.
Alternative: a new proposal
This entire page was 12 hours old before I even knew it existed, by which time there were misunderstandings, and raging arguments taking place. I have read what has been said, and believe there is net support for at least the principle of evening up (to a greater or lesser extent) the geographical coverage of Misplaced Pages. However, there are many legitimate concerns, and I have taken these on board and now present an amended proposal for the community's consideration
Proposal
The executive summary of my proposal is this: bot automation driven by WikiProjects, operating within community-defined guidelines.
Here is the meat of it.
1. A new WikiProject is created to coordinate the activities of the bot. This allows for a central group of volunteers to assist with the generic tasks involved in making this project work, and gives a centralised palce for questions to be asked, and new proposals and requests to be made
2. Before beginning work on a new country, the relevant WikiProjects are contacted. These will include the country WikiProjects, continental WikiProjects, and subject-based WikiProjects. We will seek some volunteers - if no, or insufficient volunteers for a country can be obtained, we ignore this country for the time being.
3. Together with the WikiProjects, a collection of sources will be obtained. The default will be an amalgamation of the US GNS data and the census data of various countries obtainable from the following list of resources http://www.census.gov/main/www/stat_int.html. If census data is not available, unreliable, or imcomplete, work on the country will be suspended until it is, or until other, reliable sources can be found to add this kind of data.
4a. Once source collection has occurred, the bot will be tuned to output lists similar in format to those already being created, but with the addition of population data, and hopefully other elements such as elevation data etc. The output will be seperated into subpages, with a subpage devoted to those places the bot is unable to reconclile between databases.
4b. The bot will not upload any data for places where the census data indicates that the dwelling is too small, with size to be determined here by community consensus (not voting!). More on this below.
5. Data will be checked, as per the old proposal, by human editors to ensure correct spellings, check for disambiguation, etc. In the case where the bot cannot automatically reconcile data from the existing sources, human editors must add a reference to their corrections to indicate how they reconciled the data (looking at an atlas, for instance). Most data reconciliation failures are likely to be failures to correlate census data to coordinate data. These references will ultimately also be uploaded by the bot
6. Once the project agrees that the data has been checked, and is ready for upload, relevant parties (such as New Page Patrol) will be given notice of an upload - I propose 30 mins notice - and the bot will automatically create the articles according to a template agreed with the WikiProjects. The articles will include all the above data, and all the references to it.
7. The bot will watchlist the articles to prevent flooding Special:UnwatchedPages and create a list of articles it created - this list will be posted to allow the WikiProject volunteers to watch the pages that they helped to create.
8. When they are first determined, the relevant notability policy will state specifically the initial minimum standards for "inherent notability" of villages, including global standards and any national exceptions. The initial specifics may be more narrow (such as minimum three or four independent reliable sources, and minimum 50% of population of capital city as determined by a specified benchmark source); over time the minima can be ratcheted down to broaden them slowly until the community and WikiProjects indicate when to stop. The bot's new articles will always observe the current notability standard strictly. Added by JJB 14:13, 2 June 2008 (UTC)
What use is this?
The advantages to the above are that, although a little slower, we end up with more than one-line stubs, and because countries can be worked on in parallel by multiple WikiProjects on their own subpage within a seperate WikiProject (the new one proposed above), the speed factor is also maintained. Thus there is an increase in quality with a minor cost in speed compared to the old proposals. By involving the WikiProjects in the way described, we ensure that there is sufficient interest in the articles, we obtain new and useful sources, and we ensure that there is someone to watchlist the pages afterwards.
The difference, therefore, between this and the old proposal is the increase in quality, and breadth of sourcing. These will not be single-sourced articles, and we will be able to devote our time to finding new and reliable sources of data. The WikiProjects also end up with a series of extra articles that they wanted, in the format they wanted. An example of how the project has already been moving in this direction is a discussion I have had with a member of WikiProject Russia, who is collating a list of sourced data in a database, and we want to help them by uploading the data when it is complete.
Other points
This proposal will probably drastically reduce the number of articles created, but I hope people will understand that this proposal by its very nature will not yield a good estimate of the number of articles created. It will be nowhere near the predictions of the first proposal, however.
I also hope the community will understand that an example is difficult to give, since I would have to first go and collect the data and sources for an entire country to create a handful of articles. This would not be in the interest of the articles in question. The rough layout of the articles created under this proposal would not be significantly different to the original - there would be an infobox, categories and text. The text would be more substantial given the additional sources, and the external links currently in the article would not exist in this new iteration.
Onwards to discussion...
I believe this proposal will qualm the legitimate fears of vandalism, unsure notability and low quality of stubs. The one point about the above, beyond acceptance, that needs to be considered above is point 4b). The easiest automatic criterion is size. There is no need to have a permanent, everlasting limit - a limit that can later be reviewed if it is found to be inadequate is probably best, so that we introduce articles slowly. My suggestion is that the community pick a percentage representing the lowest size of town/village to be included - the percentage would be "as a percentage of the capital city of the country". So if you picked 50%, all dwellings that had a population greater than half the population of the capital city would be included by the bot.
The reason for doing this is that it is fairer than selecting a fixed number, like 30,000, since less developed countries will not necessarily have reached the levels of urbanisation that we consider.
This proposal should satisfy most of those "on the fence" for the previous proposal, should continue to garner support from those supporting it, and may even address some of the concerns of those who opposed. But let's not make the following discussion divisive. I beg, no more straw polls, no more "voting" - let's just talk about this rationally.
Let the games begin! Fritzpoll (talk) 11:53, 2 June 2008 (UTC)
Straw Poll
- This is not a vote, even though it looks like one. However, if a significant number of people say "yes without reservations" we can move ahead faster, if a significant number say "no, it's just a bad idea," we can speedy-cancel it.
I like it as it stands
- Sign here if you like the proposal as it stands.
- In before the zot! JJB 14:13, 2 June 2008 (UTC)
I like it with reservations
- Sign here, and put your reservations in the discussion sections below.
- davidwr/(talk)/(contribs)/(e-mail) 14:03, 2 June 2008 (UTC)
- Colchicum (talk) 14:19, 2 June 2008 (UTC)
I have serious reservations that cause me to vote no
- Sign here, and put your reservations in the discussion sections below.
No, it's just a bad idea
- Sign here if you think the whole concept of bot-generated or bot-assisted mass-article generation for places stubs is a bad idea.
Oppose strongly: encourages proliferation of articles about non-notable places, reinforces notions of "inherent notability", just plain wrong. Kww (talk) 14:18, 2 June 2008 (UTC)
Discussion
So we just scrap the previous discussions because you didn't see it? Seems strange, at the least. I still oppose the whole plan, per the reasons I already gave. Thanks for the opportunity to voice my opinion, again. IvoShandor (talk) 13:41, 2 June 2008 (UTC)
- No, they're on the talk page - people coming to the page afresh won't see the adjusted proposals, and at well over 200K, an archive seemed appropriate Fritzpoll (talk) 13:43, 2 June 2008 (UTC)
Better. However, I still oppose the concept that an arbitrary population size makes a location notable. Locations are notable because things are there or happen there that get discussed directly and in detail by independent third-party sources. That's it. Nothing else counts at all. One liners, inclusions in lists of locations, or being in an atlas doesn't confer notability. Direct, detailed discussion in third-party sources is the only criteria that matters at all.
As for balance, I wouldn't mind if someone created a bot that searched for bot-created geographic locations that have never been edited by humans and deleted them. That would repair the error that was made when bots created innumerable articles for non-notable US locations.Kww (talk) 13:45, 2 June 2008 (UTC)
- I agree with archiving. I'm adding some more specific subsections below. --Alvestrand (talk) 13:49, 2 June 2008 (UTC)
- Kww, I think the criteria for notability is going to be set by the participating WikiProjects. The criteria for Russia may differ from the criteria for Egypt, and none, one, or both may use population as a criterion, depending on the will of the editors involved in those projects. davidwr/(talk)/(contribs)/(e-mail) 13:56, 2 June 2008 (UTC)
Size limits
I would like to see the bot operate with an absolute size limit for population, and move that limit downwards over time. As an initial number, let me toss out 10.000 - that would make sure there isn't two million entries created in the first round, and increase the consistent coverage of Misplaced Pages. --Alvestrand (talk) 13:49, 2 June 2008 (UTC)
- Assuming I read the revised proposal correctly, there is no "first round" - but rather lots of "first rounds," one for each country or other area covered by a Wikproject. The "first round" of country A may include all places no matter how small, if that is the will of the editors. Another country may have a population cut-off of 100,000 if that is the will of the relevant Wikiprojects' editors. davidwr/(talk)/(contribs)/(e-mail) 13:58, 2 June 2008 (UTC)
- I think I was suggesting that whatever the community at large decides is suitable is what the bot will do. If the community wishes to defer these decisions to the Wikiprojects, then it can. Fritzpoll (talk) 13:59, 2 June 2008 (UTC)
Relation to existing stubs
There are many place stubs with low quality, and lots of information (census data, references, maps) that FritzpollBot can do better than a human editor. I suggest that for existing places, FritzPollbot place a subpage of the form /AutoGenerated with what it would have generated, and a human editor can "pull up" the stuff that he likes. --Alvestrand (talk) 13:49, 2 June 2008 (UTC)
- Yes, although actually I'd prefer to see the bot (in a month or two, when this first part is going swimmingly) start to stuff many of the existing stubs into internal sections of its own content, after its one-line lead (rather than mess with subpages, which are widely deprecated in mainspace). It would be able to handle most oddities like additional photos or maps, the stub template itself, and other markup. E.g. if stub says only "Obscure village Y was the birthplace of famous person X" with stub template, the text becomes new section 1 with a neutral title like "Details". The bot could defer complex cases back to the projects. JJB 14:20, 2 June 2008 (UTC)
- Better, when going through the data before the article-creation run, editors could be able to signal that they would like to see an autogenerated version (created at, say, Wikiproject:FritzpollBot/Foobar/Foo, for the article named Foo in country Foobar) - these can then replace or merge into the current versions where necessary. It'd be a bit of a waste of time in my view to create thousands of repeat articles in cases where we already know we wouldn't benefit from it. Pfainuk talk 14:27, 2 June 2008 (UTC)
Global coordination
I think we should have a Wikiproject page for the bot itself, where global issues are addressed. This should link to other participating WikiProjects.
There should also be a "single place to go" to see all upcoming scheduled runs of the bot, with advanced notice of at least 24 hours for large runs and at least 30 minutes for runs larger than a handful of articles.
This page should also have an emergency-stop switch that would stop the bot and suspend all upcoming runs.
davidwr/(talk)/(contribs)/(e-mail) 14:10, 2 June 2008 (UTC)
- The WikiProject idea is point 1 of the new proposal. Advanced notice is also suggested, thought the timing is open to debate. Good idea to include a schedule within the new WikiProject page. A note to me would prevent all upcoming runs as the bot is supervised, and not fully automatic. A stop switch can be added with relative ease - a block by an admin would have the same effect. Fritzpoll (talk) 14:14, 2 June 2008 (UTC)
Thoughts
I might support this (notability has never seemed a problem to me), but I still have questions. 1) Could you please elaborate how it is possible to make sure that all the produced pages will be watchlisted by somebody with enough expertise to detect misinformation in the future? It is still not very clear. Authors normally know something about the places they write about, randomly assigned volunteers don't. 2) What is wrong with the suggestion to create lists with redirects instead of stubs? They are much easier to patrol. 3) How will it be possible to clear the backlog of orphaned stubs? And if we are going to create lists anyway (which may be useful for other reasons, but is an artificial and wrong way to clear the backlog), why do we need the stubs? Colchicum (talk) 14:16, 2 June 2008 (UTC)
- Well, because we would seek volunteers from WikiProjects associated with the articles being created and get them heavily involved in the process, they can watchlist the articles. That's one reason why, if there are no volunteers from areas of the encyclopaedia that would be interested in such articles, I don't think we should create them until such volunteers exist. Lists with redirects are perfectly acceptable, but I think from a content point of view, this could be decided by the WikiProjects - how much are they willing to maintain, etc. Certainly, there is no technical objection to this. The lists are purely to administer the bot's operation, so clearly shouldn't count towards clearing the backlog, as ultimately they may well be removed. I think discussion on a country-by-country basis will allow a decision on how to "add to the web" - I see the bot as more of a tool for people to use as they wish - sort of a builder doing the work of some architects, eahc of whom wants it sone slightly differently. I'm not sure if I answered everything - let me know if I didn't Fritzpoll (talk) 14:23, 2 June 2008 (UTC)
Italy experience
I've a similar experience to Llywrch's Ethiopia experience reported above - this year I've spent a lot of time sorting out the place articles in Italy. Which as countries go should be as good as it gets - a highly developed, top 10 economy with one of the most active Wikipedias at it.wikipedia.org, Internet availability is not an issue, massive emigration to anglophone countries and a popular holiday destination to ensure lots of English speakers are interested in it, plus in the top 5 for length of recorded history and archaeology. Everything is in your favour, it must have some of the most notable territory on the planet if you define notability by the availability of sources. And yet....
...notability for Italian settlements seems to dry up somewhere around the 2000 population mark, even on it.wiki where they have all the "home team" advantages. On a consistent basis anyway, that's just an observation of what seems to "work" out in the field. It gets pretty patchy below that, and often the smaller villages work better merged into a larger administrative area, often the most notable thing in an area doesn't fit neatly into a village in any case. There are advantages in having a broad picture in a single article rather than lots of microarticles. But for the rest of the world, I'd suggest a mental model of somewhere in the 5000-10,000 population mark as being the lower limit for consistent notability, which is nice because that typically corresponds to a unit of municipal administration - the township, commune, municipality, district or whatver. In Italy the equivalent level is the comune - the population of 60m people is spread over 8,101 comuni, an average of 7,400 people per comune, although that's distorted a bit by the likes of Rome and Milan, I guess the median is about 6000 people. That's the level of local administration that has a mayor, town hall etc, and is the level you should be aiming for - we (now, finally!) have an article for all 8101 of those Each of those comuni typically has a couple of subdivisions (frazioni, località and the like, approximating to wards or parishes) so there must be 40,000+ of those - but en.wiki only has about 200 of those, and it.wiki has about 1700 but a lot of those are either- I'd guess that long term the "right" number would be somewhere around 1000 out of 40,000+. However one of my long-term aims would be to create redirects for all 40,000-odd frazioni to the appropriate comune article.
Honestly, the comune articles just seem to "work", whereas going down to frazione level tends not to - there are real synergies in not fragmenting the information too finely, and information about the frazioni is still on Misplaced Pages, it just gets a bit more context from being in the comune article. And that's just from the perspective of Italy, which has so many advantages in proving notability from WP:RS. Perhaps the sensible thing would be to go down to township level for countries like Italy, and down to the step above (counties/provinces/the unit covering 50,000-100,000 people) for countries where WP:RS are more scanty. That way every point on earth is covered by a Misplaced Pages article, but that article has a chance of being notable in its own right. The debate is not about comprehensiveness, but about the granularity of that comprehensiveness. Obviously in almost every case (perhaps barring the likes of Nauru?) the granularity should be below "country" level, but personally I think "village" level is too fine a cut.
And I don't like using geographical entities such as "village" - it doesn't guarantee the WP:CSB comprehensive coverage of the surface of the Earth we want, that you get if you use administrative areas. Governments tend to make sure that all points of their territory are covered by an administrative division, but that's not the case if you rely on geographical points such as "villages". In particular not if you rely on Maplandia Level 3 data. To take a random example, I was in the province of Asti a few months ago. The territory of that province is covered by 118 comuni, which are listed at Category:Communes_of_the_Province_of_Asti. Compare that with Maplandia's coverage of Asti - just in the A's it lists only Asti itself out of 6 comuni beginning with A, and Aie, which I guess is the suburb of Bruno (comune population 379) that Google shows as Borgo Aie some way to the north of where Maplandia has it - even it.wiki doesn't have an article. So a hit rate of 1 in 6, in one of the most developed regions of western Europe! However at first glance their provinces at Level 2 look OK. So another reason to concentrate on getting comprehensiveness at Level 2 level, you can probably hope to get that reasonably "right", but Level 3 will need country specialists.
And I would echo Llywrch's comments about the amount of work needed to produce a "clean" list of articles - even in Italy where the government provides handy lists of comuni in Excel format, it still took a lot of work to knock the articles on it.wiki and en.wiki into shape. Despite all the work that had been done on Italian places, there were a few articles missing because people had seen blue links from articles on everything from Chicago supermarkets to Latvian goddesses. And that's before you get onto the typos in the official list, variations in accent use, renamings, a morass of local dialect names for places, bilingual names, and of course WP:COMMONNAMEs in English. The data needs a LOT of cleaning to get it "right", even when it's from an authoritative government source with no problems about transliteration from a non-Roman alphabet.
So has someone who's done 1000's of edits to "place" articles in a country where WP:RS are as abundant as anywhere, I would beg you to :
- Use administrative divisions - the only way to ensure comprehensive coverage of the Earth's surface
- Using arbitrary "places" or "villages" doesn't help WP:CSB, as they aren't comprehensive
- Don't use arbitrary population limits, aim to get 100% coverage of a single administrative level.
- WP:RS may mean that "Level 2" divisions are what you should aim for with comprehensive coverage on Misplaced Pages - counties/provinces or similar
- Even in Western Europe, you should only go down to township/commune level for comprehensive coverage
- Don't use Maplandia Level 3 - the data sucks
- Articles at the parish/ward level only "work" for a relatively small percentage of places, and the township article is improved by having the parishes all in the same place
- Be great if you can make redirects of the parishes
- Please tag the Talk page with an appropriate country Project - much easier to have that done at the start.
- Beware misleading blue links - a check of the appropriate Categeory:Townships in... can help you catch those
But as long as you don't go mad going down to village level, I fully support the intent, albeit quite not the letter of how the proposal is currently phrased - good luck. FlagSteward (talk) 14:28, 2 June 2008 (UTC)
- Thanks for some of the tips. Using census data should help achieve many of the aims you specify. Maplandia will not be used in this new proposal. Redirects are possible if the WikiProject desires them. Tagging the talk page is also no problem. Misleading blue links will be picked up by the last stage, where the uploaded lists are checked. Thank you for your thoughtful response - I hope some consensus on what depth to aim for can be achieved Fritzpoll (talk) 14:35, 2 June 2008 (UTC)