Revision as of 11:39, 3 June 2008 editFritzpoll (talk | contribs)Extended confirmed users12,706 edits →Precedent: r← Previous edit | Revision as of 11:46, 3 June 2008 edit undoFritzpoll (talk | contribs)Extended confirmed users12,706 edits →Proper sourcing, no spam: r to WASNext edit → | ||
Line 412: | Line 412: | ||
Also, are you going to clean up the example articles that you have already created that violated this "Proper sourcing, no spam" standard? Or perhaps someone already did? Could links to those articles be placed here for accountability and verification? ] (]) 17:28, 2 June 2008 (UTC) | Also, are you going to clean up the example articles that you have already created that violated this "Proper sourcing, no spam" standard? Or perhaps someone already did? Could links to those articles be placed here for accountability and verification? ] (]) 17:28, 2 June 2008 (UTC) | ||
:The first job of the new WikiProject will be to clean up the existing articles, adjust the source to point at a more precise location, and remove the Encarta links that I know you feel strongly about. I am personally a little tied up with this page and the others being spawned, but I will see if Blofeld et al. can fix these now. Alternatively, you can remove the Encarta links yourself! But seriously, this will be done in due course. Under the new proposal, there would be more than one source, so sourcing would be even better. Best wishes ] (]) 11:46, 3 June 2008 (UTC) | |||
==Point 8: Clarification?== | ==Point 8: Clarification?== |
Revision as of 11:46, 3 June 2008
Shortcut- New proposal formulated - old discussion archived to talk page.
Alternative: a new proposal
This entire page was 12 hours old before I even knew it existed, by which time there were misunderstandings, and raging arguments taking place. I have read what has been said, and believe there is net support for at least the principle of evening up (to a greater or lesser extent) the geographical coverage of Misplaced Pages. However, there are many legitimate concerns, and I have taken these on board and now present an amended proposal for the community's consideration
Proposal
The executive summary of my proposal is this: bot automation driven by WikiProjects, operating within community-defined guidelines.
Here is the meat of it.
1. A new WikiProject is created to coordinate the activities of the bot. This allows for a central group of volunteers to assist with the generic tasks involved in making this project work, and gives a centralised palce for questions to be asked, and new proposals and requests to be made
2. Before beginning work on a new country, the relevant WikiProjects are contacted. These will include the country WikiProjects, continental WikiProjects, and subject-based WikiProjects. We will seek some volunteers - if no, or insufficient volunteers for a country can be obtained, we ignore this country for the time being.
3. Together with the WikiProjects, a collection of sources will be obtained. The default will be an amalgamation of the US GNS data and the census data of various countries obtainable from the following list of resources http://www.census.gov/main/www/stat_int.html. If census data is not available, unreliable, or imcomplete, work on the country will be suspended until it is, or until other, reliable sources can be found to add this kind of data.
4a. Once source collection has occurred, the bot will be tuned to output lists similar in format to those already being created, but with the addition of population data, and hopefully other elements such as elevation data etc. The output will be seperated into subpages, with a subpage devoted to those places the bot is unable to reconclile between databases.
4b. The bot will not upload any data for places where the census data indicates that the dwelling is too small, with size to be determined here by community consensus (not voting!). More on this below.
5. Data will be checked, as per the old proposal, by human editors to ensure correct spellings, check for disambiguation, etc. In the case where the bot cannot automatically reconcile data from the existing sources, human editors must add a reference to their corrections to indicate how they reconciled the data (looking at an atlas, for instance). Most data reconciliation failures are likely to be failures to correlate census data to coordinate data. These references will ultimately also be uploaded by the bot
6. Once the project agrees that the data has been checked, and is ready for upload, relevant parties (such as New Page Patrol) will be given notice of an upload - I propose 30 mins notice - and the bot will automatically create the articles according to a template agreed with the WikiProjects. The articles will include all the above data, and all the references to it.
7. The bot will watchlist the articles to prevent flooding Special:UnwatchedPages and create a list of articles it created - this list will be posted to allow the WikiProject volunteers to watch the pages that they helped to create.
8. When they are first determined, the relevant notability policy will state specifically the initial minimum standards for "inherent notability" of villages, including global standards and any national exceptions. The initial specifics may be more narrow (such as minimum three or four independent reliable sources, and minimum 50% of population of capital city as determined by a specified benchmark source); over time the minima can be ratcheted down to broaden them slowly until the community and WikiProjects indicate when to stop. The bot's new articles will always observe the current notability standard strictly. Added by JJB 14:13, 2 June 2008 (UTC)
What use is this?
The advantages to the above are that, although a little slower, we end up with more than one-line stubs, and because countries can be worked on in parallel by multiple WikiProjects on their own subpage within a seperate WikiProject (the new one proposed above), the speed factor is also maintained. Thus there is an increase in quality with a minor cost in speed compared to the old proposals. By involving the WikiProjects in the way described, we ensure that there is sufficient interest in the articles, we obtain new and useful sources, and we ensure that there is someone to watchlist the pages afterwards.
The difference, therefore, between this and the old proposal is the increase in quality, and breadth of sourcing. These will not be single-sourced articles, and we will be able to devote our time to finding new and reliable sources of data. The WikiProjects also end up with a series of extra articles that they wanted, in the format they wanted. An example of how the project has already been moving in this direction is a discussion I have had with a member of WikiProject Russia, who is collating a list of sourced data in a database, and we want to help them by uploading the data when it is complete.
Other points
This proposal will probably drastically reduce the number of articles created, but I hope people will understand that this proposal by its very nature will not yield a good estimate of the number of articles created. It will be nowhere near the predictions of the first proposal, however.
I also hope the community will understand that an example is difficult to give, since I would have to first go and collect the data and sources for an entire country to create a handful of articles. This would not be in the interest of the articles in question. The rough layout of the articles created under this proposal would not be significantly different to the original - there would be an infobox, categories and text. The text would be more substantial given the additional sources, and the external links currently in the article would not exist in this new iteration.
Onwards to discussion...
I believe this proposal will qualm the legitimate fears of vandalism, unsure notability and low quality of stubs. The one point about the above, beyond acceptance, that needs to be considered above is point 4b). The easiest automatic criterion is size. There is no need to have a permanent, everlasting limit - a limit that can later be reviewed if it is found to be inadequate is probably best, so that we introduce articles slowly. My suggestion is that the community pick a percentage representing the lowest size of town/village to be included - the percentage would be "as a percentage of the capital city of the country". So if you picked 50%, all dwellings that had a population greater than half the population of the capital city would be included by the bot.
The reason for doing this is that it is fairer than selecting a fixed number, like 30,000, since less developed countries will not necessarily have reached the levels of urbanisation that we consider.
This proposal should satisfy most of those "on the fence" for the previous proposal, should continue to garner support from those supporting it, and may even address some of the concerns of those who opposed. But let's not make the following discussion divisive. I beg, no more straw polls, no more "voting" - let's just talk about this rationally.
Let the games begin! Fritzpoll (talk) 11:53, 2 June 2008 (UTC)
Straw Poll
- This is not a vote, even though it looks like one. However, if a significant number of people say "yes without reservations" we can move ahead faster, if a significant number say "no, it's just a bad idea," we can speedy-cancel it.
I like it as it stands
- Sign here if you like the proposal as it stands.
- In before the zot! JJB 14:13, 2 June 2008 (UTC)
- Like I stated before there is nothing wrong with stubs. Britannica is mostly stubs! Zginder 2008-06-02T15:03Z (UTC)
- The stubs created by this bot are better than most stubs written by humans. I, for one, welcome our new robot overlords! Plrk (talk) 15:09, 2 June 2008 (UTC)
- Support. Even if flawed, this proposal will set things in motion, while eventual mistakes will be fixed (Misplaced Pages can do that). --Qyd (talk) 15:19, 2 June 2008 (UTC)
- Support - Would give a kick-start to certain places with deficiencies. MRM (talk) 15:28, 2 June 2008 (UTC)
- Absolutely strongest possible support. The world is not ending folks, it is merely getting added to Misplaced Pages. Uniformly, human assisted-ly, using Wiki-botcode. Good freeking grief, get over yourselves. — Preceding unsigned comment added by Keeper76 (talk • contribs)
- Support - There's enough human intervention involved here for common sense to prevail on notability where it's needed. I'd like to see the functionality I mention below included. Pfainuk talk 15:57, 2 June 2008 (UTC)
- They're already on my notepad to be implemented Fritzpoll (talk) 15:59, 2 June 2008 (UTC)
- Support. I did have reservations, but the proposal in its current form addressed them all fairly well.—Ëzhiki (Igels Hérissonovich Ïzhakoff-Amursky) • (yo?); 16:13, 2 June 2008 (UTC)
- Support. To those who identify the lack of human input as a problem with this proposal, note the following. I am a human being (or at least my cat seems to think I am.) I am personally compiling a list of the Canadian municipalities that still don't have articles, sticking only to incorporated municipalities for which a bot can easily extract objective source data from the last Canadian census and excluding any place that doesn't meet that requirement. I am personally verifying all of these redlinks to ensure that they aren't just misspelled links to articles that we do already have (and even if we do end up with some accidental duplicates, it takes what, three whole seconds to type #REDIRECT ] and hit save?) I am personally providing Fritzpoll with the relevant and valid Canadian sources. And I am personally committed to ensuring that every article gets reviewed afterward, either by myself or by another WP:CWNB colleague, to ensure that it got done correctly, gets corrected quickly if anything goes wrong, and gets expanded whenever possible with content of the type that only a human can add. So there's human oversight every step of the way — at least for Canadian municipalities, the bot simply won't have the ability to do anything that somebody from WP:CWNB, be it me or someone else, doesn't personally give Fritzpoll the go-ahead to do with it. So I don't see what the problem is. Bearcat (talk) 16:25, 2 June 2008 (UTC)
- This is exactly the kind of work ethic I would like to see enshrined in the proposal itself for all countries that would have articles created. In particular, the articles need to be assured of expansion before they get generated. Ryan Reich (talk) 16:51, 2 June 2008 (UTC)
- Support - People from other countries are more likely to make an article for their city on a wikipedia in their language. Finding and translating these would take forever. This bot would help people get these articles on the english wikipedia. Jkasd 16:38, 2 June 2008 (UTC)
- Support. I think this is a fantastic idea, though I understand and accept the concerns of those members below. KV5 • Squawk box • Fight on! 16:39, 2 June 2008 (UTC)
- yes without reservations - it will create uniformity and consistency. It also add the potential to add more editors. Kingturtle (talk) 17:10, 2 June 2008 (UTC)
- Support As I supported the first proposal and strongly believe all inhabited places are inherently notable. Davewild (talk) 17:17, 2 June 2008 (UTC)
- Strong Support - I have long wanted to see just such a project and I'm very glad that people are stepping forward to do it. Two comments. First, my observation is that people are *far* more likely to expand a stub than to start an article. Yes, all these articles would probably be created eventually, but I suspect the hand-created ones would usually lack the type of info this bot initially adds. I think the reluctance to start an article compared to adding to a stub is even stronger when the person isn't a native speaker. Any grammatical/spelling errors in the added info from non-native speakers can be fixed by people who know the language but not the subject matter. Secondly, as far as vandalism goes, I don't think it will be a real problem. What is the difference between an article that gets viewed 1000 times/minute and vandalism gets fixed after 5 minutes, compared with an article that gets viewed 1000 times/month and gets fixed after 5 months? The nature of fixing vandalism is that X% of the readers will notice the error, and Y% of those X people will fix it. I suspect that vandalism will be easier to detect in these stub articles and if vandals are using the "random page" function to find articles to vandalize, this would turn out to be a plus. Yeah, not enough of a plus to justify creating stub articles alone, but I just don't see the vandalism problem on wikipedia changing much because of these stubs. Wrs1864 (talk) 17:21, 2 June 2008 (UTC)
- First, just because people are more likely to expand a stub than start an article doesn't mean that all the place stubs that would be created will be likely to get expanded. Speculation of future notability is not supported by the notability policy, and if the stubs are created merely with geographic data, they will not initially claim any notability. Second, your attitude towards vandalism is a little cavalier: in the case of a mass-produced set of an unknown number of articles, perhaps hundreds of thousands, X (as in X%) would be extremely, unusually small. Yet the vandals would not necessarily be as arbitrary as the police: directed misinformation campaigns, based on some kind of political rivalry, racism, or whatever, could directly target locations that might take months to be noticed. This is not something to be brushed off. The only way to combat this is to ensure that human editors are personally invested in the articles created and the notable information placed there. Please comment on my proposal for this below. Ryan Reich (talk) 18:10, 2 June 2008 (UTC)
- Support seems almost too conservative, but it will be a good start. EJF (talk) 17:25, 2 June 2008 (UTC)
- Support Agree with EJF--I would have supported even something more radical. This will be a fantastic project. Mangostar (talk) 17:56, 2 June 2008 (UTC)
- Strongest support Yes, this is going to be a great project, and would, to some extent, reduce the systemic bias in Misplaced Pages.--Dwaipayan (talk) 17:59, 2 June 2008 (UTC)
- Support Even in this revamped and weakened version of the proposal, it is still something that should be done. I can find no policy issues involved since any published atlas meets the requirements as a secondary source as defined in WP:PSTS. The ues of multiple data sets meets the requirements of WP:RS. I find fault with many of the arguments that seem to boil down to WP:NOEFFORT or WP:WHOCARES. I believe that this project reduces significant barriers to getting notable, encyclopedic information into article space because new editors, or non-native english speaking editors are more likely to edit an existing article than start a new article. I also believe that a consistant starting point for these articles, with a common format, layout, title convention, and proper category tagging will go a long way toward improving the overall quality of the entire encyclopedia. Jim Miller (talk) 18:13, 2 June 2008 (UTC)
- In my reading of the footnotes in WP:PSTS, an atlas is definitely not a secondary source, as it does not provide any kind of analysis or interpretation. It is simply a collection of facts. It also satisfies the criteria of the University of Nevada (Reno) reference as being a collection of "raw research data" (though it is not an original document). Further information that tends to appear in an atlas, such as cultural data, is a secondary source, but the mere geographic facts are not. I think that placing these contents of atlases in the category of secondary sources is contrary to the intent of the term as providing a document of human interest in the facts, and that allowing articles based purely on rote information would diminish Misplaced Pages to the status of a mere repository of information. However, see my proposal for how we can implement this project in a way that does justice to the encyclopedia and to the places, and also encourages participation by new editors. Ryan Reich (talk) 18:29, 2 June 2008 (UTC)
- Then we will disagree. An atlas is not a mere listing of facts, in this case listings of demarcations determined by surveyors to determine boundries. I would argue that cartography is inherently analytical, and that atlas publishers undergo a defined editorial process of their publications. I have read your other proposal and don't find any merit in your insistance that a human generated stub is any more notable than a bot generated stub. Further, we should not be setting any precedent that would define stricter standards for creating articles than those standards used to justify of deleting articles. Any stub that would survive the already low deletion criteria of AfD should not be prevented from creation. Jim Miller (talk) 00:01, 3 June 2008 (UTC)
- An atlas may not, as a whole, be a mere listing of facts, but if all you are using from it is the bare facts, then you haven't used the portion which is a secondary source. The statistics themselves are merely a primary source; I suppose we do disagree on this point, but I think it is the letter of WP:NOT#Misplaced Pages is not an indiscriminate collection of information, and certainly the spirit, that articles replicating only cartographic data do not belong here, regardless of whether they otherwise meet the notability criteria. And if you think that my proposal would simply replace bot-generated stubs with human-generated ones, then you didn't read it. The point is to replace a stub containing no notability claims (hence barely deserving the name of stub, since it doesn't meet the inclusion criteria) with an article that is at least stub-quality and draws from sources testifying to human discourse on the subject; perhaps, from the other parts of the atlas that a bot can't possibly process. My criteria are not restrictive; I merely advocate that we adhere to the notability guidelines, no more than AfD would ask. Many articles are deleted for lack of notability claims. The specific point I'm making in this debate is that statistical data alone is not a notability claim. Ryan Reich (talk) 01:36, 3 June 2008 (UTC)
- Then we will disagree. An atlas is not a mere listing of facts, in this case listings of demarcations determined by surveyors to determine boundries. I would argue that cartography is inherently analytical, and that atlas publishers undergo a defined editorial process of their publications. I have read your other proposal and don't find any merit in your insistance that a human generated stub is any more notable than a bot generated stub. Further, we should not be setting any precedent that would define stricter standards for creating articles than those standards used to justify of deleting articles. Any stub that would survive the already low deletion criteria of AfD should not be prevented from creation. Jim Miller (talk) 00:01, 3 June 2008 (UTC)
- In my reading of the footnotes in WP:PSTS, an atlas is definitely not a secondary source, as it does not provide any kind of analysis or interpretation. It is simply a collection of facts. It also satisfies the criteria of the University of Nevada (Reno) reference as being a collection of "raw research data" (though it is not an original document). Further information that tends to appear in an atlas, such as cultural data, is a secondary source, but the mere geographic facts are not. I think that placing these contents of atlases in the category of secondary sources is contrary to the intent of the term as providing a document of human interest in the facts, and that allowing articles based purely on rote information would diminish Misplaced Pages to the status of a mere repository of information. However, see my proposal for how we can implement this project in a way that does justice to the encyclopedia and to the places, and also encourages participation by new editors. Ryan Reich (talk) 18:29, 2 June 2008 (UTC)
- Support This is a well-thought out and useful proposal. Tuf-Kat (talk) 18:25, 2 June 2008 (UTC)
- Support, it sounds a fine idea and I expect the creation of these stubs to act as a catalyst to increase input from existing editors with knowledge of the country, as well as helping recruit new editors. This should help a great deal with the serious problem that is our US/European bias. Tim Vickers (talk) 18:41, 2 June 2008 (UTC)
- Support - Why wouldn't I support expanding Misplaced Pages's coverage of notable topics? These stubs will be a great asset. Okiefromokla 19:00, 2 June 2008 (UTC)
- Support - Why not? Stubs can always be expanded, and this paves the way for interested writers. Der Wohltemperierte Fuchs 19:01, 2 June 2008 (UTC)
- Support. Although I think this proposal is a little restrictive/conservative and is too bureaucratic. IMO, this bot should operate like any other upload bot. Certainly, groups of Wikipedians going through its uploads is useful, but this shouldn't be used in policy to limit the activity of the bot before-the-fact. Ultimately, we need articles on all designated geographic locations regardless of population size (rather meaningless, IMO - many geographic locations are unpopulated but significant). The purpose of using sources such as US GNS is that significance is inherent in geographic data from these. Every action on Misplaced Pages is reversible and I trust the bot maintainer to be responsible. --Oldak Quill 19:16, 2 June 2008 (UTC)
- Support, and I supported the original proposal.-gadfium 19:18, 2 June 2008 (UTC)
- Support if Fritzpoll implements all minor improvements he promised (see his reply to concerns below).Biophys (talk) 19:45, 2 June 2008 (UTC)
- Support. I like cheese. JKBrooks85 (talk) 19:48, 2 June 2008 (UTC)
- Support. It is about time, I think this bot is long over due. -theoneintraining (talk) 20:03, 2 June 2008 (UTC)
- Support without reservations. The articles would be consistent and uniform to start with and would give a good starting point for other editors who wish to expand on those articles. Places are already notable. -Pparazorback (talk) 20:03, 2 June 2008 (UTC)
- Support, as before... agree with the above comments about it being a good place to start. Alex Muller 20:23, 2 June 2008 (UTC)
- Support - I actually have some reservations (I'm sure you have read at least some of them), but per the proposal, we can work these things out within the WikiProject. I support the general idea of the project. Good luck! -- Ynhockey 20:26, 2 June 2008 (UTC)
- Support - I'm surprised this bot wasn't created long ago... and I think Keeper76 said it well. jj137 (talk) 20:30, 2 June 2008 (UTC)
- Support per my previous comments on the talk page. El Greco 21:27, 2 June 2008 (UTC)
- Support. I was strongly in opposition to the original proposal, but this new proposal resolves those concerns to a sufficient degree that I support it. By using multiple types of information from several databases and by delegating much of the work to human-assisted work from WikiProjects, this looks to be an excellent proposal. Another reason why I support is this: Blindly creating two million stubs is a process that is unlikely to adapt, hard to control, and has a huge potential to do damage; this new proposal could be customized and adapted quite easily, and thus is unlikely to do harm. The key here is adaptability, because there will always be unforeseen challenges with any proposal of this scale. Pyrospirit (talk · contribs) 21:35, 2 June 2008 (UTC)
- Support, for the most part this new proposal resolves my concerns in that it establishes more than bare coordinates as sources. I'm less concerned about lower population limits; sometimes such stubs should be created for completeness.--Prosfilaes (talk) 21:38, 2 June 2008 (UTC)
- Support, although: would it not be possible to hide these places from "random article" until someone else than the bot has edited them at least once?--Aqwis (talk – contributions) 21:51, 2 June 2008 (UTC)
- Support; I think the new proposal takes the original idea and gives it a higher quality. I think this project will do a good job of creating articles about locations that wouldn't otherwise get articles, because the people who live there may not be part of Misplaced Pages/even have access to it. As these places are often underrepresented, I think this project will do a very good job of filling out the ranks of Misplaced Pages's article on places. -- Natalya 23:44, 2 June 2008 (UTC)
- Support. As per previous discussion all my thoughts still stand. While this modfied version will reduce the speed of article creation, that was never a goal (e.g. to be the fastest bot in the world or whatever) it will hopefully lead to increase in quality. Calaka (talk) 01:32, 3 June 2008 (UTC)
- Support. This proposal is awesome. Wrad (talk) 01:34, 3 June 2008 (UTC)
- Support As I have noted before. The U.S. has articles on every little place that were bot generated (not saying every article; just saying a similar bot ran a similar task on a smaller scale without all of the fuss). The same respect should be given to the rest of the world in some sort of fashion. §hep • ¡Talk to me! 02:12, 3 June 2008 (UTC)
- Support as per previous straw poll. Eyes will have to be kept after it, but I believe it will benefit the project overall. — Huntster (t • @ • c) 02:25, 3 June 2008 (UTC)
- Support A great way for Misplaced Pages to improve its coverage of the world's towns and cities. Captain panda 03:00, 3 June 2008 (UTC)
- Support. PrinceOfCanada (talk) 03:05, 3 June 2008 (UTC)
- Support As you say, start with large towns, cities, and see how it goes (I foresee up to a million pages within a year or two is within reason). I think this is a great effort, as mentioned it helps with infobox standardization enormously. I am for a more or less fixed population limit. (perhaps established cities, towns, 1000+ or 5000+; I assume that is still a large quantity of articles.) Danski14 03:07, 3 June 2008 (UTC)
- Support The original proposal was flawed but this one is much better, and it will be of a greate help to many countries --TheJosh (talk) 03:19, 3 June 2008 (UTC)
- Strong support Informative, accurate and sourced articles should be welcomed with open arms. Predictions that these articles will somehow attract the vandalism boogyman, nationalist edit wars, wikidust, and outright misinformation apply no more than they do with any other article. Perhaps even less so. Wikiprojects will be organized and mobilized, and this will produce high quality stubs. --NickPenguin(contribs) 03:25, 3 June 2008 (UTC)
- Strong support per my comment before -- penubag (talk) 04:19, 3 June 2008 (UTC)
Support Fritzpoll has met some of the concerns I raised in my earlier edit. I have compiled a collection of information on several hundred Ethiopian settlements that I would like to share with Fritzpoll so the bot could add those articles; an example of the minimum the bot would create with this data is Softu. (And hopefully the bot can update all of the Ethiopian articles when the results of the 2007 Ethiopian census is published.) -- llywrch (talk) 04:27, 3 June 2008 (UTC)
- Support per my archived comments.. where ever they went to...-- Ned Scott 06:22, 3 June 2008 (UTC)
- Support As per archived comments. Lugnuts (talk) 07:16, 3 June 2008 (UTC)
I like it with reservations
- Add your reservations to the discussion sections below, and add a brief signed comment here, linking to the relevant section(s) if you wish.
- davidwr/(talk)/(contribs)/(e-mail) 14:03, 2 June 2008 (UTC)
- Colchicum (talk) 14:19, 2 June 2008 (UTC)
- Ryan Reich (talk) 16:25, 2 June 2008 (UTC) (See my comments at #Inherent notability)
- LouScheffer (talk) 16:26, 2 June 2008 (UTC) - I like it, but percent of capital city is not the right metric
- Badagnani - place name redlinks (of all sizes of settlement) need to be turned blue; the proposal is good but all places (including Chinese counties, Italian comuni, U.S. census-designated places, etc.) should have articles. But this should be done in a methodical way, starting with the actual higher-level administrative division articles (such as List of administrative divisions of Shanxi) and making articles (bluelinks) of all the redlinks. Keep in mind that distinctions between cities, counties, districts, comuni, etc. can be complex. For the Districts of Vietnam, I believe User:Blnguyen used a bot to create stubs for all the nation's districts, which I think took about 2 weeks, including re-checking and dabbing (an example is Ham Thuan Bac). Badagnani (talk) 02:00, 3 June 2008 (UTC)
- Pseudomonas(talk) 16:59, 2 June 2008 (UTC)
- Support with reservations per comments based on extensive experience by Llywrch (Ethiopian stub creation) and FlagSteward (Italian stub creation). --A. B. 19:10, 2 June 2008 (UTC)
- Though I love the general idea... the metric suggested is absurd. Half the size of the capital? That's a joke! Let's look at some examples:
- France: the ONLY city listed would be Paris.
- Brazil: Eleven cities would be listed, solely because São Paulo is not the capital city--instead, the much smaller Brasilia is. (If it were, Brazil would only have two city articles: São Paulo and Rio de Janiero.)
- Russia: the ONLY city listed would be Moscow.
- USA: Sixty-one different cities would be listed, as Washington, D.C. is relatively small for a country's capital city.
- So, Stockton, California would get an article; Saint Petersburg would not. I love the idea, but this metric is ridiculous. Let's just pick a number--I'd say a population over 1,000 should suffice--and be done with it. But what a great idea for a bot in general. Matt Yeager ♫ (Talk?) 22:28, 2 June 2008 (UTC)
- The absurdness mainly comes from the "Half" and "Capital": 5-10 percent of largest city gives better results. However, I agree with Matt and others, that the metric is not good: cutoffs are essential if percentages are used. See WP:Village pump (proposals)/FritzpollBot#Size limits for more details. Other than that, the proposal addresses, at least in principle, all of my earlier reservations. Geometry guy 22:37, 2 June 2008 (UTC)
- Per Matt Yeager's excellent points. Relations to capitals aren't the way to go here. Absolute numbers may be arbitrary, but they're probably less biased. --Bfigura 00:40, 3 June 2008 (UTC)
- My reservation: How long is it going to take to create one article, from start to finish? SYSS Mouse (talk) 02:27, 3 June 2008 (UTC)
- I like it in general. It's good to get the WikiProjects closely involved. But the capital city metric needs to be discussed further. Zagalejo^^^ 02:48, 3 June 2008 (UTC)
- After opposing, I'm now giving the bot my cautious support for three reasons. First: Fritzpoll is a trustworthy, intelligent editor, not some zealot bent on shoving 2 million micro-stubs here; he has shown himself willing to listen to the many suggestions that are sure to come about from such a large undertaking. Second: while the bot alone cannot judge notability, this will be a deliberative, interactive process, done in close consultation on a country-by-country basis. Some countries with more complex administrative systems, or which are already sufficiently covered here, might be skipped entirely, while the bot would be very useful indeed for the 36,780 Communes of France. Third: if something goes awry, the process can be stopped or even reversed. At bottom it's a matter of trust, and I am confident the bot will be used wisely, so I endorse its implementation, at least on an experimental basis. Biruitorul 06:23, 3 June 2008 (UTC)
- Support, but the metric used should be administrative divisions, as Badagnani suggested above. Doing it based on population doesn't make sense to me, even if done on a scaled basis. Population bears no relation to notability, in my mind. You can get some highly populated districts where there is no reliable written source in any language, and then some which are well-reported on yet quite small in population.--Aervanath lives in the Orphanage 09:18, 3 June 2008 (UTC)
- Support, but the 50% idea is rather limiting. Many countries have had an unproportional urbanisation over the last few decades. Take some of the established countries: Germany would only have Berlin and Hamburg. Britain will have a similar situation. Agathoclea (talk) 10:58, 3 June 2008 (UTC)
- Comment to all' - yes, the suggested metric is a bit daft. The 50% was plucked out as an example, and I just wanted to show that I was open to the idea of some metric. Let the discussion lower down the page determine what criteria need to be met, and let's look past my ill thought-through suggestion relating to percentages of capital cities :) (hides in shame and ignomy) Fritzpoll (talk) 11:22, 3 June 2008 (UTC)
I have serious reservations that cause me to vote no
- Sign here, and put your reservations in the discussion sections below.
Serious reservations -- I don't see how you can expect us to support your proposal without some concrete examples of the bot's output. The samples I saw earlier were pretty bad -- most of them should probably be deleted. Pete Tillman (talk) 22:39, 2 June 2008 (UTC)
- The problem here is that the new proposal means getting a WikiProject involves to gather sources, then me adjusting the bot to fuse the sources, outputting the lists, having the lists checked and then using the amended lists to create the articles. This is, necessarily, a long process and one where I will have trouble getting people to engage if it is on a very speculative basis of this "might" happen. I do appreciate your concerns, but I hope that what you disliked most about the earlier samples was their lack of content and not the layout of the articles, etc. In a sense, the bot has proved itself technically capable of creating articles with the information we give it. The problem was, from the earlier discussion, that it didn't have much information to go on. By involving human editors, it will have more information, and output the data in a format similar to the one created by the previous version of the bot - that is, text, infobox, sources. The only differences will be the extra information, extra sources, and no external links. On a separate note, I would think that the bot's new WikiProject would probably first off add a task of cleaning those articles up to a higher standard per the discussions here and elsewhere. If you have any concerns that I haven't addressed, please ask Fritzpoll (talk) 11:30, 3 June 2008 (UTC)
No, it's just a bad idea
- Sign here if you think the whole concept of bot-generated or bot-assisted mass-article generation for places stubs is a bad idea.
- Oppose strongly: encourages proliferation of articles about non-notable places, reinforces notions of "inherent notability", just plain wrong. Kww (talk) 14:18, 2 June 2008 (UTC)
- There are no "non-notable places". Any geographic place is notable; the difference is in what sources can or can't be brought to bear on an article. Bearcat (talk) 15:43, 2 June 2008 (UTC)
- Please read my thoughts at #Inherent notability. I want to build a consensus that this idea is wrong and unnecessary. Ryan Reich (talk) 16:54, 2 June 2008 (UTC)
- Bearcat, I absolutely and totally disagree. Boven Bolivia is an excellent example of a "non-notable" place: an old farmhouse that even its neighbors don't know by that name. Places are notable if, and only if, they have been covered directly and in detail by multiple reliable sources. Single specks on a map or single lines in an atlas don't cut it, and no wikiproject can come up with rules that shortcut it.Kww (talk) 18:52, 2 June 2008 (UTC)
- Being 'covered directly and in detail by multiple reliable sources' is verifiability, not notability. Notability is a subjective extension of verifiability. What is considered "notable" or not changes between each person. The article Boven Bolivia is potentially useful to those travelling in, mapping or writing about Bonaire. No one is able to conceive the potential uses of any one article, but this is what notability claims to do. Verifiability is the only concrete policy of inclusion we've got to stick to and since this bot necessarily uses verifiable sources, verifiability is not an issue with the uploads of this bot.--Oldak Quill 19:32, 2 June 2008 (UTC)
- Your position is troublingly broad. You appear to be claiming that everything is notable because it is notable to someone. This is emphatically not the standard we have ever used on Misplaced Pages, nor the one that is adopted by WP:Notability. That policy specifically restricts notability to the contents of secondary sources, those which transcend mere verifiability to achieve some measure of importance by way of having been discussed by people. This is not a very restrictive principle, yet you insist that it be weakened to the point of uselessness. Taking "verifiability" as our only inclusion criterion would lead to Misplaced Pages becoming a repository of indiscriminate facts, an outcome which is also specifically forbidden by policy (it is also not a directory). I do not ask that every new article be gripping; all I want is to be assured that it is relevant. Ryan Reich (talk) 19:59, 2 June 2008 (UTC)
- No, I reject "notability" (the concept) because it changes from person to person. The foundation of WP:N is verifiability and its inclusion in other sources. I disagree with "significance" being brought into inclusion because significance changes between context, time and place. The reason Misplaced Pages doesn't turn into a 'repository of indiscriminate facts' isn't due to our article inclusion policy (or any other single policy) but due to a number of policies relating to what articles should be and how they should be written. Rejecting article inclusion policy based upon "notability" is entirely independent of whether Misplaced Pages is a repository or an encyclopedia and whether the facts contained in articles are indiscriminate or not. --Oldak Quill 20:24, 2 June 2008 (UTC)
- Don't confuse the Misplaced Pages jargon "notable" with the English word of the same spelling. The English word means "something that I care about"; here, I am arguing only that we must follow the prescriptions of WP:N. You should read, very carefully, WP:N#General notability guideline, in particular the clarification of "presumed" concerning what Misplaced Pages is not. Certainly, notability requires verifiability (of the notability claim: that's in the clarification of "reliable"), but it is more than that. It also requires more than passing factual mention (see the clarification of "significant coverage"). I have already argued that an atlas, at least for the purposes to which FritzpollBot will put it, is not a source (as clarified in "sources"). Indeed, other than verifiability, the only part of the standard satisfied by the stubs that would be created is "independence of the source". Finally, you cannot reject the concept of "notability" and still keep Misplaced Pages. The concept is the core of what we are; it's as basic to the encyclopedia as the US Constitution is to the United States. Ryan Reich (talk) 20:41, 2 June 2008 (UTC)
- Thank you for your considered reply. Notability has been reined-in from a catch-all term for anything considered not-significant to something more concrete and measurable. The policy WP:N as-it-stands is fairly objective but still contains many variables that are open to interpretation and the policy is thrown around a lot in matters of people not considering a subject significant or coverage-worthy. The defining feature of WP:N (compared to WP:V) is depth of coverage and independent sources. Ignoring independent sources here, "depth of coverage" is one such variable in the policy that is open to interpretation. WP:N defines "significant coverage" as "sources address the subject directly in detail, and no original research is needed to extract the content." The policy does not define what it means by "in detail" and this definition of "significant coverage" does not preclude the use of geography databases. The datasources that this bot intends to use give the subjects more than a passing mention and provide enough information to compile a reasonable stub article which concisely overviews the subject (population, location, &c.). Just to your last point: I would suggest that Misplaced Pages is based on no single policy and certainly it is not based on an article inclusion policy. Starting from WP:Five pillars (just one place to start), I have no need to agree with or use WP:N to edit Misplaced Pages. WP:N has redundancy in several other policies and those parts which aren't redundant are subjective and undefined. --Oldak Quill 21:17, 2 June 2008 (UTC)
- Don't confuse the Misplaced Pages jargon "notable" with the English word of the same spelling. The English word means "something that I care about"; here, I am arguing only that we must follow the prescriptions of WP:N. You should read, very carefully, WP:N#General notability guideline, in particular the clarification of "presumed" concerning what Misplaced Pages is not. Certainly, notability requires verifiability (of the notability claim: that's in the clarification of "reliable"), but it is more than that. It also requires more than passing factual mention (see the clarification of "significant coverage"). I have already argued that an atlas, at least for the purposes to which FritzpollBot will put it, is not a source (as clarified in "sources"). Indeed, other than verifiability, the only part of the standard satisfied by the stubs that would be created is "independence of the source". Finally, you cannot reject the concept of "notability" and still keep Misplaced Pages. The concept is the core of what we are; it's as basic to the encyclopedia as the US Constitution is to the United States. Ryan Reich (talk) 20:41, 2 June 2008 (UTC)
- No, I reject "notability" (the concept) because it changes from person to person. The foundation of WP:N is verifiability and its inclusion in other sources. I disagree with "significance" being brought into inclusion because significance changes between context, time and place. The reason Misplaced Pages doesn't turn into a 'repository of indiscriminate facts' isn't due to our article inclusion policy (or any other single policy) but due to a number of policies relating to what articles should be and how they should be written. Rejecting article inclusion policy based upon "notability" is entirely independent of whether Misplaced Pages is a repository or an encyclopedia and whether the facts contained in articles are indiscriminate or not. --Oldak Quill 20:24, 2 June 2008 (UTC)
- Your position is troublingly broad. You appear to be claiming that everything is notable because it is notable to someone. This is emphatically not the standard we have ever used on Misplaced Pages, nor the one that is adopted by WP:Notability. That policy specifically restricts notability to the contents of secondary sources, those which transcend mere verifiability to achieve some measure of importance by way of having been discussed by people. This is not a very restrictive principle, yet you insist that it be weakened to the point of uselessness. Taking "verifiability" as our only inclusion criterion would lead to Misplaced Pages becoming a repository of indiscriminate facts, an outcome which is also specifically forbidden by policy (it is also not a directory). I do not ask that every new article be gripping; all I want is to be assured that it is relevant. Ryan Reich (talk) 19:59, 2 June 2008 (UTC)
- Being 'covered directly and in detail by multiple reliable sources' is verifiability, not notability. Notability is a subjective extension of verifiability. What is considered "notable" or not changes between each person. The article Boven Bolivia is potentially useful to those travelling in, mapping or writing about Bonaire. No one is able to conceive the potential uses of any one article, but this is what notability claims to do. Verifiability is the only concrete policy of inclusion we've got to stick to and since this bot necessarily uses verifiable sources, verifiability is not an issue with the uploads of this bot.--Oldak Quill 19:32, 2 June 2008 (UTC)
- Bearcat, I absolutely and totally disagree. Boven Bolivia is an excellent example of a "non-notable" place: an old farmhouse that even its neighbors don't know by that name. Places are notable if, and only if, they have been covered directly and in detail by multiple reliable sources. Single specks on a map or single lines in an atlas don't cut it, and no wikiproject can come up with rules that shortcut it.Kww (talk) 18:52, 2 June 2008 (UTC)
- (<-) Hopefully the notability guidelines are not too prescriptive, or else they would inevitably prosecute the innocent. However, I believe that even as they are written, widespread propogation of purely statistical geographic data into otherwise empty stubs doesn't meet these guidelines. It doesn't even meet the standard of the first pillar: Misplaced Pages is an encyclopedia. Among the things that it is not (which are said right in WP:Five pillars) are: an indiscriminate collection of information, a web directory, or a collection of source documents. An indiscriminate directory of all towns in the world, containing only definitional data, would seem to be at least two and a half of these (technically, as I've said, an atlas isn't actually a source document, but this particular kind of data is no improvement over what is found in a source document). Furthermore, WP:No original research is a policy, and its definition of primary and secondary sources clearly (see the footnotes) places raw geographic data in the category of a primary source. You can still disagree that WP:N's prescription that notability is determined by secondary sources alone is binding, but if you do, then you still run up against the WP:NOT objections. Finally, if you do disagree, then you will have to explain why you consider that this extremely broad principle you are advocating is "common sense" justifying the "occasional exception". WP:N may not be absolutely binding like a policy, but you can't use this fact as a shield. Ryan Reich (talk) 22:40, 2 June 2008 (UTC)
- You need to go reread WP:N, OldakQuill. Every phrase I used is a quote from WP:N, not WP:RS. Kww (talk) 20:15, 2 June 2008 (UTC)
- When I used the word "notability" I referred to the concept rather than the policy as-it-stands. Parts of WP:N are drawn from WP:V. Notability is not merely defined as 'covered directly and in detail by multiple reliable sources' and this is more a reflection of WP:V and WP:RS. According to WP:N, "notability" may only be indicated by coverage, but coverage isn't the same as notability. Notability is an abstract concept about the significance of article-subjects and whether they should be included in Misplaced Pages. Coverage in sources (essentially verifiability) is only an indicator of notability. There are no true measures of "notability" and what is "notable" or not ultimately comes down to the individual and what they feel is significant. --Oldak Quill 20:32, 2 June 2008 (UTC)
- You need to go reread WP:N, OldakQuill. Every phrase I used is a quote from WP:N, not WP:RS. Kww (talk) 20:15, 2 June 2008 (UTC)
- Boven Bolivia was created by a person, not a bot. A bot will only do what it's programmed to do, and with a population threshhold, it would never create a stub for Boven Bolivia. JD Lambert 19:36, 2 June 2008 (UTC)
- Oppose: this kind of thing leads to a proliferation of substubs that turn out not to have been worth clicking to. Worthwhile articles are created by intelligent humans; until an intelligent human wants to make an article, let that article continue not to exist. I'm already sick of such nonsense elsewhere. -- Hoary (talk) 14:44, 2 June 2008 (UTC)
- Oppose. I for one do believe in the inherent notability of locations -- in fact I believe it is even addressed in the notability policy -- in other words the article on Alsask, Saskatchewan has every right to exist on Misplaced Pages as the article on New York City. HOWEVER, I oppose strongly the idea of articles being created by bot. Bots are, by definition, dumb robots, so there's no way for this thing to know if an article already exists for a place, especially places that might rightly be written about as part of larger articles i.e. metro areas, rural municipalities, etc. I have no objection to a bot creating a list of articles that need to be done (I'm not sure how that would work, though), but humans should ultimately create new articles, not a bot. Plus, any attempt to create some sort of limit to notability (such as setting a minimum population) would create a dangerous notability precedent that would violate WP:NPOV. 23skidoo (talk) 14:50, 2 June 2008 (UTC)
- No problem in principle with your opposing, but I am a bit confused. The bot I describe will do everything you say you want - it uses human input at every step, and human editors decide what articles they want created. It doesn't overwrite articles (skips anything blue-linked), and let's human editors check in advance that these existing articles don't need to be disambiguated. When human editors then decide what articles they want to create, it does the grunt work for them. So, oppose if you will, but I'm a little confused about why you are opposing this particular bot. Fritzpoll (talk) 14:55, 2 June 2008 (UTC)
- In fact, WP:Notability specifically does not address "inherent notability", though it links to an essay (meaning, non-binding opinion) on the subject. Please read my comments at #Inherent notability about why I think this concept is contrary to the principles of Misplaced Pages and is not necessary to improve it. Ryan Reich (talk) 16:59, 2 June 2008 (UTC)
- Oppose Still have no clue why this is needed. Countering systemic bias is a completely weak justification, bordering on political correctness. It seems to be just common sense that if a place article is required, it will be created eventually in the same way all other articles are, there is no need to have a stub sitting there waiting for it to happen. Setting off this bot would create a dangerous precedent with people queueing up with lists of inherently notable things that have no articles. If the issue is standardisation, or helping users, then what seems to have completely passed the proposers by is the alternate approach, to support the manual method of article creation by documenting all of the above in a manual of style or instruction guide. Coding and running the bot just seems to be a massive waste of effort given the number of things that need to be done to already existing articles, given the high likelihood that most of these articles will never get touched. Anyone who blindly thinks all the stubs will get expanded eventually, clearly haven't toured the rest of the already created parts of the wiki enough. The proposal still basically has the look and feel of a 'can be done' not 'must be done' hobby project, desperately looking for a reason to exist. MickMacNee (talk) 15:07, 2 June 2008 (UTC)
- Fine to oppose for reasons relating to the bot, but please don't imply anything about my motivation for this. A group of editors who ahve created hundreds, if not thousands of articles like this by hand, and wanted them to be created, asked for a bot to do the work. I wrote one in response to this - I didn't just sit down and try to write something for a laugh and then try to force everyone to let me use it. Fritzpoll (talk) 15:12, 2 June 2008 (UTC)
- If these 'thousands of articles' are simply of the same content as the proposed bot stubs, then that was a bad idea too, and also seems to be effort looking for a justification. Get offended all you want, but its a fact that every deleted page or aborted project on wikipedia started out as a good idea in somebody's mind. Its a basic fact of wikipedia. MickMacNee (talk) 15:26, 2 June 2008 (UTC)
- Oppose, as per existing comments, If an article needs to be made, someone will make it. The generation of stubs for non-notable locations is not really a good idea - it seems that this is being done just to put the stubs there - I fail to see how this will help anything. If people have actual information on these, then this will eventually get made. The only possible thing this can do is to lower the effort required to make a page in the first place. Surely you could simply make a wikiproject where if people have information that they wish to put into wiki a separate web-page could be created where users select the item they are looking for and a pre-pressed article is returned?? This would seem more logical, at least to me. Having a bot generate two million articles that no-one might ever look at seems like a waste of time - I'm all for ignoring efficiency but I'm also all for not doing things without a really good reason - and I fail to see it here. Alternatively if the idea is to make census data available, isn't that what the census website is doing? Is there actually a need to mirror that? I would be for a bot that include this information into existing articles - no problems. User A1 (talk) 15:35, 2 June 2008 (UTC)
- It isn't going to make two million articles. It isn't going to make them automatically. It's only going to create the pages that members of WikiProjects want to have created. If you read the actual proposal, and ignore this irritaing page title (!) you'll see that this thing is essentially people-driven with a bot doing the grunt work of extracting data from sources, listing it for analysis by humans and then only adding the ones that people want to create Fritzpoll (talk) 15:39, 2 June 2008 (UTC)
- reply I read the proposal. All human editors are doing is slowing the rate at which the two million articles are generated by the introduction of a bottleneck. Given 20 volunteers at 5 min/bot generated articles each, assuming a wikipedia time of 4hrs/day/volunteer (is the bot operating in paralell or do all volunteers see the same bot upload proposals?) is 48 articles per editor/day this gives 960 total bot articles per day gives 350,400 articles per year. At that rate it is 5.7 years to 2 million articles. User A1 (talk) 15:58, 2 June 2008 (UTC)
- No, the additional restriction on notability or population size will heavily restrict the number created, as I suggest following the inclusion of census data. It will thus never reach 2 million. Probably not even 1 million. Fritzpoll (talk) 16:10, 2 June 2008 (UTC)
- reply I read the proposal. All human editors are doing is slowing the rate at which the two million articles are generated by the introduction of a bottleneck. Given 20 volunteers at 5 min/bot generated articles each, assuming a wikipedia time of 4hrs/day/volunteer (is the bot operating in paralell or do all volunteers see the same bot upload proposals?) is 48 articles per editor/day this gives 960 total bot articles per day gives 350,400 articles per year. At that rate it is 5.7 years to 2 million articles. User A1 (talk) 15:58, 2 June 2008 (UTC)
Qualified oppose- I still think it's preferable for people, rather than a bot, to go about creating these articles, which should happen on an as-needed basis, rather than automatically. However, I am relieved by the shape of the new proposal. I like the fact that if this goes through, consultation with editors working on individual countries will happen first. Perhaps the bot will even skip over certain countries if coverage thereof is judged to be complete. I only hope the promise of careful consultation will be honoured, and we will not rush into this, but do it with careful deliberation. Biruitorul 15:49, 2 June 2008 (UTC)
- Absolutely, nooone wants the articles, they don't get created. If the source data is incomplete, they don't get created. Any violation of the proposals as they stand should result in the bot being immediately blocked, and the operator (i.e. me) being sanctioned in some way. The bot is just a tool - computer code is not intelligent enough to be let loose on this kind of thing unattended. Think of this bot like a can opener: you use a can opener to open the tin, because it's difficult, though not impossible, to get inside the can without it. What you don't want is your can opener deciding what you should have for dinner. Fritzpoll (talk) 15:54, 2 June 2008 (UTC)
- Oppose - while every place is notable, the stub articles are worthless if nobody is willing to fill them with useful information Towel401 (talk) 16:06, 2 June 2008 (UTC)
- I was hoping that by including people from the WikiProjects who wanted the articles created, they would want them created for a reason - namely expansion. I guess I did mention this in my proposal above, but looking at it now, it is kinda long-winded! Fritzpoll (talk) 16:09, 2 June 2008 (UTC)
- Every place is potentially notable, and certainly has a lot of reason to be notable, but for the purposes of Misplaced Pages, that has to be determined by reliable secondary sources. The idea that places can be inherently notable is a subversion of the principles of Misplaced Pages. Since a determination of inherent notability is part of the proposal, I want to start a discussion at #Inherent notability to the effect that the idea should not and does not need to exist. Ryan Reich (talk) 17:03, 2 June 2008 (UTC)
- Oppose, although this variation of the proposal is better than the last one. I still don't like the idea of a bot writing our articles, as it takes away the need for human editors. Even with population, elevation info and such, the articles are still going to be rather stubby, and most of them will never be expanded or improved. Juliancolton 16:44, 2 June 2008 (UTC)
- For reasons I have already given. seresin ( ¡? ) 19:19, 2 June 2008 (UTC)
- Strong oppose— there are already issues with special:random; do we really want more problems? Furthermore, there would be far too many stubs, and this would affect the Five-million and Ten million pool. In short, the cons outweigh the pros, and I believe we shouldn't have an automated bot doing the work we should be doing, albeit tedious. Why not have bots write the rest of Misplaced Pages for us? --Mizu onna sango15/水女珊瑚15 19:52, 2 June 2008 (UTC)
- Could you clarify what you mean by 'issues with special:random'? If you mean too many articles in one subject appear when using this tool, I would say that this isn't an issue. Having a larger proportion of articles about one subject isn't an argument for limiting more articles being created in that subject. The effect of increasing the proportion of geography articles on Misplaced Pages will be reversed as Misplaced Pages grows and extends. Furthermore, I imagine that some subjects (geography and science) will always have more articles and will always be more linked to by Special:Random. If anything, a disproportionate number of articles about one subject is more a call to other subjects to extend their coverage to make up. Special:Random is a useful tool, but its results and how this reflects Misplaced Pages's make-up should not influence our inclusion policy. In a years time Special:Random may be able to select random articles in a particular category (or exclude particular categories) and this wouldn't be a problem. The 5M and 10M pools are essentially just fun and again shouldn't be used to affect our inclusion policy. --Oldak Quill 20:11, 2 June 2008 (UTC)
- Would you be happy if special random had a 1 in 10 chance of producing a place stub? (I actualy think the chance would be higher with the predicted numbers). MickMacNee (talk) 22:06, 2 June 2008 (UTC)
- Could you clarify what you mean by 'issues with special:random'? If you mean too many articles in one subject appear when using this tool, I would say that this isn't an issue. Having a larger proportion of articles about one subject isn't an argument for limiting more articles being created in that subject. The effect of increasing the proportion of geography articles on Misplaced Pages will be reversed as Misplaced Pages grows and extends. Furthermore, I imagine that some subjects (geography and science) will always have more articles and will always be more linked to by Special:Random. If anything, a disproportionate number of articles about one subject is more a call to other subjects to extend their coverage to make up. Special:Random is a useful tool, but its results and how this reflects Misplaced Pages's make-up should not influence our inclusion policy. In a years time Special:Random may be able to select random articles in a particular category (or exclude particular categories) and this wouldn't be a problem. The 5M and 10M pools are essentially just fun and again shouldn't be used to affect our inclusion policy. --Oldak Quill 20:11, 2 June 2008 (UTC)
I've said this to other editors who don't like the idea of a bot. Can I just say that the idea of the bot creating a few articles automatically is so many editors such as myself can no longer spend all our time creating articles on geo stubs and trying to address the huge bias on here, but can focus on quality, on building up stubs to start class articles. I dedicate a lot of time to wikipedia and creating new articles but if I could soley spend my editing time on here writing the articles that exist, things would be looking a lot better.
Perhaps the millions of articles thing was overly ambitious. It would take over a year to create that many and to expand each and every one of them would be a difficult task indeed, time which you or I haven't got. What we can do however it do several thousand at a time and get the wikirpojects involved so we can aim to get a team working at expanding a sensible number of articles of the most notable articles e.g towns with a poulation over 1000 that could quite feasibly be expanded and not remain permastubs. Ideally I;d love to have full and detailed articles on everywhere, but the huge problem is access to knowledge. Realistically if we could get many onto to here like this, it would give us a firm basis to build upon sensibly. I think the new proposal has a lot of positice points, I agree bots are stupid, even Mr Fritz. the bot programmer is the first to say this. But if it is used in the right way and coordinated and regulated closely it can be a very powerful and efficent tool in setting up a foundation to build upon as of course we write the articles!!. I have spent many weeks laone trying to adding infoboxes and refs to the geo articles which already exist by country and the biggest problem by far is lack of consistency and general shoddiness of starting them. Some editors don't get me wrong cna start articles correctly and get things off to a good start but the majority are not done in a manner expected of wikipedia and it takes weeks to sort out the mess. But if we have a bank of articles under the whim of wikiprojects and along with the editors like myself who work on geo articles we can try to build the best we can which help people in the most efficient and consistent way we can. Whatever is thought of me, I would rather not create 2 million perma stubs either and am here to build an encyclopedia of the highest quality and depth. It should be done in stages but we need to start from somewhere. Best regards ♦Blofeld of SPECTRE♦ 20:08, 2 June 2008 (UTC)
- I agree with the large part of the point you are trying to make: the bot will relieve the tedium for human editors who want to populate geography articles with basic information, freeing their efforts to expand these stubs. My core reservation is that the stubs will be created without any plan for further expansion, but merely the hope that they will attract some interest. This bot should be the mechanical core of a much larger organizational effort centered around the national WikiProjects to attach editors to these new stubs and flesh them out. This effort needs some advance planning, namely, the location before creating the stubs of sufficient reliable secondary sources to establish notability claims for each town. Any article for which this cannot be done is an immediate candidate for being what you don't want: a permastub. The community needs to come to a consensus around this point before this bot can be run. Ryan Reich (talk) 20:21, 2 June 2008 (UTC)
- The more I think about this, the more I don't like it. Even creating only lists isn't all that great. What's wrong with just letting Misplaced Pages expand at a natural rate? The Volapük Misplaced Pages was heavily criticised (and IIRC almost deleted and restarted) for using bots just like this to create thousands of stubs (I think they were even geographcal places too). Why would we want to bring that here? Misplaced Pages has content based on what its users want, because its created by its users. This will likely lead to a systemic bias toward English speaking areas. I think that as a problem, its overhyped. I would venture a guess that the Arabic Misplaced Pages is biased toward the Middle East and the Russian Misplaced Pages is biased toward Russia. Its perfectly natural and, while it does need to be addressed somewhat, using a bot like this as some sort of full frontal assuault against bias isn't a very good idea. We also have to consider how this will affect Misplaced Pages's image. We generally make a press release for every X millionth article, how is it going to look when we release 2 of those a few months apart and the articles are bot created stubs? Yes, we'll have 4 million articles, but only half will be created in the real wiki tradition. Storden, Minnesota (a rambot article picked entirely at random) is a good example of what's likely to happen to most of these articles - Since it was created almost 6 years ago, 17 of its 21 edits have been by bots, only 1 human edit has actually added any new text, 1 sentence about a highway, and even that wasn't done until earlier this year. If someone wants thousands of automatically generated and maintained articles about towns, that's a fine idea for a website, but not for Misplaced Pages. Mr.Z-man 20:12, 2 June 2008 (UTC)
- Oppose much per Mr.Z-man, directly above. From where I'm standing, this seems too much like a solution in search of a problem. This project is always expanding, and as interest in creating articles on tiny villages grows, they will be created. We don't need a bot creating stubs for hundreds, thousands, millions of articles that will likely never be touched again. Systemic bias is a problem, nobody can deny that, but it is nowhere near the problem some would make it out to be. I do not support the concept of this bot; we should focus on vastly improving the quality of articles, not quantity. - auburnpilot talk 21:00, 2 June 2008 (UTC)
- Oppose because this will just encourage deletionists and inclusionists to have yet another forum for blazing rows. If a place is worth including then it is worth real work. Fiddle Faddle (talk) 23:30, 2 June 2008 (UTC)
- Oppose per just about all the comments above, particularly concerns over non-notability, and do we really want WP to have a ton of stuff generated by bots? The whole point of WP as I recalle was that it would be user-generated, not computer-generated. --Bookgrrl /lookee here 01:33, 3 June 2008 (UTC)
- If bots can write decent articles (which I think is the case here), then let them have at it! I don't care if I'm reading machine-written text as long as it's useful. If people can think of other ways to use bots to flesh out other areas of the 'pedia, I'd support it there too. As always, we should focus on content, not contributors (human or otherwise). Mangostar (talk) 01:38, 3 June 2008 (UTC)
- Oppose - I want their to be articles on most towns in the world to make this encyclopedia representative of the world, adding several million articles of questionable notability is not acceptable. If we were creating lists of towns in differents states in a country, that would make sense, as some towns are very small and others are very notable and require a whole article. This does make it seem like notability is inherited, meaning that because they are towns or villages, they need a whole article, and that isn't the case; in many cases, only a little will be able to be reliably sourced, so we will have to have, in addition to all the work we currently have to do to improve wikipedia, is to merged millions of stub articles into lists. Perhaps if we could create a bot to create lists of towns, that would be acceptable, but this should only be done for something which notability is completely unquestionable. Judgesurreal777 (talk) 02:40, 3 June 2008 (UTC)
- Oppose Any information that is worth adding to an encyclopedia ought to be added by a human. The information that is suggested as possible bot-loaded material is not bias-free/neutral/unproblematic - particularly for the countries that are likely to be underrepresented. This is a truly bad idea. Also, we don't need a plethora of useless, probably unreliable, misleadingly incomplete sub-stubs. Pinkville (talk) 02:54, 3 June 2008 (UTC)
- Oppose The number of errors will likely be overwhelming, tens of thousands at least, maybe hundreds of thousands. Either way, thousands of these stubs, under attended, could become battlegounds for nationalistic edit wars and who knows what else. Lastly, as Pinkville hints, no bot can interpret the notability guidelines. This is not a helpful notion. However, it could be very helpful in a separate geo-wiki project. Gwen Gale (talk) 03:05, 3 June 2008 (UTC)
- The bot doesn't have to interpret the notability guidelines. Per the proposal, humans will. The bot will make not decisions about what should and should not be added. Humans will ask for the data, the bot will simply save them time. If they want the articles, presumably it's so that they can expand them. Trust me, I know that a bot can't just create the articles, which is why this bot has never, ever, been proposed by me as a simple "read data, create article" type bot. Fritzpoll (talk) 11:17, 3 June 2008 (UTC)
- Oppose Ideally, this is the 💕 that anyhuman can edit. Creating an article should have a rationale, someone should care about it beyond its existence as a line of census data. Lets extend this a little bit - every star in the universe is inherently notable, every species of life on Earth is inherently notable, every dropped wrench in orbit around the Earth is notable - are we going to create stubs for each? Franamax (talk) 03:55, 3 June 2008 (UTC)
- Every star, every species, every will eventually have a wikipedia article that is deemed to be notable. Perhaps in 10 years? 20? 100? Do not be surprised by the eventual creation of all these things in time to come, either by human/robot/space alien/etc hands. Things change. They do not stay as they are. Peace!Calaka (talk) 04:22, 3 June 2008 (UTC)
- Absolutely, and I look forward to showing my grandchildren my edits to Misplaced Pages when it was only on computers. The question though is today - do we achieve a purpose by creating a million stubs? And do we help the world of search by putting a WP stub into the top-three Google results, even though it is uninformative? The articles we create go straight to the top of search rankings, shouldn't they actually say something useful? Franamax (talk) 04:30, 3 June 2008 (UTC)
- Ahh I was not aware that google automatically indexed newly created wiki articles on the first page. I assumed that the newly created articles arrived at the top page after several search results of the topic in google and consequential clicks of that link. Calaka (talk) 04:46, 3 June 2008 (UTC)
- Google works from a complex algorithm which I don't pretend to understand. It revolves to a large degree on the number of incoming and outgoing links to the page, and I think to the site. Misplaced Pages scores way up there. For instance the obscure Order of the Dogwood which I recently created - two weeks ago, searching that phrase on google would give you pages of links related to Terry Fox and Nancy Greene, now those are all gone, subsumed into the Misplaced Pages link - we're the go-to click. Please anyone correct me if I'm wrong, but this is serious stuff here. Franamax (talk) 06:20, 3 June 2008 (UTC)
- Ahh I was not aware that google automatically indexed newly created wiki articles on the first page. I assumed that the newly created articles arrived at the top page after several search results of the topic in google and consequential clicks of that link. Calaka (talk) 04:46, 3 June 2008 (UTC)
- Absolutely, and I look forward to showing my grandchildren my edits to Misplaced Pages when it was only on computers. The question though is today - do we achieve a purpose by creating a million stubs? And do we help the world of search by putting a WP stub into the top-three Google results, even though it is uninformative? The articles we create go straight to the top of search rankings, shouldn't they actually say something useful? Franamax (talk) 04:30, 3 June 2008 (UTC)
- Oppose per what everyone else has said (info added by humans, not computers). Fin©™ 10:05, 3 June 2008 (UTC)
- Strong Oppose. In a nut shell, you want to create a bot that creates articles that no one finds interesting enough to create. I cannot oppose this idea more strongly. I'm not strictly a creationist per say, but in most cases I believe that if someone wants to create an article on a subject that interests them, then they should be allowed to, so long as its not nonsence or other such vandalism, but if no one wants to create an article on a topic, that means its doesn't need to exist. ☯Ferdia O'Brien /(C) 10:11, 3 June 2008 (UTC)
- My reply to Gwen above covers your point about "interesting enough to create" - the bot is not deciding what articles to create - human beings are, per the proposal as described Fritzpoll (talk) 11:17, 3 June 2008 (UTC)
- Strong oppose - per what I had said previously. Creates a rather untenable set of circumstances. Sephiroth BCR 10:44, 3 June 2008 (UTC)
I'm a Luddite
- Sign here if you're a Luddite.
- Dude, mechanical looms are just wrong :) However, I do favour the human touch in almost every aspect of Misplaced Pages, with due care to each and every thing we do. It takes a lot longer, but we're more sure of the results. Remember, there's no deadline. Franamax (talk) 04:02, 3 June 2008 (UTC)
- I would suggest that the creation of the rambot stubs spurred alot of human editing from Americans who started discovering Misplaced Pages in search engine results for their hometown or places in their local area. Furthermore, for newcomers, making a small addition to something that's there will often be less daunting than creating a new page, especially when they're unfamiliar with stylistic issues (not to mention autoconfirmed requirements for creating new pages that exist nowadays). Would we not see the same consequences from these stubs as the rambot stubs? There is the issue of limited returns from non-English speakers, I suppose. --bainer (talk) 05:31, 3 June 2008 (UTC)
- (edit conflict) If each and every page that the bot produces has to be checked by a human before being created as an article, doesn't this satisfy your requirement? Yes, there is no deadline, but that doesn't mean that the work has to be repetitive and tedious, which is what creating these stubs by hand would be.--Aervanath's signature is boring 05:34, 3 June 2008 (UTC)
- Aervanath, to the first, do you mean that we only require an editor to say with a straight face "oh yes, twenty stubs added per minute and I personally verified each one"? That's a recent-ongoing bone of contention of which you may not be aware. Repetitive and tedious - there's my point, if it doesn't gladden the heart, why do it?
- Bainer, I take the point on encouraging contributions to expand articles, which I think is also the thrust of Aerv's comment, let's just put them out there. But to what purpose? One million more places for people to say "This town is sooo boring" or "My firend Paul is teh gay"? We end up having to massage every new editor's contribs up to scratch. The counterpoint might be, why not stub every group of four teenagers who play musical instruments? (And as far as non-English - non-issue, we have the equal responsibility to patrol all these articles for quality). Franamax (talk) 06:04, 3 June 2008 (UTC)
- Responding to your first point, from what I understand about the way the bot will operate, the articles will not be created until the relevant WikiProject has reviewed the information and deemed it worthy of its own article. So, yes, the articles will not be personally verified at the very instant of creation, but the bot-created article will have been pre-verified by a human editor.
- As for the vandalism, I guess there's a fundamental philosophical difference there. I don't think we should let the fear of vandalism stop us from improving the 'pedia. That feels too much like giving in to them and letting the vandals determine what we do and don't do on Misplaced Pages. The anti-vandal bots and RC patrollers do a great job of reverting vandalism very quickly, and I don't think these new articles will impact that stellar record.--Aervanath lives in the Orphanage 10:09, 3 June 2008 (UTC)
Discussion
So we just scrap the previous discussions because you didn't see it? Seems strange, at the least. I still oppose the whole plan, per the reasons I already gave. Thanks for the opportunity to voice my opinion, again. IvoShandor (talk) 13:41, 2 June 2008 (UTC)
- No, they're on the talk page - people coming to the page afresh won't see the adjusted proposals, and at well over 200K, an archive seemed appropriate Fritzpoll (talk) 13:43, 2 June 2008 (UTC)
Better. However, I still oppose the concept that an arbitrary population size makes a location notable. Locations are notable because things are there or happen there that get discussed directly and in detail by independent third-party sources. That's it. Nothing else counts at all. One liners, inclusions in lists of locations, or being in an atlas doesn't confer notability. Direct, detailed discussion in third-party sources is the only criteria that matters at all.
As for balance, I wouldn't mind if someone created a bot that searched for bot-created geographic locations that have never been edited by humans and deleted them. That would repair the error that was made when bots created innumerable articles for non-notable US locations.Kww (talk) 13:45, 2 June 2008 (UTC)
- I agree with archiving. I'm adding some more specific subsections below. --Alvestrand (talk) 13:49, 2 June 2008 (UTC)
- Kww, I think the criteria for notability is going to be set by the participating WikiProjects. The criteria for Russia may differ from the criteria for Egypt, and none, one, or both may use population as a criterion, depending on the will of the editors involved in those projects. davidwr/(talk)/(contribs)/(e-mail) 13:56, 2 June 2008 (UTC)
- No wikiproject should set a criteria for inclusion that is below the general standard, only above. Generally, the project guidelines are shortcuts: if an album meets WP:MUSIC, it is exceedingly rare that it doesn't also meet the general notability guideline. Projects can't override WP:N to suit their particular fancy, they can only provide additional criterion. To do otherwise allows us to have Pokemonpedia, because a small group of editors decides that "included in the Pokemon Index" is sufficient to be notable. This is going down the exact same bad path: appearing on three maps, or in three databases, may make information verifiable, but does not make it notable. I've got no problem with someone saying that towns below 10,000 are generally not notable, but that is not the same as saying that towns above 10,000 are notable. Many towns are completely non-notable, and have no reason to be included in an encyclopedia. Atlases, sure, but encyclopedias, no. Kww (talk) 15:13, 2 June 2008 (UTC)
- I think what I was trying to aim for by having a coimmunity discussion was to establish the community's guidelines for how notability should be determined. I said population as a suggestion to get the ball rolling, and because it's easy. However, we should be open to any other suggestions for defining it. I certainly wouldn't want WikiProjects overriding our notability guidelines - I just wanted to establish what these guidelines should be if the bot is to run. My proposal is a flexible thing - that's how I hope to reach consensus Fritzpoll (talk) 15:20, 2 June 2008 (UTC)
The watchlisting bit should be removed. Special:Unwatchedpages is already pretty useless, adding them to the bot's watchlist is just a waste of server space and could be misleading if the page is ever made useful in the future. Mr.Z-man 19:18, 2 June 2008 (UTC)
Size limits
I would like to see the bot operate with an absolute size limit for population, and move that limit downwards over time. As an initial number, let me toss out 10.000 - that would make sure there isn't two million entries created in the first round, and increase the consistent coverage of Misplaced Pages. --Alvestrand (talk) 13:49, 2 June 2008 (UTC)
- Assuming I read the revised proposal correctly, there is no "first round" - but rather lots of "first rounds," one for each country or other area covered by a Wikproject. The "first round" of country A may include all places no matter how small, if that is the will of the editors. Another country may have a population cut-off of 100,000 if that is the will of the relevant Wikiprojects' editors. davidwr/(talk)/(contribs)/(e-mail) 13:58, 2 June 2008 (UTC)
- I think I was suggesting that whatever the community at large decides is suitable is what the bot will do. If the community wishes to defer these decisions to the Wikiprojects, then it can. Fritzpoll (talk) 13:59, 2 June 2008 (UTC)
- I like the idea of having individual WikiProjects decide what the threshold should be. Resolute, Nunavut appears on many world maps as a dot at the far northern edge of Canada. These same maps may omit many other cities with a population of 100,000. Our own article on this hamlet of 229 is pretty good. So why is it included on so many global maps? Because it's a hub for people living, working and exploring within a sparsely populated area of thousands of square kilometers. On the other hand, we had the whole brouhaha of the Gnaa, Nigeria article in 2006. Initially started as an exercise in trolling by a trolling group also known as GNAA, it was almost impossible to kill because we didn't want to delete articles about towns in underserved areas. Eventually it got deleted -- see the talk page discussion at Misplaced Pages talk:Articles for deletion/Gnaa, Nigeria (3rd nomination); what was initially thought to be a town of 6,559 based on Falling Rain Genomics data really turned out to be farmland when viewed on Google Earth. Even Falling Rain's data made the same point once it was properly interpreted. No Nigerian Wikipedian would have advocated the retention of the Gnaa article but probably most Canadians would advocate keeping the Resolute article. --A. B. 18:27, 2 June 2008 (UTC)
- Disagree with size requirements. Niue has around 20 settlements on the island, with about 2000 people. In this case, most of the population emigrated to New Zealand for work. However, prior to the emigration, the locations were often much larger. This is one of the big problems with trying to impose size limitations. When generating articles for US locations, which often have much shorter histories, the bar was set at a population of 3. I could accept using that standard elsewhere as well. John Carter (talk) 19:11, 2 June 2008 (UTC)
I don't strongly favour fixed population limits and like the idea to get guidance from WikiProjects. However, I do think we need a community-wide approximate default, and that variations around this default (both wholesale changes to the default, and individual exceptions) should be based on the availability of reliable secondary sources. I see the need for percentages for smaller countries, but a percentage doesn't work well across the spectrum from Niue to China: cutoffs are needed. I also think that percentage of total population may be a safer measure than percentage of capital city. I would propose something like the following (the specific figures are just to make the principle easier to understand):
- For countries with a population over 100000, the minimum* population allowed should be 1000;
- For countries with a population between 2000 and 100000, the minimum* population allowed should be 1 per cent of population;
- For countries with a population of less than 2000, the minimum* population allowed should be 20.
The * is to emphasise that this can be an approximate default. From the point of view of notability, it is important to get the figure "20" right, and this is best left to WikiProjects to demonstrate their sources. From the point of view of not flooding the encyclopedia, it is important to get the figure "1000" right. (My belief is that 1000 is a safe figure here and will result in less than 200000 new stubs, but a higher figure is fine with me: 10000 is certainly safe.) The percentage scale interpolates between these figures in a way that treats medium-sized countries fairly. Geometry guy 22:29, 2 June 2008 (UTC)
- I like the idea of having individual WikiProjects decide what the criterion for inclusion should be, but I don't think that it has to be population size. So no approximate default, please. It is possible that some projects will prefer to have only the articles that are not orphaned, others will prefer to have only the articles with interwikis, others will prefer tables with redirects etc. Colchicum (talk) 22:38, 2 June 2008 (UTC)
- It is only an approximate default minimum that I want: WikiProjects can freely ask for many fewer articles to be created. What we need to protect against is WikiProjects generating too many articles without sufficient justification. Geometry guy 22:41, 2 June 2008 (UTC)
- Well, yes, but this approach is flawed. E.g. Russia has a population about 30 times larger than that of Finland. Now do you really think that a settlement with a population of 1,000 (or 10,000, or 100) should be notable enough in Finland and not notable 30 km from there in Russia? Why is it the size of the country that matters? Colchicum (talk) 23:17, 2 June 2008 (UTC)
- I think you are contributing in write-only mode, with is a pet hate of mine. Russia, population 142 million, Finland, population 5.3 million. Both much bigger than 100000, so, according to my proposal, the approximate minimum size would be the same (I suggest 1000). So your comment supports my approach, and my modification of the initial proposal. Thanks, Geometry guy 00:03, 3 June 2008 (UTC)
- What about France and Monaco? I still think that a settlement is always notable enough if the article has interwikis or links pointing to it, regardless of the population size. And I don't understand why Fritzpoll thinks that a country's development can be measured by its population. Cf. Andorra vs. Sudan. Colchicum (talk) 10:03, 3 June 2008 (UTC)
- I think you are contributing in write-only mode, with is a pet hate of mine. Russia, population 142 million, Finland, population 5.3 million. Both much bigger than 100000, so, according to my proposal, the approximate minimum size would be the same (I suggest 1000). So your comment supports my approach, and my modification of the initial proposal. Thanks, Geometry guy 00:03, 3 June 2008 (UTC)
- Well, yes, but this approach is flawed. E.g. Russia has a population about 30 times larger than that of Finland. Now do you really think that a settlement with a population of 1,000 (or 10,000, or 100) should be notable enough in Finland and not notable 30 km from there in Russia? Why is it the size of the country that matters? Colchicum (talk) 23:17, 2 June 2008 (UTC)
- It is only an approximate default minimum that I want: WikiProjects can freely ask for many fewer articles to be created. What we need to protect against is WikiProjects generating too many articles without sufficient justification. Geometry guy 22:41, 2 June 2008 (UTC)
Relation to existing stubs
There are many place stubs with low quality, and lots of information (census data, references, maps) that FritzpollBot can do better than a human editor. I suggest that for existing places, FritzPollbot place a subpage of the form /AutoGenerated with what it would have generated, and a human editor can "pull up" the stuff that he likes. --Alvestrand (talk) 13:49, 2 June 2008 (UTC)
- Yes, although actually I'd prefer to see the bot (in a month or two, when this first part is going swimmingly) start to stuff many of the existing stubs into internal sections of its own content, after its one-line lead (rather than mess with subpages, which are widely deprecated in mainspace). It would be able to handle most oddities like additional photos or maps, the stub template itself, and other markup. E.g. if stub says only "Obscure village Y was the birthplace of famous person X" with stub template, the text becomes new section 1 with a neutral title like "Details". The bot could defer complex cases back to the projects. JJB 14:20, 2 June 2008 (UTC)
- Better, when going through the data before the article-creation run, editors could be able to signal that they would like to see an autogenerated version (created at, say, Wikiproject:FritzpollBot/Foobar/Foo, for the article named Foo in country Foobar) - these can then replace or merge into the current versions where necessary. It'd be a bit of a waste of time in my view to create thousands of repeat articles in cases where we already know we wouldn't benefit from it. Pfainuk talk 14:27, 2 June 2008 (UTC)
- For the record, this can be easily done, and I'm perfectly happy to implement it Fritzpoll (talk) 14:57, 2 June 2008 (UTC)
- This sounds like a good idea, but think this can be done most simply in articlespace. How would an uncreated article get linked to an autogenerated draft, and how would a "random" user discover this draft in order to benefit from it? I think this would just double or triple the number of pages necessary to add these articles, and I think there are other ways to improve this proposal. --NickPenguin(contribs) 03:11, 3 June 2008 (UTC)
Global coordination
I think we should have a Wikiproject page for the bot itself, where global issues are addressed. This should link to other participating WikiProjects.
There should also be a "single place to go" to see all upcoming scheduled runs of the bot, with advanced notice of at least 24 hours for large runs and at least 30 minutes for runs larger than a handful of articles.
This page should also have an emergency-stop switch that would stop the bot and suspend all upcoming runs.
davidwr/(talk)/(contribs)/(e-mail) 14:10, 2 June 2008 (UTC)
- The WikiProject idea is point 1 of the new proposal. Advanced notice is also suggested, thought the timing is open to debate. Good idea to include a schedule within the new WikiProject page. A note to me would prevent all upcoming runs as the bot is supervised, and not fully automatic. A stop switch can be added with relative ease - a block by an admin would have the same effect. Fritzpoll (talk) 14:14, 2 June 2008 (UTC)
Thoughts
I might support this (notability has never seemed a problem to me), but I still have questions. 1) Could you please elaborate how it is possible to make sure that all the produced pages will be watchlisted by somebody with enough expertise to detect misinformation in the future? It is still not very clear. Authors normally know something about the places they write about, randomly assigned volunteers don't. 2) What is wrong with the suggestion to create lists with redirects instead of stubs? They are much easier to patrol. 3) How will it be possible to clear the backlog of orphaned stubs? And if we are going to create lists anyway (which may be useful for other reasons, but is an artificial and wrong way to clear the backlog), why do we need the stubs? Colchicum (talk) 14:16, 2 June 2008 (UTC)
- Well, because we would seek volunteers from WikiProjects associated with the articles being created and get them heavily involved in the process, they can watchlist the articles. That's one reason why, if there are no volunteers from areas of the encyclopaedia that would be interested in such articles, I don't think we should create them until such volunteers exist. Lists with redirects are perfectly acceptable, but I think from a content point of view, this could be decided by the WikiProjects - how much are they willing to maintain, etc. Certainly, there is no technical objection to this. The lists are purely to administer the bot's operation, so clearly shouldn't count towards clearing the backlog, as ultimately they may well be removed. I think discussion on a country-by-country basis will allow a decision on how to "add to the web" - I see the bot as more of a tool for people to use as they wish - sort of a builder doing the work of some architects, eahc of whom wants it sone slightly differently. I'm not sure if I answered everything - let me know if I didn't Fritzpoll (talk) 14:23, 2 June 2008 (UTC)
- 1) It will be necessary to write some script which would enable the volunteers to watchlist the pages (selected on the basis of the regional categorization they like) quickly.
- 2) As to lists vs. stubs -- ok if the relevant projects are explicitely, fairly and without bias informed of both options and the pros and cons raised in this discussion.
- 3) I suggest to notify WP:O and to create a tag that would place the article in a special subcategory of Category:Orphaned articles for bot-created orphans, so that your bot could immediately tag the pages it creates as such if they are orphaned. We shouldn't clutter the list of human-created orphans. Colchicum (talk) 14:57, 2 June 2008 (UTC)
- 2 and 3 are simple to do, and will be done. 1)....well, I can certainly output a list based on a user's query and then they can copy and paste the list into their watchlist. Sound good? Fritzpoll (talk) 15:00, 2 June 2008 (UTC)
- I have created {{geo-orphan}} to tag the orphaned ones. The tag will place articles in Category:Orphaned articles about a place--Aervanath lives in the Orphanage 05:55, 3 June 2008 (UTC)
- 2 and 3 are simple to do, and will be done. 1)....well, I can certainly output a list based on a user's query and then they can copy and paste the list into their watchlist. Sound good? Fritzpoll (talk) 15:00, 2 June 2008 (UTC)
Italy experience
I've a similar experience to Llywrch's Ethiopia experience reported above - this year I've spent a lot of time sorting out the place articles in Italy. Which as countries go should be as good as it gets - a highly developed, top 10 economy with one of the most active Wikipedias at it.wikipedia.org, Internet availability is not an issue, massive emigration to anglophone countries and a popular holiday destination to ensure lots of English speakers are interested in it, plus in the top 5 for length of recorded history and archaeology. Everything is in your favour, it must have some of the most notable territory on the planet if you define notability by the availability of sources. And yet....
...notability for Italian settlements seems to dry up somewhere around the 2000 population mark, even on it.wiki where they have all the "home team" advantages. On a consistent basis anyway, that's just an observation of what seems to "work" out in the field. It gets pretty patchy below that, and often the smaller villages work better merged into a larger administrative area, often the most notable thing in an area doesn't fit neatly into a village in any case. There are advantages in having a broad picture in a single article rather than lots of microarticles. But for the rest of the world, I'd suggest a mental model of somewhere in the 5000-10,000 population mark as being the lower limit for consistent notability, which is nice because that typically corresponds to a unit of municipal administration - the township, commune, municipality, district or whatver. In Italy the equivalent level is the comune - the population of 60m people is spread over 8,101 comuni, an average of 7,400 people per comune, although that's distorted a bit by the likes of Rome and Milan, I guess the median is about 6000 people. That's the level of local administration that has a mayor, town hall etc, and is the level you should be aiming for - we (now, finally!) have an article for all 8101 of those Each of those comuni typically has a couple of subdivisions (frazioni, località and the like, approximating to wards or parishes) so there must be 40,000+ of those - but en.wiki only has about 200 of those, and it.wiki has about 1700 but a lot of those are either- I'd guess that long term the "right" number would be somewhere around 1000 out of 40,000+. However one of my long-term aims would be to create redirects for all 40,000-odd frazioni to the appropriate comune article.
Honestly, the comune articles just seem to "work", whereas going down to frazione level tends not to - there are real synergies in not fragmenting the information too finely, and information about the frazioni is still on Misplaced Pages, it just gets a bit more context from being in the comune article. And that's just from the perspective of Italy, which has so many advantages in proving notability from WP:RS. Perhaps the sensible thing would be to go down to township level for countries like Italy, and down to the step above (counties/provinces/the unit covering 50,000-100,000 people) for countries where WP:RS are more scanty. That way every point on earth is covered by a Misplaced Pages article, but that article has a chance of being notable in its own right. The debate is not about comprehensiveness, but about the granularity of that comprehensiveness. Obviously in almost every case (perhaps barring the likes of Nauru?) the granularity should be below "country" level, but personally I think "village" level is too fine a cut.
And I don't like using geographical entities such as "village" - it doesn't guarantee the WP:CSB comprehensive coverage of the surface of the Earth we want, that you get if you use administrative areas. Governments tend to make sure that all points of their territory are covered by an administrative division, but that's not the case if you rely on geographical points such as "villages". In particular not if you rely on Maplandia Level 3 data. To take a random example, I was in the province of Asti a few months ago. The territory of that province is covered by 118 comuni, which are listed at Category:Communes_of_the_Province_of_Asti. Compare that with Maplandia's coverage of Asti - just in the A's it lists only Asti itself out of 6 comuni beginning with A, and Aie, which I guess is the suburb of Bruno (comune population 379) that Google shows as Borgo Aie some way to the north of where Maplandia has it - even it.wiki doesn't have an article. So a hit rate of 1 in 6, in one of the most developed regions of western Europe! However at first glance their provinces at Level 2 look OK. So another reason to concentrate on getting comprehensiveness at Level 2 level, you can probably hope to get that reasonably "right", but Level 3 will need country specialists.
And I would echo Llywrch's comments about the amount of work needed to produce a "clean" list of articles - even in Italy where the government provides handy lists of comuni in Excel format, it still took a lot of work to knock the articles on it.wiki and en.wiki into shape. Despite all the work that had been done on Italian places, there were a few articles missing because people had seen blue links from articles on everything from Chicago supermarkets to Latvian goddesses. And that's before you get onto the typos in the official list, variations in accent use, renamings, a morass of local dialect names for places, bilingual names, and of course WP:COMMONNAMEs in English. The data needs a LOT of cleaning to get it "right", even when it's from an authoritative government source with no problems about transliteration from a non-Roman alphabet.
So has someone who's done 1000's of edits to "place" articles in a country where WP:RS are as abundant as anywhere, I would beg you to :
- Use administrative divisions - the only way to ensure comprehensive coverage of the Earth's surface
- Using arbitrary "places" or "villages" doesn't help WP:CSB, as they aren't comprehensive
- Don't use arbitrary population limits, aim to get 100% coverage of a single administrative level.
- WP:RS may mean that "Level 2" divisions are what you should aim for with comprehensive coverage on Misplaced Pages - counties/provinces or similar
- Even in Western Europe, you should only go down to township/commune level for comprehensive coverage
- Don't use Maplandia Level 3 - the data sucks
- Articles at the parish/ward level only "work" for a relatively small percentage of places, and the township article is improved by having the parishes all in the same place
- Be great if you can make redirects of the parishes
- Please tag the Talk page with an appropriate country Project - much easier to have that done at the start.
- Beware misleading blue links - a check of the appropriate Categeory:Townships in... can help you catch those
But as long as you don't go mad going down to village level, I fully support the intent, albeit quite not the letter of how the proposal is currently phrased - good luck. FlagSteward (talk) 14:28, 2 June 2008 (UTC)
- Thanks for some of the tips. Using census data should help achieve many of the aims you specify. Maplandia will not be used in this new proposal. Redirects are possible if the WikiProject desires them. Tagging the talk page is also no problem. Misleading blue links will be picked up by the last stage, where the uploaded lists are checked. Thank you for your thoughtful response - I hope some consensus on what depth to aim for can be achieved Fritzpoll (talk) 14:35, 2 June 2008 (UTC)
- I would also prefer using administrative/governmental/political (whatever you want to call them) divisions to determine which articles to create or not. Political divisions are more likely to reflect the history of a place (and therefore notability) more accurately than simple population figures.--Aervanath lives in the Orphanage 06:33, 3 June 2008 (UTC)
Precedent
Is there any precedent for roboticly-adding geographic stubs, even on a small scale? If so, can we learn lessons from it? How about non-geographic stubs? davidwr/(talk)/(contribs)/(e-mail) 15:03, 2 June 2008 (UTC)
- User:Rambot (lots of geography stubs), and User:Polbot (some US politicians, and then lots of animals and plant species), are the two I can remember. There must be others as well. Carcharoth (talk) 15:28, 2 June 2008 (UTC)
- From what I've heard, Rambot did it without the human intervention that is proposed for this bot. Not sure though Fritzpoll (talk) 15:31, 2 June 2008 (UTC)
- According to this, Rambot add ~30000 articles in an 8-day period. With 86400 seconds in a day, 8 days, and 30000 article to create, this works out to 23.04 seconds per article. (86400 X 8 = 691200 seconds / 30000 = 23.04) I think it's safe to assume this was done without human intervention. Thingg 16:13, 2 June 2008 (UTC)
- Well, I can do simple AWB jobs at 6 articles per minute, checking for errors and skipping problems on the way. Give me 23.4 seconds and I have enough time to correct a small error on each edit. :-) Unfortunately I can't do that 24/7 :-) Seriously, though, all bot tasks involve human intervention and human oversight: the bots are error tested, the operation is monitored by the operator, and the bot flags up exceptions when human intervention is needed (or skips and writes to a log). What makes a task a bot task is that most of the time the bot autosaves the edit, without a human check before each save. In the case of this bot, my understanding is that, after much careful preparation, human intervention and error testing, the actual run to generate the stubs will mostly involve autosaving like any other bot task. But please correct me if I am wrong. Geometry guy 21:52, 2 June 2008 (UTC)
- You've nailed it exactly, much more succinctly than I could Fritzpoll (talk) 11:39, 3 June 2008 (UTC)
- Well, I can do simple AWB jobs at 6 articles per minute, checking for errors and skipping problems on the way. Give me 23.4 seconds and I have enough time to correct a small error on each edit. :-) Unfortunately I can't do that 24/7 :-) Seriously, though, all bot tasks involve human intervention and human oversight: the bots are error tested, the operation is monitored by the operator, and the bot flags up exceptions when human intervention is needed (or skips and writes to a log). What makes a task a bot task is that most of the time the bot autosaves the edit, without a human check before each save. In the case of this bot, my understanding is that, after much careful preparation, human intervention and error testing, the actual run to generate the stubs will mostly involve autosaving like any other bot task. But please correct me if I am wrong. Geometry guy 21:52, 2 June 2008 (UTC)
- According to this, Rambot add ~30000 articles in an 8-day period. With 86400 seconds in a day, 8 days, and 30000 article to create, this works out to 23.04 seconds per article. (86400 X 8 = 691200 seconds / 30000 = 23.04) I think it's safe to assume this was done without human intervention. Thingg 16:13, 2 June 2008 (UTC)
- From what I've heard, Rambot did it without the human intervention that is proposed for this bot. Not sure though Fritzpoll (talk) 15:31, 2 June 2008 (UTC)
How to handle blue-links
I would recommend that instead of "just skipping" blue-links, blue-links can be logged with the blue-link-article sizes, and checked for "#redirect." Redirect- and too-small articles - which might turn out to be garbage articles or stubs of inferior quality to the bot-generated version - can then be visited by hand and if appropriate, replaced by or merged with a bot-created version. davidwr/(talk)/(contribs)/(e-mail) 16:01, 2 June 2008 (UTC)
- Yeah, sorry - forgot to mention this. As a rollover from the old proposal, this logging mechanism will still be in place (as it took long enough to write it) Fritzpoll (talk) 16:03, 2 June 2008 (UTC)
- All blue-linked articles should eventually be checked by hand to make sure they aren't something completely unrelated. The town of Big River Dam (England) might be a blue-link for the dam, not the town. This can be partially automated by checking to see if the blue-link is in a "cities and towns in " or similar category, but even that will have false positives and false negatives. davidwr/(talk)/(contribs)/(e-mail) 16:04, 2 June 2008 (UTC)
- Also, a lot of this will hopefully be avoided by human checking - but the bot will still catch human errors using the logging described Fritzpoll (talk) 16:06, 2 June 2008 (UTC)
Talk pages
The bot should create talk pages with the appropriate wikiprojects filled in. It should also add a first section noting the original stub was created by a bot. davidwr/(talk)/(contribs)/(e-mail) 16:01, 2 June 2008 (UTC)
- Yes, again, consultation with the WikiProjects here is essential. A small tag from the Bot's WikiProject should suffice as the bot notification? Fritzpoll (talk) 16:05, 2 June 2008 (UTC)
Name conflicts and disambiguations
If a country has two places with the same name, we need to flag that for human intervention and/or have an automated way of handling it. It's not inconceivable that two towns in two different counties or provinces will have the same name. If the country's naming convention is "townname, countryname" that will be a conflict. This will also be a problem for blue-links, where one of the places already has an article and the other does not. davidwr/(talk)/(contribs)/(e-mail) 16:08, 2 June 2008 (UTC)
- As people examine the lists of places, these should become apparent. It's why the bot will list places in alphabetical order by country - the duplicate names end up next to each other. Disambiguation then is a breeze, existing articles can be moved to the disambiguated name, and ta-da! Then the bot doesn't have to try to do this either Fritzpoll (talk) 16:36, 2 June 2008 (UTC)
Percent of capital city is not the right metric
The size of capital cities depends strongly on whether the city is also the commerce center (such as London, Paris, Tokyo, Moscow) or just a government center (USA, Germany before unification). The difference can be more than an order of magnitude (Bonn has 300K, Berlin 3M people). Better, perhaps, would be a percentage of the largest city. LouScheffer (talk) 16:32, 2 June 2008 (UTC)
- Good point, and I support percentage of the largest city as the right choice for the project. Maybe start at 25%?? Cheers, Pete Tillman (talk) 22:13, 2 June 2008 (UTC)
- I don't think population should be the metric at all. Political divisions (as I said above) are more likely to reflect the notability of a particular location. My preferred limit would be creating articles down to (at most) the 2nd-to-lowest administrative level. In most cases, I'm sure even that far may be too low for notability purposes.--Aervanath lives in the Orphanage 06:41, 3 June 2008 (UTC)
Inherent notability
Since a lot of people, including the proposer, are prepared to jump on the bandwagon of "inherent notability" of places, I want to state my arguments that such a concept should not exist. Here is the comment I left at the old discussion:
- Here is a comment on "inherent notability" which I left at Misplaced Pages talk:Notability (Places and transportation)#Inhabited places: Regardless of whether it is the community's norm (and regardless of whether it is already done in other policies), I do not think it is a good idea to declare, merely by administrative fiat, that a broad class of topics is inherently notable. This amounts to an abdication of responsibility for the quality of the encyclopedia. Consider how notability can be proved absent such a declaration: given a topic, one consults first primary sources which document it, and then secondary sources which, in commenting on the primary sources, relate it to other ideas and provide the crucial aspect of notability: human commentary. This is why Misplaced Pages is a tertiary source, merely weaving together the contents of secondary sources as a unified testament to human interest in a subject. The insistence on secondary sources is at the core of two of our core policies: no original research, and notability. By requiring that we rely on secondary sources, we force our articles to incorporate only documented facts and opinions, preventing us from being what we can never successfully be: a forum for the original publication of new ideas. But it also ensures that we only write about topics which have been demonstrated to matter outside themselves: that's what the existence of secondary sources (commentary) proves. That's the basis for our notability criteria. By declaring something to have inherent notability, we give license to circumvent secondary sources and, therefore, sacrifice true notability. Essentially, a permissive notability policy is original research.
In addition to this, I have also said that in my eyes, a place article whose only contents are rote information like coordinates and population (or even elevation, etc.) can be considered to be no more than a dictionary definition of the place, and should be treated like we treat dictionary definitions of words: deleted. The principle, here and in the above paragraph, is that the only information which is notable according to WP:Notability is that which comes from secondary sources, and therefore represents some kind of human interest in the subject.
On the other hand, geographic locations represent a large body of formally similar subjects, of each of which there is potentially a lot to say; it is dishonest to claim that two similar towns, one with an article and one without, truly differ in notability, but probably more in the amount of expertise of our editors. The question here is how to identify two locations as being "similar", and how to provide for all similar locations the kind of minimal acceptable notable information that we know must exist. Since this information is not rote, by definition including it will require human intervention, but there is still a place for a bot to collect the basic rote data and form the stubs for each article (which must then be raised to acceptable standards by people). Since the resources available to contribute to articles in various countries or regions varies (that is, the resources of the geographic WikiProjects), the determination of "granularity" of coverage must be made on this basis. The observations made in #Italy experience about the utility of administrative regions seem to me to be the key to reconciling these requirements.
Here is my suggestion:
- Article creation will proceed on a country-by-country basis, but it doesn't have to be sequential.
- For each country, the Wikiproject's willing members will be tallied and the number will be compared to a list of administrative regions of the country, at various levels of granularity. The generation of articles will proceed at the finest level of granularity for which there is sufficient membership to ensure some kind of individual attention to each article.
- The members of the WikiProject will gather reliable secondary sources on these regions. Once a sufficient guarantee of notability is obtained for each region, the bot can generate stubs for all of them in the format currently proposed (coordinates, population, etc.)
- After the stubs are created, the WikiProject will go to work on them, adding truly notable information from their sources.
The ordering of points 3 and 4 is important: we should not encourage the indiscriminate creation of articles for which there is no guarantee of notability! However, the services of a bot in doing work that no human would want or be able to do are invaluable in facilitating the discriminating creation of articles whose expansion is certain. This method is a lot of work, to be sure, but it will do the most justice to each country according to the maximum extent of the resources we have available to us. Once this "first pass" is completed (hopefully within my lifetime :) we can move on to finer granularities; quite possibly, this project will attract attention to the geographical WikiProjects and there will be more participants the second time around.
I admit that this will not counter the systemic bias we already have to the same degree that the original, contentious proposal would have. We will not achieve the same granularity all over the world. Like I said in the first discussion, this is an inevitable consequence of the inherent bias in the world concerning reliable, usable, and available sources for English-speaking readers and editors, and the inherent bias on en.wikipedia of editor expertise. However, it will combat this bias to the fullest extent possible under these circumstances and provide a fair shake for all parts of the world. By covering each country by administrative regions, we create a framework for further expansion and a convenient measure our coverage. As the editor bias is alleviated, the bias in sources will itself lessen (because a broader experience among editors will open up more sources to us, and of course, over time more sources will appear), and the bot can be run again to achieve greater coverage.
We can never achieve good encyclopedic content from a fully mechanical process. The bot should be used to make our work easier, not to replace it entirely. Just having articles named after these places is not enough: the articles should be on each place, and that takes human participation. Ryan Reich (talk) 16:21, 2 June 2008 (UTC)
- Hmmm...implementable. If consensus is reached that this is the notability requirement I allude to requiring in the body of the proposal, then it will be implemented Fritzpoll (talk) 16:55, 2 June 2008 (UTC)
- I completely agree with Ryan: inherent notability is a dubious concept for a source-based encyclopaedia. Notability, as it presently exists in Misplaced Pages, encapsulates the more important ideas that we need to be able to say something about the subject, and that this "something" needs to be verifiable through secondary sources. We need a human to determine whether sufficient source materials are available to make this is possible. Jakew (talk) 17:07, 2 June 2008 (UTC)
- Although I certainly wouldn't (and don't) support having a separate article on every named community in existence regardless of its actual legal status or availability of sources, my view (at least as pertains to places in Canada, with the acknowledgement that it may not easily translate to other places due to differing structures of government) is that any place that is legally incorporated as a municipality (city, town, etc.) should have an article, and smaller communities or neighbourhoods within that municipality should be redirected to the municipality unless and until there are sufficient sources for the communities as independent topics. As far as I'm concerned, if a place is incorporated as a municipality, then properly sourced census data and/or the municipality's own website are, in and of themselves, sufficient sources to justify the creation of at least a stub, although obviously additional content needs to be properly sourced as well. I do, however, absolutely oppose any notion that any notability standard beyond legally incorporated as a municipality by the laws of its province or territory can or should be used to distinguish "notable" from "non-notable" municipalities. Unincorporated places are a different story, certainly, but at least for Canada the rule needs to be that if it's incorporated as a municipality, then it definitely gets an article even if it has a population of two. Bearcat (talk) 17:17, 2 June 2008 (UTC)
- Essentially, you are proposing that government warrant of incorporation is a secondary source (hence, sufficient to establish notability). I don't think this is the right standard for an encyclopedia, because it is indiscriminate. The use of the word "notable" is unfortunate in this context because it seems to imply that a non-notable town is without value, whereas what it really implies is that the town is simply not the subject of enough external recognition to fill its own article. You are advocating that every incorporated town get an article, even if only the fact that it is incorporated is in there; thus, the Misplaced Pages geography articles will constitute an atlas. If this is what you want, then I agree with MickMacNee below that we need an entirely different Wikiatlas or whatever for this sort of thing. As he says, and I agree, "knowledge" is not "information" alone, and an encyclopedia demands more. Ryan Reich (talk) 17:41, 2 June 2008 (UTC)
- No, I'm stating that those are starter sources which are, in and of themselves, sufficient to justify the start of an article. No incorporated municipality in Canada is ever going to be entirely unreferenceable beyond that — by the very nature of being a municipality, media sources will exist: mayors get interviewed in newspapers and on TV. Events happen. Famous people are born there. Museums and community festivals take place. And on, and so forth. The sources simply are not a problem. But people are much more likely to add new content and sources to an article that already exists than they are to create and properly format a new one from scratch. The fact that the process of starting a brand new article from scratch on an incorporated municipality is kind of time-consuming and exhausting is precisely why not all Canadian municipalities are already done, which is why there should be a way to automate the parts of the process that can be simply filled in by a bot, such as filling in a basic infobox. There isn't a single incorporated municipality of any size on earth whose article is entirely unexpandable beyond what a bot can automate, but the article needs to start somewhere, and there's no reason why census-data-plus-municipal-website should be insufficient as a start. Bearcat (talk) 21:24, 2 June 2008 (UTC)
- The example of the US towns generated by Rambot suggests that you are overly optimistic. Examples are floating around this page of articles that, in fact, have never received human attention in the more than six years of their existence. A reasonable application of the notability standard would conclude that they do not meet it (and would have concluded it upon creation; the intervening years simply show that what you hope could come to pass, in fact, does not), and the same will be true of Canadian towns, African towns, or farm houses in the Caribbean. If, as you claim, the sources can always be found, then it is your responsibility to find them when the articles go up; otherwise, WP:N is not visibly satisfied, WP:V is certainly not satisfied, and the article should be deleted. My proposal is simply that the bot be made available to the Wikiprojects to generate templates for facilitating the start of articles, exactly as you want, and that some organizational effort be made so that articles be started in parallel at a particular level of granularity across each country, and with notability claims as we require of everything else. I have also said elsewhere that simply because people may be more likely to improve some stubs than to create the article themselves does not mean that if we create every possible stub, that most of them are likely to be improved, and in the mean time, their existence is a policy violation, a vandalism problem, and a blight. In short: I approve of the automated part of the process; I simply disagree with you about where in the process that automation should occur. Ryan Reich (talk) 21:39, 2 June 2008 (UTC)
- I'd like to note that I started to identify and list Canadian municipalities that are still redlinked on Misplaced Pages yesterday, and the process has already resulted in two new, well-referenced and well-formatted articles about Canadian municipalities — Bristol, Quebec and Sutton, Quebec — getting created by human editors today. So this process also serves to help editors identify and work on missing topics in advance, as well. And it's every Wikipedian's responsibility equally to add content and sources to any individual article. Nobody has any special responsibility for any individual article beyond ensuring that the content that they do personally add is properly sourced. A clearly-defined and organized list that exists for a WikiProject to collectively review for inclusion of further sources, yeah, absolutely. That's already one of the inherent results of this project as proposed in the first place. But it's not my personal responsibility to go out of my way to hunt down unsourced articles on Misplaced Pages just for the sake of adding sources to them. My only responsibility is to properly reference the content that I do add to the articles I choose to work on. Bearcat (talk) 21:47, 2 June 2008 (UTC)
- Also, there's a big difference between "unexpanded" and "unexpandable". The former does not automatically imply the latter. Bearcat (talk) 22:04, 2 June 2008 (UTC)
- We agree entirely, but are somehow still arguing. We agree that the process of automatically making available to editors a list of needed place articles facilitates their creation. We agree that when that process is started, the resulting articles are well-sourced and filled with nontrivial facts. I apologize for suggesting that it is your personal responsibility to find these sources for every such article; the point I wanted to make was that the success you demonstrate with Bristol, Quebec, for example, is the result of a human editor stepping forward to take responsibility for an article and substantiate the supposition that notable facts exist about this town. You say that your only responsibility is to properly reference the content you add to the articles that you choose to work on; consider, then, that the responsibility of the users of this bot is to do this for every article it creates. Your examples strengthen my belief that the use of this bot must be coordinated with interested editors who will do for the stubs it creates exactly what you have done with Bristol and Sutton. And do this before the stubs are unleashed. Ryan Reich (talk) 23:17, 2 June 2008 (UTC)
- If editors were going through geographic redlinks that systematically, the bot wouldn't be necessary in the first place, because the articles would all already be done. But they're not. Bearcat (talk) 04:53, 3 June 2008 (UTC)
- We agree entirely, but are somehow still arguing. We agree that the process of automatically making available to editors a list of needed place articles facilitates their creation. We agree that when that process is started, the resulting articles are well-sourced and filled with nontrivial facts. I apologize for suggesting that it is your personal responsibility to find these sources for every such article; the point I wanted to make was that the success you demonstrate with Bristol, Quebec, for example, is the result of a human editor stepping forward to take responsibility for an article and substantiate the supposition that notable facts exist about this town. You say that your only responsibility is to properly reference the content you add to the articles that you choose to work on; consider, then, that the responsibility of the users of this bot is to do this for every article it creates. Your examples strengthen my belief that the use of this bot must be coordinated with interested editors who will do for the stubs it creates exactly what you have done with Bristol and Sutton. And do this before the stubs are unleashed. Ryan Reich (talk) 23:17, 2 June 2008 (UTC)
- The example of the US towns generated by Rambot suggests that you are overly optimistic. Examples are floating around this page of articles that, in fact, have never received human attention in the more than six years of their existence. A reasonable application of the notability standard would conclude that they do not meet it (and would have concluded it upon creation; the intervening years simply show that what you hope could come to pass, in fact, does not), and the same will be true of Canadian towns, African towns, or farm houses in the Caribbean. If, as you claim, the sources can always be found, then it is your responsibility to find them when the articles go up; otherwise, WP:N is not visibly satisfied, WP:V is certainly not satisfied, and the article should be deleted. My proposal is simply that the bot be made available to the Wikiprojects to generate templates for facilitating the start of articles, exactly as you want, and that some organizational effort be made so that articles be started in parallel at a particular level of granularity across each country, and with notability claims as we require of everything else. I have also said elsewhere that simply because people may be more likely to improve some stubs than to create the article themselves does not mean that if we create every possible stub, that most of them are likely to be improved, and in the mean time, their existence is a policy violation, a vandalism problem, and a blight. In short: I approve of the automated part of the process; I simply disagree with you about where in the process that automation should occur. Ryan Reich (talk) 21:39, 2 June 2008 (UTC)
- No, I'm stating that those are starter sources which are, in and of themselves, sufficient to justify the start of an article. No incorporated municipality in Canada is ever going to be entirely unreferenceable beyond that — by the very nature of being a municipality, media sources will exist: mayors get interviewed in newspapers and on TV. Events happen. Famous people are born there. Museums and community festivals take place. And on, and so forth. The sources simply are not a problem. But people are much more likely to add new content and sources to an article that already exists than they are to create and properly format a new one from scratch. The fact that the process of starting a brand new article from scratch on an incorporated municipality is kind of time-consuming and exhausting is precisely why not all Canadian municipalities are already done, which is why there should be a way to automate the parts of the process that can be simply filled in by a bot, such as filling in a basic infobox. There isn't a single incorporated municipality of any size on earth whose article is entirely unexpandable beyond what a bot can automate, but the article needs to start somewhere, and there's no reason why census-data-plus-municipal-website should be insufficient as a start. Bearcat (talk) 21:24, 2 June 2008 (UTC)
- Essentially, you are proposing that government warrant of incorporation is a secondary source (hence, sufficient to establish notability). I don't think this is the right standard for an encyclopedia, because it is indiscriminate. The use of the word "notable" is unfortunate in this context because it seems to imply that a non-notable town is without value, whereas what it really implies is that the town is simply not the subject of enough external recognition to fill its own article. You are advocating that every incorporated town get an article, even if only the fact that it is incorporated is in there; thus, the Misplaced Pages geography articles will constitute an atlas. If this is what you want, then I agree with MickMacNee below that we need an entirely different Wikiatlas or whatever for this sort of thing. As he says, and I agree, "knowledge" is not "information" alone, and an encyclopedia demands more. Ryan Reich (talk) 17:41, 2 June 2008 (UTC)
- Step 4 just isn't going to happen in my opinion, and will be ruined by the fact that there are going to be enough people in every project who conclude that every stub proposed by the bot should be created through 'inherent notability', completely nullifying the human intervention element being held up above as being the difference between the dumb automatic creation of an atlas as opposed to the assisted creation of an encyclopoedia of knowledge. The core issue is that some people have misunderstood the purpose of wikipedia, and clearly don't see the issue that 'X is a place in region Y, which has a population of Z' is information, but is not knowledge. Inherent notability of places is appropriate to a wikiatlas, not wikipedia, in the same way that words are inherently notable for wiktionary. I think a completely new wiki project on the lines of wikimapia is what is needed here, not this tremendous diversion of effort away from the real jobs that need doing here. MickMacNee (talk) 17:25, 2 June 2008 (UTC)
- In order to make it possible, we need to establish a recognized official policy on "inherent notability" (hopefully, that there is none). Then, this policy can be brought to bear on articles created as "nullification" attempts and they can be properly deleted or trans-wikied. What you say about a separate Wikiproject is right, but just because there is a separate project for all places doesn't mean that we shouldn't have a place here for those that deserve real articles under our actual standards. Just like book has an article despite being merely a word, so should possibly tens of thousands of towns have articles despite millions of others not having enough notable information on them to warrant one. What I want to accomplish here is a determination that this is the correct standard, and agreement that what I've proposed is the correct way to implement it with FritzpollBot. Ryan Reich (talk) 17:52, 2 June 2008 (UTC)
People over at WP:Fiction have been struggling with the same inherent notability concept, and I will again state my opinion that it is not only wrong, it is inherently dangerous. Things achieve notability by being described, directly and in detail, by reliable, independendent, third-party sources. No other criteria can force inclusion. Attempts to decree that something is inherently notable is a shortcut that leads to controversy and strange divisions. Why are towns notable, but not shopping malls? Television episodes? Asteroids? Stars? Garage-rock bands with a really cool MySpace page? It's an effort to force an issue that cannot be supported by logic, and as such will inevitably cause trouble. Kww (talk) 19:02, 2 June 2008 (UTC)
- All notability decisions lead to strange decisions and controversy. Asteroids are inherently notable; we have articles on all numbered asteroids, I believe. Inherent notability just means that we have that many fewer complex decisions on AfD, and reduces some systemic bias by reducing the quick deletion of inherently notable things that don't have their sources on Google in English.--Prosfilaes (talk) 21:36, 2 June 2008 (UTC)
- You are arguing by assertion. Asteroids aren't inherently notable. One group of editors decided that they were. As for complex decisions ... isn't that why we have editors? Kww (talk) 23:44, 2 June 2008 (UTC)
- All notability decisions lead to strange decisions and controversy. Asteroids are inherently notable; we have articles on all numbered asteroids, I believe. Inherent notability just means that we have that many fewer complex decisions on AfD, and reduces some systemic bias by reducing the quick deletion of inherently notable things that don't have their sources on Google in English.--Prosfilaes (talk) 21:36, 2 June 2008 (UTC)
- I think Ryan Reich is correct on his fundamental assertion that there is no such thing as inherent notability. As he says, notability is confirmed by coverage in multiple verifiable secondary sources. However, I think this is merely a problem of semantics. Certain categories of articles WILL be notable, no matter what. For example, we have an article for every country in the world. Now, are these inherently notable, under Ryan's definition? No, nothing is. They are notable because every country in the world has been written about in multiple verifiable secondary sources. However, because we can reasonably assume that every country in the world has had things written about them, then if we somehow discover a country which doesn't yet have an article, we can go ahead and create the stub, even though we may not have the proper sourcing that a finished article would have. Even in the extreme case that verifiable secondary sources don't even exist yet, I highly doubt that anyone's going to argue that an independent country is not notable enough for the 'pedia. Why? Because we know that for something like an independent country, there will very soon be sources about it. That is what the various notability guidelines are trying to do. They are taking the notability policy and applying it to groups of articles, as opposed to one article at a time. So, while there may be no such thing as inherent notability, there are certainly groups of articles that we can assume are notable right from the get-go. (inserted) The challenge for the community is now to decide at what level we are willing to assume notability.--Aervanath lives in the Orphanage 11:01, 3 June 2008 (UTC)
You can find someone to note most any placename
I'm not too keen on flooding the internet namespace with placename articles, considering how difficult it is already becoming to get past all the content-less directory pages out there. But if we do this mechanically, we will in practice take one of the big (semi-)governmental listings, which is going to lead us into two different kinds of errors that these directories contain.
But let's try a different angle for a second. By using newspapers, I can source the existence of a lot of small places. For instance, in the vicinity of my present location, I can identify each subdivision (or in some cases clusters of subdivisions) as a named community. They are certainly real, and the newspapers refer to them by name. Are they notable places? Well, they don't fit into the traditional town-centered model of place-ness, so the post office for instance doesn't generally assign them post offices with names. The census isn't that fine-grained.
OK, so we go for the government-like sources. Well, using census designated places is a problem, because they do not reflect real borders. I've been working on Columbia, Maryland, whose legal existence, like some other new towns, is peculiar. It would be possible to go through real estate records and determine quite precisely what the boundaries of Columbia are, because it is covenants that determine whether or not any given property is really in Columbia. The CDP map of Columbia is quite "inaccurate" insofar as it shows several areas outside Columbia (e.g. Holiday Hills and Allview) that are geographically distinct; it also assigns part of Clarksville to Columbia, for reasons that aren't at all apparent to me. Meanwhile, the post office keeps trying to reduce the granularity of "place-ness". Their theory of how large Silver Spring, Maryland is, for example, is quite inflated. There are also some blatant mistakes in their assignment of secondary names (e.g. they have Colesville and Cloverly swapped).
My sense is that a mechanical translation of these data into articles is going to generate a great deal of argument. If we go with the Census, we will get data, but we'll also get a lot of "places" whose geography is a debatable artifact of statistical collecting convenience. If we use the list of geographic place names, we will produce a ton of stubs, and there will undoubtedly be argument as to whether many of them actually exist. In the meantime, the process is likely to generate a lot of cluttery misinformation. Mangoe (talk) 19:22, 2 June 2008 (UTC)
- Regarding CDPs, you could argue that at the time of the census the borders of the CDP were whatever the Census Bureau said they were, deed records notwithstanding. If deed records of land parcels 1-100 were part of ABCville, but the census counted all land between The River and The Road, regardless of what parcel it was in, the census-bureau definition would be the one to use, with a notation that it was current as of the last time the census bureau used it. You could also write an article about ABCville but call it "ABCville, a community outside Metropolis, parts of which are part of the CDP by the same name" then use the deed records to define the boundaries. davidwr/(talk)/(contribs)/(e-mail) 19:32, 2 June 2008 (UTC)
- Those boundaries were wherever the census said they were (and indeed, they provide maps), but the problem is not as to the existence of these boundaries, but as to their meaning. Take a look at the article on Columbia. By any standard, it is a notable place, census or no census, as it represents an important social experiment. The problem with identifying it with/as the CDP, though, is that the latter includes large areas which aren't part of the experiment! Or to be more accurate, places that aren't part of the experiment. Allview is "in" Columbia according to the post office and the census, but it isn't part of the planned community; it predates it by some six or seven years.
- To put it in other words: I think it is going to be difficult to find geographical authorities for these places that amount to more than lists of names and some sort of coordinates. The census provides more data, but it is not a geographical authority, and its geographic divisions are founded in statistical efficacy, not whether other people consider such-and-such to be a place. The situation for the post office is worse, since zip code maps are figments of back deduction from delivery routes. They both work OK in places where there is a lot of empty space between "places", but they don't work that well when their granularity starts to fall below normal senses of "place". Even semi-rural density produces problems. How big is Laurel, Maryland, anyway? Well, it is incorporated, at least; but the post office assigns it to four counties, and the census makes similar divisions. The problem is, of the four CDPs outside Laurel, only one of them is a real place (Maryland City). To the degree that North Laurel is a real place, it mostly doesn't have anything to do with the CDP (most of which is either Scaggsville or Hammond Village or any of a bunch of old developments); I've never heard anyone refer to "West Laurel" at all. And in two years the CDPs are going to move; I'm already seeing that the intermediate dataset leaves out a lot of places.
- Yes, one could research all this, which is rather the point. Bot-generated articles are by their nature unresearched, and therefore fairly cry out for thoughtful correction. Or thoughtless correction, for that matter. Mangoe (talk) 21:51, 2 June 2008 (UTC)
- Thanks for the thoughtful posts. Another problem with (forex) the RAMBOT-generated articles on CDP's is, they're hard to correct, as some editors are convinced the CDPs are sacrosanct. See, for example Talk:Village of Oak Creek, Arizona#Proper name? to see how tedious such a correction (by a knowledgeable local resident, me) can be. Oy. Pete Tillman (talk) 22:23, 2 June 2008 (UTC)
- It's not a correction, though; it's merely one POV.--Prosfilaes (talk) 22:32, 2 June 2008 (UTC)
- Sorry, but the census doesn't agree. Please read these:
- "Census Designated Place (CDP) Program for the 2010 Census--Proposed Criteria". United States Census Bureau.
- "Participant Statistical Areas Program (PSAP): Criteria for the 2010 Census and Beyond". United States Census Bureau.
- They intend to correct their geography to match the places the locals recognize, and they're taking off all the size standards. It is possible that each of the developments I've mentioned will be listed separately, as by my reading of the proposal, each of them is a distinct, countable, named neighborhood. Interestingly, they take one of the cases I've mentioned as a problem example. Mangoe (talk) 11:07, 3 June 2008 (UTC)
- Sorry, but the census doesn't agree. Please read these:
Diversity of data types
My reservation, and it is a small one, is that I'd still like to see more data types integrated than just population and co-ordinates. Depending on region and country, there are potentially many data sets that could be integrated (and referenced) to produce a valuable resource. Economic data, political, religious, linguistic, geographical, for a start, plus there may be local websites that have pages for many towns in a region. This could make the difference between an article being a copy-paste from an existing gazetteer and a new and useful synthesis. Pseudomonas(talk) 17:06, 2 June 2008 (UTC)
- Absolutely agree - the more the better. That's why I need editors familiar with the topics to be able to provide the sources, or to find them, or to know where they are. Only when we've exhausted everything would we fuse the data together for checking. Fritzpoll (talk) 17:07, 2 June 2008 (UTC)
- In that case I thoroughly look forward to seeing the results; this could be a great opportunity for integrating data that would otherwise be near impossible to get in one place. Pseudomonas(talk) 17:40, 2 June 2008 (UTC)
Proper sourcing, no spam
The end result should be articles with proper sourcing and no spam. Every claim in the article should exist at a reliable published source whose page is indicated in the References (or Source) section; and no link should be in the article that is not warrented by the information at that link. WAS 4.250 (talk) 17:28, 2 June 2008 (UTC)
Also, are you going to clean up the example articles that you have already created that violated this "Proper sourcing, no spam" standard? Or perhaps someone already did? Could links to those articles be placed here for accountability and verification? WAS 4.250 (talk) 17:28, 2 June 2008 (UTC)
- The first job of the new WikiProject will be to clean up the existing articles, adjust the source to point at a more precise location, and remove the Encarta links that I know you feel strongly about. I am personally a little tied up with this page and the others being spawned, but I will see if Blofeld et al. can fix these now. Alternatively, you can remove the Encarta links yourself! But seriously, this will be done in due course. Under the new proposal, there would be more than one source, so sourcing would be even better. Best wishes Fritzpoll (talk) 11:46, 3 June 2008 (UTC)
Point 8: Clarification?
I'm not sure I understand point 8.
- When they are first determined, the relevant notability policy will state specifically the initial minimum standards for "inherent notability" of villages, including global standards and any national exceptions. The initial specifics may be more narrow (such as minimum three or four independent reliable sources, and minimum 50% of population of capital city as determined by a specified benchmark source); over time the minima can be ratcheted down to broaden them slowly until the community and WikiProjects indicate when to stop. The bot's new articles will always observe the current notability standard strictly.
You're saying we should have a base global standard for "inherent notablity" and then fine tune it to national cases. I don't think anyone would argue against this, but what that base global standard should be is the point of contention. Your suggestions for initial criteria seem problematic. For instance, there is no city in Hungary or Mexico with anywhere near 50% of population of the capital. Maybe I've misunderstood this point entirely.
More importantly, if it will always be broadened to include more locations, why bother creating a global standard to begin with? TheMightyQuill (talk) 18:56, 2 June 2008 (UTC)
- I've already started what I hope will be a discussion on "inherent notability" above. It would be better if conversation on the subject were centralized there. Ryan Reich (talk) 17:45, 2 June 2008 (UTC)
Auto-draft, human-patrol, auto-move
Some Wikiprojects may want you to do a run with the output in a separate namespace, maybe wikiprojectname/FritzpollBotresults/articlename, patrol the articles, then robo-move articles successfully patrolled. As part of the patrol process, articles can rejected, moved to a human-assistance-required namespace, or marked "ready to move." Please consider giving this capability to the bot. davidwr/(talk)/(contribs)/(e-mail) 17:19, 2 June 2008 (UTC)
- As I understand it, that's pretty much what's going to happen. The data are going to be placed in a WikiProject's subpage, and human editors are going to check over each article before it gets created in mainspace.--Aervanath lives in the Orphanage 06:55, 3 June 2008 (UTC)
Why not a new Wikimedia sister project, a la Wikispecies?
If the bulk of these articles is simply going to consist of interpolated database info, then it would seem to me that creating a new whole wiki specially designed for this purpose would be more ideal. Locations with Misplaced Pages articles could be interwiki-linked, and perhaps a bot could create soft interwiki redirects to all locations which exist in that sister project but do not currently have articles here. Nothing would stop human editors with an interest from creating articles here, and this would alleviate any concerns of having tens of thousands, if not millions, of new, unwatched, and likely unmaintained articles here. Girolamo Savonarola (talk) 17:37, 2 June 2008 (UTC)
- Wikimedia projects are not intended to be content forks are are supposed to present a different type of information (i.e. Misplaced Pages includes encyclopedia articles, Wiktionary includes dictionary entries...). Wikispecies was not set up to include encyclopedia articles or anything approaching Misplaced Pages-like content. For what it's worth, Wikispecies was set up with significant community opposition (hence the charter to limit it to taxonomy) and the project has failed. Who would it value to create similar projects with overlapping purpose but separate user-bases, policy and governance? Who would it value to remove so much content from Misplaced Pages when Misplaced Pages is not limited in size and does not face the same problems as paper encyclopedias? The benefit of Misplaced Pages containing such a broad array of information is that this content is subject to the same policies (it is consistent), there is no requirement to browse out of the site to get to articles about particular things and articles are exposed to a much larger user.base (that is, articles are seen by more people and problems are more likely to be ironed-out). Content forks create redundancy, create inconsistency and divide the user-base so that articles have more limited exposure. --Oldak Quill 19:50, 2 June 2008 (UTC)
- You miss my point - I am suggesting that what will currently be the only content of these bot-articles (ie - statistics) would perhaps be better suited as the main corpus of an atlas-like sister project. Any human editors that would like to initiate Misplaced Pages articles on any of these places will still be able to, as they always have been. Girolamo Savonarola (talk) 20:26, 2 June 2008 (UTC)
- It depends how statistics are presented. If statistics are presented as prose and infoboxes in an encyclopedia article (like many geography and science articles), this content should be in Misplaced Pages. This bot intends to start prose articles, not just tables of statistics. --Oldak Quill 20:38, 2 June 2008 (UTC)
- If all they are is statistics, then you are just putting lipstick on a pig. Misplaced Pages is not a collection of statistics, and if you can't find real prose, not just window dressing, to put in these articles, then this content should not be in Misplaced Pages. Note that although this policy article does advocate using infoboxes, it specifically does so as readability devices only. The main point is that statistics must be presented in context, which these stubs will not do. Ryan Reich (talk) 21:06, 2 June 2008 (UTC)
- (xpost) Prose is not our sole standard for an article, nor should it be. My point is that while these all may be (and ideally should be) encyclopedia articles, the lack of information besides statistics indicates that there may be a more useful, informative, and accessible way to present the data, prose or not. Anyone wishing to actively create articles on these places, where the article would consist of other real-world data from reliable secondary sources, is of course welcome to do so. But creating bot-written encyclopedia articles of prose statistics for the sake of creating them is not a goal worth pursuing independently, in my opinion, no. If there is a concern about preserving the information on-wiki, then surely this information could all be just as easily rendered into tables within list articles concerning the local regions, and with the much greater benefit of having regional comparisons quickly at hand.
- In short, if we want to create a geographical register, why not create a geographical register rather than trying to force-feed one into Misplaced Pages? Girolamo Savonarola (talk) 21:12, 2 June 2008 (UTC)
- (edit conflict) "Lipstick on a pig" certainly describes the Rambot-generated prose text in articles like Storden, Minnesota, mentioned previously. I'd rather see the auto-generated prose kept to a bare minimum and just keep the numbers in the infobox. — Andrwsc (talk · contribs) 21:14, 2 June 2008 (UTC)
- It depends how statistics are presented. If statistics are presented as prose and infoboxes in an encyclopedia article (like many geography and science articles), this content should be in Misplaced Pages. This bot intends to start prose articles, not just tables of statistics. --Oldak Quill 20:38, 2 June 2008 (UTC)
- You miss my point - I am suggesting that what will currently be the only content of these bot-articles (ie - statistics) would perhaps be better suited as the main corpus of an atlas-like sister project. Any human editors that would like to initiate Misplaced Pages articles on any of these places will still be able to, as they always have been. Girolamo Savonarola (talk) 20:26, 2 June 2008 (UTC)
Interwikis - another asset, even for stubs
It is possible that at least some of these exist on other language wikis. With interwiki links, even a sub-stub could be very useful if it links to an article in another language with which a reader is somewhat familiar (or which could be machine translated). (A lot of locations in Africa have decent articles in French even where the articles here are atrocious, for instance.) It might be worth doing some sort of query of the wiki of the relevant language for all the titles of the articles to be created, so interwikis could be added automatically where appropriate. Mangostar (talk) 17:53, 2 June 2008 (UTC)
Llywrch's must-read, expert assessment of the issues
Llywrch has been deeply involved in this sort of thing improving our coverage of places in Ethiopia. Even handled at the Wikiproject level, there are many problems as well as payoffs (or as Lyrwich put it in his edit summary: "warning: there be monsters -- & also treasures -- here"). Please see his now-archived comments in response to the first proposal:
--A. B. 18:44, 2 June 2008 (UTC)
400,000 larger articles vs. 2 million stubs
We should consider who's going to monitor 2 million more articles for vandalism, errors, etc. Merging stub-type data on all these places into fewer, bigger articles might make for better ongoing monitoring and quality control. For instance, an article on Anywhere Township would have a section for Nowhereville and each of the 4 other little hamlets in the township. The 5 hamlets each have a redirect to the township article. --A. B. 19:15, 2 June 2008 (UTC)
- As a reader, I prefer navigating and using smaller-scope stub articles about particular things than big list articles (containing several stub articles). Articles per item facilitates easier and more dynamic browsing. Stub articles are useful to the reader and present a lot of information in a compact area. Traditional encyclopedia articles and dictionary entries are more similar to stubs (and similarly contain concise and compact information) than to 32kb articles. 32kb articles are preferable, but it is untrue to say that stubs are useless or difficult. Furthermore, I feel more ready to expand a stub article than an item in a list article. --Oldak Quill 19:56, 2 June 2008 (UTC)
- While this is also my biggest concern, it was claimed above that This proposal will probably drastically reduce the number of articles created <...>. It will be nowhere near the predictions of the first proposal. Colchicum (talk) 20:01, 2 June 2008 (UTC)
Ensuring quality articles before their creation
Perhaps each geographical WikiProject that wants to use this bot should have a data check and implementation plan for review by others before unleashing the Sorcerer's Apprentice-bot to create hundreds of articles. --A. B. 19:18, 2 June 2008 (UTC)
- This is exactly what I have proposed myself. This kind of quality assurance is necessary before undertaking such a large and complicated project. It is all the more necessary because Misplaced Pages policy mandates that articles make notability claims; for a bot to create multitudes of articles without notability claims would constitute a systematic violation of this policy. Ryan Reich (talk) 19:44, 2 June 2008 (UTC)
- Plenty of other categories, such as school-stubs and living-things-stubs, have only boilerplate references or no references at all. davidwr/(talk)/(contribs)/(e-mail) 20:02, 2 June 2008 (UTC)
- Two responses: 1. WP:OTHERSTUFF. 2. Maybe they should be deleted, depending on the notability claims they make. It's one thing to have unsourced claims; it's another to have no claims and no sources. There's nothing to say that a systematic violation of policy hasn't already happened elsewhere; I just don't want it to happen here. Ryan Reich (talk) 20:10, 2 June 2008 (UTC)
- Plenty of other categories, such as school-stubs and living-things-stubs, have only boilerplate references or no references at all. davidwr/(talk)/(contribs)/(e-mail) 20:02, 2 June 2008 (UTC)
Underlying Issues
I Oppose the idea, but for different reasons, which i think need to be separate from the main straw poll, because it is part of what i oppose. I think that Fritzpoll has done the communty a service by creating this bot, and bringing the proposal forward, not to mention modifying it dramatically, and taking a lot of flack for making the idea a possibility. So, Thank you, Fritzpoll. Because of the amazing and quick blow-up that happened yesterday, however, i have to oppose: It clearly shows that there are severe underlying issues which need to be resolved:
- Inherent Notability: Ryan Reich has made good points about this, and i agree that it is a dubious concept that cannot be brushed under the rug at this point.
- Scope: It seems that, as Girolamo Savonarola has suggested, the capacity of this bot is better suited to the creation of a new project, holding data about all these places, names, villages, hamlets, whatever.
- Behaviour: I was appalled by some of the comments made in the other discussion. As i pointed out there, i saw hectoring and badgering going on, as the majority of Opposers were argued against. I also saw Opposers talking about "horrible" imaginations, "inherently repulsive" ideas, and something being "wful, awful, AWFUL". Not the good, temperate language we ought to expect (and, to be truthful, usually find) in WP.
- Ownership: It seems to me that the reason for the behaviour was an attitude behind it, one of, "We own this project, and therefore...." Not from one side or the other only, i hasten to add. Now, i may only have made a thousand or so edits, but my understanding is that i am as valued an editor (though maybe not as valuable) as he who has made twenty or thirty thousand; let me tell you, that the idea that the fewer edits we lesser mortals have the less our contribution is valued or we are respected is a sure way to drive us away. And that is not within the goal of this project, is it?
I think that these issues are far more important to resolve now than the question of whether we let a bot create articles. Perhaps i could be accused of being overdramatic, so i won't say that the future of WP depends on it, but i will say that in any organisation the huge blow-ups are almost always not actually about the apparent trigger. FritzpollBot is our trigger only, not the most important thing we must focus on. Cheers, Lindsay 20:04, 2 June 2008 (UTC)
- Clearly, I agree, and thanks to Lindsay for saying what needed to be said. This is an opportunity to settle a long-standing policy debate over "inherent notability", hopefully in the direction that we do not relax our standards just to gain some more page titles. The question of FritzpollBot also needs to be settled, and the precedent set by the consensus (eventually) reached will certainly form an important part of this discussion, but the bot is not the biggest issue here. Ryan Reich (talk) 21:17, 2 June 2008 (UTC)
- What if the bot were created as a tool for regional WikiProjects to do with as they would, with no global concerted effort to actually run the bot worldwide? This seems to be in the revised plan already. The main wikiproject would live on to support the bot as a technical tool, not to decide what articles got created or when. davidwr/(talk)/(contribs)/(e-mail) 22:43, 2 June 2008 (UTC)
- This is the direction I've also been going recently, but we still need to settle the issue of whether the raw stubs the bot creates are suitable articles. Otherwise, the geographic wikiprojects may well just do what was originally proposed, each on its own. Ryan Reich (talk) 23:23, 2 June 2008 (UTC)
- In regards to creating a WikiProject to deal with this upcoming flood, Misplaced Pages:WikiProject_Cities exists, and would likely be a good centralized location for discussion when the bot begins to run. --NickPenguin(contribs) 00:30, 3 June 2008 (UTC)
Is systemic bias bad?
A good deal of this discussion is centred on systemic (or systematic) bias. In general there is an underlying assumption that systemic bias is a BAD THING. The argument is phrased along the lines of "why should a one-horse town in the United States be notable and an equivalent town in Africa not?" It seems at times as though Misplaced Pages:No systemic bias has become the sixth pillar of Misplaced Pages. I believe there is also a (mostly) unconscious but still significant association of systemic bias with racial or ethnic prejudice. But the purpose of an encyclopaedia is to inform, not to document, and in order for an article to inform it must first be read. Now, consider two villages: village X in a remote part of England and village Y in a remote part of Nepal.
- The probability of students in X being asked to write about their home town is close to 1.
- The probability of students in X being asked to write about Y is close to 0.
Of course, the converse is also true. I dearly hope that one day all citizens of Y will have access to broadband internet and that Nepalese Misplaced Pages will grow to the size of English Wiki, but the usefulness of an article on X in Nepalese Wiki will still be close to 0. This is the nature of systemic bias and it is why systemic bias is a GOOD THING.
A paper encyclopaedia is biased towards what the editors want us to know. Misplaced Pages is biased towards what we ourselves want to know. Correction of that bias by creating articles about what we don't want to know serves no educational purpose while creating the potential for harm (accidental or deliberate misinformation etc.). Nobody on English Wiki currently wants to know about village Y. Blofeld of SPECTRE, even if he wants to know about every village in the world, doesn't want to know about village Y because he is unaware of its existence. When the day comes that people do want to know about it, it will be time for a human to create an article. But for 90%+ of village Ys that day will never come.
The current proposals are in conformity with most of the above; nonetheless I feel that these points need to be made. Scolaire (talk) 21:00, 2 June 2008 (UTC)
- I hope you don't mind that I replaced all your "systematic"s with "systemic"s. Systematic bias is quite bad, amounting as it does to a conspiracy to skew the contents of the encyclopedia; systemic bias is "merely" an imbalance in the enyclopedia caused by the system itself. Ryan Reich (talk) 21:13, 2 June 2008 (UTC)
- I don't mind too much. My link was meant to be red, though—Misplaced Pages:No systemic bias is neither a policy nor a guideline. Scolaire (talk) 21:28, 2 June 2008 (UTC)
- Sorry, I missed the irony. Ryan Reich (talk) 22:14, 2 June 2008 (UTC)
- If an article does not exist, readers cannot access it and the usefulness of the encyclopedia on this topic is zero. If a stub exists on a very obscure topic (like a small Nepalese village) then even if only one reader accesses this article per year, the utility of the encyclopedia on this topic will be greater than zero. Consequently, having articles on obscure topics will increase the utility of Misplaced Pages, even if these articles are not frequently accessed and the resulting increase in usefulness is not very large. As the result is a net positive, the small magnitude of the positive effect is not an argument against this course of action. Tim Vickers (talk) 21:38, 2 June 2008 (UTC)
- And if fewer than one reader reads (as opposed to accesses) the article? And on what basis can you assume that a majority or even a substantial minority of bot-created articles will be read? Scolaire (talk) 21:46, 2 June 2008 (UTC)
- Even if only 1/10 of the articles are read, this still only changes the magnitude of the positive effect. What your argument fails to do is show that there is a negative effect on our readers from the creation of these articles. As this makes Misplaced Pages more useful, rather than less useful, then we should do it. Tim Vickers (talk) 21:58, 2 June 2008 (UTC)
- "Misplaced Pages is biased towards what we ourselves want to know. Correction of that bias by creating articles about what we don't want to know serves no educational purpose while creating the potential for harm (accidental or deliberate misinformation etc.)." That's the negative effect. Scolaire (talk) 22:03, 2 June 2008 (UTC)
- So you are arguing that more people will be misinformed than informed by this project? That certainly seems a tenuous argument. Mangostar (talk) 22:16, 2 June 2008 (UTC)
- I would not by any means be the only person to argue that! But in fact my argument is what it says: systemic bias in many cases is natural and right, and anybody undertaking a project of this nature has to take cognizance of that fact. Scolaire (talk) 22:27, 2 June 2008 (UTC)
- So you are arguing that more people will be misinformed than informed by this project? That certainly seems a tenuous argument. Mangostar (talk) 22:16, 2 June 2008 (UTC)
- "Misplaced Pages is biased towards what we ourselves want to know. Correction of that bias by creating articles about what we don't want to know serves no educational purpose while creating the potential for harm (accidental or deliberate misinformation etc.)." That's the negative effect. Scolaire (talk) 22:03, 2 June 2008 (UTC)
- Someone mentioned above, there is nothing more infuriating than clicking a supposed blue link to find a bare skeleton of information. Creating articles because we think one person might read seems a real stretch. And I also don't get this accessibility to editing argument, we all as editors should probably recognise there is mere weeks between an editor who can (meaningfully) improve a stub, to someone who can create an article. As I said, this project is more appropriately changed to an MOS for article creation, or new article copyeditting. MickMacNee (talk) 22:01, 2 June 2008 (UTC)
- I disagree wholeheartedly. As someone who works on a lot of articles about the developing world (lately it's been Cambodia), I am most frustrated to see red links. Even a tiny skeleton of information can provide the basis for further research. For instance, if I am researching a small town of a certain name, I would want to know whether there is one town XYZ in Cambodia, or multiple towns XYZ (which this project could tell me), so that I don't go mangling several towns into one article. I might also want to know the geographic location of a town, which could help me match up different transliterations of one town's name, or distinguish between similarly named towns, as the case might be. This project would also allow me to do this. Even a directory of towns would help identify major population centers in a given area. For the article Ratanankiri, for instance, it took ages to verify which towns outside the provincial capital were big enough to deserve mention in the province article. This project would also assist readers and editors in this respect. Mangostar (talk) 22:13, 2 June 2008 (UTC)
- And as a sidenote, if you don't like being surprised by stubs, you can have their links be a different color by changing your preferences. Then you can just pretend they are redlinks, which is apparently better?... Mangostar (talk) 22:18, 2 June 2008 (UTC)
- The point being is you would be doing that in preparation to create an article with more than that basic information. Having a mass information bank hoping that others will come and edit is unnecessary. The same could be said for companies, people etc etc, there are databases of basic information all over about everything, wikipedia goes further than being a basic store of information. What is wrong with posting the resources and basic article template at a MOS page if the problem is not knowing where to find this information? MickMacNee (talk) 22:21, 2 June 2008 (UTC)
- Since this information will all be taken from a few central, comprehensive sources, you could just go there. Why should Misplaced Pages become what it is not just for convenience? Ryan Reich (talk) 22:20, 2 June 2008 (UTC)
- I might support the addition of other basic bot articles if I thought their contents were notable. (I believe most or all settlements are notable, so this is fine by me.) Similarly, because I believe these are all notable, it is not indiscriminate. And all of wikipedia comes from other resources and is organized for convenience--that's the beauty of it. Your description of "a few central, comprehensive sources" misrepresents the reality of getting data from govt sites for third world countries. To find census data for Cambodian villages took a good deal of browsing through google hits on various queries--and it is in 18 orphaned numbered PDFs that don't appear to be linked from any other places on the Cambodian statistics site. This would be too much effort to go through to create any individual article or handful of articles, but when it is undertaken all at once (converting the PDF to XLS, merging with coords, etc) it scales and is efficient enough to make sense as a project. It is silly to think that humans should do this on a case-by-case basis, rather than getting some super-duper spreadsheets together and running a bot. Mangostar (talk) 22:35, 2 June 2008 (UTC)
- I don't want to take too hard a line on this; the bot is a great idea for helping with the creation of articles. If the bot could do all the looking-up that you had to do, your life would be much easier. But I absolutely object to indiscriminately unleashing the bot to create articles containing just that information, because it is not notable in and of itself. Yes, every place has a strong likelihood of being notable in a thousand different ways, and certainly in one or two, but just like with every Misplaced Pages article, that notability must be established by recourse to a reliable, secondary source: not just a collection of raw facts, but analysis, opinion, or connections. Furthermore, if the bot could provide you with a list, there is no need to create the articles; just generate the list for yourself, or include it as a subpage in WikiProject whatever (in your case, Cambodia). My point is that even if the stub articles would have a potential for expansion, without a notability claim establishing that potential, they are no more than dictionary definitions. Authors need to take responsibility for their articles. Ryan Reich (talk) 23:30, 2 June 2008 (UTC)
- Given the common sense explanation of the origin of bias above, it might be better to organise a concerted translation project for all the place articles that the other language wikis may have that we don't. MickMacNee (talk) 22:01, 2 June 2008 (UTC)
- No, IMHO systemic bias is not bad at all, it is just a matter of supply and demand. But a stub, provided that it is watched and it is not orphaned, is useful regardless of that. Colchicum (talk) 22:26, 2 June 2008 (UTC)
For all those who are concerned about how these stubs will make Misplaced Pages look bad, I would argue to the contrary, and particularly because of the combating systemic bias effect. Systemic bias is bad in part because it makes people take Misplaced Pages less seriously. Many people believe Misplaced Pages is all about collecting geeky trivia that is of interest to American 20-something guys or is all about American and British shopping malls or elementary schools because, frankly, that is at least a good chunk of what goes on here. I'm not a big elitist and don't have a problem with that content existing, but think how people's perceptions of Misplaced Pages might change if it had a reputation for being a premier reference for all places, and the premier reference on places in the developing world instead. Mangostar (talk) 22:41, 2 June 2008 (UTC)
- How would it look if all those places were nearly empty stubs containing no distinguishing facts? We would be just as US-centric as we are now, and everyone would know that no thought or consideration was given to any of those articles. Writing these stubs is no more than an exercise is political correctness; to really counteract systemic bias, we would have to change the process that leads to it. That is the process which connects notable information with articles on Misplaced Pages, and according to my proposal, this bot can and should serve as a facilitator in establishing that connection, but can never make the connection by itself. Ryan Reich (talk) 23:35, 2 June 2008 (UTC)
- A point alluded to above is that who will protect all these 2 million stubs from vandalism? Systemic bias is fine as long as that bias is towards what is useful. Useful means useful to the audience of Misplaced Pages, and if that audience is 95% Western, then too bad, the way to fix that is to get more non-Western people on the Internet - something Misplaced Pages cannot solve - not create millions of unwatched stubs. - Merzbow (talk) 00:23, 3 June 2008 (UTC)
- This is a chicken and egg argument. Where is the incentive to use and edit WP if content you are interested in doesn't exist yet? Creating stubs to reduce systemic bias greatly lowers the barrier to entry for editing and using those articles. Suicup (talk) 00:34, 3 June 2008 (UTC)
- Agreed. Misplaced Pages could become the definitive resource for information on these sorts of places, attracting readers (and hence editors). To give an analogous example, I don't use Misplaced Pages for reference on fashion topics (another systemic bias issue, here because of the lack of female contributors), since the coverage is awful. (I'll contribute every now and again though.) If Misplaced Pages were a place to get information on fashion, maybe more fashion-minded people would come to read, and in the process improve the articles. Mangostar (talk) 01:34, 3 June 2008 (UTC)
- This is a chicken and egg argument. Where is the incentive to use and edit WP if content you are interested in doesn't exist yet? Creating stubs to reduce systemic bias greatly lowers the barrier to entry for editing and using those articles. Suicup (talk) 00:34, 3 June 2008 (UTC)
- A point alluded to above is that who will protect all these 2 million stubs from vandalism? Systemic bias is fine as long as that bias is towards what is useful. Useful means useful to the audience of Misplaced Pages, and if that audience is 95% Western, then too bad, the way to fix that is to get more non-Western people on the Internet - something Misplaced Pages cannot solve - not create millions of unwatched stubs. - Merzbow (talk) 00:23, 3 June 2008 (UTC)
- Systemic bias is bad: by failing to counter it, we tacitly accept that there is a geographical and cultural limit on the things we should write about on the 'pedia. In fact, Scolaire's example of the villages of England and Nepal is rather ironic, in that in itself it shows systemic bias. Yes, schoolchildren in England may not be asked to write about a village in Nepal. But right next door to Nepal is India, which has 90,000,000 English speakers, which is 50% more than the entire population of the United Kingdom. The percentage of those people who want to know about a village in Nepal is going to be far greater than those in England. Should we not cater to them as well? — Preceding unsigned comment added by Aervanath (talk • contribs) 08:45, 3 June 2008
- Actually, I would be quite happy to substitute India for England in that example. The point is the remoteness of the village, not its proximity to an English-speaking population. If "a far greater percentage" means 0.000001% rather than 0.00000001% then a bot-generated article/stub will still have a usefulness of virtually zero. The systemic bias that I am talking about arises from people's desire to learn, and creating articles not to be read is only a cosmetic excercise. Scolaire (talk) 09:01, 3 June 2008 (UTC)
- I guess we just disagree at a more basic level then. I think you underestimate people's basic curiosity, and I think that these articles WILL be read and improved upon as Misplaced Pages's user base continues to grow and expand. English has the most speakers (native or otherwise) out of any language, and it is the language of communication, business and diplomacy practically everywhere. As the Misplaced Pages that caters to these readers, world language right now, we have a responsibility to serve all of them equally, not just the native speakers.--Aervanath lives in the Orphanage 09:39, 3 June 2008 (UTC)
- Actually, I would be quite happy to substitute India for England in that example. The point is the remoteness of the village, not its proximity to an English-speaking population. If "a far greater percentage" means 0.000001% rather than 0.00000001% then a bot-generated article/stub will still have a usefulness of virtually zero. The systemic bias that I am talking about arises from people's desire to learn, and creating articles not to be read is only a cosmetic excercise. Scolaire (talk) 09:01, 3 June 2008 (UTC)
Do topics have a "right" to exist as articles?
I think that's what's at the heart of this discussion. I personally don't think that atomic-level topics need to exist on every little thing when the merging, redirection, or collation of such information creates a more useful and complete article. There is nothing inherently great about expanding our article count, but having more encyclopedic information - and more importantly having a higher average quality level - is going to have a large bearing on the public perceptions on and success of this encyclopedia. Exponentially expanding our stubs is not the way to achieve this, nor is it realistic to expect them to be quickly dealt with in any case, given the number of active editors currently on-hand wiki-wide, much less the small percentage of those who would be willing to edit these articles. Adding millions of new stubs into the system without any regard for how the information is organized and broken down into bite-sized articles merely dilutes our overall quality level, and thus creates a much larger system-wide negative factor for all editors, readers, and outside observers (to varying degrees) which cannot be equally mitigated by the small additional utility generated by several million substubs (of which less than 1% are likely to be improved beyond stub-level within any reasonable timeframe).
On the other hand, organizing this information more responsibly at a higher regional level, and in lists and tables, where the statistics can not only be laid bare, but actually used for easier and more useful local comparisons, will be able to maintain a higher standard of article quality, generate a far more manageable set of articles, and have a higher utility than a vast array of stand-alone stubs. Furthermore, settlements which can be expanded further and have a human editor interested in pursuing so, can always still split off into their own articles (while presumably leaving the tabled data intact on the regional page). Everyone benefits.
We don't get points for a raw article-count, so why is this being treated as such? Average article depth and quality is the metric we need to be keeping our eyes on. Girolamo Savonarola (talk) 01:01, 3 June 2008 (UTC)
- Lists are good summary tools, but content doesn't exactly blossom on a list, or at least not the same way it does in an article. If the subject is notable, it deserves an article, and then we allow time for the articles to grow. Besides, this proposal will only allow the articles to be created when the relevant WikiProjects have become involved, so it's not like these articles will just get abandoned. --NickPenguin(contribs) 01:15, 3 June 2008 (UTC)
- We merge things all the time - in this case, the articles are bot-operated prosified statistics. There's no good reason why they couldn't be rendered as tables within lists, and then when someone wants to expand them, they can be split off, with human input and further information. Therefore any topic which an editor is willing to actually substantially improve would become an article, while the vast number which many of us are worried will not undergo any significant edits, will not be left in abandoned substubs. Girolamo Savonarola (talk) 01:17, 3 June 2008 (UTC)
- Perhaps this is where we do not see eye to eye. You envision one or two editors contributing a substantial amount of high quality content, while I envision dozens of anonymous users contributing a handful of sentences and corrections. We all dream of Misplaced Pages being perfect tomorrow, but we forget that it wasn't built yesterday. --NickPenguin(contribs) 01:28, 3 June 2008 (UTC)
- I am intrigued by the proposal and am less than vehement in my opposition to it, but I do think it's not the best way to move forward. For the reasons Girolamo Savonarola laid out above, I would support it if it were a question of bot-generated tables. With such a scheme, many or most—perhaps eventually all—of the entries for which an article would be appropriate could get their own articles in due time. Full articles likewise could be based on stubs, but there is an advantage to that only if one believes that stubs have significantly more intrinsic value than items on a list, and I suspect that wouldn't be the case for most of the two million stubs that are proposed. The majority of them, I believe, would still be stubs a decade later.
- The primary disadvantage I foresee could well be a serious one, and that is vandalism. Currently, vandalism is controlled to a large extent because the editors who create articles or make the major additions which turn stubs into articles monitor them on their watchlists. Who will watch two million new stubs? Bots cannot do it alone, at least not effectively, and there aren't enough human editors to do it.
- I, for one, do not dream of Misplaced Pages being perfect tomorrow or ever, but I do sincerely hope it will continue to improve in the years and decades to come. Perhaps it wasn't built yesterday, but it was built in a remarkably short length of time, and that shows all too often. The way I see it, genuine improvement will not happen rapidly because of better tools, no matter how brilliant their design and clever their implementation; it will happen incrementally because of human editors treating it as a labor of love—taking meticulous care over details, emphasizing quality over quantity and accuracy over speed, and taking the time to get it right. Rivertorch (talk) 05:41, 3 June 2008 (UTC)
(<-)You're right, Rivertorch, better tools will not automatically cause genuine improvement. But they do make the task of genuinely improving the 'pedia far, far easier. The discovery of fire didn't automatically make our ancestors' lives as good as we have it today, but it created the potential where none was there before.--Aervanath lives in the Orphanage 07:22, 3 June 2008 (UTC)
- Also, as for the "right" of topics to exist, see the discussion RyanReich started, above at #Inherent notability.--Aervanath lives in the Orphanage 07:22, 3 June 2008 (UTC)