Misplaced Pages

:Request a query: Difference between revisions - Misplaced Pages

Article snapshot taken from[REDACTED] with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Browse history interactively← Previous editContent deleted Content addedVisualWikitext
Revision as of 04:33, 29 December 2024 editBD2412 (talk | contribs)Autopatrolled, IP block exemptions, Administrators2,459,520 edits Longest ref names: or perhaps 75← Previous edit Latest revision as of 20:08, 20 January 2025 edit undoLowercase sigmabot III (talk | contribs)Bots, Template editors2,311,280 editsm Archiving 1 discussion(s) to Misplaced Pages:Request a query/Archive 5) (bot 
(41 intermediate revisions by 7 users not shown)
Line 25: Line 25:
] ]


== Australia Project ==
== List of all file redirects that are in use in mainspace ==


I am interested to know how the Australian project is progressing.
I wrote a query that lists all file redirects, at ]. Can this query be expanded to only list file redirects that are used in mainspace somewhere? –] <small>(])</small> 22:26, 19 December 2024 (UTC)
Number of
:]. —] 22:56, 19 December 2024 (UTC)
* articles created
* articles deleted
* edits by editors with/without Australia user boxes.


] (]) 12:42, 2 January 2025 (UTC)
== Update to NPP reports ==
:Which specific user boxes? What time frame? Articles those users have deleted, or articles those users created that anyone deleted? Counting edits by editors without specific user boxes is Right Out; it's going to be well over a billion, would take days to count if the query didn't die (it would), and would be useless for any purpose. —] 16:46, 2 January 2025 (UTC)
::@], would a list of the top editors of tagged articles be useful, and then you could compare the membership list by hand? See https://quarry.wmcloud.org/query/78918 for WPMED's list. ] (]) 05:53, 5 January 2025 (UTC)


== Number of editors per year ==
Is it possible to add a link to the {{tq|#}} column at ] with an xtools redirs created link. It can target ]


I bring https://quarry.wmcloud.org/query/89411 back to haunt you. I'm ] to assemble a table of registered editors per year. I have figured out how to modify the query to pick a different year. But what I want now is the number of registered editors in each year who made 10+ edits during that year (so, 10 edits in 2024 counts, but 5 edits in 2023 plus another 5 edits in 2024 does not), 100+ edits, and 1,000+ edits.
Similarly for ] targeting ] Thanks! <span style="font-family:monospace;font-weight:bold">]:&lt;]&gt;</span> 15:49, 20 December 2024 (UTC)
:] and ]. —] 18:56, 20 December 2024 (UTC)
::Thanks a lot <span style="font-family:monospace;font-weight:bold">]:&lt;]&gt;</span> 04:06, 21 December 2024 (UTC)


What do you think? ("The query will die" is hopefully not the answer.) ] (]) 05:52, 5 January 2025 (UTC)
:I can't think of a way to do this that doesn't look at every edit in the time range. I think it's likely the query will die. But then, it managed to complete for one month (]), so maybe ] for all of 2024 will eventually too. No counts of currently-deleted edits this time. —] 12:04, 5 January 2025 (UTC)
::] includes deleted edits. —] 23:38, 5 January 2025 (UTC)
:::Maybe this is something that would have to be done by the WMF's Analytics team. I can get "one edit this year" from the original query that you wrote for me, and I'm currently slowly walking it back. It takes ~40 minutes to run, plus several hours for me to remember to check it.
:::@] may be interested in knowing that the peak for number of registered editors who made 1+ edit appears to be 2014–2015, aka when the visual editor became available again. ] (]) 03:34, 6 January 2025 (UTC)
::::The 2024 query has completed. The one including deleted edits took an hour to run.
::::Is the right choice to fork it and run each year separately, or can it be expanded to do all/several years at once? ] (]) 04:57, 6 January 2025 (UTC)
:::::Better to fork it. The time for deleted edits isn't going to change much - it has to do a full table scan, since there isn't an appropriate index - but it's still the live edits that take the bulk of the time, and that ''is'' improved by narrowing the timespan looked at. —] 05:13, 6 January 2025 (UTC)
::::::Okay. I forked it to ], changed the years from <code>2024</code> to <code>2023</code>, and set it to run again. If this works, then I can repeat that step a dozen times.
::::::BTW, the earlier query got a slightly smaller total number of editors for 2024. ] (]) 05:18, 6 January 2025 (UTC)
:::::::That's expected, regardless of which earlier query you mean. Query 89557 will have fewer than 89569 because there's plenty of users who have deleted edits in 2024 but no currently-live ones. One based on 89411 will have very slightly fewer because the views of the revision and archive tables it's looking at are slightly more heavily redacted than the ones 89557/89569 use. —] 05:35, 6 January 2025 (UTC)
::::::::] tells me there were 812,635 editors in 2023.
::::::::] (forked from your new ]) tells me there were 11 fewer editors in 2023.
::::::::But the same scripts for 2024 vary in the opposite direction. The second script finds 29 more editors in 2024. Mostly the second script seems to be finding a small handful more editors (2 to 50) in each year. ] (]) 02:46, 7 January 2025 (UTC)
:::::::::I don't see a run in 89411's history for 2023. ] shows that figure, but it was run almost a year ago. I've just rerun it, and it's giving me 812621, which is both more consistent with the 812624 from the new query and offset in the right direction. Or less wrong direction, anyway.{{pb}}I'm reasonably confident that the reason we're getting fewer numbers from the same queries now compared to a year ago is because there's been more revdeletions and suppressions in the meantime; neither query can see such edits. It doesn't surprise me a bit to find out there's been revdeletions of 2023 edits made after early February 2024. I'll run a comparison of users that would be seen by the two queries so we can have a better idea why they show up in the second and not the first; it's going to take a while though. —] 04:25, 7 January 2025 (UTC)
::::::::::OK, the three users that show up in the second query for 2023 but not the earlier one are {{user5|Hhhj24}}, {{user5|IFAG dreifive}}, and {{user5|Christian Granbacher}}. The first two have one live edit each in 2023; the third has two deleted edits in 2023. All three have user_editcount = 0 despite those edits, as can be seen in their contributions links. The way the earlier query works is it first fetches all users with user_editcount at least 1, then checks each of those users to see whether they have any live or deleted edits in the requested timeframe. So it doesn't ever check for edits by those three users because of the bad data in user_editcount. —] 05:22, 7 January 2025 (UTC)
:::::::::::Thank you. I'm going to file this under "deletions happen" and not worry about it. ] (]) 03:25, 8 January 2025 (UTC)


== Talk pages ==
== Measuring the number of source links to each domain for a given article/set of articles ==
==== Command denied====


Hey, I'm looking for help with creating two SQL queries
I keep getting the error, "execute command denied to user 's52788'@'%' for routine 'enwiki_p.count'". I was using the page database, but even after I modified my query to only use the externallinks database (meaning I need to input a numerical page ID instead of using the title), I'm still getting the denial. What am I doing wrong here? Am I just not allowed to aggregate? Here's my query, simplified as much as possible and still not working:
# Find all talk pages (excluding archive talk pages) that do not use {{tl|WikiProject banner shell}}.
# Find all articles with no talk page (red link).
] (]) 13:11, 15 January 2025 (UTC)
:There are at least hundreds of thousands, likely millions, of each. ] and ] have the first 10k. —] 14:18, 15 January 2025 (UTC)
::Thanks! The first query is not producing correct results. The second result on the list is ] which does use the template. ] (]) 16:50, 15 January 2025 (UTC)
:::No, the second result is {{!r|Talk:!!Destroy-Oh-Boy!!}}. The page it redirects to transcludes the template. —] 17:40, 15 January 2025 (UTC)
::::(Which, of course, isn't that helpful an answer, so I've updated the query in-place to exclude talk pages that are redirects. —] 17:43, 15 January 2025 (UTC))


== Serial commas in page titles ==
SELECT count (el_to_domain_index)
FROM externallinks
WHERE el_from = 37198628
GROUP BY el_to_domain_index;


I posted the following request at ] and was advised to come here.
] (]) 23:14, 21 December 2024 (UTC)
:Remove the space between <code>count</code> and the open paren. —] 23:21, 21 December 2024 (UTC)
::(Or <code>set sql_mode = 'IGNORE_SPACE';</code> first. —] 23:24, 21 December 2024 (UTC))
::Wow. Thank you. ] (]) 23:29, 21 December 2024 (UTC)


{{collapse top}}
==== Lag, no results returned ====
Hello, I'm not sure that this request can be completed automatically; please accept my apology if it can't. I just want some lists, without edits to anything except the page where you put the lists, so it's not a CONTEXTBOT issue: just a "good use of time" issue. Could you compile some lists of pages in which serial commas are present or are omitted? I just discovered ] and created ] as a redirect to support serial commas. Ideally, whenever a page could have a serial comma in the title, we'd have a redirect for the form not used by the current title, but I assume this isn't always the case.


First off, I'd like a list of all mainspace pages (whether articles, lists, disambiguation pages, anything else except redirects) that use a serial comma. I think the criteria might be:
<s>Now I'm trying to get counts for all the external links from all the pages in a category. I want to do this for each separate page, and get lists of all the actual URLs, but y'know, baby steps. I used this query: https://quarry.wmcloud.org/query/89031
*
*comma
*
*comma
*
*
I'm unsure whether they're rigid enough, or whether they might return a lot of false positives.


Secondly, I'd like a list of all pages whose titles are identical to the first list, except lacking a serial comma. Redirects would be acceptable here, since if I'm creating serial-comma redirects, it helps to know if it already exists.
USE enwiki_p;
SELECT el_to_domain_index,
count(el_to_domain_index)
FROM externallinks
JOIN categorylinks ON cl_from = el_from
WHERE cl_to = 11696843
GROUP BY el_to_domain_index
ORDER BY count(el_to_domain_index) DESC;


Thirdly, I'd like a list of all mainspace pages (whether articles, lists, disambiguation pages, anything else except redirects) that could use a serial comma but don't. I think the criteria would be:
I'm not getting any results and it takes ages to not get them. What am I doing wrong now? Also, how do I include pages in any subcategories, or does this include them automatically? ] (]) 00:57, 22 December 2024 (UTC)
*
</s>
*
*comma
*one or more words]
*
*
Once the list is complete, the bot checks each page with the following process: "if I inserted a comma immediately before 'and' or 'or', would it appear on the first list?" If the answer is "no", the bot removes it from the list.


Fourthly, I'd like a list of all pages whose titles are identical to the third list, except they have a serial comma. Again, redirects are acceptable.
I figured out that I need to use page despite the slowness it'll cause, because cl_to uses a name instead of an ID. So here is my new query, now also running on simplewiki for easier testing. https://quarry.wmcloud.org/query/89032


USE simplewiki_p
SELECT page_title,
el_to_domain_index,
count(el_to_domain_index)
FROM externallinks
JOIN categorylinks ON cl_from = el_from
JOIN page on cl_from = page_id
WHERE cl_to = Canada
GROUP BY page_title, el_to_domain_index;


Is this a reasonable request? Please let me know if it's not, so I don't waste your time.
This query though has a syntax error on line 2.
{{collapse bottom}}


After my request, ] suggested that I come here and offered some comments: '' would work for the first request and would work for the second.<br/>The latter two lists are trickier.'' To this, I replied ''Is there a way to download a list of results from a particular search? As far as I know, the only way to get a list of results is to copy/paste the whole thing somewhere and delete everything that's not a page title. (With 11,544 results for the first search, this isn't something I want to do manually.) Also, the first search includes redirects, e.g. ] is result #1.''
I also think I might be in the wrong place to ask for step-by-step help like this. If there's a better place for me to go, I'd appreciate the direction. ] (]) 02:18, 22 December 2024 (UTC)
:You don't need the USE statement on Quarry since you have to select a database there separately (since most are on different servers now); but if you keep it, you need to terminate it with a semicolon.{{pb}}Next error you'd get is that you need to quote 'Canada'. At least that one has a useful error message ("Unknown column 'Canada' in 'where clause'").{{pb}}The reason your first query took forever is because <syntaxhighlight lang='sql' inline>SELECT * FROM categorylinks WHERE cl_to = 11696843;</syntaxhighlight> does a full table scan - it tries to coerce each row's cl_to (a string value) into a number, and then does a numeric comparison. There's no correct way to use the index on cl_to since many different strings compare equal to that number, in particular ones starting with whitespace. <syntaxhighlight lang='sql' inline>SELECT * FROM categorylinks WHERE cl_to = '11696843';</syntaxhighlight> on the other hand finishes instantly with no results (since ] has no members). Categories are only loosely tied to the page at their title anyway.{{pb}}You won't get members of subcategories like that - you have to go far out of your way to do so, similar to ]. You ''would'' get the direct subcategories like ] themselves, if any happened to have any external links. Distinguish them by selecting page_namespace too, if you're not already filtering by it. —] 02:56, 22 December 2024 (UTC)
::It sounds like I'm better off doing a multipart kludge- getting all the relevant page titles with Massviews or Petscan, running a single query to turn them into IDs, then using those IDs as el_froms so I only need the externallinks database. Thank you for your help! ] (]) 05:59, 22 December 2024 (UTC)


Please ping me if you reply; I've never heard of this page before, and I might not remember to come back if you don't ping me. Thanks for your help. ] (]) 03:22, 20 January 2025 (UTC)
== Orphaned editnotices ==
:{{ping|Nyttend}} Doing the whole thing would be very awkward; we don't have access to create temporary tables on the replicas, so it would have to all be in one query, and I haven't been able to cajole the query pessimizer into doing it efficiently. What I ''can'' do quite easily is give you lists of all mainspace titles that match each of those regexes, along with whether each is a redirect, and what the titles would be if the comma is added/removed (as appropriate). ] and ]. Those should be sufficient to construct your four lists, though perhaps not trivially. I can do that too if you're unable, though not tonight and maybe not tomorrow either; let me know. —] 04:32, 20 January 2025 (UTC)

::Could you give me two tables with four columns of data? First column is the link, second column is existing status (red, blue redirect, blue non-redirect), third column is alternate-comma variant (with-comma for the without-comma table; without-comma for the with-comma table), and fourth column is status of variant. If you do this in a simple table, I should be able to copy/paste it into Excel, sort so I can easily remove everything I don't want, and move the remaining content back to MediaWiki. ] (]) 04:44, 20 January 2025 (UTC)
When a page is moved, its ] is not moved with it. There is a post-move warning for it, but users would need to move it separately. That too can only be done by template editors, page movers and admins. I believe that there are plenty of editnotices that have become orphaned from their target page. I need a query to list such pages. If there is already a regularly updated database, that will work too. Thanks! <span class="nowrap">&#8212;''']'''</span> <sup class="nowrap">(] • {]•]})</sup> 07:53, 25 December 2024 (UTC)
::Ah, never mind, I've explored the links and just realised that there's a way to download a CSV with almost all this information. This should suffice. Thanks! ] (]) 04:47, 20 January 2025 (UTC)

:::{{ping|Nyttend}} I've updated both queries to include the status of the variant titles. —] 05:00, 20 January 2025 (UTC)
:Here's mainspace only to get you started: ]. You or someone else can fork and improve this if you need additional namespaces. Making this a database report somewhere using {{t|Database report}} might be a good idea. Hope this helps. –] <small>(])</small> 08:42, 25 December 2024 (UTC)
::I suspect it's much worse than that. It's certainly more complex.{{pb}}There's plenty of mainspace titles with colons in them, and it's conceivable that some of those have orphaned editnotices; there's really no way around parsing for the namespace name, and that's going to be ugly and complex, and I haven't tried it yet. (It being Christmas morning and all. Maybe tomorrow.) But I wouldn't estimate that to result in more than a handful of other hits.{{pb}}Much more likely is the case that CX Zoom mentions directly: a page is moved but the editnotice isn't, leaving it attached to the remnant redirect. There's going to be false positives looking for those whether we do it the "correct" way and look in the move log (since there might be an editnotice intentionally attached to a page that had another page moved from it in the past), or whether we include editnotices attached to pages that are currently redirects. The latter's easier, and especially easier to combine with the query looking for pages that don't exist; I've done it at ]. That'll also miss editnotices that unintentionally weren't moved with their page, where the resulting redirect was turned into a ''different'' page, though. —] 15:30, 25 December 2024 (UTC)
:::Thank you very much, both of you... <span class="nowrap">&#8212;''']'''</span> <sup class="nowrap">(] • {]•]})</sup> 16:45, 25 December 2024 (UTC)
::I've updated ] in-place with a version that catches mainspace pages with colons. ] is the only new hit. —] 17:00, 27 December 2024 (UTC)
:::Thanks! <span class="nowrap">&#8212;''']'''</span> <sup class="nowrap">(] • {]•]})</sup> 17:39, 27 December 2024 (UTC)
::] has a version for all namespaces. And oh wow is it ever fugly. —] 17:44, 27 December 2024 (UTC)

== Longest ref names ==

In my travels I have come across some very long ref names in ref tags, sometimes automatically generated by incorporating the title of the work that is being referenced. Occasionally I will shorten excessively long ref names just to improve readability of the wikitext in that section. This has me curious as to whether it is possible to generate a list of the longest ref names being used in articles, as there are probably some likely targets for this kind of cleanup. Is it possible to either generate a list of the longest ref names in order of length, or barring that, a list of ref names that are more than, say, 50, or perhaps 75, characters? ] ] 02:49, 29 December 2024 (UTC)

Latest revision as of 20:08, 20 January 2025

Page for requesting database queries

Archiving icon
Archives
Archive 1Archive 2Archive 3
Archive 4Archive 5


This page has archives. Sections older than 14 days may be automatically archived by Lowercase sigmabot III when more than 4 sections are present.
Shortcuts

This is a page for requesting one-off database queries for certain criteria. Users who are interested and able to perform SQL queries on the projects can provide results from the Quarry website.

You may also be interested in the following:

  • If you are interested in writing SQL queries or helping out here, visit our tips page.
  • If you need to obtain a list of article titles that meet certain criteria, consider using PetScan (user manual) or the default search. Petscan can generate list of articles in subcategories, articles which transclude some template, etc.
  • If you need to make changes to a number of articles based on a particular query, you can post to the bot requests page, depending on how many changes are needed.
  • For long-term review and checking, database reports are available.

Quarry does not have access to page content, so queries which require checking wikitext cannot be answered with Quarry. However, someone may be able to assist by using Quarry in another way (e.g. checking the table of category links rather than the "Category:" text) or suggest an alternative tool.

Australia Project

I am interested to know how the Australian project is progressing. Number of

  • articles created
  • articles deleted
  • edits by editors with/without Australia user boxes.

Wakelamp db (talk) 12:42, 2 January 2025 (UTC)

Which specific user boxes? What time frame? Articles those users have deleted, or articles those users created that anyone deleted? Counting edits by editors without specific user boxes is Right Out; it's going to be well over a billion, would take days to count if the query didn't die (it would), and would be useless for any purpose. —Cryptic 16:46, 2 January 2025 (UTC)
@Wakelamp, would a list of the top editors of tagged articles be useful, and then you could compare the membership list by hand? See https://quarry.wmcloud.org/query/78918 for WPMED's list. WhatamIdoing (talk) 05:53, 5 January 2025 (UTC)

Number of editors per year

I bring https://quarry.wmcloud.org/query/89411 back to haunt you. I'm trying to assemble a table of registered editors per year. I have figured out how to modify the query to pick a different year. But what I want now is the number of registered editors in each year who made 10+ edits during that year (so, 10 edits in 2024 counts, but 5 edits in 2023 plus another 5 edits in 2024 does not), 100+ edits, and 1,000+ edits.

What do you think? ("The query will die" is hopefully not the answer.) WhatamIdoing (talk) 05:52, 5 January 2025 (UTC)

I can't think of a way to do this that doesn't look at every edit in the time range. I think it's likely the query will die. But then, it managed to complete for one month (January 2024), so maybe quarry:query/89557 for all of 2024 will eventually too. No counts of currently-deleted edits this time. —Cryptic 12:04, 5 January 2025 (UTC)
quarry:query/89569 includes deleted edits. —Cryptic 23:38, 5 January 2025 (UTC)
Maybe this is something that would have to be done by the WMF's Analytics team. I can get "one edit this year" from the original query that you wrote for me, and I'm currently slowly walking it back. It takes ~40 minutes to run, plus several hours for me to remember to check it.
@Jdforrester (WMF) may be interested in knowing that the peak for number of registered editors who made 1+ edit appears to be 2014–2015, aka when the visual editor became available again. WhatamIdoing (talk) 03:34, 6 January 2025 (UTC)
The 2024 query has completed. The one including deleted edits took an hour to run.
Is the right choice to fork it and run each year separately, or can it be expanded to do all/several years at once? WhatamIdoing (talk) 04:57, 6 January 2025 (UTC)
Better to fork it. The time for deleted edits isn't going to change much - it has to do a full table scan, since there isn't an appropriate index - but it's still the live edits that take the bulk of the time, and that is improved by narrowing the timespan looked at. —Cryptic 05:13, 6 January 2025 (UTC)
Okay. I forked it to quarry:query/89581, changed the years from 2024 to 2023, and set it to run again. If this works, then I can repeat that step a dozen times.
BTW, the earlier query got a slightly smaller total number of editors for 2024. WhatamIdoing (talk) 05:18, 6 January 2025 (UTC)
That's expected, regardless of which earlier query you mean. Query 89557 will have fewer than 89569 because there's plenty of users who have deleted edits in 2024 but no currently-live ones. One based on 89411 will have very slightly fewer because the views of the revision and archive tables it's looking at are slightly more heavily redacted than the ones 89557/89569 use. —Cryptic 05:35, 6 January 2025 (UTC)
89411 tells me there were 812,635 editors in 2023.
89581 (forked from your new 89569) tells me there were 11 fewer editors in 2023.
But the same scripts for 2024 vary in the opposite direction. The second script finds 29 more editors in 2024. Mostly the second script seems to be finding a small handful more editors (2 to 50) in each year. WhatamIdoing (talk) 02:46, 7 January 2025 (UTC)
I don't see a run in 89411's history for 2023. quarry:query/80211 shows that figure, but it was run almost a year ago. I've just rerun it, and it's giving me 812621, which is both more consistent with the 812624 from the new query and offset in the right direction. Or less wrong direction, anyway.I'm reasonably confident that the reason we're getting fewer numbers from the same queries now compared to a year ago is because there's been more revdeletions and suppressions in the meantime; neither query can see such edits. It doesn't surprise me a bit to find out there's been revdeletions of 2023 edits made after early February 2024. I'll run a comparison of users that would be seen by the two queries so we can have a better idea why they show up in the second and not the first; it's going to take a while though. —Cryptic 04:25, 7 January 2025 (UTC)
OK, the three users that show up in the second query for 2023 but not the earlier one are Hhhj24 (talk · contribs · deleted contribs · page moves · block user · block log), IFAG dreifive (talk · contribs · deleted contribs · page moves · block user · block log), and Christian Granbacher (talk · contribs · deleted contribs · page moves · block user · block log). The first two have one live edit each in 2023; the third has two deleted edits in 2023. All three have user_editcount = 0 despite those edits, as can be seen in their contributions links. The way the earlier query works is it first fetches all users with user_editcount at least 1, then checks each of those users to see whether they have any live or deleted edits in the requested timeframe. So it doesn't ever check for edits by those three users because of the bad data in user_editcount. —Cryptic 05:22, 7 January 2025 (UTC)
Thank you. I'm going to file this under "deletions happen" and not worry about it. WhatamIdoing (talk) 03:25, 8 January 2025 (UTC)

Talk pages

Hey, I'm looking for help with creating two SQL queries

  1. Find all talk pages (excluding archive talk pages) that do not use {{WikiProject banner shell}}.
  2. Find all articles with no talk page (red link).

Gonnym (talk) 13:11, 15 January 2025 (UTC)

There are at least hundreds of thousands, likely millions, of each. quarry:query/89907 and 89908 have the first 10k. —Cryptic 14:18, 15 January 2025 (UTC)
Thanks! The first query is not producing correct results. The second result on the list is Talk:Destroy-Oh-Boy! which does use the template. Gonnym (talk) 16:50, 15 January 2025 (UTC)
No, the second result is Talk:!!Destroy-Oh-Boy!!. The page it redirects to transcludes the template. —Cryptic 17:40, 15 January 2025 (UTC)
(Which, of course, isn't that helpful an answer, so I've updated the query in-place to exclude talk pages that are redirects. —Cryptic 17:43, 15 January 2025 (UTC))

Serial commas in page titles

I posted the following request at WP:BOTR and was advised to come here.

Extended content

Hello, I'm not sure that this request can be completed automatically; please accept my apology if it can't. I just want some lists, without edits to anything except the page where you put the lists, so it's not a CONTEXTBOT issue: just a "good use of time" issue. Could you compile some lists of pages in which serial commas are present or are omitted? I just discovered List of cities, towns and villages in Cyprus and created List of cities, towns, and villages in Cyprus as a redirect to support serial commas. Ideally, whenever a page could have a serial comma in the title, we'd have a redirect for the form not used by the current title, but I assume this isn't always the case.

First off, I'd like a list of all mainspace pages (whether articles, lists, disambiguation pages, anything else except redirects) that use a serial comma. I think the criteria might be:

  • comma
  • comma

I'm unsure whether they're rigid enough, or whether they might return a lot of false positives.

Secondly, I'd like a list of all pages whose titles are identical to the first list, except lacking a serial comma. Redirects would be acceptable here, since if I'm creating serial-comma redirects, it helps to know if it already exists.

Thirdly, I'd like a list of all mainspace pages (whether articles, lists, disambiguation pages, anything else except redirects) that could use a serial comma but don't. I think the criteria would be:

  • comma
  • one or more words]

Once the list is complete, the bot checks each page with the following process: "if I inserted a comma immediately before 'and' or 'or', would it appear on the first list?" If the answer is "no", the bot removes it from the list.

Fourthly, I'd like a list of all pages whose titles are identical to the third list, except they have a serial comma. Again, redirects are acceptable.


Is this a reasonable request? Please let me know if it's not, so I don't waste your time.

After my request, User:Qwerfjkl suggested that I come here and offered some comments: intitle:/+, +, (and|or) +/ would work for the first request and intitle:/+, + (and|or) +/ would work for the second.
The latter two lists are trickier.
To this, I replied Is there a way to download a list of results from a particular search? As far as I know, the only way to get a list of results is to copy/paste the whole thing somewhere and delete everything that's not a page title. (With 11,544 results for the first search, this isn't something I want to do manually.) Also, the first search includes redirects, e.g. Orders, decorations, and medals of the United Nations is result #1.

Please ping me if you reply; I've never heard of this page before, and I might not remember to come back if you don't ping me. Thanks for your help. Nyttend (talk) 03:22, 20 January 2025 (UTC)

@Nyttend: Doing the whole thing would be very awkward; we don't have access to create temporary tables on the replicas, so it would have to all be in one query, and I haven't been able to cajole the query pessimizer into doing it efficiently. What I can do quite easily is give you lists of all mainspace titles that match each of those regexes, along with whether each is a redirect, and what the titles would be if the comma is added/removed (as appropriate). quarry:query/90019 and 90020. Those should be sufficient to construct your four lists, though perhaps not trivially. I can do that too if you're unable, though not tonight and maybe not tomorrow either; let me know. —Cryptic 04:32, 20 January 2025 (UTC)
Could you give me two tables with four columns of data? First column is the link, second column is existing status (red, blue redirect, blue non-redirect), third column is alternate-comma variant (with-comma for the without-comma table; without-comma for the with-comma table), and fourth column is status of variant. If you do this in a simple table, I should be able to copy/paste it into Excel, sort so I can easily remove everything I don't want, and move the remaining content back to MediaWiki. Nyttend (talk) 04:44, 20 January 2025 (UTC)
Ah, never mind, I've explored the links and just realised that there's a way to download a CSV with almost all this information. This should suffice. Thanks! Nyttend (talk) 04:47, 20 January 2025 (UTC)
@Nyttend: I've updated both queries to include the status of the variant titles. —Cryptic 05:00, 20 January 2025 (UTC)
Category:
Misplaced Pages:Request a query: Difference between revisions Add topic