Revision as of 14:26, 10 December 2024 editCX Zoom (talk | contribs)Edit filter helpers, Extended confirmed users, Page movers, Pending changes reviewers19,421 editsm →List of Revision IDs by edit summaries: add to request← Previous edit | Latest revision as of 03:25, 8 January 2025 edit undoWhatamIdoing (talk | contribs)Autopatrolled, Extended confirmed users, Pending changes reviewers122,231 edits →Number of editors per year: ReplyTag: Reply | ||
(68 intermediate revisions by 12 users not shown) | |||
Line 25: | Line 25: | ||
] | ] | ||
== Orphaned editnotices == | |||
== Syntax error due to using a reserved word as a table or column name in MySQL == | |||
When a page is moved, its ] is not moved with it. There is a post-move warning for it, but users would need to move it separately. That too can only be done by template editors, page movers and admins. I believe that there are plenty of editnotices that have become orphaned from their target page. I need a query to list such pages. If there is already a regularly updated database, that will work too. Thanks! <span class="nowrap">—''']'''</span> <sup class="nowrap">(] • {]•]})</sup> 07:53, 25 December 2024 (UTC) | |||
https://quarry.wmcloud.org/query/87911 | |||
:Here's mainspace only to get you started: ]. You or someone else can fork and improve this if you need additional namespaces. Making this a database report somewhere using {{t|Database report}} might be a good idea. Hope this helps. –] <small>(])</small> 08:42, 25 December 2024 (UTC) | |||
https://stackoverflow.com/questions/23446377/syntax-error-due-to-using-a-reserved-word-as-a-table-or-column-name-in-mysql | |||
::I suspect it's much worse than that. It's certainly more complex.{{pb}}There's plenty of mainspace titles with colons in them, and it's conceivable that some of those have orphaned editnotices; there's really no way around parsing for the namespace name, and that's going to be ugly and complex, and I haven't tried it yet. (It being Christmas morning and all. Maybe tomorrow.) But I wouldn't estimate that to result in more than a handful of other hits.{{pb}}Much more likely is the case that CX Zoom mentions directly: a page is moved but the editnotice isn't, leaving it attached to the remnant redirect. There's going to be false positives looking for those whether we do it the "correct" way and look in the move log (since there might be an editnotice intentionally attached to a page that had another page moved from it in the past), or whether we include editnotices attached to pages that are currently redirects. The latter's easier, and especially easier to combine with the query looking for pages that don't exist; I've done it at ]. That'll also miss editnotices that unintentionally weren't moved with their page, where the resulting redirect was turned into a ''different'' page, though. —] 15:30, 25 December 2024 (UTC) | |||
:::Thank you very much, both of you... <span class="nowrap">—''']'''</span> <sup class="nowrap">(] • {]•]})</sup> 16:45, 25 December 2024 (UTC) | |||
::I've updated ] in-place with a version that catches mainspace pages with colons. ] is the only new hit. —] 17:00, 27 December 2024 (UTC) | |||
:::Thanks! <span class="nowrap">—''']'''</span> <sup class="nowrap">(] • {]•]})</sup> 17:39, 27 December 2024 (UTC) | |||
::] has a version for all namespaces. And oh wow is it ever fugly. —] 17:44, 27 December 2024 (UTC) | |||
== Longest ref names == | |||
It isn't handling the `user` table right as "user" is an SQL reserved word, I think. | |||
In my travels I have come across some very long ref names in ref tags, sometimes automatically generated by incorporating the title of the work that is being referenced. Occasionally I will shorten excessively long ref names just to improve readability of the wikitext in that section. This has me curious as to whether it is possible to generate a list of the longest ref names being used in articles, as there are probably some likely targets for this kind of cleanup. Is it possible to either generate a list of the longest ref names in order of length, or barring that, a list of ref names that are more than, say, 50, or perhaps 75, characters? ] ] 02:49, 29 December 2024 (UTC) | |||
The syntax highlighter was showing "user" in red, so I surrounded it with backticks `user`, then it was showing in light blue. | |||
:References aren't in the database except in page source, which isn't available in the replicas. You can find some with search, like {{search link|1=insource:ref insource:/\< *ref *name *= *{76}/}}. That's going to miss a bunch of them, most obviously any refs crazy enough to include angle brackets in their names ({{search link|1=insource:ref insource:/\< *ref *name *= *\"*/|2=though those mostly seem to be errors anyway}}) or ref syntax using non-standard spaces, but ElasticSearch's gratuitously awful regex engine can't do a whole lot better. Also won't find ref names populated through templates - I understand some infoboxes do this. —] 05:26, 29 December 2024 (UTC) | |||
:: This is definitely plenty to start with. It is crazy that there are so many lengthy ref tags. The Misplaced Pages article was the most references has under 1000 of them, and if every ref name was made of an arbitrary combination of letters and numbers, they could all be handled with two character ref names. ] ] 18:44, 29 December 2024 (UTC) | |||
== Australia Project == | |||
I think it needs to be highlighted in white to work correctly. But how? ] (]) 18:47, 14 November 2024 (UTC) | |||
: Unrelated to the reserved word. `WHERE IS NULL(u.user_name)` should be `WHERE u.user_name IS NULL`. But see prior noise at ] if you want to continue this. ] ] 20:12, 14 November 2024 (UTC) | |||
::https://www.w3schools.com/sql/sql_isnull.asp indicates that my syntax should be valid. Two alternative ways to do the same thing? Regarding the "prior noise", I'm a more competent administrator who's checking page histories, and leaving redirects within user space alone. My current focus is on cross-namespace redirects from user pages of nonexistent users to outside of userspace. My recent deletion log will give you an idea; I'm trying to make a more specific query to reduce the noise level in the query results I've been working from. – ] (]) 20:53, 14 November 2024 (UTC) | |||
::: Wikimedia uses MySQL (actually ] which uses MYSQL-ish syntax), not SQL server where your link says <code>ISNULL</code> (not <code>IS NULL</code> which the query uses) is valid. ] ] 21:06, 14 November 2024 (UTC) | |||
::::MariaDB , and it works the way Wbm1058 was trying to use it (modulo the misplaced space). SQL Server's ISNULL() is a synonym of COALESCE() instead. x IS NULL is generally safer precisely because of that incompatibility. —] 21:29, 14 November 2024 (UTC) | |||
:::::I tried just changing the syntax of the "IS NULL" statement as suggested. It was cooking on that for a while, and then: | |||
:::::: "'''Error''' | |||
::::::This web service cannot be reached. Please contact a maintainer of this project. | |||
:::::: | |||
::::::Maintainers can find troubleshooting instructions from our documentation on Wikitech." | |||
::::: Hopefully my query didn't just crash the server. – ] (]) 21:55, 14 November 2024 (UTC) | |||
::::::It just ran to completion, so simply changing the "IS NULL" statement fixed the syntax error. Now on to figure out the results, and tweak the query to do what I really want it to do. Thanks for your help. ] (]) 22:09, 14 November 2024 (UTC) | |||
FYI, I'm now feeling the joy. ] is my report of 400 pages which I think may all be safely speedy-deleted under ''U2: Userpage or subpage of a nonexistent user''. This report was culled from a ], by INTERSECT with the user table SELECT. This is indicative of the poor page-move interface design, which leads editors who think they're publishing user drafts to keep pages in userspace when they really wanted to move to mainspace, because they neglected the namespace dropdown in the move-page user interface. – ] (]) 14:11, 15 November 2024 (UTC) | |||
I am interested to know how the Australian project is progressing. | |||
== Dusty articles within all subcategories of a parent category == | |||
Number of | |||
* articles created | |||
* articles deleted | |||
* edits by editors with/without Australia user boxes. | |||
] (]) 12:42, 2 January 2025 (UTC) | |||
Is this possible? I'd like to get a list like ] but for anything within any subcategory of ]. It would make quite a nice little To Do list for times I feel like doing some research and writing but don't have a particular bee in my bonnet that very minute. Thanks for any help! ] (]) 15:16, 17 November 2024 (UTC) | |||
:Which specific user boxes? What time frame? Articles those users have deleted, or articles those users created that anyone deleted? Counting edits by editors without specific user boxes is Right Out; it's going to be well over a billion, would take days to count if the query didn't die (it would), and would be useless for any purpose. —] 16:46, 2 January 2025 (UTC) | |||
:What is "dusty"? Neither ] nor ] say what it does. Is it a sort by timestamp of last edit?{{pb}}In direct subcategories only, include the handful of pages directly in the category, or the whole tree? If the last, to what depth? Examples: ]→]→]→] is depth 2, and ]→]→]→]→] is depth 3; neither the page itself nor the root category count. —] 16:51, 17 November 2024 (UTC) | |||
::@], would a list of the top editors of tagged articles be useful, and then you could compare the membership list by hand? See https://quarry.wmcloud.org/query/78918 for WPMED's list. ] (]) 05:53, 5 January 2025 (UTC) | |||
::Yes, it's a list of articles by date of most recent edit. | |||
::Hm, on the second question. Ideally I'd end up with is a list of, say, food items that hadn't been edited in ten years. Or chefs, or restaurants, or food animals or whatever. Maybe I need to choose a more specific subcategory? ] (]) 17:24, 17 November 2024 (UTC) | |||
:::Well, ok, | |||
:::*] is in ], but, strictly speaking, isn't in "any subcategory of ]". Should it be included in the list? | |||
:::*] is in ], which is a subcategory of ], so it should be. (It's also directly in ], but never mind that.) | |||
:::*] isn't in ] or any of its immediate subcategories, but it ''is'' in ], a subcategory of ]; so the article's in a sub-subcategory of ]. Include or not? | |||
:::*] isn't in Food or drink, its immediate subcategories, or (I think) any of ''their'' subcategories, but it's in ], a subcat of ], so it's at least in a sub-sub-subcategory of ]. Same question. | |||
:::The reason I need a maximum depth is because - like almost all reasonably broad categories - ] eventually includes a significant portion of ''all'' categories. Depth 10, for example, has 122639 different categories in the tree, out of 2.4 million total categories (including maintenance categories, category redirects, and so on), and you really quickly start getting unrelated pages like ] → ] → ] → ] → ] → ] → ] → ].{{pb}}Or, if you like, you can give me a list of categories to pull from. Even if it's a large list, or something like "Anything in any direct subcategory of ], ], ], ], ". —] 18:33, 17 November 2024 (UTC) | |||
::::Oh, and do you want non-mainspace pages in the list or not? What about redirects? —] 18:38, 17 November 2024 (UTC) | |||
:::::lol...clearly in over my head here. :D Thank you for your patience. | |||
:::::So, no to feed a cold, starve a fever. Yes to recipe, dulce de leche and ice milk. | |||
:::::I think maybe start with something that's likely to contain fewer extraneous things. ] in a way that will allow me to see, for instance, the articles that are in ] > ] > ] > ] that haven't been edited in the last ten years. ] (]) 18:50, 17 November 2024 (UTC) | |||
::::::Oh, no non-mainspace pages, no redirects. ] (]) 19:13, 17 November 2024 (UTC) | |||
:::::::None quite that old in either tree. ] for ] depth 3 (oldest is ], 2015-11-16 18:36:35 - see what I meant about unrelated pages?), ] for ] depth 4 (oldest is ], 2019-12-16T04:47:03). —] 19:19, 17 November 2024 (UTC) | |||
::::::::Well, thank you for your work, and sorry to waste your time! ] (]) 19:35, 17 November 2024 (UTC) | |||
:::::::::Not wasted at all. Not your fault the category system is terrible for datamining.{{pb}}There might be some value in finding the latest revision that wasn't marked minor, and maybe excluding ones made by bots too, but that's going to be harder and a lot slower. Would definitely need to cut the set of articles to look at to something on the order of a couple thousand before looking at the edits, rather than the tens of thousands in that first tree. —] 20:14, 17 November 2024 (UTC) | |||
::::::::::Thanks. And I've actually already found an article that needs attention from your 87976 query, so win! | |||
::::::::::The point for me here is looking for categories that have many articles that haven't been updated since before sourcing started modernizing. It's a bit tricky because the articles that were created first -- probably in any category -- are also likely the articles that get update often, have multiple watchers, etc. So it's possible there just aren't huge numbers of food articles that need this kind of attention. ] (]) 21:18, 17 November 2024 (UTC) | |||
== Number of |
== Number of editors per year == | ||
I bring https://quarry.wmcloud.org/query/89411 back to haunt you. I'm ] to assemble a table of registered editors per year. I have figured out how to modify the query to pick a different year. But what I want now is the number of registered editors in each year who made 10+ edits during that year (so, 10 edits in 2024 counts, but 5 edits in 2023 plus another 5 edits in 2024 does not), 100+ edits, and 1,000+ edits. | |||
There are {{NUMBEROFARTICLES}}, but AIUI that includes disambig pages, stand-alone lists, and outlines, and maybe even portals (i.e., all content namespaces, not just the mainspace) but excludes redirects. Is there a way to get a count of the number of just plain old ordinary articles, excluding the other types? (A percentage from a sample set is good enough; I'd like to be able to write a sentence like "Of the 6.9 million articles, 6.2 million are regular articles, 0.45 million are lists, and 0.2 million are disambig pages.") ] (]) 22:46, 17 November 2024 (UTC) | |||
: {{re|WhatamIdoing}} according to ], there are 362,957 of those. ] ] 23:29, 17 November 2024 (UTC) | |||
::] suggests that there are about a thousand of those, which will not have a material effect on the numbers. | |||
::] says they've tagged 131K pages. There are about 123,700 pages with "List of" or "Lists of" at the start of the title. ] (]) 00:37, 18 November 2024 (UTC) | |||
:There is no clear definition of what a "regular article" is. Also many pages are not correctly marked and categorized. Don't for ] which look like ordinary articles, or might be, depending. -- ]] 00:43, 18 November 2024 (UTC) | |||
:<nowiki>{{NUMBEROFARTICLES}}</nowiki> seems to be mainspace non-redirect pages. I'd thought it used other heuristics, too; I remember needing at least one link, and less confidently requiring a period? But plainly doesn't anymore; I'm getting 6912240 for ns0 !page_is_redirect on the replicas now.{{pb}}There's only 362201 non-redirects in ] and mainspace. Most of the difference are in other namespaces, probably legitimately, though I'm surprised to see 208 in user:, 44 total in various talk namespaces, 9 mainspace redirects, and a single redirect in draftspace.{{pb}}114253 mainspace non-redirects in ], though 64 of those are in the disambig cat as well.{{pb}}Lists are less certain; there's no ]. I could try to count pages that are in any category starting with "Lists " or ending with " lists", but - not having done that before - don't have any idea how many it would miss, and how many it would miscount. Ditto with pages starting with "List of " or "Lists of " (which is easy - 120653, not counting any redirs or pages in the dabs or set index cats). —] 01:00, 18 November 2024 (UTC) | |||
::Oh, and 11193167 redirects (so 18105407 total mainspace pages), if you care. —] 01:03, 18 November 2024 (UTC) | |||
::So 6,912,230 non-redirect pages, of which 362,201 are dab pages and 120,653 are Lists (per title), and the rest (e.g., Outlines, Index pages) is immaterial. A good SIA looks like an article and an underdeveloped one looks like a dab page, which takes us back to GreenC's point about it not being a clear-cut question. | |||
::All of this suggests that if you count SIAs as 'articles', then there are 6.429 million articles (93%) and if you count them as lists/dabs, then there are 6.315 million articles (91%). | |||
::Thanks, all. ] (]) 01:15, 18 November 2024 (UTC) | |||
What do you think? ("The query will die" is hopefully not the answer.) ] (]) 05:52, 5 January 2025 (UTC) | |||
== Median account age for EXTCON == | |||
:I can't think of a way to do this that doesn't look at every edit in the time range. I think it's likely the query will die. But then, it managed to complete for one month (]), so maybe ] for all of 2024 will eventually too. No counts of currently-deleted edits this time. —] 12:04, 5 January 2025 (UTC) | |||
::] includes deleted edits. —] 23:38, 5 January 2025 (UTC) | |||
Hello again, generous satisfiers of curiosity: | |||
:::Maybe this is something that would have to be done by the WMF's Analytics team. I can get "one edit this year" from the original query that you wrote for me, and I'm currently slowly walking it back. It takes ~40 minutes to run, plus several hours for me to remember to check it. | |||
:::@] may be interested in knowing that the peak for number of registered editors who made 1+ edit appears to be 2014–2015, aka when the visual editor became available again. ] (]) 03:34, 6 January 2025 (UTC) | |||
Today's question is how old the typical currently active ] account is. The requirements are: | |||
::::The 2024 query has completed. The one including deleted edits took an hour to run. | |||
::::Is the right choice to fork it and run each year separately, or can it be expanded to do all/several years at once? ] (]) 04:57, 6 January 2025 (UTC) | |||
* is currently active (perhaps made at least one edit during the last 30 days? Any plausible definition of active works for me, so pick whatever is easiest and cheapest to run) | |||
:::::Better to fork it. The time for deleted edits isn't going to change much - it has to do a full table scan, since there isn't an appropriate index - but it's still the live edits that take the bulk of the time, and that ''is'' improved by narrowing the timespan looked at. —] 05:13, 6 January 2025 (UTC) | |||
* meets EXTCON (all of which will have the EXTCON permission) | |||
::::::Okay. I forked it to ], changed the years from <code>2024</code> to <code>2023</code>, and set it to run again. If this works, then I can repeat that step a dozen times. | |||
::::::BTW, the earlier query got a slightly smaller total number of editors for 2024. ] (]) 05:18, 6 January 2025 (UTC) | |||
I don't care whether it's date of first edit vs registration date. I also don't care whether it's all ~73K of them or if it's a sample of 5K–10K. I am looking for an answer to the nearest year ("Most active EXTCON editors started editing before 2014" or "If you see someone editing an article under EXTCON, they probably started editing more than 10 years ago"). | |||
:::::::That's expected, regardless of which earlier query you mean. Query 89557 will have fewer than 89569 because there's plenty of users who have deleted edits in 2024 but no currently-live ones. One based on 89411 will have very slightly fewer because the views of the revision and archive tables it's looking at are slightly more heavily redacted than the ones 89557/89569 use. —] 05:35, 6 January 2025 (UTC) | |||
::::::::] tells me there were 812,635 editors in 2023. | |||
Thank you, ] (]) 17:14, 19 November 2024 (UTC) | |||
::::::::] (forked from your new ]) tells me there were 11 fewer editors in 2023. | |||
::::::::But the same scripts for 2024 vary in the opposite direction. The second script finds 29 more editors in 2024. Mostly the second script seems to be finding a small handful more editors (2 to 50) in each year. ] (]) 02:46, 7 January 2025 (UTC) | |||
:Hmm. user_touched has been scrubbed because it is private data. So I guess LEFT JOIN recentchanges to see who is active? This should only get us users who have made an edit in the last 30 days. Then run MEDIAN() on it. Let's see if ] finishes. If the count is 72,000ish, then I also need to add a WHERE to filter out the editors who aren't in recentchanges. –] <small>(])</small> 18:33, 19 November 2024 (UTC) | |||
:::::::::I don't see a run in 89411's history for 2023. ] shows that figure, but it was run almost a year ago. I've just rerun it, and it's giving me 812621, which is both more consistent with the 812624 from the new query and offset in the right direction. Or less wrong direction, anyway.{{pb}}I'm reasonably confident that the reason we're getting fewer numbers from the same queries now compared to a year ago is because there's been more revdeletions and suppressions in the meantime; neither query can see such edits. It doesn't surprise me a bit to find out there's been revdeletions of 2023 edits made after early February 2024. I'll run a comparison of users that would be seen by the two queries so we can have a better idea why they show up in the second and not the first; it's going to take a while though. —] 04:25, 7 January 2025 (UTC) | |||
::That's going to get you not just every user with the user right - the whole point of a left join is that you get a result whether there's a row in the joined table or not - but a row in your resultset for ''every'' row they have in recentchanges. And you're leaving out admins, who have extended confirmed implicitly. Plus, even if it worked, it would be a dump of ~25k values.{{pb}}Mariadb has , but I can't get it to work on user_registration no matter how I try to preprocess it first - it gives me "Numeric datatype is required for percentile_cont function" when I call it directly on the column, which is reasonable, but always 100000000 if I cast it to any kind of numeric value, which isn't. (Anyone know what I'm doing wrong? ]. And I've never really grokked window funcs or how to get them to behave like normal, sane, grouped-by aggregate funcs anyway.) But doing it longhand works just fine. ]: 28 May 2013. —] 19:37, 19 November 2024 (UTC) | |||
::::::::::OK, the three users that show up in the second query for 2023 but not the earlier one are {{user5|Hhhj24}}, {{user5|IFAG dreifive}}, and {{user5|Christian Granbacher}}. The first two have one live edit each in 2023; the third has two deleted edits in 2023. All three have user_editcount = 0 despite those edits, as can be seen in their contributions links. The way the earlier query works is it first fetches all users with user_editcount at least 1, then checks each of those users to see whether they have any live or deleted edits in the requested timeframe. So it doesn't ever check for edits by those three users because of the bad data in user_editcount. —] 05:22, 7 January 2025 (UTC) | |||
:::user_registration is ASCII-encoded binary rather than a binary number which is why you're getting nonsense when casting it and trying to do operations on it. ] (]) 21:28, 19 November 2024 (UTC) | |||
:::::::::::Thank you. I'm going to file this under "deletions happen" and not worry about it. ] (]) 03:25, 8 January 2025 (UTC) | |||
::::Casting it seems to get me a numeric, and doing normal arithmetic on it (user_registration + 1, division, and so on) coerces it to a numeric; it doesn't get me nonsense until I try to pass it through MEDIAN(). And UNIX_TIMESTAMP() in particular is documented to return an unsigned int. <s>Current theory is that MEDIAN() can't deal with large numbers (]; dividing by numbers smaller than ten gets me 100 million again), which is boggling.</s> No, a cast or operation on the result of ''MEDIAN()'' is what fixes it. Still boggling. ]. Thanks for the prod. —] 21:55, 19 November 2024 (UTC) | |||
:::::Also, comparing the results reminded me that user_registration is NULL for some users who registered before mid-2005ish, which I hadn't corrected for. 2013-06-15 19:42:14, though I doubt the two weeks' inaccuracy is going to matter much to WAID. —] 22:30, 19 November 2024 (UTC) | |||
:::::TIL CAST seems to convert to the proper numeric representation if the binary string contains only numeric ASCII characters. Glad you were able to get it working though. ] (]) 22:34, 19 November 2024 (UTC) | |||
== List of Revision IDs by edit summaries == | |||
Can someone write a SQL query that isolates all edits made with the edit summary <code><nowiki>Disambiguating links to ] (intentional link to DAB) using ].</nowiki></code> where XYZ is any combination of letters, numbers or symbols. There is a bug in the script that causes edits with this summary to target to a wrong link, see ]. Thanks! <span class="nowrap">—''']'''</span> <sup class="nowrap">(] • {]•]})</sup> 14:24, 10 December 2024 (UTC) |
Latest revision as of 03:25, 8 January 2025
Page for requesting database queries
Archives | |||||
|
|||||
This page has archives. Sections older than 14 days may be automatically archived by Lowercase sigmabot III when more than 4 sections are present. |
This is a page for requesting one-off database queries for certain criteria. Users who are interested and able to perform SQL queries on the projects can provide results from the Quarry website.
You may also be interested in the following:
- If you are interested in writing SQL queries or helping out here, visit our tips page.
- If you need to obtain a list of article titles that meet certain criteria, consider using PetScan (user manual) or the default search. Petscan can generate list of articles in subcategories, articles which transclude some template, etc.
- If you need to make changes to a number of articles based on a particular query, you can post to the bot requests page, depending on how many changes are needed.
- For long-term review and checking, database reports are available.
Quarry does not have access to page content, so queries which require checking wikitext cannot be answered with Quarry. However, someone may be able to assist by using Quarry in another way (e.g. checking the table of category links rather than the "Category:" text) or suggest an alternative tool.
Orphaned editnotices
When a page is moved, its editnotice is not moved with it. There is a post-move warning for it, but users would need to move it separately. That too can only be done by template editors, page movers and admins. I believe that there are plenty of editnotices that have become orphaned from their target page. I need a query to list such pages. If there is already a regularly updated database, that will work too. Thanks! —CX Zoom 07:53, 25 December 2024 (UTC)
- Here's mainspace only to get you started: quarry:query/89138. You or someone else can fork and improve this if you need additional namespaces. Making this a database report somewhere using {{Database report}} might be a good idea. Hope this helps. –Novem Linguae (talk) 08:42, 25 December 2024 (UTC)
- I suspect it's much worse than that. It's certainly more complex.There's plenty of mainspace titles with colons in them, and it's conceivable that some of those have orphaned editnotices; there's really no way around parsing for the namespace name, and that's going to be ugly and complex, and I haven't tried it yet. (It being Christmas morning and all. Maybe tomorrow.) But I wouldn't estimate that to result in more than a handful of other hits.Much more likely is the case that CX Zoom mentions directly: a page is moved but the editnotice isn't, leaving it attached to the remnant redirect. There's going to be false positives looking for those whether we do it the "correct" way and look in the move log (since there might be an editnotice intentionally attached to a page that had another page moved from it in the past), or whether we include editnotices attached to pages that are currently redirects. The latter's easier, and especially easier to combine with the query looking for pages that don't exist; I've done it at quarry:query/89148. That'll also miss editnotices that unintentionally weren't moved with their page, where the resulting redirect was turned into a different page, though. —Cryptic 15:30, 25 December 2024 (UTC)
- Thank you very much, both of you... —CX Zoom 16:45, 25 December 2024 (UTC)
- I've updated quarry:query/89148 in-place with a version that catches mainspace pages with colons. Template:Editnotices/Page/Index of underwater diving: N–Z is the only new hit. —Cryptic 17:00, 27 December 2024 (UTC)
- Thanks! —CX Zoom 17:39, 27 December 2024 (UTC)
- quarry:query/89198 has a version for all namespaces. And oh wow is it ever fugly. —Cryptic 17:44, 27 December 2024 (UTC)
- I suspect it's much worse than that. It's certainly more complex.There's plenty of mainspace titles with colons in them, and it's conceivable that some of those have orphaned editnotices; there's really no way around parsing for the namespace name, and that's going to be ugly and complex, and I haven't tried it yet. (It being Christmas morning and all. Maybe tomorrow.) But I wouldn't estimate that to result in more than a handful of other hits.Much more likely is the case that CX Zoom mentions directly: a page is moved but the editnotice isn't, leaving it attached to the remnant redirect. There's going to be false positives looking for those whether we do it the "correct" way and look in the move log (since there might be an editnotice intentionally attached to a page that had another page moved from it in the past), or whether we include editnotices attached to pages that are currently redirects. The latter's easier, and especially easier to combine with the query looking for pages that don't exist; I've done it at quarry:query/89148. That'll also miss editnotices that unintentionally weren't moved with their page, where the resulting redirect was turned into a different page, though. —Cryptic 15:30, 25 December 2024 (UTC)
Longest ref names
In my travels I have come across some very long ref names in ref tags, sometimes automatically generated by incorporating the title of the work that is being referenced. Occasionally I will shorten excessively long ref names just to improve readability of the wikitext in that section. This has me curious as to whether it is possible to generate a list of the longest ref names being used in articles, as there are probably some likely targets for this kind of cleanup. Is it possible to either generate a list of the longest ref names in order of length, or barring that, a list of ref names that are more than, say, 50, or perhaps 75, characters? BD2412 T 02:49, 29 December 2024 (UTC)
- References aren't in the database except in page source, which isn't available in the replicas. You can find some with search, like insource:ref insource:/\< *ref *name *= *{76}/. That's going to miss a bunch of them, most obviously any refs crazy enough to include angle brackets in their names (though those mostly seem to be errors anyway) or ref syntax using non-standard spaces, but ElasticSearch's gratuitously awful regex engine can't do a whole lot better. Also won't find ref names populated through templates - I understand some infoboxes do this. —Cryptic 05:26, 29 December 2024 (UTC)
- This is definitely plenty to start with. It is crazy that there are so many lengthy ref tags. The Misplaced Pages article was the most references has under 1000 of them, and if every ref name was made of an arbitrary combination of letters and numbers, they could all be handled with two character ref names. BD2412 T 18:44, 29 December 2024 (UTC)
Australia Project
I am interested to know how the Australian project is progressing. Number of
- articles created
- articles deleted
- edits by editors with/without Australia user boxes.
Wakelamp db (talk) 12:42, 2 January 2025 (UTC)
- Which specific user boxes? What time frame? Articles those users have deleted, or articles those users created that anyone deleted? Counting edits by editors without specific user boxes is Right Out; it's going to be well over a billion, would take days to count if the query didn't die (it would), and would be useless for any purpose. —Cryptic 16:46, 2 January 2025 (UTC)
- @Wakelamp, would a list of the top editors of tagged articles be useful, and then you could compare the membership list by hand? See https://quarry.wmcloud.org/query/78918 for WPMED's list. WhatamIdoing (talk) 05:53, 5 January 2025 (UTC)
Number of editors per year
I bring https://quarry.wmcloud.org/query/89411 back to haunt you. I'm trying to assemble a table of registered editors per year. I have figured out how to modify the query to pick a different year. But what I want now is the number of registered editors in each year who made 10+ edits during that year (so, 10 edits in 2024 counts, but 5 edits in 2023 plus another 5 edits in 2024 does not), 100+ edits, and 1,000+ edits.
What do you think? ("The query will die" is hopefully not the answer.) WhatamIdoing (talk) 05:52, 5 January 2025 (UTC)
- I can't think of a way to do this that doesn't look at every edit in the time range. I think it's likely the query will die. But then, it managed to complete for one month (January 2024), so maybe quarry:query/89557 for all of 2024 will eventually too. No counts of currently-deleted edits this time. —Cryptic 12:04, 5 January 2025 (UTC)
- quarry:query/89569 includes deleted edits. —Cryptic 23:38, 5 January 2025 (UTC)
- Maybe this is something that would have to be done by the WMF's Analytics team. I can get "one edit this year" from the original query that you wrote for me, and I'm currently slowly walking it back. It takes ~40 minutes to run, plus several hours for me to remember to check it.
- @Jdforrester (WMF) may be interested in knowing that the peak for number of registered editors who made 1+ edit appears to be 2014–2015, aka when the visual editor became available again. WhatamIdoing (talk) 03:34, 6 January 2025 (UTC)
- The 2024 query has completed. The one including deleted edits took an hour to run.
- Is the right choice to fork it and run each year separately, or can it be expanded to do all/several years at once? WhatamIdoing (talk) 04:57, 6 January 2025 (UTC)
- Better to fork it. The time for deleted edits isn't going to change much - it has to do a full table scan, since there isn't an appropriate index - but it's still the live edits that take the bulk of the time, and that is improved by narrowing the timespan looked at. —Cryptic 05:13, 6 January 2025 (UTC)
- Okay. I forked it to quarry:query/89581, changed the years from
2024
to2023
, and set it to run again. If this works, then I can repeat that step a dozen times. - BTW, the earlier query got a slightly smaller total number of editors for 2024. WhatamIdoing (talk) 05:18, 6 January 2025 (UTC)
- That's expected, regardless of which earlier query you mean. Query 89557 will have fewer than 89569 because there's plenty of users who have deleted edits in 2024 but no currently-live ones. One based on 89411 will have very slightly fewer because the views of the revision and archive tables it's looking at are slightly more heavily redacted than the ones 89557/89569 use. —Cryptic 05:35, 6 January 2025 (UTC)
- 89411 tells me there were 812,635 editors in 2023.
- 89581 (forked from your new 89569) tells me there were 11 fewer editors in 2023.
- But the same scripts for 2024 vary in the opposite direction. The second script finds 29 more editors in 2024. Mostly the second script seems to be finding a small handful more editors (2 to 50) in each year. WhatamIdoing (talk) 02:46, 7 January 2025 (UTC)
- I don't see a run in 89411's history for 2023. quarry:query/80211 shows that figure, but it was run almost a year ago. I've just rerun it, and it's giving me 812621, which is both more consistent with the 812624 from the new query and offset in the right direction. Or less wrong direction, anyway.I'm reasonably confident that the reason we're getting fewer numbers from the same queries now compared to a year ago is because there's been more revdeletions and suppressions in the meantime; neither query can see such edits. It doesn't surprise me a bit to find out there's been revdeletions of 2023 edits made after early February 2024. I'll run a comparison of users that would be seen by the two queries so we can have a better idea why they show up in the second and not the first; it's going to take a while though. —Cryptic 04:25, 7 January 2025 (UTC)
- OK, the three users that show up in the second query for 2023 but not the earlier one are Hhhj24 (talk · contribs · deleted contribs · page moves · block user · block log), IFAG dreifive (talk · contribs · deleted contribs · page moves · block user · block log), and Christian Granbacher (talk · contribs · deleted contribs · page moves · block user · block log). The first two have one live edit each in 2023; the third has two deleted edits in 2023. All three have user_editcount = 0 despite those edits, as can be seen in their contributions links. The way the earlier query works is it first fetches all users with user_editcount at least 1, then checks each of those users to see whether they have any live or deleted edits in the requested timeframe. So it doesn't ever check for edits by those three users because of the bad data in user_editcount. —Cryptic 05:22, 7 January 2025 (UTC)
- Thank you. I'm going to file this under "deletions happen" and not worry about it. WhatamIdoing (talk) 03:25, 8 January 2025 (UTC)
- OK, the three users that show up in the second query for 2023 but not the earlier one are Hhhj24 (talk · contribs · deleted contribs · page moves · block user · block log), IFAG dreifive (talk · contribs · deleted contribs · page moves · block user · block log), and Christian Granbacher (talk · contribs · deleted contribs · page moves · block user · block log). The first two have one live edit each in 2023; the third has two deleted edits in 2023. All three have user_editcount = 0 despite those edits, as can be seen in their contributions links. The way the earlier query works is it first fetches all users with user_editcount at least 1, then checks each of those users to see whether they have any live or deleted edits in the requested timeframe. So it doesn't ever check for edits by those three users because of the bad data in user_editcount. —Cryptic 05:22, 7 January 2025 (UTC)
- I don't see a run in 89411's history for 2023. quarry:query/80211 shows that figure, but it was run almost a year ago. I've just rerun it, and it's giving me 812621, which is both more consistent with the 812624 from the new query and offset in the right direction. Or less wrong direction, anyway.I'm reasonably confident that the reason we're getting fewer numbers from the same queries now compared to a year ago is because there's been more revdeletions and suppressions in the meantime; neither query can see such edits. It doesn't surprise me a bit to find out there's been revdeletions of 2023 edits made after early February 2024. I'll run a comparison of users that would be seen by the two queries so we can have a better idea why they show up in the second and not the first; it's going to take a while though. —Cryptic 04:25, 7 January 2025 (UTC)
- That's expected, regardless of which earlier query you mean. Query 89557 will have fewer than 89569 because there's plenty of users who have deleted edits in 2024 but no currently-live ones. One based on 89411 will have very slightly fewer because the views of the revision and archive tables it's looking at are slightly more heavily redacted than the ones 89557/89569 use. —Cryptic 05:35, 6 January 2025 (UTC)
- Okay. I forked it to quarry:query/89581, changed the years from
- Better to fork it. The time for deleted edits isn't going to change much - it has to do a full table scan, since there isn't an appropriate index - but it's still the live edits that take the bulk of the time, and that is improved by narrowing the timespan looked at. —Cryptic 05:13, 6 January 2025 (UTC)
- quarry:query/89569 includes deleted edits. —Cryptic 23:38, 5 January 2025 (UTC)