This is an old revision of this page, as edited by Safrolic (talk | contribs) at 23:14, 21 December 2024 (→Trying to measure the number of source links to each domain for a given article: new section). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
Revision as of 23:14, 21 December 2024 by Safrolic (talk | contribs) (→Trying to measure the number of source links to each domain for a given article: new section)(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff) Page for requesting database queries
Archives | |||||
|
|||||
This page has archives. Sections older than 14 days may be automatically archived by Lowercase sigmabot III when more than 4 sections are present. |
This is a page for requesting one-off database queries for certain criteria. Users who are interested and able to perform SQL queries on the projects can provide results from the Quarry website.
You may also be interested in the following:
- If you are interested in writing SQL queries or helping out here, visit our tips page.
- If you need to obtain a list of article titles that meet certain criteria, consider using PetScan (user manual) or the default search. Petscan can generate list of articles in subcategories, articles which transclude some template, etc.
- If you need to make changes to a number of articles based on a particular query, you can post to the bot requests page, depending on how many changes are needed.
- For long-term review and checking, database reports are available.
Quarry does not have access to page content, so queries which require checking wikitext cannot be answered with Quarry. However, someone may be able to assist by using Quarry in another way (e.g. checking the table of category links rather than the "Category:" text) or suggest an alternative tool.
Number of articles that are actually articles
There are 6,940,131, but AIUI that includes disambig pages, stand-alone lists, and outlines, and maybe even portals (i.e., all content namespaces, not just the mainspace) but excludes redirects. Is there a way to get a count of the number of just plain old ordinary articles, excluding the other types? (A percentage from a sample set is good enough; I'd like to be able to write a sentence like "Of the 6.9 million articles, 6.2 million are regular articles, 0.45 million are lists, and 0.2 million are disambig pages.") WhatamIdoing (talk) 22:46, 17 November 2024 (UTC)
- @WhatamIdoing: according to Category:All disambiguation pages, there are 362,957 of those. BD2412 T 23:29, 17 November 2024 (UTC)
- Category:WikiProject Outlines articles suggests that there are about a thousand of those, which will not have a material effect on the numbers.
- Misplaced Pages:WikiProject Lists/Assessment#Statistics says they've tagged 131K pages. There are about 123,700 pages with "List of" or "Lists of" at the start of the title. WhatamIdoing (talk) 00:37, 18 November 2024 (UTC)
- There is no clear definition of what a "regular article" is. Also many pages are not correctly marked and categorized. Don't for WP:INDEXES which look like ordinary articles, or might be, depending. -- GreenC 00:43, 18 November 2024 (UTC)
- {{NUMBEROFARTICLES}} seems to be mainspace non-redirect pages. I'd thought it used other heuristics, too; I remember needing at least one link, and less confidently requiring a period? But plainly doesn't anymore; I'm getting 6912240 for ns0 !page_is_redirect on the replicas now.There's only 362201 non-redirects in Category:All disambiguation pages and mainspace. Most of the difference are in other namespaces, probably legitimately, though I'm surprised to see 208 in user:, 44 total in various talk namespaces, 9 mainspace redirects, and a single redirect in draftspace.114253 mainspace non-redirects in Category:All set index articles, though 64 of those are in the disambig cat as well.Lists are less certain; there's no Category:All list pages. I could try to count pages that are in any category starting with "Lists " or ending with " lists", but - not having done that before - don't have any idea how many it would miss, and how many it would miscount. Ditto with pages starting with "List of " or "Lists of " (which is easy - 120653, not counting any redirs or pages in the dabs or set index cats). —Cryptic 01:00, 18 November 2024 (UTC)
- Oh, and 11193167 redirects (so 18105407 total mainspace pages), if you care. —Cryptic 01:03, 18 November 2024 (UTC)
- So 6,912,230 non-redirect pages, of which 362,201 are dab pages and 120,653 are Lists (per title), and the rest (e.g., Outlines, Index pages) is immaterial. A good SIA looks like an article and an underdeveloped one looks like a dab page, which takes us back to GreenC's point about it not being a clear-cut question.
- All of this suggests that if you count SIAs as 'articles', then there are 6.429 million articles (93%) and if you count them as lists/dabs, then there are 6.315 million articles (91%).
- Thanks, all. WhatamIdoing (talk) 01:15, 18 November 2024 (UTC)
Median account age for EXTCON
Hello again, generous satisfiers of curiosity:
Today's question is how old the typical currently active WP:EXTCONFIRMED account is. The requirements are:
- is currently active (perhaps made at least one edit during the last 30 days? Any plausible definition of active works for me, so pick whatever is easiest and cheapest to run)
- meets EXTCON (all of which will have the EXTCON permission)
I don't care whether it's date of first edit vs registration date. I also don't care whether it's all ~73K of them or if it's a sample of 5K–10K. I am looking for an answer to the nearest year ("Most active EXTCON editors started editing before 2014" or "If you see someone editing an article under EXTCON, they probably started editing more than 10 years ago").
Thank you, WhatamIdoing (talk) 17:14, 19 November 2024 (UTC)
- Hmm. user_touched has been scrubbed because it is private data. So I guess LEFT JOIN recentchanges to see who is active? This should only get us users who have made an edit in the last 30 days. Then run MEDIAN() on it. Let's see if quarry:query/88037 finishes. If the count is 72,000ish, then I also need to add a WHERE to filter out the editors who aren't in recentchanges. –Novem Linguae (talk) 18:33, 19 November 2024 (UTC)
- That's going to get you not just every user with the user right - the whole point of a left join is that you get a result whether there's a row in the joined table or not - but a row in your resultset for every row they have in recentchanges. And you're leaving out admins, who have extended confirmed implicitly. Plus, even if it worked, it would be a dump of ~25k values.Mariadb has a MEDIAN() window function, but I can't get it to work on user_registration no matter how I try to preprocess it first - it gives me "Numeric datatype is required for percentile_cont function" when I call it directly on the column, which is reasonable, but always 100000000 if I cast it to any kind of numeric value, which isn't. (Anyone know what I'm doing wrong? Sample query. And I've never really grokked window funcs or how to get them to behave like normal, sane, grouped-by aggregate funcs anyway.) But doing it longhand works just fine. quarry:query/88039: 28 May 2013. —Cryptic 19:37, 19 November 2024 (UTC)
- user_registration is ASCII-encoded binary rather than a binary number which is why you're getting nonsense when casting it and trying to do operations on it. Uhai (talk) 21:28, 19 November 2024 (UTC)
- Casting it seems to get me a numeric, and doing normal arithmetic on it (user_registration + 1, division, and so on) coerces it to a numeric; it doesn't get me nonsense until I try to pass it through MEDIAN(). And UNIX_TIMESTAMP() in particular is documented to return an unsigned int.
Current theory is that MEDIAN() can't deal with large numbers (see resultset 4; dividing by numbers smaller than ten gets me 100 million again), which is boggling.No, a cast or operation on the result of MEDIAN() is what fixes it. Still boggling. Cleaner query. Thanks for the prod. —Cryptic 21:55, 19 November 2024 (UTC)- Also, comparing the results reminded me that user_registration is NULL for some users who registered before mid-2005ish, which I hadn't corrected for. 2013-06-15 19:42:14, though I doubt the two weeks' inaccuracy is going to matter much to WAID. —Cryptic 22:30, 19 November 2024 (UTC)
- TIL CAST seems to convert to the proper numeric representation if the binary string contains only numeric ASCII characters. Glad you were able to get it working though. Uhai (talk) 22:34, 19 November 2024 (UTC)
- Casting it seems to get me a numeric, and doing normal arithmetic on it (user_registration + 1, division, and so on) coerces it to a numeric; it doesn't get me nonsense until I try to pass it through MEDIAN(). And UNIX_TIMESTAMP() in particular is documented to return an unsigned int.
- user_registration is ASCII-encoded binary rather than a binary number which is why you're getting nonsense when casting it and trying to do operations on it. Uhai (talk) 21:28, 19 November 2024 (UTC)
- That's going to get you not just every user with the user right - the whole point of a left join is that you get a result whether there's a row in the joined table or not - but a row in your resultset for every row they have in recentchanges. And you're leaving out admins, who have extended confirmed implicitly. Plus, even if it worked, it would be a dump of ~25k values.Mariadb has a MEDIAN() window function, but I can't get it to work on user_registration no matter how I try to preprocess it first - it gives me "Numeric datatype is required for percentile_cont function" when I call it directly on the column, which is reasonable, but always 100000000 if I cast it to any kind of numeric value, which isn't. (Anyone know what I'm doing wrong? Sample query. And I've never really grokked window funcs or how to get them to behave like normal, sane, grouped-by aggregate funcs anyway.) But doing it longhand works just fine. quarry:query/88039: 28 May 2013. —Cryptic 19:37, 19 November 2024 (UTC)
List of Revision IDs by edit summaries
Can someone write a SQL query that isolates all edits made with the edit summary Disambiguating links to ] (intentional link to DAB) using ].
where XYZ is any combination of letters, numbers or symbols. There is a bug in the script that causes edits with this summary to target to a wrong link, see User talk:Qwertyytrewqqwerty/DisamAssist#Bug Report: Double "(disambiguation)" links created. Thanks! —CX Zoom 14:24, 10 December 2024 (UTC)
- quarry:query/88643. —Cryptic 17:43, 10 December 2024 (UTC)
- Thank you very much. It helped me in fixing the pages that had wrong links due to the script bug. —CX Zoom 19:50, 10 December 2024 (UTC)
Draftifications by month
Hi everyone. Cryptic kindly created this query which shows how many draftifications took place between 2021-07 and 2022-08. Could someone please fork it to show dates from 2016 to 2024? If it's easier, I'm fine with seeing the number of draftifications by year instead of by month. Many thanks and best wishes, Clayoquot (talk | contribs) 03:38, 14 December 2024 (UTC)
- I've rerun the query in-place. —Cryptic 14:19, 14 December 2024 (UTC)
- Beautiful, thank you so much Cryptic! Clayoquot (talk | contribs) 16:29, 14 December 2024 (UTC)
List of all file redirects that are in use in mainspace
I wrote a query that lists all file redirects, at quarry:query/88966. Can this query be expanded to only list file redirects that are used in mainspace somewhere? –Novem Linguae (talk) 22:26, 19 December 2024 (UTC)
Update to NPP reports
Is it possible to add a link to the #
column at Misplaced Pages:New_pages_patrol/Reports#Unreviewed_new_redirects_by_creator_(top_10) with an xtools redirs created link. It can target xtools:pages/en.wikipedia.org/USERNAME/0/onlyredirects
Similarly for Misplaced Pages:New_pages_patrol/Reports#Unreviewed_new_articles_by_creator_(top_10) targeting xtools:pages/en.wikipedia.org/USERNAME/all Thanks! ~/Bunnypranav:<ping> 15:49, 20 December 2024 (UTC)
- Done and done. —Cryptic 18:56, 20 December 2024 (UTC)
- Thanks a lot ~/Bunnypranav:<ping> 04:06, 21 December 2024 (UTC)
Trying to measure the number of source links to each domain for a given article
I keep getting the error, "execute command denied to user 's52788'@'%' for routine 'enwiki_p.count'". I was using the page database, but even after I modified my query to only use the externallinks database (meaning I need to input a numerical page ID instead of using the title), I'm still getting the denial. What am I doing wrong here? Am I just not allowed to aggregate? Here's my query, simplified as much as possible and still not working:
SELECT count (el_to_domain_index)
FROM externallinks
WHERE el_from = 37198628
GROUP BY el_to_domain_index;
Safrolic (talk) 23:14, 21 December 2024 (UTC)
Category: