Misplaced Pages

talk:Database reports - Misplaced Pages

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

This is an old revision of this page, as edited by Phil Boswell (talk | contribs) at 06:05, 3 August 2012 (Popular/duplicated external links: thanks, but yes, that's an incorrect URL, and there's loads missing!). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Revision as of 06:05, 3 August 2012 by Phil Boswell (talk | contribs) (Popular/duplicated external links: thanks, but yes, that's an incorrect URL, and there's loads missing!)(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)
Archiving icon
Archives
Index
Archive 1Archive 2Archive 3
Archive 4Archive 5Archive 6
Archive 7Archive 8

Requests: Please list any requests for reports below in a new section. Be as specific as possible, including how often you would like the report run.

User subpages for users indef blocked for spamming

There are many users that have been indefinitely blocked for spamming, for using Misplaced Pages for promotional purposes, or for having a promotional username. Quite often these users have subpages in their userspace that contain promotional material that hasn't been deleted. This material can show up in Google searches - which is probably why spammers put it there. Would it be technically possible to have a database report that lists these pages? If so, would others find this useful enough to be worth the effort? Peacock (talk) 17:46, 23 June 2010 (UTC)

Probably worth the effort. Do you have an example subpage (that hasn't been deleted) so that I can verify the results of any queries? --MZMcBride (talk) 17:53, 23 June 2010 (UTC)
Here's one I came across yesterday and just blanked: User:Juntaomotors/Juntaomotors. Peacock (talk) 15:43, 25 June 2010 (UTC)
That user was not blocked for spam, so it's not really a helpful example . I wrote the query below, but for some reason it does not work. Perhaps MZMcBride will be clever enough to write a better query. Tim1357 01:18, 26 June 2010 (UTC)
The Non-Functioning Query
SELECT Concat('User:', page_title) 
FROM   page 
       JOIN logging
         ON log_title = Substring_index(page_title, '/', 1)
            AND log_namespace = 2
WHERE  page_namespace = 2 
       AND page_title LIKE '%/%'
       AND log_action = 'block'
       AND log_comment LIKE '%spam%'
LIMIT  1;

I assumed PCock wanted current blocks, not all block actions, so I checked against the ipblocks table. It's an insanely slow query, though. Might be better to do it in sets in a programming language. I imagine you'd want to check the ipb_reason field for a few strings like "usernamehardblocked", "spam", etc. --MZMcBride (talk) 02:10, 26 June 2010 (UTC)

Working, but very slow, query
SELECT
  page_namespace,
  page_title,
  ipb_reason
FROM page
JOIN ipblocks
ON ipb_address = TRIM(SUBSTRING_INDEX(REPLACE(page_title, '_', ' '), '/', 1))
AND page_namespace IN (2,3)
WHERE ipb_expiry = 'infinity'
AND page_title LIKE '%/%'
LIMIT 1;

Hmm. I think this is one query that is not going to be done gracefully. I ran a new query that is set to save here when it finishes. (Note each page title lacks the 'User:' prefix). I'll check back tomorrow to see if the query had any success, otherwise I am out of ideas. Tim1357 02:36, 26 June 2010 (UTC)

New Query
CREATE TEMPORARY TABLE u_tim1357.blocked_users 
  (
     USER VARBINARY(255)
  );
INSERT INTO u_tim1357.blocked_users 
SELECT DISTINCT log_title 
FROM   logging 
WHERE  log_action = 'block'
       AND log_comment LIKE '%spam%'
ORDER  BY log_timestamp DESC; 
SELECT page_title 
FROM   page 
       JOIN u_tim1357.blocked_users
         ON page_title = Substring_index(USER, '/', 1)
WHERE  page_title LIKE '%/%' 
       AND page_namespace = 2;
I really hope you were using logging_ts_alternative table instead of logging when you were running those queries on the Toolserver. — Dispenser 14:56, 6 July 2010 (UTC)
Uh oh. Did I break anything? Nobody told me about that table! Tim1357 23:08, 7 July 2010 (UTC)
Anyways, I bypassed the logging table all together and just used ipblocks. I limited the query to the first 200 results, which will be automatically saved here when the query completes. --Tim1357 02:35, 7 August 2010 (UTC)
Guess not. --MZMcBride (talk) 05:54, 9 August 2010 (UTC)
Yea, dunno what happened there. Tim1357 02:41, 17 August 2010 (UTC)

Subpages of Misplaced Pages:Articles for deletion not transcluded to a daily log page

To find AFD discussions not properly transcluded. Maybe limit to recently created to eliminate noise. –xeno 15:38, 1 August 2010 (UTC)

I assume you're already aware of Old deletion discussions (configuration) and Orphaned article deletion discussions (configuration). This is a slight variant, I guess. I'm not sure of a proper report title. I thought you were going to get a Toolserver account and start doing these tasks yourself? :P --MZMcBride (talk) 17:45, 1 August 2010 (UTC)
Untranscluded deletion discussions. As to your question... Kindasorta. But I know my limits! –xeno 18:21, 3 August 2010 (UTC)
"Untranscluded article deletion discussions", you mean? --MZMcBride (talk) 22:33, 3 August 2010 (UTC)
Quite. I can't do all the work, you see. xeno 22:34, 3 August 2010 (UTC)
Okay, so we'll do "Untranscluded deletion discussions" and include MFD and AFD then, I guess. You want this updated daily? --MZMcBride (talk) 16:44, 14 August 2010 (UTC)

More to-do

At some point...

--MZMcBride (talk) 22:15, 29 September 2010 (UTC)

What's a leecher report? Svick (talk) 23:13, 29 September 2010 (UTC)
"User pages of non-contributing users" or something like that. That is, users who come here, register an account, create a user page (that's often spammy, vandalism, or a test page), and never edit anywhere else but on their user page. There are some added complications now (a lot of people land here from other Wikimedia wikis, especially with unified login now, e.g.) and there still isn't a very clear title, but I think the overall idea has merit. --MZMcBride (talk) 00:30, 30 September 2010 (UTC)
There is some merit in that sort of report, maybe not real often though. --Rocksanddirt (talk) 20:32, 22 October 2010 (UTC)

--MZMcBride (talk) 19:41, 4 March 2011 (UTC)

Most-linked redirects (2)

A biweekly (or weekly if it seems useful) report I'd like to see is a list of most-linked redirects. For example, the bot would count the number of times John Blain (football player) is linked, showing the top 500 or so redirects. While some would be useful and should be left alone, others could be found that were the result of page moves that should have fixed a long time ago. Wizardman Operation Big Bear 21:23, 9 October 2010 (UTC)

Why should be redirects that are results of page moves fixed? See WP:NOTBROKEN. Svick (talk) 12:38, 10 October 2010 (UTC)
Some don't need to be, but linking to an outdated link could cause created articles to follow that nonstandard format, leading to more work for the rest of us. That and while some redirects don't need to be cleaned up, many do. Wizardman Operation Big Bear 04:15, 11 October 2010 (UTC)
Huh, you've actually asked about this before. And I've already given you the pagelinks table lecture, so now all I need to do is remember to actually write this report. :-) I'll try to get to it tomorrow. For what it's worth, I doubt it will be a weekly report. It'll likely be kind of "expensive" to generate/update. --MZMcBride (talk) 07:21, 11 October 2010 (UTC)
There is already a report called Templates containing links to disambiguation pages. Perhaps a Templates containing links to redirect pages may be useful as well? -- WOSlinker (talk) 08:49, 11 October 2010 (UTC)
To Wos: that could work. To MZM: If I did ask earlier than I forgot about that, I understand that it would definitely be a tough list to create, and limiting it to monthly given the update expense is fine. Wizardman Operation Big Bear 18:38, 11 October 2010 (UTC)
I just noticed my request several sections up. To clarify, this would just be for mainspace articles; no reason to bother with others. If you want to trim it more and remove other redirects which shouldn't be touched, then we can remove any that include the category "redirects with possibilities" which should knock time off running it. Wizardman Operation Big Bear 18:42, 11 October 2010 (UTC)

Take a look at Most-linked redirects (configuration). The initial limit was 30 incoming links. I've now raised it to 200 incoming links and it's running currently. We'll see what it outputs, how useful it is, and what (if any) adjustments are needed once it finishes.

The idea of creating another report for "Templates containing links to redirects" seems to be a good one. --MZMcBride (talk) 23:57, 11 October 2010 (UTC)

Thanks. Admittedly I was not expecting that many redirects to have such a large number of links. So far it's helped find a few issues that I've handled, even though most redirects there would be left alone. Wizardman Operation Big Bear 02:57, 12 October 2010 (UTC)
It might be helpful to see if there are ways to refine the scope of the report. This can usually be done by looking at specific examples to see if there are distinguishable characteristics about the page that can be broadly applied to other similar pages. For the example you provided (John Blain (football player)), we can say a few things:
  • the page has a move log
  • the page's target is a living person
  • the page title contains parentheses (which usually signify disambiguation of some kind)
  • the page has 7 incoming links from other articles
  • the page title signifies that the person is a living person (using a common first name)
Some of these observations are completely useless. Some aren't. If you can try to give me a better idea of what kind of pages you're looking for (perhaps with some other specific titles), I can try to make a better report for you. This might mean that the report will ultimately be moved to a different title, but that's no big deal. --MZMcBride (talk) 03:03, 12 October 2010 (UTC)
Out of those, the main one that jumps out at me would be the move log; that is something I was hoping to work on, alongside the parentheses modifier as well. The many football player disambigs, for example, stem from a pretty big move war from 07-08, so there's a lot of articles that go to the wrong places still. Wizardman Operation Big Bear 03:20, 12 October 2010 (UTC)

High NFCC pages report

On this version of the report, there is an entry for Head of the River (Victoria), with the entry noting 8 files in use. Yesterday, I removed more than 400 uses of non-free files from this page. The report is showing the number of files, not the number of uses. This particular article has gone largely unnoticed for three years because "8" uses is not a high number. Yet this article is certainly one of the highest, if not the highest, abusers of non-free content we've ever had on this project.

I'd like to see this report modified, or a new report created, to show how many non-free file uses there are, rather than the number of non-free files used. --Hammersoft (talk) 14:18, 20 October 2010 (UTC)

The database only tracks binary usage data. This is also true of templates, categories, page links, and external links. It might be possible to add a column to this report, but it will not be possible to get number of uses for all pages. That might result in some false negatives: if a page is using only one non-free file, but uses it a few hundred times, it won't be listed.
In order to count uses, a regex will have to be run on the page text, I think. This will need to account for templates and other such things. I have an idea of how to do this in my head, but it might be a while before I get around to it. --MZMcBride (talk) 18:52, 20 October 2010 (UTC)

Misplaced Pages:Database reports/Stubs included directly in stub categories and the Darius Dhlomo copyvio problem

Please exclude, from Misplaced Pages:Database reports/Stubs included directly in stub categories, any article transcluding Project:Contributor copyright investigations/Darius Dhlomo/Notice. These pages aren't then problem of WikiProject Stub Sorting; and when their problem is solved, these pages will either have been deleted or reverted to a version which probably doesn't have direct use of stub categories. עוד מישהו Od Mishehu 08:10, 21 October 2010 (UTC)

This should be updating now. --MZMcBride (talk) 19:05, 24 October 2010 (UTC)
Done, thank you. עוד מישהו Od Mishehu 08:57, 25 October 2010 (UTC)

Popular WikiProjects

User:WhatamIdoing has expressed a desire for data regarding popular WikiProjects, and those data interest me also. User talk:Mr.Z-man/Archive 12#Popular WikiProjects? has a record of the request. I would like to see a monthly report (of page views or edits or watchers; preferably one report for each of those three variables). I have added this talk page to my watchlist, and I will watch for a reply (or replies) here.
Wavelength (talk) 19:25, 27 October 2010 (UTC)

Reports of WikiProject watchers and WikiProjects by changes are now up. Svick (talk) 17:30, 13 November 2010 (UTC)
Thank you very much.—Wavelength (talk) 20:52, 13 November 2010 (UTC)

New BLPs that might be eligible for a sticky prod

Earlier this year the community introduced a new form of deletion for new unreferenced BLPs, there's a concern that we aren't tagging as many as we could. Please could we have the following regular report:

Criteria: articles in Category:All unreferenced BLPs that were created after March the 18th 2010
Format: List of articles, marked if they are already in Category:BLP articles proposed for deletion or Category:Articles for deletion
Frequency: Weekly

Many thanks ϢereSpielChequers 17:41, 28 October 2010 (UTC)

Done: Biographies of living people possibly eligible for deletion (configuration). --MZMcBride (talk) 05:48, 5 November 2010 (UTC)

Red-linked categories with significant incoming links

In light of this discussion, would it be possible to generate a report listing red-linked categories (i.e., deleted or never created) which have significant incoming links (i.e., from any namespace except User: and Misplaced Pages:)? Thank you, -- Black Falcon 17:31, 14 November 2010 (UTC)

  • I think Special:Wantedcategories generally updates regularly, which is why this report was never created, although that report only lists the thousand most populated categories. If someone were to go through and either bluelink or de-link the first couple hundred categories depending on their specifics, however, you would likely get everything with 2 category members or more. VegaDark (talk) 01:35, 15 November 2010 (UTC)
    • Special:WantedCategories only lists red-linked categories with actual members, doesn't it? Does it also list empty categories which have incoming links? -- Black Falcon 05:38, 15 November 2010 (UTC)
      • Ah, incoming links, for some reason I was thinking category members. Nope, guess there isn't a report for that. Although I bet there will be a ton of redlinked crap categories we've used in examples in deletion discussions over the years. VegaDark (talk) 10:06, 15 November 2010 (UTC)
        • I hadn't thought of those, but you're right: there will be a lot of those, as well as links from CfD nominations and notifications to users. I don't know if it is possible to generate a report that would ignore incoming links from the User: and Misplaced Pages: namespaces but, if it is possible, it would help to focus the list. -- Black Falcon 17:06, 15 November 2010 (UTC)

So you want red-linked categories (with or without members) with incoming links from namespaces other than Misplaced Pages (4) or User (2)? What should the report title be? How often should it update? The more detail you can provide, the less thinking I have to do (and thus it's much easier for me to knock this out in a few minutes). :-) --MZMcBride (talk) 04:58, 23 February 2012 (UTC)

Yes, thank you. We should probably also exclude any talk namespaces as category links in talk page discussions generally need not or ought not to be updated. I think either Red-linked categories with links (configuration) or Red-linked categories with incoming links (configuration), or something similar, would be an adequately descriptive title. A weekly update should be sufficient, though even a monthly one may be adequate once the backlog is cleared. -- Black Falcon 17:20, 23 February 2012 (UTC)
Please take a look at Red-linked categories with incoming links (configuration) and let me know what you think. --MZMcBride (talk) 21:26, 24 February 2012 (UTC)
The more I think about it, the more I think that you actually want the inverse here regarding incoming links. That is, you don't want cases where there are incoming links from all but certain namespaces (page_namespace NOT IN (1,2,3,4,5,7,9,11,13,15,17,19,101,103,105)), you want cases where there are incoming links from certain namespaces (page_namespace IN (0, 6, 10, 14)), right? If I created a list of red-linked categories and put it somewhere in the Misplaced Pages namespace, it would screw up any subsequent reports, as they'd all have links from a page in the Misplaced Pages namespace, using your logic. But that's not what you're looking for, is it? You're looking for cases where there are red-linked categories from articles, files, templates, or other categories... I think? --MZMcBride (talk) 23:25, 24 February 2012 (UTC)
P.S. I kind of feel terrible now after realizing that this request has been sitting here for over a year and it took like ten minutes to complete. I'm the worst.
Your work and assistance are greatly appreciated, no matter the time it takes. :) The report looks perfect; thank you!
You're right about the logic of inclusion/exclusion, since my interest is in links from certain namespaces rather than all links except those from certain namespaces; ultimately, we're excluding more namespaces than we're including. The namespaces of interest would be the main namespace (0) and the file (6), template (10), help (12), category (14) and portal (100) namespaces. I suppose that links from the MediaWiki (8) and book (108) namespaces should be updated, too, but that's a theoretical consideration since pages in those namespaces tend not to link to categories. -- Black Falcon 19:24, 5 March 2012 (UTC)
By the way, is it possible also to include category pages that contain no members – e.g., ones that have been deleted and emptied? -- Black Falcon 19:27, 5 March 2012 (UTC)
Dunno. This report's logic has twisted my mind pretty badly.
I set up Red-linked categories with incoming links (configuration) to update weekly. Take a look at that report and then we can see what needs to be tweaked. --MZMcBride (talk) 00:58, 12 March 2012 (UTC)
It's my fault since, looking at the report now, I realize that I had not taken into account two key factors.
The report that I had in mind originally was of category pages which are red-linked and empty and have incoming links from namespaces 0, 6, 10, 12, 14, or 100. That report would have been useful in removing links to categories which should not have any incoming links.
This report is of category pages which are red-linked and populated and have incoming links from the above namespaces. It is arguably more useful since it identifies categories that need to be created (and properly subcategorized) or emptied – a task that is more important than link removal. I will work on the report over the next few days and provide feedback here.
Again, thank you! -- Black Falcon 02:56, 14 March 2012 (UTC)

(unindent) Perhaps having a "Members" column would be good? That way you could easily sort the report by ones that are empty (0 members) and ones that aren't. Thoughts? --MZMcBride (talk) 22:00, 17 March 2012 (UTC)

P.S. Just for my reference, this report takes 45 minutes to run currently on willow.

List of longest 'DISCUSSION' content.

Hi, Is it possible to get a list of (Misplaced Pages-) articles which have the largest 'Discussion' content? On these 'Talk:' pages you also see the amount of archives; for example the article about Barack Obama has 71 archives, and will end high in the list probably. Thanks in advance. Kyra —Preceding unsigned comment added by 82.171.75.225 (talk) 10:55, 17 November 2010 (UTC)

Here you go. Svick (talk) 20:40, 20 November 2010 (UTC)
Looks pretty good. Might be possible to limit to pages with "rchive" in them and then add the root page for a more precise count of actual discussion, but your method is probably about as accurate, relatively.
One thing. Perhaps it might be better to put the column in megabytes? For example, 40 230 kilobytes = 39.2871094 megabytes (according to Google); I think the megabytes figure is much easier to understand and appreciate.
Thanks for taking care of this. :-) --MZMcBride (talk) 20:59, 20 November 2010 (UTC)
I have changed it to show the size in megabytes.
I don't like including only archives the way you suggested, because it would mean that it would be the sum of sizes of the talk page including archives excluding other subpages, but including their archives (e.g. Misplaced Pages talk:Requests for arbitration/Badlydrawnjeff/Proposed decision/Archive 1). And some people are already confused what exactly does the number mean, so I don't want to complicate it more. Svick (talk) 23:05, 20 November 2010 (UTC)

I've taken this to MfD, as it's quickly been misinterpreted. There's nothing problematic about having long talk histories so long as the archives are orderly, so it's not clear what purpose this particular report fulfils. Chris Cunningham (user:thumperward: not at work) - talk 13:01, 21 November 2010 (UTC)

Two database reports for File redirects, please

Originally I had posted here on MZMcBride's talkpage but he has brought my request to this discussion board instead. Could someone construct two database reports for redirects in the File namespace with one or less incoming links, one for those that match Titleblacklist and another for those that don't? :| TelCoNaSpVe :| 05:23, 24 November 2010 (UTC)

Note: The Titleblacklist is a pain since matching can only be correctly done with PHP (or maybe perl). — Dispenser 05:34, 24 November 2010 (UTC)
Okay, let's ignore Titleblacklist then. Can we at least have a database report of all redirects from the file namespace with one or less incoming links stored somewhere (e.g. here)? :| TelCoNaSpVe :| 22:00, 7 December 2010 (UTC)
5803 files. That's quite long report, I think it would be worth trimming it down somehow, like report only those that have no incoming links at all? Svick (talk) 16:38, 16 January 2011 (UTC)
You're allowed to paginate. Plenty of current reports do. :P
That said, I'm not sure how good an idea this report is. What's the point of finding (internally) unused file redirects? --MZMcBride (talk) 00:54, 17 January 2011 (UTC)

The Talk pages by size MfD

The Talk pages by size MfD raised a lot of good issues and ideas. A main problem is that the database reports have very little information about them on the page, which has lead to confusion (as brought out in the MfD). Most of the database reports pages are not categorized. In response to the MfD, I modified the Misplaced Pages:Database reports/Talk pages by size page text using posts from the MfD. I also created Category:Misplaced Pages database reports as a place to categorize all the pages listed at Misplaced Pages:Database reports and other database reports pages whereever they may be. Initially, I though of using {{infopage}} on the top of the database pages, but that template didn't fit because the database report pages do not describe communal consensus on some aspect of Misplaced Pages norms and practices. I added a generic one to Talk pages by size instead. In view of the talk page tagging with the admonishment "Perhaps this will motivate greater efficiency in the use of kilobytes," you may want to create a top template to distinguish a utility report from a statistical report. I added text to the top of Misplaced Pages:Database reports to address this. You may further want to expand the text of Misplaced Pages:Database reports to provide some Database reports consensus control over the use of utility reports and statistical reports. -- Uzma Gamal (talk) 16:54, 25 November 2010 (UTC)

I populated Category:Misplaced Pages database reports with some of the database reports pages. If you like the category, please add the rest of the database reports pages. If not, you know what to do and there is no need to inform me of it : ). -- Uzma Gamal (talk) 17:08, 25 November 2010 (UTC)
The MFD was a complete waste of time and offered (almost) nothing of value, except to point out once again that some people posting here (on Misplaced Pages) are clueless. The category additions (and other additions) are going to be overwritten the next time the reports update. A better approach to the intro text of the reports is needed, but there are a lot of other projects I'd rather work on. --MZMcBride (talk) 22:49, 25 November 2010 (UTC)

Rename Long pages

Misplaced Pages:Database reports/Long pages should be renamed Misplaced Pages:Database reports/Talk pages by length consistent with Misplaced Pages:Database reports/Talk pages by size. "Long" is a subjective conclusion whereas Talk pages by length offers no opinion to characterize the page. -- Uzma Gamal (talk) 16:54, 25 November 2010 (UTC)

Long pages (configuration) has more than talk pages. --MZMcBride (talk) 22:46, 25 November 2010 (UTC)
"Pages by length" might work. It's up to you. Keep up the good work. -- Uzma Gamal (talk) 14:04, 26 November 2010 (UTC)

Orphaned template documentation

Would it be possible to generate a report of "/doc" subpages in the Template namespace for which the main template page does not exist or is a redirect? Thanks, -- Black Falcon 05:13, 29 November 2010 (UTC)

May even be useful to show all subpages and not just /doc. -- WOSlinker (talk) 07:59, 29 November 2010 (UTC)
I've tried it for other types of subpages, it isn't useful. There are navboxes with titles like 1984/85_X_game that are still transcluded and <noinclude> use to not work with the preload function so those templates were left uncategorized. We could improve Misplaced Pages:Database reports/Unused templates so that it's more useful... — Dispenser 18:31, 5 December 2010 (UTC)
/* Orphaned doc pages */
SELECT CONCAT("]"),
   IF(rd_title IS NOT NULL, IF(trg.page_id, "Occupied", ""), "Delete") AS "Move",
   (SELECT COUNT(*) FROM templatelinks WHERE tl_namespace=doc.page_namespace AND tl_title=doc.page_title) AS "Transclusions",
   rd_title
FROM page AS doc
LEFT JOIN page AS tpl ON tpl.page_namespace=doc.page_namespace
    AND tpl.page_title=TRIM(TRAILING "/doc" FROM doc.page_title)
LEFT JOIN redirect ON rd_from=tpl.page_id
LEFT JOIN page AS trg ON trg.page_namespace=rd_namespace
    AND trg.page_title=CONCAT(rd_title, "/doc")
WHERE doc.page_namespace=10
AND   doc.page_title LIKE "%/doc"
AND   doc.page_is_redirect=0
AND (tpl.page_is_redirect=1 OR tpl.page_id IS NULL)

And we have 209 results with many which need to be merged/redirected to the new /doc page. — Dispenser 18:31, 5 December 2010 (UTC)

Long pages changes

The page has two edits (, ) in the history that probably has to be implemented with the bot. At same also the 140K threshold could be lowered to 130K. --Kslotte (talk) 11:14, 5 December 2010 (UTC)

Proposed change

I was wondering if Misplaced Pages:Database reports/Unused templates could possibly be changed so that templates which transclude {{substituted}} within them or within their /doc are either not included, or are separated into a separate section entitled "Substituted Templates" or something similar? This would make it much easier to see which templates intentionally have no transclusions (because they are supposed to be substituted) and those which are just unused. Thanks! Mhiji (talk) 18:23, 12 December 2010 (UTC)

Tag the templates with {{transclusionless}}; I'll write some code to exclude pages tagged with that template. --MZMcBride (talk) 18:52, 12 December 2010 (UTC)
Thats great thanks. Though {{transclusionless}} redirects to {{substituted}} anyway? And {{substituted}} is used much more than it's redirect. Mhiji (talk) 23:38, 12 December 2010 (UTC)
Actually, we can use categories as the bug regarding <noinclude> has been fixed. — Dispenser 02:00, 13 December 2010 (UTC)
Templates are more flexible than categories. They allow content to be included or not, they allow for easier redirects and renames, etc. Use a template. It doesn't matter if you use {{transclusionless}} or {{substituted}}. Either will work. --MZMcBride (talk) 02:15, 13 December 2010 (UTC)

Request: List of articles found in every language but English

I start an article recently and then found there were already 20+ other versions on other language Wikipedias. This got me wondering what other articles exist in multiple other languages but don't appear in English. Has anyone tried to make a list of topics which have not been translated into English, sorted by the number of other language Wikipedias it appears in? Or is anyone willing to try making such a list? Thanks in advance. —Pengo 01:41, 25 December 2010 (UTC) (Crossposted from Misplaced Pages:Village_pump_(idea_lab)#Every_language_but_English.3F). And if you need more specifics I'd like I'd be happy to flesh out the idea.

I worked on this for 2 hours and I have not much to show for it . There are a lot of wikipedias (269), some of which are really small (Cheyenne Misplaced Pages has 57 articles). Also, on the toolserver, the wikipedia databases are served across 3 separate servers, which makes querying difficult. If you (MZMcBride) want to see my work, I've included the query below. Good luck! --05:04, 28 December 2010 (UTC)
Tim1357's broken query
SELECT ff.page_title, count(ff.page_title)
FROM   (SELECT z.page_title 
        FROM   dewiki_p.page AS z 
        WHERE  z.page_namespace = 0 
        UNION ALL 
        SELECT y.page_title 
        FROM   frwiki_p.page AS y 
        WHERE  y.page_namespace = 0 
        UNION ALL 
        SELECT x.page_title 
        FROM   plwiki_p.page AS x 
        WHERE  x.page_namespace = 0 
        UNION ALL 
        SELECT w.page_title 
        FROM   itwiki_p.page AS w 
        WHERE  w.page_namespace = 0 
        UNION ALL 
        SELECT v.page_title 
        FROM   jawiki_p.page AS v 
        WHERE  v.page_namespace = 0 
        UNION ALL 
        SELECT u.page_title 
        FROM   eswiki_p.page AS u 
        WHERE  u.page_namespace = 0 
        UNION ALL 
        SELECT t.page_title 
        FROM   ptwiki_p.page AS t 
        WHERE  t.page_namespace = 0 
        UNION ALL 
        SELECT s.page_title 
        FROM   nlwiki_p.page AS s 
        WHERE  s.page_namespace = 0 
        UNION ALL 
        SELECT r.page_title 
        FROM   ruwiki_p.page AS r 
        WHERE  r.page_namespace = 0 
        UNION ALL 
        SELECT q.page_title 
        FROM   svwiki_p.page AS q 
        WHERE  q.page_namespace = 0 
        UNION ALL 
        SELECT p.page_title 
        FROM   zhwiki_p.page AS p 
        WHERE  p.page_namespace = 0 
        UNION ALL 
        SELECT o.page_title 
        FROM   cawiki_p.page AS o 
        WHERE  o.page_namespace = 0 
        UNION ALL 
        SELECT n.page_title 
        FROM   nowiki_p.page AS n 
        WHERE  n.page_namespace = 0 
        UNION ALL 
        SELECT m.page_title 
        FROM   fiwiki_p.page AS m 
        WHERE  m.page_namespace = 0 
        UNION ALL 
        SELECT l.page_title 
        FROM   ukwiki_p.page AS l 
        WHERE  l.page_namespace = 0) AS ff 
WHERE  NOT EXISTS (SELECT 1 
                   FROM   enwiki_p.page AS aa 
                   WHERE  aa.page_namespace = 0 
                          AND aa.page_title = ff.page_title) 
GROUP BY ff.page_title
LIMIT  1;
Err, aren't you assuming that each wiki would have the article with the same title? Surely other language Wikipedias will have localized titles like "World War II".... You need to be focus on the langlinks table. --MZMcBride (talk) 05:05, 28 December 2010 (UTC)
I had already posted the query at the village pump and concluded that it would not work since 1) the interwikis aren't kept in good condition and 2) other languages use different page structure. On DE Quantum physics and Quantum mechanics are separate while on EN they are covered in the same article. — Dispenser 05:45, 28 December 2010 (UTC)
I thought approaching this as “biggest interwiki networks that don't include enwiki”. But I think that would be difficult or impossible to do with just SQL. Svick (talk) 11:12, 28 December 2010 (UTC)
It has been done in the past, I think. . Sole Soul (talk) 13:43, 28 December 2010 (UTC)
I created Spanish pond turtle yesterday for just that reason(having 7 other language pages). I think you need to look on the bigger other language Wikipedias (.es, .de) and check if it has no .en link but has many to others. Regards, SunCreator 14:36, 28 December 2010 (UTC)
Re Intrawiki links not being in good condition. My experience with The death anomalies project has been quite the opposite - intrawiki link errors do exist, but are rare. I think this sort of report would generate a useful list of articles that we potentially should have, and where instead it identifies that we are missing an intrawiki link, then that is also useful info. So yes it would be worth doing this by checking for articles on DE, FR etc that have large numbers of intrawiki links but not an EN one. Whether the result is an extra article or an added intrawiki link to an existing article doesn't matter as both are positive outcomes. ϢereSpielChequers 17:25, 28 December 2010 (UTC)

I have created the report based on data from dumps. See Misplaced Pages:Dump reports/Missing articles. User<Svick>.Talk(); 14:51, 11 June 2011 (UTC)

Images with no recognisable license

Would it be possible to have a 'report' based version of http://en.wikipedia.org/User:MGA73/No_license which updated daily?

Sfan00 IMG (talk) 13:52, 5 January 2011 (UTC)

I don't know what you mean. What's deficient with that subpage? How would a separate report be different? --MZMcBride (talk) 01:59, 16 January 2011 (UTC)

Images not yet suitable for Commons

Is it possible to have a report that searches for media tagged {{mtc}} (and related) but which don't have a recognisable license, or one that's incompatible with Commons (such as fair-use)?

Report to run on a dialy or weekly basis depending on volume of media concerned?

Sfan00 IMG (talk) 13:56, 5 January 2011 (UTC)

I don't know what "recognisable license" means. I also don't know which licenses are incompatible with Commons. Is there a category for such a thing? --MZMcBride (talk) 02:00, 16 January 2011 (UTC)

Editors eligible for Autopatrol privilege

Per this conversation at the Village pump, run a query very similar to tools:DBQ-87 to find editors eligible for the Autopatrol privilege.

  • Search to find all editors on en.wikipedia who have created 50+ non-redirect articles
  • If editor already has administrator or autoreview priviliges, remove from list
  • If account is less than 6 months old, remove from list
  • If last edit was made more than 30 days ago, remove from list
  • Sort list by number of articles created

Is it possible to run a query like this? Thanks very much. -      Hydroxonium (talk) 00:37, 14 January 2011 (UTC)

I came up with this, I'm not sure if it works. Tim1357 02:52, 18 January 2011 (UTC)
Query
SELECT rev_user_text 
FROM   revision 
       JOIN page 
         ON page_id = rev_page 
            AND page_namespace = 0 
            AND page_is_redirect = 0 
       LEFT JOIN user_groups 
         ON ug_group IN ( 'sysop', 'autoreviewer' ) 
            AND rev_user = ug_user 
WHERE  rev_user!=0 
       AND Isnull(ug_user) 
       AND rev_timestamp > Date_format(DATE_SUB(NOW(), INTERVAL 6 MONTH), 
                           '%Y%m%d%H%i%s') 
       AND rev_parent_id = 0 
       AND rev_deleted = 0 
GROUP  BY rev_user 
HAVING COUNT(*) > 50 
LIMIT  1;

Thanks very much, Tim. I'm not familiar with SQL, so I will let others chime in. Thanks again. -      Hydroxonium (talk) 06:13, 18 January 2011 (UTC)

Can we add "bot" to the ug_group list so it looks like this?
Query
SELECT rev_user_text 
FROM   revision 
       JOIN page 
         ON page_id = rev_page 
            AND page_namespace = 0 
            AND page_is_redirect = 0 
       LEFT JOIN user_groups 
         ON ug_group IN ( 'sysop', 'autoreviewer', 'bot' ) 
            AND rev_user = ug_user 
WHERE  rev_user!=0 
       AND Isnull(ug_user) 
       AND rev_timestamp > Date_format(DATE_SUB(NOW(), INTERVAL 6 MONTH), 
                           '%Y%m%d%H%i%s') 
       AND rev_parent_id = 0 
       AND rev_deleted = 0 
GROUP  BY rev_user 
HAVING COUNT(*) > 50 
LIMIT  1;

I have been running in to bots on the list from the jira:DBQ-87 query. I don't think bots are allowed to create articles except is special cases, but I've been seeing them. Thanks again. - Hydroxonium (H3O) 23:50, 18 January 2011 (UTC)

Another version...
SELECT
  p2.page_creator,
  usrtmp.rev_timestamp AS first_edit,
  r2.rev_timestamp AS last_edit,
  COUNT(*)
FROM u_mzmcbride_enwiki_page_creators_p.page AS p1
JOIN enwiki_p.page AS p2
ON p1.page_id = p2.page_id
JOIN (SELECT
        user_id,
        user_name,
        user_editcount,
        rev_timestamp,
        GROUP_CONCAT(ug_group) AS groups
      FROM user
      LEFT JOIN user_groups
      ON ug_user = user_id
      JOIN revision
      ON rev_user = user_id
      AND user_id NOT IN (SELECT
                            ug_user
                          FROM user_groups
                          WHERE ug_group IN ('sysop', 'autoreviewer'))
      AND rev_timestamp = (SELECT
                             MIN(rev_timestamp)
                           FROM revision
                           WHERE rev_user = user_id)
      AND rev_timestamp < DATE_FORMAT(DATE_SUB(NOW(),INTERVAL 6 MONTH),'%Y%m%d%H%i%s')
      GROUP BY user_id) AS usrtmp
ON usrtmp.user_name = p1.page_creator
JOIN revision AS r2
ON p2.page_id = r2.rev_page
WHERE p2.page_namespace = 0
AND p2.page_is_redirect = 0
AND r2.rev_timestamp = (SELECT
                           MAX(rev_timestamp)
                         FROM revision
                         WHERE rev_user = usrtmp.user_id)
AND r2.rev_timestamp > DATE_FORMAT(DATE_SUB(NOW(),INTERVAL 1 MONTH),'%Y%m%d%H%i%s')
GROUP BY p1.page_creator
HAVING COUNT(*) > 49
ORDER BY COUNT(*) DESC;

I maintain a database that contains page_id, page_creator pairs. So rather than querying each page, you can just look up the stored info. It's at u_mzmcbride_enwiki_page_creators_p. This query uses that database, but it got interrupted today. This really is too obnoxious to put into one query. It needs to be split out into multiple parts using some sort of sane logic. I may do this tomorrow if I get a chance. --MZMcBride (talk) 07:22, 19 January 2011 (UTC)

I actually saw that when I was writing the query but didn't use it because the table is indexed so that it is optimized for pageid->creator, instead of the other way around. This means that MySQL has to do a table scan in order to find each page that the user created. Maybe you could create an index on top of the table to help with this? CREATE INDEX user_creations ON u_mzmcbride_enwiki_page_creators_p.page (page_creator); or something would do the trick. Tim1357 04:01, 20 January 2011 (UTC)
I don't know what you're talking about. The schema looks like this:
CREATE DATABASE u_mzmcbride_enwiki_page_creators_p;
CREATE TABLE page (
page_id INT UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT,
page_creator varchar(255) binary NOT NULL default ''
);
CREATE INDEX page_creator ON page (page_creator);
I don't really have the time to do this right now, but I think the best method is to do a query to get all the page creators with over fifty article creations (should take about half an hour), then do a Pythonic set comparison with the bots/autopatrollers/sysops and then check last edit/first edit for each user using functions/individual queries. That's how I mapped it out in my head, at least. I probably won't get around to this for a while, though. --MZMcBride (talk) 04:09, 20 January 2011 (UTC)


Thanks very much for the help, MZMcBride. Yes, please split up the query if that will ease the task. If it ends up being too obnoxious, I would be fine just running the original jira:DBQ-87 query again. Then I can manually go through the list and check for the other things (6 months old, activity in the last 30 days, etc.). That's what I have been doing with the old list. It's just extremely labor intensive. I've spent over 20 hours manually sifting through that old list and I'm starting to go bonkers. Ughhh. Anyway, thanks again. I appreciate the help.- Hydroxonium (H3O) 08:27, 19 January 2011 (UTC)
I don't know SQL or anything about the Misplaced Pages database, but I have been reading up on stuff. So this is likely to be an annoying question that can be ignored, but I'll ask it anyway. Is user_touched a field in the database, and could that be used to check for recent activity? Thanks again for the help. - Hydroxonium (H3O) 08:58, 19 January 2011 (UTC)
If it's not too much trouble, could we exclude users whose last creation was more than, say, 90 days ago? If it's too much faff, then never mind—it's easy enough to check manually. HJ Mitchell | Penny for your thoughts? 01:33, 20 January 2011 (UTC)
HJ Mitchel, that won't slow it down much. We could just add AND MAX(rev_timestamp)>DATE_FORMAT(DATE_SUB(NOW(),INTERVAL 90 DAY),'%Y%m%d%H%i%s') to the HAVING statement. Tim1357 04:17, 20 January 2011 (UTC)
MZMcBride: I'll try to run this on one of the slow servers to see if I can get any results. Tim1357 04:17, 20 January 2011 (UTC)

┌──────────────────────────────┘
Hi guys. Thanks very much for working on this report. Could this be run monthly? It would help the guys assigning permissions if the workload came in smaller batches more frequently rather than one big batch. Thanks again. - Hydroxonium (H3O) 18:11, 20 January 2011 (UTC)

One more. This is probably difficult, but would it be possible to search for and remove users that were declined in the previous month? This might be more approprate for a bot to do maybe. Thanks much. - Hydroxonium (H3O) 18:20, 20 January 2011 (UTC)

Thanks very much for working on this report, everybody. Svick has created this report at Misplaced Pages:Database reports/Editors eligible for Autopatrol privilege. Thanks Svick. - Hydroxonium (H3O) 03:41, 24 January 2011 (UTC)

I'm not sure why Svick didn't announce the new report here or at Misplaced Pages:Database reports/Announcements. I assume he had his reasons, though.... I peeked at the configuration and it looks pretty good. I'm still hoping Svick'll take over database reports altogether, but that's a different topic for a different day. --MZMcBride (talk) 04:34, 24 January 2011 (UTC)
Sorry, I wanted to tweak it a bit before posting here, but I didn't find the time until now. Specifically, I was trying to remove those that didn't create any article recently, but that caused the report to run too long. But I removed those that were declined this or last month.
MZMcBride, this is exactly the reason I don't want to take over – I don't always have the time.
Svick (talk) 16:25, 26 January 2011 (UTC)
You think I always have the time? :P
The idea behind a multi-maintainer project is that hopefully the load could be distributed between people, allowing for a faster response time for new reports/queries. I think of it as a long-term goal, though. Nothing needs to be done this instant. --MZMcBride (talk) 23:03, 26 January 2011 (UTC)
Yeah, I think you're right, multi-maintainer project is the way to go. Svick (talk) 16:15, 27 January 2011 (UTC)

New "Duplicate Files" report

A few comments about this new report:

  1. report titles generally use sentence case
  2. use {{fullurl:}} instead of for portability to other wikis and better links on the secure site (this also removes the need for Python's urllib module)
  3. include any crontab entry on the configuration subpage
  4. you want ~~~~~, not ~~~~ (it looks like this was already fixed)
  5. if you're limiting the number of output results, say so in the report description text
  6. if you're doing other checks (such as file protection status), say so in the report description text
  7. "annother" is misspelled in the report description text currently
  8. don't hardcode database names and host names in the configuration
  9. don't hardcode namespace names (even if they are canonical, use toolserver.namespacename)

Other than that, looks good. It'd be nice to get a multi-maintainer project for these reports going at some point. I'll look into that at some point. --MZMcBride (talk) 05:33, 24 January 2011 (UTC)

Ok, I think I've fixed all those things. Tim1357 05:56, 24 January 2011 (UTC)
Awesome. :D Thanks for the quick response. Last minor nitpick is that you're still importing urllib (when it isn't being used any longer), but that's no big deal. Thanks for helping out! I really appreciate it. --MZMcBride (talk) 06:00, 24 January 2011 (UTC)
Although the updated report is broken: You need to have a newline after table row delimiter |- and you need to add ns_id = 6 to the SQL, so that it actually uses the File namespace. Svick (talk) 19:26, 26 January 2011 (UTC)
One more thing: I think you want to test that the images are not protected separately (i.e. using two left joins). This way, each duplicate pair will show up only once. Svick (talk) 13:52, 28 January 2011 (UTC)

All pages containing colons

I'd like to see a report that has information about:

  1. The number of pseudo-namespace redirects out there.
  2. Pages that can have possible conflicts with interlanguage/iso-code prefixes.

Can a query be done for this? Thanks, :| TelCoNaSpVe :| 00:16, 26 January 2011 (UTC)

Maybe? You need to be much more specific, though. I assume you're talking about pages such as CAT:CSD? It's easy enough to list all page titles that contain a colon (":"), but that probably isn't what you want (even if your section header says otherwise). Inherently pages beginning with used language codes (such as "en") are forbidden. (Try making the page "En:foo" on this wiki.) So it's rather unclear what you're after and for what purpose. If you can clarify, that would help things along. --MZMcBride (talk) 03:04, 26 January 2011 (UTC)

Misplaced Pages:Database reports/Deleted red-linked categories

I have been tackling some of the entries in this report but it is a bit difficult to see the wood for the trees sometimes as there are a lot of such categories which only contain pages in user space (e.g. Category:Capitalist Wikipedians and the like). Would it be possible to split the report (there are already 3 subpages but these seem to be just for size reasons) into say

  • Red-linked categories containing at least one page in article space.
  • Red-linked categories containing only pages in user space.
  • Any remaining red-linked categories.

Thanks. Boissière (talk) 22:36, 27 January 2011 (UTC)

I don't have an objection to this per se, but it may be easier just to empty those user categories which do not start with User or Misplaced Pages. I've started working on this. -- Black Falcon 22:13, 28 January 2011 (UTC)

Long stubs

Would be nice to be able to keep this at 500-1,000 items. Currently it has fewer then 100 entries that could still be stubs after removing over 500 !stubs. Vegaswikian (talk) 07:30, 3 February 2011 (UTC)

Okay, I fixed the logic to be a bit more reasonable. You can see the changes in this diff. --MZMcBride (talk) 16:44, 6 February 2011 (UTC)
Thanks. Vegaswikian (talk) 08:35, 7 February 2011 (UTC)

Possible change to Deleted red-linked categories

For the Cats listed at Misplaced Pages:Database reports/Deleted red-linked categories, would it be possible to either change the members column (showing how many pages are in the deleted cat) to use PAGESINCAT? Or, if the number is logged for the database report as how many were member pages at the time of the scan, perhaps a new column with a "current" heading? This would probably increase the page count, as I believe you can only call 500 parser functions at time (as opposed to the current 800), but I think it would help to process and clean up the entries there. ennasis @ 11:08, 30 Shevat 5771 / 4 February 2011 (UTC)

Done. --MZMcBride (talk) 21:06, 6 February 2011 (UTC)
Awesome. :) This will help out a lot, I'm sure. Thanks. ennasis @ 00:05, 4 Adar I 5771 / 8 February 2011 (UTC)

Polluted category

Two things, first of all in Misplaced Pages:Database reports/Polluted categories/Configuration, AND tl_title = 'Pollutedcat' should be updated to AND tl_title = 'Polluted_category' since the template was renamed.

Thanks. -- WOSlinker (talk) 11:16, 6 February 2011 (UTC)

Sure, done: https://github.com/mzmcbride/database-reports/commit/1e1188ca0cad9b1cd32e856fa63ff563cd2f1840. --MZMcBride (talk) 16:49, 6 February 2011 (UTC)

New category-related report

And secondly, a new repot that checks all the categories that transclude {{Template category}} to see if any of them contain article pages wouldbe nice.

I've done some SQL below but not certain if it will work or need some tweaking.

SELECT page_title
FROM page AS pg1
JOIN templatelinks AS tl
ON pg1.page_id = tl.tl_from
WHERE pg1.page_namespace = 14
AND tl.tl_namespace = 10
AND tl.tl_title = 'Template_category'
AND EXISTS (SELECT 1
            FROM page AS pg2
            JOIN categorylinks AS cl
            ON pg2.page_id = cl.cl_from
            WHERE pg2.page_namespace = 0
            AND pg1.page_title = cl.cl_to)
LIMIT 250;

Thanks. -- WOSlinker (talk) 11:16, 6 February 2011 (UTC)

Do you have a report title and update frequency in mind? --MZMcBride (talk) 16:50, 6 February 2011 (UTC)
How about Template categories containing articles & Weekly would be fine. -- WOSlinker (talk) 16:55, 6 February 2011 (UTC)
Done: Template categories containing articles (configuration). --MZMcBride (talk) 18:09, 6 February 2011 (UTC)
Thanks -- WOSlinker (talk) 18:18, 6 February 2011 (UTC)

Selectively archived talk page (again)

I've selectively archived this talk page again. If you want a new report or a tweak to an existing report, post (or re-post) here and someone will take care of it, eventually. --MZMcBride (talk) 17:30, 6 February 2011 (UTC)

Article categories containing templates

The new report Template categories containing articles (configuration) is good for some wikignoming but I was wondering if a report on the converse situation, i.e. where an article category contains a template, would also be possible (and useful)? Boissière (talk) 20:59, 17 February 2011 (UTC)

Such a report could be useful, but there are times when articles categories should contain a template (mainly when the topic of the template corresponds exactly to the topic of a topic category). See e.g. Category:Cars (film). -- Black Falcon 21:03, 17 February 2011 (UTC)
I am not sure that I agree and WP:PROJCATS implies that this should not be done. The reason being that the article categories are for readers whereas templates are usually only of interest to editors. Boissière (talk) 22:30, 17 February 2011 (UTC)
I would argue that it is helpful to readers when a navigation template (intended for readers, not editors) for a particular topic is placed in a category for the same topic (this is, in my opinion, the only circumstance when such "template pollution" is useful). In general, it is quite common for a limited amount of project content to be categorized directly into a content category when the topic of the content and the category is the same. For instance, stub categories always appear in content categories, sorted after the main alphabetical list under the character "µ". The situation is similar with Misplaced Pages books, which are sorted under the character "β", and templates, sorted under the character "τ". -- Black Falcon 17:58, 18 February 2011 (UTC)

Move watch

Here's a report I've made: Page moves grouped by user. Updated hourly, it groups all moves by user in the last 24 hours, week, or month. It's been very useful in catching pagemove sockmasters such as User:Tobias Conradi. --JaGa 18:58, 21 February 2011 (UTC)

Neat. :-) There are already a number of other Toolserver tools listed in the index on the subject-space page. Feel free to add a row for your tool/report if you'd like. I'm sure people would find it helpful. Cheers. --MZMcBride (talk) 23:24, 21 February 2011 (UTC)
Done. Thanks. --JaGa 09:44, 24 February 2011 (UTC)

Possible new report

I'm not sure how useful this would be, but a report like "talk pages that are redirects when parent article is not" may be helpful, particularly in finding poor moves or copy-paste articles. ennasis @ 02:49, 20 Adar I 5771 / 24 February 2011 (UTC)

Misplaced Pages:Database reports/Meta-Wiki rights changes

MZ, is there a way to make the "Comments" field do some sort of word-wrapping? If not, I forgive you. Killiondude (talk) 21:51, 25 February 2011 (UTC)

KD, you're a comment. And I forgive you. --MZMcBride (talk) 05:10, 26 February 2011 (UTC)
You are a god among men. Killiondude (talk) 05:46, 27 February 2011 (UTC)

Editors eligible for Autopatrol privilege (2)

How often is the list of Editors eligible for Autopatrol privilege updated ?

Monthly ?! *** in fact *** ( contact ) 14:47, 2 March 2011 (UTC)
Hah, it's configured for the 29th day of the month, too, so it missed February altogether. It should probably be changed to weekly. --MZMcBride (talk) 16:28, 2 March 2011 (UTC)

Would it be possible to filter it to exclude those who haven't created an article in the last month? ϢereSpielChequers 18:11, 2 March 2011 (UTC)

I tried that when I was creating the report, but failed, because the resulting query was too slow. I'm trying an alternative approach now: creating my own modified copy of MZMcBride's page creators table. Svick (talk) 22:10, 2 March 2011 (UTC)
Thanks, that would make it much more useful. ϢereSpielChequers 22:13, 2 March 2011 (UTC)

I should mention that we are still working on the original list from January. I am looking at automating more of the task and think breaking the it in to seperate parts may help. I started a thread at Misplaced Pages:Village pump (proposals)#Automating submissions for Autopatrol right. Everybody is encouraged to comment. Thanks. - Hydroxonium (talk) 13:27, 12 March 2011 (UTC)

A new version of the report, containing only those that created an article in the last 30 days is finally up. User<Svick>.Talk(); 11:09, 24 April 2011 (UTC)

Thank you, great work. - Hydroxonium (TCV) 13:45, 25 April 2011 (UTC)

Orphaned /to do pages

When someone moves an article to a new name, they may not notice that sometimes there was a /to do subpage off the talk page, which then ends up being orphaned since it is not moved with the page and talk page. So just wondering if a query which lists pages that are redirects where there is also a /to do subpage that exists would be a useful report? -- WOSlinker (talk) 20:54, 2 April 2011 (UTC)

It's similar to Misplaced Pages:Database reports/Orphaned talk subpages but with the root page being a redirect rather than not existing. -- WOSlinker (talk) 21:02, 2 April 2011 (UTC)
Or another way could be to list all talk subpages without links to or transclusions. -- WOSlinker (talk) 17:50, 4 April 2011 (UTC)
It seems there is quite a lot of such pages. First thousand is at Misplaced Pages:Database reports/Talk subpages with redirect parent. User<Svick>.Talk(); 03:36, 24 April 2011 (UTC)
Thanks. Just wondering, maybe better to only list those without any links to them. So adding the the following to the where clase might help. -- WOSlinker (talk) 11:17, 24 April 2011 (UTC)
and not exists (
  select *
  from pagelinks
  where pl_from=sub.page_id
  and pl_title<>'Database_reports/Talk_subpages_with_redirect_parent'
)
Right, done. User<Svick>.Talk(); 12:45, 24 April 2011 (UTC)
Thanks for handling this one, Svick.
This bit of the query looks strange:
select page_id
     from page
     where page_title = 'Database_reports/Talk_subpages_with_redirect_parent')
You're not specifying page_namespace here. That must make it much slower, no? (Actually, just tested, it's 6 seconds vs. 0, but still would be much better to specify here. It would also be nice to make it abstract and use the value from report_title if possible.)
There's a minor typo in the description text as well ("exclusing" --> "excluding"). Other than that, looks good. :-) --MZMcBride (talk) 18:02, 24 April 2011 (UTC)
Yeah, you're right. I was thinking that there's no chance that there will be a page with the same name in another namespace, but forgot about indexes. User<Svick>.Talk(); 19:00, 24 April 2011 (UTC)

Commons cat redirects

Could someone run a query for intersecting Commons:Category:Category redirects with the interwiki or iwlinks table. The table should have from (wikipedia title page), to(commons category), fix (commons category redirect target). The table could be read by a bot to resolve Misplaced Pages:BOTR#Check_commons_links_and_fix_cases_where_pages.2Fcategories_have_been_moved...not sure how often it should run.Smallman12q (talk) 11:51, 3 April 2011 (UTC)

Replied there. Seems like a stupid situation. --MZMcBride (talk) 00:15, 4 April 2011 (UTC)
Welcome to Misplaced Pages. ;-) Stupid situations will happen a lot; it's good to be able to identify where things have gone wrong and fix them, automatically if possible. Mike Peel (talk) 07:50, 4 April 2011 (UTC)

Red links

Could someone write a report for articles with red links, and templates with red links sorted by number of red links?Smallman12q (talk) 12:18, 7 April 2011 (UTC)

User:RussBot/Templates with red links may provide part of what you wanted. --R'n'B (call me Russ) 16:22, 29 June 2011 (UTC)

Redirects to sections that don't exist

For example. Is this feasible? --Closedmouth (talk) 05:41, 29 April 2011 (UTC)

Please yes. This is horribly annoying. Killiondude (talk) 05:45, 29 April 2011 (UTC)
I'm not sure I'd call needing an extra click "horribly annoying," but yeah, these should be fixed. At least a non-existent section still generally gets you to the appropriate page; it's not as though it returns an error page or something silly like that.
I'm kind of surprised WikiProject Check Misplaced Pages doesn't catch this already, but the extra requests needed to get the parsed section headers probably explain that. I'll try to get to this over the weekend. I have a thought or two about how to do it. --MZMcBride (talk) 06:30, 29 April 2011 (UTC)
I wrote this at Broken section anchors (configuration). There were about 700 results from the first 2500 redirects that contain an anchor checked. That's about 28% of redirects pointing a section that are affected, assuming the rate of false positives is low (which it seems to be). I'll try to get this report set up to update a bit more regularly in the coming days/weeks. For now, there's plenty of work to be done.
Briefly, I'll note that in a lot of these cases, the answer isn't to simply remove the anchor or find an alternate target header. Sometimes the answer is to point the redirect to a different page altogether or add an {{anchor}} to the target page. --MZMcBride (talk) 01:03, 2 May 2011 (UTC)
Poked at this a bit more. Updated the script to be much, much faster and to exclude a few false positives. Woo! --MZMcBride (talk) 02:09, 4 May 2011 (UTC)
^\s*#REDIRECT:?.*?\s*\]*?)(?:\|]+)?\]\] is the regex I used for parsing redirects, I suggest you use it. — Dispenser 22:39, 5 May 2011 (UTC)
I'm doing this task at dewiki (see: de:Benutzer:Krdbot/RedirectDeeplink) based on the dump . I think it could be quite easily adapted to en: if desired. --Krd (talk) 06:52, 29 April 2011 (UTC)
In the current dump there are 203808 redirects pointing to sections. A limit of 100 hits is reached after checking 813 of these, so maybe there are 25000 bad section redirects in total. (This ratio is corresponding to the one seen on dewiki.)
List created at User:Krdbot/RedirectDeeplink. --Krd (talk) 11:16, 29 April 2011 (UTC)
Is the script you're using public? I'm curious how you're parsing section headers. --MZMcBride (talk) 05:19, 30 April 2011 (UTC)
Currently it's not, simply because it grew from a first try to a working solution without emphasis on style and readability. The basic approach for section parsing is to parse the wikicode for known section markers (section headlines, Template:Anchor, etc.) and additionally check the html output for id tags if the first step doesn't match. (Two step approach because because step one hits 90% and the results can be held in a local cache easier than html output.) The code should be quite stable unless in the redirecting page there is a crazy mix of different encodings of special characters. --Krd (talk) 07:07, 30 April 2011 (UTC)
tools:~dispenser/cgi-bin/rdcheck.py (linked from What links here) It also supports generators so categories can be checked or the thing I added to WP:FAC. It now part of the checklist for FAC, though coaching may be needed. Human review is needed: 1) section was spun off into its own article, 2) a bot "fixed" a double redirect, 3) similarly redirects were never pointed to any section. There's some good code in dabfix used for comparing similar links. — Dispenser 05:53, 30 April 2011 (UTC)

Redesigning this project

As this project approaches 100 reports, there are some structural/design/maintability issues that should be addressed at some point. I put some notes at Misplaced Pages:Database reports/Redesign. If anyone has any comments/suggestions/etc., throw them on that page or this talk page or wherever. --MZMcBride (talk) 21:42, 30 April 2011 (UTC)

A caching system allowing user initiated report regeneration along with automatic regeneration. Might allow for simple queries (e.g. contribs tool) if we can work in user specified parameters. Run time histogram and statistics, so expensive reports aren't regenerated when no one is reading them. History of the past 5 report and size/row count. Support for multiple wikis and mirroring reports on wiki, useful for Page count by namespace. Include a dependance chain, e.g. report A require updated project_banner table done by report B. Custom CSS/JS, table filter, and sorting. Support for dump reports. Points system for games??? — Dispenser 21:41, 5 May 2011 (UTC)
Notification options: Email, RSS, or a bot edit (think {{LSR}}) for when reports complete. A refresher script (similar to books) for < 30 minute reports with a graph of previous completion times. — Dispenser 21:00, 3 June 2011 (UTC)
What do you mean? There is always a bot edit when a report completes. Or do you mean having common page for all reports? User<Svick>.Talk(); 22:11, 3 June 2011 (UTC)
Some from an earlier design document: configurable .ini-like python file (i.e. Framework approach), statistics query run time, JS highlighter (they're faster), Form fields have auto-complete and regex validation is applied client and server side.
Controls programs (themselves reports): report data file viewer, report directory list, activity stream, and administration. — Dispenser 22:16, 2 September 2011 (UTC)

Userspace links Update

Any chance of an update for Misplaced Pages:Database reports/Articles containing links to the user space, couple of weeks since the last one--Jac16888 23:40, 21 May 2011 (UTC)

It keeps getting killed for running so long, I think. It's a very expensive query. The report probably needs to be re-implemented using smarter logic.
One idea I had was to only check pages that have been edited in the past 30 days in the article namespace (using the recentchanges table). But even that is about a million queries. I'll have to give more thought about how to get this information efficiently. Dispenser, Svick, et al.: Do you have any ideas? --MZMcBride (talk) 04:32, 30 May 2011 (UTC)
P.S. We need a "Broken database reports (configuration)" report. (-;
You could try to split the query into two: one for User and the other for User talk namespace. But the most likely outcome would be that it's going to be even slower. Another option would be to do the distinct part in memory. (I'm assuming the pagelinks query is the slow one, not the templatelinks one.) User<Svick>.Talk(); 20:12, 30 May 2011 (UTC)

Gadget statistics

/* Gadget statistics - run time 1.5 hrs SLOW_OK */
SELECT up_property, COUNT(*)
FROM user_properties_anonym
WHERE up_property LIKE "gadget-%"
AND   up_value=1
GROUP BY up_property

With the same record keeping purpose as Page count by namespace, I propose the creation of a installed gadgets report (see previous: WT:Gadget#Usage-Stats). The table interesting properties the table includes are: gender, imagesize, diffonly, blank edit summary warning, recent change/watchlist flags bot, minor. We may also want to ask to derived columns such as user_active and a binned edit count. — Dispenser 00:02, 26 May 2011 (UTC)

User preferences (configuration) --MZMcBride (talk) 22:06, 29 May 2011 (UTC)
Skin notes: 0=default, 1=nostalgia, 2=cologneblue; amethyst introduced 1.4. removed 1.5; major of users don't have a skin set they are not counted, thus they are the difference in skin total and the current number of users. — Dispenser 05:32, 11 July 2011 (UTC)

Repeated {{citation needed}}

Could we get a list of pages where a {{citation needed}} is tagged itself with {{citation needed}} as in Facebook_features#Networks.2C_Groups.2C_and_Like_Pages. Smallman12q (talk) 23:17, 1 June 2011 (UTC)

Hmm, you actually want a dump report. In order to do something like this, you need to scan the wikitext of every article (from a database dump) and look for that particular string using regular expressions. Of course then you also have to account for redirects and other bizarre shit that can appear within the templates themselves (like sub-templates and magic words and parser functions).
I was going to do a similar database report (for repeating newlines, actually). I can probably just do this one when I'm doing the other one. If you're bored, it'd be helpful to have a list of strings to test a regular expression against. I'll put some examples below so you can see what I'm talking about. --MZMcBride (talk) 00:03, 2 June 2011 (UTC)
I'm doing this right now using the dump from April 5. Tim1357 17:55, 25 June 2011 (UTC)
Apparently Smackbot was going to fix these, but for some reason didn't. Tim1357 23:53, 25 June 2011 (UTC)
{{citation needed|reason=foo}}{{cn|date=2010-04}}
{{cn|date=2009-02|reason={{PAGENAME}} sucks}}   {{citation needed}}

Implementation notes

Just so I don't forget, what's really needed here is a generic function that can take wikitext input and output pairs of "grabbed templates" and their respective positions within the wikitext. It needs to be able to catch odd cases like {{foo|bar={{baz}}}}, but once it reasonably can, it should be trivial to take each position, subtract it from the subsequent one, and then output only templates that are really close to each other (matching on template name too, of course). A generic template_grabber function would resolve this report and be very helpful for other projects (including the next step, which is taking the grabbed templates and then parsing their parameters). This isn't very complicated, but it will require a few hours of dedicated development time to do properly. --MZMcBride (talk) 01:33, 2 June 2011 (UTC)

AWB uses the regex {{(?>+|\{(?<DEPTH>)|\}(?<-DEPTH>))*(?(DEPTH)(?!))}} to match top-level templates, that can have other templates nested inside them. Although I never understood how it works, and it's probably .Net-specific. User<Svick>.Talk(); 22:07, 2 June 2011 (UTC)
Hey MZMcBride. I'm not sure if this is exactly what you were looking for, but I wrote this template parser a while back to parse templates. It can handle an infinite number of nested templates (ect:{{foo|bar={{baz|foo={{bar}}}}}}). It can also handle embedded images and wikitables. The code is below. Hope it helps at all. Tim1357 22:49, 20 June 2011 (UTC)
Template Parser
spaces=' \t\n\r'
chars=
class Template():
        def __init__(self,text=None):
                '''The parameter "text" is a string that starts with "{{". The parser will automatically find the end of the template'''
                self.start='{{'
                self.end='\n}}'
                self.wrapper=lambda g,p=None:'\n | %s = %s'%(p,g) if p!=None else '\n | %s'%g
                if text != None:
                        self.text=text
                try:
                        self.text=self.text.strip()
                except:
                        raise TypeError('The Class: "Template" requires text be passed to it.')
                self.r=None
                self.cursor=0
                cursor=0
                m = len(self.text)-1
                while 1:
                        if '|'== self.text :
                                self.standard()
                                if self.r!=None:
                                        self.r=iter(self.r)
                                        te=
                                        for p,g in self.r:
                                                if p !=None:p=p.strip()
                                                te.append((p,g.strip()))
                                        self.r = te
                                        del te
                                break
                        elif self.text=='}}':break
                        cursor+=1
                        if cursor>=m:
                                raise TemplateSytnaxError('Could Not find Template Start')
                self.name=self.text.strip('{}|')
        def standard(self):
                max=len(self.text)
                cursor=0
                r=
                try:
                        while self.text!='|':cursor+=1
                except IndexError:
                        self.r=None
                        return
                cursor+=1
                last=cursor
                while cursor<max:
                        n, cursor =self.checkname(cursor=cursor)
                        last=cursor
                        while True:
                                if self.text=='{{':
                                        cursor = self.untill(cursor=cursor+1)
                                elif self.text=='[[':
	                                        cursor = self.untill_img(cursor=cursor+1)
                                elif self.text=='}}':
                                        r.append((n,self.text))
                                        self.r=r
                                        self.text=self.text
                                        return
                                elif self.text=='|':
		                                        r.append((n,self.text))
		                                        cursor =cursor+1
		                                        last=cursor
		                                        break
                                cursor+=1
                                if cursor>=max: break
				self.cursor=cursor
                raise 'No ending brackets!'
        def checkname(self,text=None,cursor=0):
                if text ==None: text=self.text
                n=cursor
                l=cursor
                while text in spaces:cursor+=1
                while text in chars: cursor+=1
                n=text
                while text in spaces:cursor+=1
                if text=='=':
                        cursor+=1
                        return (n,cursor)
                return (None,l)
        def untill_table(self,text=None,cursor=0):
            if text ==None: text=self.text
            s=cursor
            st=1
            while True:
                    if text == '|}':
                            st-=1
                    elif text == '{|':
                            st+=1
                    elif text == '{{':
	                               cursor=self.untill(cursor=cursor+1)
                    elif text=='[[':
	                        	cursor=self.untill_img(cursor=cursor+1)
                    cursor+=1
                    if st==0:
						return cursor
                    if cursor >=len(text):
						return cursor
        def untill_img(self,text=None,cursor=0):
		                if text ==None: text=self.text
		                s=cursor
		                st=1
		                while True:
		                        if text == ']]':
		                                st-=1
		                        elif text == '[[':
		                                st+=1
		                        elif text == '{{':
				                               cursor=self.untill(cursor=cursor+1)
		                        elif text == '{|':
			                               cursor=self.untill_table(cursor=cursor+1)
		                        cursor+=1
		                        if st==0:
									return cursor
		                        if cursor >=len(text):
									return cursor
        def untill(self,text=None,cursor=0):
                if text ==None: text=self.text
                st=1
                while True:
                        if text == '}}':
                                st-=1
                        elif text == '{{':
                                st+=1
                        elif text=='[[':
                        	cursor=self.untill_img(cursor=cursor+1)
                        elif text == '{|':
	                               cursor=self.untill_table(cursor=cursor+1)
                        cursor+=1
                        if st==0: return cursor
                        if cursor >=len(text):return cursor
        def old(self):
			return self.text
        def __dict__(self):
			return dict(self.r)
        def __getitem__(self,key):
			a=dict()
			assert a.has_key(key.lower()), KeyError('\'%s\''%key)
			return a
        def __delitem__(self,key):
			'''Usage: del TemplateObject'''
			for k,v in self.r:
				if k is None: continue
				if k.lower()==key.lower():
					self.r.remove((k,v))
					return
			raise KeyError('\'%s\''%key)
        def __iter__(self):
        	return iter(self.r)
        def __str__(self):
			if self.r == None:
				return self.text
			i=self.start+self.name      
			for p,g in self.r:
				i+=self.wrapper(g,p)
			i+= self.wrapper(self.image,'image')
			i+=self.end
			return i

Report not updating

Can someone please determine why this is no longer updating each month? Misplaced Pages:Database reports/Stubs included directly in stub categories. I'm guessing maybe an error in cron syntax, but I don't see the problem. Dawynn (talk) 13:29, 16 June 2011 (UTC)

"Someone," heh. cron errors are usually caught at insert/update. I have an e-mail from cron that says that the query was interrupted. This was likely due some kind of hiccup or (more likely) the query ran too long and got killed. The e-mail output is below. Honestly, this report shouldn't be a database report at all. It's using a rather hackish system that results in a lot of false negatives, as I recall. It would make a lot more sense as a dump report. Maybe I'll get around to rewriting it at some point. "Dump reports" are currently halted due to some kind of horrible speed issue in the current rewritten script, though. Until the speed problems can be addressed (in a demonstrable way), there won't be any new dump reports. Fun fun fun fun (lookin' forward to the weekend). --MZMcBride (talk) 15:16, 16 June 2011 (UTC)
Your "cron" job on willow
PYTHONPATH=$HOME/scripts python $HOME/scripts/database-reports/directstubs.py > /dev/null
produced the following output:
Traceback (most recent call last):
  File "/home/mzmcbride/scripts/database-reports/directstubs.py", line 72, in <module>
    ''')
  File "/opt/ts/python/2.7/lib/python2.7/site-packages/MySQLdb/cursors.py", line 173, in execute
    self.errorhandler(self, exc, value)
  File "/opt/ts/python/2.7/lib/python2.7/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
    raise errorclass, errorvalue
_mysql_exceptions.OperationalError: (1317, 'Query execution was interrupted')

Highly used template redirects

Suggestion: redirects to highly visible templates which aren't protected, perhaps sortable by number of transclusions. Is that possible? Par exemple: Template:Infobox Organization --Closedmouth (talk) 16:31, 17 June 2011 (UTC)

They are included in Misplaced Pages:Database reports/Unprotected templates with many transclusions which covers all templates, not just redirects. Infobox Organization is 1022 on the list. -- WOSlinker (talk) 18:11, 17 June 2011 (UTC)
Thank you. --Closedmouth (talk) 06:10, 18 June 2011 (UTC)
Suggestion: you're the worst. --MZMcBride (talk) 20:48, 17 June 2011 (UTC)
Your mother. --Closedmouth (talk) 06:10, 18 June 2011 (UTC)
no u hare j 18:54, 18 June 2011 (UTC)

Database for Infoboxes

Hi Guys, I made a SQLite3 database that indexes all the infoboxes used in the article space. I was finding that I was doing quite a bit of work with the API just to find a parameter in an infobox, so I decided to create a database to save time and bandwith. I was wondering if you think the database (about 60/90 MB with/without indexes, uncompressed) would be helpful here. If you think it might come in handy, I'll upload it to the MySQL database. Cheers. Tim1357 18:44, 25 June 2011 (UTC)

Oh, here's the schema (if it matters):
sqlite> .schem
CREATE TABLE InfoboxParameters (Infobox INT, Arg Varbinary, Value Varbinary, Article Varbinary);
CREATE TABLE Infoboxes (Name Varbinary,ID int unsigned, PRIMARY KEY(ID));
CREATE INDEX Infobox_Arg_Index on InfoboxParameters (Infobox,Arg);
Tim1357 18:46, 25 June 2011 (UTC)
Will you be keeping it more up to date than Templatetiger, which is currently 6 months old? — Dispenser 19:28, 25 June 2011 (UTC)
Holy crap! That's awesome. Do we have query access to the database? Tim1357 00:01, 26 June 2011 (UTC)
(We do) Tim1357 16:28, 27 June 2011 (UTC)

WikiProjects by size

Could we add a report on the largest WikiProjects by the number of articles tagged with a project's banner? The data for each project is available on the toolserver when "All" is clicked but there is no option to sort the projects by number of articles. Once each month or two would probably be frequent enough for this task. This report would be useful for size comparisons and descriptions in the Signpost's WikiProject Report. -Mabeenot (talk) 23:56, 26 June 2011 (UTC)

Erm, here goes:
SELECT p_project, p_count,p_qcount,((p_qcount+.0)/p_count)*100 as percent_qualityAssessed ,p_icount,((p_icount+.0)/p_count)*100 as percent_importanceAssessed 
FROM u_titoxd_wp10_p.projects
ORDER BY p_count DESC;

The output (after I formatted it a bit) is here. The report is very lightweight and should be no problem to run monthly. Tim1357 04:33, 13 July 2011 (UTC)

Thanks! -Mabeenot (talk) 22:25, 13 July 2011 (UTC)
Whoops, it looks like I was using the wrong (out-dated) database. I think the one in use now is u_enwp10. I don't have read-access to that database right now, but I asked CBM (the maintainer of the project) for access. Tim1357 14:46, 14 July 2011 (UTC)
He generated the report for me. Is this better? Tim1357 18:49, 15 July 2011 (UTC)
Much better. Thanks for your hard work. -Mabeenot (talk) 20:58, 18 July 2011 (UTC)

Images without FUR report

Please consider addding {{Non-free image data}} and ((tl|Non-free image rationale}} Images to the exclusions list for this report, if not already done so . Sfan00 IMG (talk) 10:48, 27 June 2011 (UTC)

Redirects containing red links

The Misplaced Pages:Database reports/Redirects containing red links report is flooded with talk pages that contain templates based on the {{WPBannerMeta}} meta-template, which automatically create links to "/Comment" subpages whether the subpage exists or not. In other words, Talk:TOPIC ends up in the report because it "contains" a red link to Talk:TOPIC/Comments, although the redlink actually only exists in the template code and wouldn't be visible to readers even if the talk page were not a redirect. Suggested tweak: exclude any link to a page in namespace 1 whose title ends in "/Comments" from the report. --R'n'B (call me Russ) 14:06, 29 June 2011 (UTC)

User talk pages which redirect to another namespace

Just what it sounds like. Maybe monthly? Some new users redirect move their user pages (and associated talkpages) to the mainspace when they are done drafting an article in the userspace. However, that means that automated tools like Twinkle which leave a note at the user's talk page are actually redirect to the article talk page. These user pages need to be un-redirected and having a list of them should make that easy. –Drilnoth (T/C) 21:48, 20 July 2011 (UTC)

Inactive users report

Is there any reason why some usergroups are listed twice? For example:

No. User Last edit User groups
863 LawBot 20101017062645 bot,bot
1491 Tb 20110115193417 rollbacker,reviewer,rollbacker,reviewer,rollbacker,reviewer,rollbacker,reviewer

Is this a bug, or an undocumented feature? :-) ennasis @ 06:31, 20 Tamuz 5771 / 22 July 2011 (UTC)

Yeah, it is a bug. It only happens when the two (or more) newest contributions of an user have the exact same timestamp. User<Svick>.Talk(); 23:13, 22 July 2011 (UTC)
That would do it. :-) Thanks for the info. ennasis @ 05:08, 25 Tamuz 5771 / 27 July 2011 (UTC)

Most viewed pages

Maybe it is not the best place, but is it possible to generate list of 1000 pages with most views in last year? Bulwersator (talk) 09:23, 22 July 2011 (UTC)

Remove some reports now that PC is gone?

Now that Pending Changes has ended, shouldn't we remove Misplaced Pages:Database reports/Potential reviewer candidates? — Train2104 (talk • contribs • count) 15:41, 23 July 2011 (UTC)

PC has ended, but it has not been turned off and there is no guarantee that it will not be used again. I think this is probably a good report to keep until the future of PC is clear. That said, we could probably move the updating to once or twice a month until that time. --Izno (talk) 05:23, 27 July 2011 (UTC)

Re-used image names

I would like a list of filenames -- both on en.wikipedia and on Commons - that have more than one image attached to them due to accidental overwriting, so as to figure out which ones need splitting into separate images.

This will probably generate a huge amount of false positives; it can be reduced by eliminating images which are uploaded by authorized image-modification bots like RotateBot, and by eliminating later images uploaded by the original uploader. It will still, however, require human attention to tag them for splitting. DS (talk) 01:08, 30 July 2011 (UTC)

File have hashes so we can detect if they're identical. I would think that somebody has already written this... — Dispenser 14:37, 30 July 2011 (UTC)
If so, then where is it? Also, hashes won't allow us to detect if (image 2) is a cropped version of (image 1), will they? DS (talk) 18:27, 30 July 2011 (UTC)

Pages which have been semi-protected the longest

I'm not sure if this is possible, but what would be helpful would be a list of semi-protected pages by how long they have been semi-protected. It might be useful to see if those which have been semi-protected the longest could afford to be unprotected. –MuZemike 16:31, 2 August 2011 (UTC)

We have this for articles (also here) and talk pages, if you sort by timestamp, you can view them chronologically. ennasis @ 22:02, 4 Av 5771 / 4 August 2011 (UTC)

Images identified as lacking NFUR

Would it be possible to add {{Has-NFUR}} to the list of templates on an image which the report skips over?

Also {{Standard rationale}} and {{short rationale}} were removed a while back and should thus not be being used to skip images.

Sfan00 IMG (talk) 10:28, 22 August 2011 (UTC)

What images would this new template be used on? –Drilnoth (T/C) 12:48, 23 August 2011 (UTC)
Images that meet NFCC criteria but which uses a template form not currently recognised by the report concerned.

Such as rationales written directly as text.

Sfan00 IMG (talk) 10:50, 24 August 2011 (UTC)

Couldn't most of those be converted to using templates? I'm not opposed to the new template, I'm just wondering if it is redundant. –Drilnoth (T/C) 16:28, 24 August 2011 (UTC)

Inactive bots

Further to Misplaced Pages talk:Bot Approvals Group/Archive 7#Bots & long term inactivity, could we have a report updated once monthly or quarterly that lists accounts with the 'bot' flag that have no edits or logged actions in over 1 year, sorted by activity with longest inactive at the top? (Am trying to think if there is an easy way to identify read-only bots that should not have any edits or logged actions, will let you know if I come up with anything.)

If possible can you identify the owner and list their last edit or logged action in the table? –xeno 13:29, 2 September 2011 (UTC)

Articles containing overlapping coordinates - missed update

I notice that Misplaced Pages:Database reports/Articles containing overlapping coordinates has not been updated in over nine days. Does anyone know why? —Stepheng3 (talk) 18:44, 17 September 2011 (UTC)

It got killed by the query killer, I imagine. Or there was a hiccup somewhere. Does it matter? I looked through a few of the pages in the list. There's plenty of work to be done (lots of cases where the coords overlap exactly). --MZMcBride (talk) 18:53, 17 September 2011 (UTC)
I was just worried it might indicate a persistent issue with report updating. I agree there's plenty of work to do. —Stepheng3 (talk) 21:13, 17 September 2011 (UTC)
Oh, no. The Toolserver has been more overloaded lately, so the Toolserver administrators have implemented (or re-implemented, I guess) a "query killer." I haven't written much about it anywhere, but it's impacting half a dozen or so reports. If the replication lag on the server increases, the query killer gets more aggressive. Some very slow queries in some reports are now being killed before completing (emptycats.py, userlinksinarticles.py, templatedisambigs.py, etc.). Some of these reports will probably have to be rewritten. A few may have to be disabled. I'm not sure how many people realize how many reports there are nowadays (over 100 at this point, I think). It's a bit annoying to maintain all of them by myself. :-) --MZMcBride (talk) 21:52, 17 September 2011 (UTC)
Thanks. I'm learning SQL right now. At some point I may apply for a toolserver account and try to help you with reports. Until then ... —Stepheng3 (talk) 21:57, 17 September 2011 (UTC)
For what it's worth, getting a Toolserver account was pretty much how I learned SQL. There's no better place, really. You have access to small and large constantly updating replicated databases that are set to read-only, so you can't break them even if you wanted to. It's a great place to play around with syntax, learn conventions, etc. I wrote some docs at tswiki:Queries and tswiki:Database access at some point that are helpful for new users. (In general, I think there's a somewhat glamorized view of Toolserver users. Most of them have no idea what they're doing.)
All of this is largely unrelated to database reports, though. They're run from the Toolserver, but the master copy of the reports code is on GitHub. A few people have access to the repo right now (Svick and maybe bjweeks?), but it'd be nice to have more people on it, assuming they won't break anything. :-) --MZMcBride (talk) 22:05, 17 September 2011 (UTC)

The status of this backlog has changed radically in the past few weeks, thanks to my colleague Redrose64. I'd like this report to become more frequent now. Perhaps twice a week? —Stepheng3 (talk) 00:41, 9 November 2011 (UTC)

I posted a real-time version of the report using catscan2 at WP:GEO/TODO. — Dispenser 18:54, 9 November 2011 (UTC)
That's helpful. Thanks, Dispenser! —Stepheng3 (talk) 20:02, 9 November 2011 (UTC)
Dispenser: <3.
Stepheng3: With the number of rows decreasing per report, I'd actually be a bit more inclined to make the updates fortnightly or something, to increase the value of an update. But for now, I think weekly is fine. The whole point is kind of moot if it's possible to get a dynamic list, which given the fact that this is just the intersection of two queries, it's really not very difficult. catscan seems a bit slow, but there's no rush. :-) --MZMcBride (talk) 03:59, 10 November 2011 (UTC)
Understood. —Stepheng3 (talk) 21:10, 10 November 2011 (UTC)

Local media description pages for F2 deletion.

I've been going through the following list over on Toolserver : http://toolserver.org/~magnus/image_pages_without_image.php?language=en&project=wikipedia and very slowly weeding out local media description pages that meets F2 criteria..

Would it be possible to have a report that compiles a list of media files for which F2 is applicable namely?

  • Page is in File: namespace
  • Page is a local description page for which the actual media file is on Commons.
  • Page does not contain templates:

Having such a report would help speed up what I am attempting to do from the toolserver list by not including media description pages which do need to be local..

Sfan00 IMG (talk) 21:54, 18 September 2011 (UTC)

Could you list all of the templates you want excluded? User<Svick>.Talk(); 17:29, 20 November 2011 (UTC)

Short filenames

To be run on a weekly basis , find media files with Very short filenames.

See: User:Tom_Morris/todo#File_name_length_query

Sfan00 IMG (talk) 14:04, 28 September 2011 (UTC)

Grep tool /^.{0,9}$/Dispenser 15:06, 28 September 2011 (UTC)

Monthly reports not being updated?

I noticed that some monthly reports show the last updates in April and July. Looks like something is slipping through the cracks. Also long stubs shows as having been updated a few days ago, but all on get on the wiki is the October 21 version. Vegaswikian (talk) 19:20, 31 October 2011 (UTC)

SPA Account edits

Can a list of articles which are more than 4 months old for which editors that have edits only to that article make up more than 1/3 of the total editors of the article be made? Smallman12q (talk) 01:53, 12 December 2011 (UTC)

Duplicate licensing sections in file

Could a list of files which have 2 or more "Licensing" sections be made?Smallman12q (talk) 20:56, 15 December 2011 (UTC)

I could make one to have two search for two different Licensing templates on the same page, but otherwise that would require a much more intensive dump scan. Tim1357 16:02, 23 December 2011 (UTC)
You could make the scan more general to find two identically( or almost identically) named sections on image files. It would probably require a dump scan...but it seems like the script for the scan should be very short.Smallman12q (talk) 04:00, 24 December 2011 (UTC)

Images with conflicting Categorization

Misplaced Pages:Database_reports/Files_with_conflicting_categorization

I'm finding that a number of these are

  • User taken photos of art - being tagged as {{photo of art}}
  • Images where the conflict is because of an NFUR block on an image with a nominally 'free' license, beign tagged as {{NFUR not needed}}


Would it be possible to drop images with the relevant templates present from the report, as they have already been identified and put in process?

Sfan00 IMG (talk) 12:27, 22 December 2011 (UTC)


Add file date sort option to Unrationaled files report

Would it be possible to add a file-date sort option to : Misplaced Pages:Database reports/Non-free files missing a rationale so that older uploads can be handled in a more sensitive manner? Sfan00 IMG (talk) 00:24, 23 December 2011 (UTC)

Page count lists

I have recently learned about WP:DBR. I saw Misplaced Pages:List of Wikipedians by article count, which is not currently included at DBR, and wondered if it would be worth producing similar reports for template space, image space, category space and project space. Also, since the report ingores redirects (and there are more redirects than non-redirects in mainspace), maybe a redirect list could be produced too. I think these reports could be produced weekly and listed at DBR. I have tried asking about this at WP:VPT and WP:HELPDESK.--TonyTheTiger (T/C/BIO/WP:CHICAGO/WP:FOUR) 15:09, 23 December 2011 (UTC)

Aduiting file attribution requirement

CC-By-SA files have a requirement for attribution by URL. Lots of people forget this when they use |link=|. The query below doesn't work since usability team decided to merge the file list into WhatLinksHere. Although a dump report is still possible.

SELECT il_from, il_to, include.tl_title
FROM page AS image
JOIN imagelinks     ON il_from!=0      AND il_to=page_title
LEFT JOIN pagelinks ON pl_from=il_from AND pl_namespace=page_namespace AND pl_title=page_title
LEFT JOIN templatelinks AS exclude ON
      exclude.tl_from=page_id
  AND exclude.tl_namespace=10
  AND exclude.tl_title LIKE "PD-%"
JOIN templatelinks AS include ON
      include.tl_from=page_id
  AND include.tl_namespace=10
  AND include.tl_title LIKE "Cc-by-sa%"
WHERE image.page_namespace=6
AND pl_from IS NULL
AND exclude.tl_from IS NULL
LIMIT 550;

Dispenser 19:58, 8 January 2012 (UTC)

'Non-article' exception to the empty-category DBR

The 'non-article' exception to Misplaced Pages:Database reports/Empty categories probably can be removed. It appears that there are no longer any empty categories containing "non-article", and any that still exist or come into existence should be identified and handled. -- Black Falcon 09:28, 13 January 2012 (UTC)

"It appears that there are no longer" - but it may appear again Bulwersator (talk) 10:04, 13 January 2012 (UTC)
That's true, of course, but we should know about any that do appear so that they can be populated or deleted. 'Non-article' has almost wholly been superseded by 'NA-Class', so there is no good reason for empty 'non-article' categories to remain undisturbed. -- Black Falcon 18:49, 13 January 2012 (UTC)

The first order of business is fixing this report all together- It's supposed to be updating daily but hasn't for over 9 days. Removing that exception seems reasonable to me, however. VegaDark (talk) 18:57, 13 January 2012 (UTC)

I noticed that, too, but I couldn't figure out the reason: BernsteinBot is still operating and the configuration for the report hasn't been altered since 2009. -- Black Falcon 21:53, 13 January 2012 (UTC)

There are only two remaining 'non-article' categories. Neither one is empty and one I've already nominated for deletion, so I have proceeded with removing (diff) the 'non-article' exception from the configuration page. -- Black Falcon 03:12, 26 January 2012 (UTC)

Abandoned articles thingy

Over at Bot Requests, Baa made a proposal about something that could find articles with the least amount of edits in X years. They suggested I take it over here since toolserver can apparently handle it better. Ten Pound Hammer01:43, 8 February 2012 (UTC)

Before anyone here starts work on this, I've already replied at WP:BOTREQ. Tim1357 04:48, 8 February 2012 (UTC)

Report not updating

WP:DBR/Non-free files missing a rationale is not updating, It's giving blank results :(

Sfan00 IMG (talk) 17:23, 21 February 2012 (UTC)

Maybe one day...

  • Oversized non-free files (configuration) — figure out what state this is in (no notes, I guess?) and fix it
  • user subpages recently where the account didn't exist or was deleted so would it be possible to see a report of subpages for nonexistent accounts?
  • subpages of indefinitely blocked accounts.
  • a listing of IPs with subpages

--MZMcBride (talk) 02:53, 22 February 2012 (UTC)

If I may, I'd like also to re-request one for red-linked categories with significant incoming links (i.e., links from all namespaces except User:, Misplaced Pages: and the talk namespaces). This is an ongoing issue for the editors who operate WP:CFD/W and such a report would be extremely helpful. -- Black Falcon 21:16, 22 February 2012 (UTC)
Reasonable enough. I replied up there. --MZMcBride (talk) 04:59, 23 February 2012 (UTC)

Looking for Reports on Bot Data

Hello. Some of you might have run into me before...I am doing a research project on bots, bot operators, and technical tools on WP and WM projects. I'm wondering if anyone wants to tackle this problem, which would help me out tremendously. I am looking for stats and data on bots, especially over time. Things like:

  • (#) of bot accounts registered over time (by month would be fine) (on English WP)
  • (#) of bot edits over time (on English WP)
  • (#) of BRFA approved and not approved over time (on English WP only, obviously)
  • same trends for bot use on other language versions (which would be a bonus)

I've found some info on these things spread around WP, but nothing that is both up-to-date and reasonably accurate/reliable. If you're interested in investigating this with me, I'd really appreciate the help (I don't have the technical knowledge base personally to find this data). Please let me know here or on my talk page.

And if you're a bot operator or Wikimedia developer (or someone who deals with the technical infrastructure of WP) and you'd like to be interviewed, please see my call here.

Thanks! UOJComm (talk) 20:14, 25 February 2012 (UTC)

I think you might also want to know how many bot accounts are active (i.e. being used) and how active they are (edit count) over time. The count of bot accounts will just rise and rise, as it's unusual for a bot account to have the flag taken away. Josh Parris 22:42, 25 February 2012 (UTC)
Apparently there has been ~800 bots ever and currently there are only ~650 tagged. I'm working on the first one right now. Tim1357 23:05, 25 February 2012 (UTC)
The first one is done . For the second one, did you want # of bot edits to date, or # of bot edits that week/month/day? Tim1357 23:52, 25 February 2012 (UTC)
Hi Josh Parris and Tim1357...Thank you both for your comments and your work. Josh...yes, stats on active versus inactive bots would help. Tim...this looks great (thank you!) Can you briefly explain to me how you got these numbers? For the second bullet point, I meant # of bot edits per month and year, though knowing the total # of bot edits to date would also be helpful (and in turn, what percentage of all edits that represents). Are you also able to do this for other language versions? Thank you again for the help! UOJComm (talk) 22:29, 27 February 2012 (UTC)

Large non-free files DBR

Category:Non-free Misplaced Pages file size reduction request recently was renamed to Category:Misplaced Pages non-free file size reduction requests. I'm not sure if the change affects the Large non-free files (configuration) database report; I could not find anything in the configuration page, but thought it best to report the change nonetheless. -- Black Falcon 22:45, 16 March 2012 (UTC)

Configuration page is definitely old. The category name needs to be adjusted in the code. Bleh. --MZMcBride (talk) 21:59, 17 March 2012 (UTC)

List of superseded images still used in the article name space

Continued from Misplaced Pages:Village pump (technical)#Searching for images in a category which are in use in main space

The categories Images made obsolete by a PNG version and Misplaced Pages images available as SVG contain files which have been superseded. Many of these images have been replaced in articles with their superseded versions, but there are a lot of articles which still need to be updated to use superseded images. I would like to have a list of files from each category that are still used in main space, preferably sorted by the number of articles the file is used in. I can't seem to find a way to do this with automated tools such as AWB or CatScan, but if there is a way I'd be happy to do it myself if possible. These lists would help me with the running of my bot (DanhashBot). Can anybody write and run database queries for these two lists, or explain how I can use an automated tool to compile such a list myself? Thanks! —danhash (talk) 00:57, 1 April 2012 (UTC)

Dom says that https://toolserver.org/~magnus/glamorous.php can do what you want to do, but it only works for Commons currently. You'll need to poke Magnus to add a database selection option.
What you're asking about is fairly trivial to do. It's just a matter of joining against the imagelinks table, pretty much. --MZMcBride (talk) 05:46, 1 April 2012 (UTC)
It would be much easier to have this information in the form of a database report. Glamorous seems to be able to provide (all or some of) the information I'm looking for, however it lists every language Misplaced Pages and is in a format that is very hard to use for this task. Any possibility of a new database report being created for images in use on the English Misplaced Pages? —danhash (talk) 18:04, 5 April 2012 (UTC)
I asked at jira:DBQ and Hoo man ran a query for me. Any possibility of adding his query as a report here? —danhash (talk) 14:14, 10 April 2012 (UTC)

Do you have a report name and update frequency in mind? --MZMcBride (talk) 03:35, 11 April 2012 (UTC)

"Superseded files used in articles". Weekly, at least at first, would probably be fine. Also, if the report could be configured to include Commons categories as well (commons:Category:PNG version available and commons:Category:Vector version available) that would be extremely helpful. —danhash (talk) 14:31, 11 April 2012 (UTC)

So what I'm imagining is a report with four sections. It would look something like this:

Superseded files used in articles; data as of ...
== Images made obsolete by a PNG version ==
| No. | File | Uses
== Misplaced Pages images available as SVG ==
| No. | File | Uses
== PNG version available ==
| No. | File | Uses
== Vector version available ==
| No. | File | Uses

Would that work? I think this should take about five minutes to create. --MZMcBride (talk) 18:00, 11 April 2012 (UTC)

That seems great. It'd be convenient if the list was descending-sorted by the "Uses" column. —danhash (talk) 18:46, 11 April 2012 (UTC)
Right. That's easy enough to do. The output will have to be truncated or paginated, though. I just looked and two of the four categories are very large (commons:Category:Vector version available contains over 47,000 files). Any thoughts on this? --MZMcBride (talk) 18:53, 11 April 2012 (UTC)
How many of those 47,000 are used in articles? I'd think that the list wouldn't be so long that reasonable pagination would be a problem, but if so the report could be limited to the top 300 or so files (or any other number that seems reasonable). —danhash (talk) 19:57, 11 April 2012 (UTC)
A lot, apparently. I set a hard limit of 1000 rows. This report is now written: Superseded files used in articles (configuration). It still needs to be properly incorporated into Misplaced Pages:Database reports, but I'll do that later. --MZMcBride (talk) 23:46, 11 April 2012 (UTC)
Hmm, I missed the "in articles" bit in writing this. Fuck. --MZMcBride (talk) 23:47, 11 April 2012 (UTC)
Surely an easy fix? Josh Parris 23:42, 16 April 2012 (UTC)
Eh, I don't think it'll be awful. I was waiting for feedback about the current list. Is this holding up the bot request? --MZMcBride (talk) 23:52, 16 April 2012 (UTC)
That's my reading of what the op is saying. But don't feel pressured by that. Josh Parris 00:47, 17 April 2012 (UTC)
I have the original two lists that I got from jira:DBQ in a sandbox, but I can't sort them by number of uses. Sorting is not technically necessary, but on images with a small number of uses (especially those with just one or two uses) it is easier to update articles manually as it's not worth setting up AWB. I want to start with files with the highest number of uses, and with the current list it's not easy to find those files. There are other solutions I'm sure (for example I could ask Hoo man to re-run the query and sort the output (should have thought of that when I first asked him)), it just seems like a database report here would be the easiest to use and the simplest to update, but if it's too difficult or time consuming to do I can certainly find another way. —danhash (talk) 14:16, 17 April 2012 (UTC)

(unindent) Your reply reads as though you missed Superseded files used in articles (configuration). --MZMcBride (talk) 21:59, 17 April 2012 (UTC)

Ummm apparently I made a mistake. When you said you missed the "in articles" bit, I thought you meant that you listed the images in each category, but forgot to limit the list to images that were used in articles, so that the database report was a copy of the images in each category. But that was a total misunderstanding on my part and I'm really not sure how I got confused like that or how I didn't figure it out sooner; sorry about that. When you said you missed the "in articles" bit, did you mean that the uses column counts all uses, not just article-space uses? —danhash (talk) 13:22, 18 April 2012 (UTC)
Yes, sorry. The current "uses" column accounts for links from any namespace. I'll try to fix this tomorrow.
The Commons component made this report significantly more complex. Because now we're querying Commons for the file names, but we have to set the scope to only English Misplaced Pages uses, otherwise you end up looking at usage on Commons (or Wikimedia-wide file usage). --MZMcBride (talk) 03:20, 19 April 2012 (UTC)

Watched pages

Is there some way to run the list of 10,000 or 100,000 most-read articles from grok.se and then use MZM's script from Misplaced Pages:Database reports/Most-watched pages to see which have 5 or fewer watchers? It could be a first-step solution to Wikipedia_talk:Special:UnwatchedPages#Fix_Unwatched_Pages_technical_issues_and_de-restrict_visability. MBisanz 16:47, 13 April 2012 (UTC)

I can provide a list of unwatched by active users of the 10,000 most linked pages. However I doubt with watchlist bankruptcies and fatigue of veteran editors that we'll get 500 useful active watchers. — Dispenser 21:01, 13 April 2012 (UTC)
It's still worth a shot, thanks. MBisanz 16:17, 14 April 2012 (UTC)
The query ran for about 50 minutes. We have 507,131 articles with no watchers and another 382,684 articles with no active watchers. Active users are those who's user_touched is within the last 30 days, same as $wgRCMaxAge (It was more accurate before automatic log out time was increased from 30 to 180 days). Per Toolserver Rules I can't give this list away. — Dispenser 05:25, 15 April 2012 (UTC)
Thanks! So, as I understand it, that means 900,000/3,900,000 are not being monitored? Can you think of any way to prioritize those 900,000 into a smaller sample (top 1,000 or 10,000 most-viewed pages) and then another way to get the TS to give permission to give the list (say via email) to trusted users who could add subsets to their watchlists? I'd personally be willing to drop a 500 page chunk on my watchlist. MBisanz 17:21, 15 April 2012 (UTC)
Assuming a median network latency of 0.3 sec × 900,000 pages = 3 days to run the report. While OK for a onetime job, its not viable for the long term. I had originally sorted by amount of incoming links (there's correlation view count), but I found the pages rather uninteresting. Even built a tool to select from WikiProjects. Still nothing interesting, although seeing WikiProject Fictional characters with a lower unwatched rate (8%) than WikiProject Biography; WikiProject Poland (77% unwatched) beat WikiProject Antarctica (70% unwatched) was amusing. So many stubs, orphans, and obscure incomplete list articles.
Anyway, following from the discussion I hacked up recent changes to only show one-fifth of articles, tools:~dispenser/cgi-bin/unwatched_changes.py. Its clear to me now that we need to rethink RC patrol. AJAXified with "X new changes, update now" atop, articles scope to your interests, inline-diff, reverts automatically hidden, user karma/hours, diff verification with multiple levels, Cluebot vandalism score, improved automatic edit summaries, and more. Only if people cared enough. — Dispenser 05:42, 1 May 2012 (UTC)

Expand page creations list to other namespaces?

Misplaced Pages:List of Wikipedians by article count could be expanded to cover other namespaces. --MZMcBride (talk) 15:49, 7 May 2012 (UTC)

Orphaned articles something something

Orphaned articles something something.

SELECT
  page_title
FROM page
LEFT JOIN pagelinks
ON pl_title = page_title
AND pl_namespace = page_namespace
AND pl_namespace = 0
WHERE pl_namespace IS NULL
AND page_namespace = 0
AND page_is_redirect = 0
LIMIT 100;

Exclude redirects, disambiguation pages, and pages tagged with {{orphaned}}.

  • Report title: ??????
  • Update frequency: ??????

--MZMcBride (talk) 23:38, 10 May 2012 (UTC)

A note about the page history

From Roan:


The segfault was due to excessive recursion in PCRE, triggered by a regex in PageTriage. Offhand it looks like it would be triggered on pages where there are more than ~18k characters between a '{{' and its matching '}}'

We're fixing it by setting the PCRE recursion limit to 1k, apparently the default value (which it was set to previously) is 100k which is way too high


--MZMcBride (talk) 02:55, 11 May 2012 (UTC)

This refers to the edits by 0:0:0:0:0:0:0:1, 216.38.130.164 and 208.80.152.165 on May 11th, 2012 between 01:01 and 02:02 UTC. Those were caused by me chasing down an error that was occurring when people edited this page: you'd see an error page with ERR_ZERO_SIZE_OBJECT but your edit would go through. This pointed to a segmentation fault in the Apache process handling the request, and tracking it down was difficult (and added another bogus edit to the page history every time I tried), but I got there in the end as explained in my quote. --Catrope (talk) 03:01, 11 May 2012 (UTC)

Templates linking to other templates' edit pages

I would like, if it's possible, to have a list of all cases where:

  1. Template:A links to Template:B's edit page (see this version of {{Camp Half-Blood}} for an example)
  2. Template:A and Template:B aren't documentation pages (that is, pages ending with the "/doc" ending)
  3. Template:A transcludes one of {{Navbox}}, {{Navbox with columns}}, {{Navbox years}}, and {{Navbox with collapsible groups}}
  4. Template:B isn't {{Navigation templates}}

Templates like that usually are the result of some navbox being renamed without the "name" parameter being updated. עוד מישהו Od Mishehu 09:30, 13 May 2012 (UTC)

SUL glich

Would it be possible to get a list of SUL accounts without any attached local accounts? This is a random glich that can happen when the last local account is renamed or when a user's registration breaks. MBisanz 18:36, 14 May 2012 (UTC)

What's a glich? --MZMcBride (talk) 19:19, 14 May 2012 (UTC)
A glich is what happens when my spell checking software is disabled and I try to type glitch. MBisanz 19:24, 14 May 2012 (UTC)

Broken report

Misplaced Pages:Database reports/Large non-free files tells that the list contains files which are not in Category:Non-free Misplaced Pages file size reduction request. However, the category is called Category:Misplaced Pages non-free file size reduction requests and it seems that the report does indeed contain files in that category. I suppose that you should also check Category:Misplaced Pages non-free file size reduction requests for manual processing if you're not already doing this. --Stefan2 (talk) 18:22, 25 May 2012 (UTC)

Uncategorized templates

Amy chance that Misplaced Pages:Database reports/Uncategorized templates could be run twice a week? -- WOSlinker (talk) 21:33, 8 June 2012 (UTC)

High-quality non-free sound files

Per Misplaced Pages:Manual of Style/Music samples and non-free content criterion 3, non-free music samples will rarely need to be more than 64kbps. It should be easy to create a database report which lists all non-free audio files of higher than 64kbps, which could then be flagged as needing attention/fixed. (It may also be worth not including any files currently tagged with {{non-free reduce}}, as these have already been flagged as needing attention.) A weekly update would probably be fine. Thanks, J Milburn (talk) 21:04, 25 June 2012 (UTC)

Category sort key and category main articles

During a discussion on a proposed category page MOS at Wikipedia_talk:Manual_of_Style/Category_pages#Cat_main the issue was raised of missing and incorrectly sorted main articles in categories. As an example science should be (and in this case is) in Category:Science with a category sort order of a space. Due to either forgetting in the case of new categories or removal due to vandalism this important bit of code or important categorisation is often missing. Can a report be generated that dives the instance where there is a category lacking its associated article and another report giving the cases of the absence of the space sort key? See WP:SORTKEY for info. -- Alan Liefting (talk - contribs) 02:45, 27 June 2012 (UTC)

Broken section links

I noticed that this page, which lists broken section links, was last updated over a month ago and is now out of date. Can the page be updated more frequently (perhaps once a day or once a week?) I'm trying to fix broken section links on Misplaced Pages. Jarble (talk) 18:13, 2 July 2012 (UTC) It appears that the page was updated daily until April 20, 2012. Jarble (talk) 18:15, 2 July 2012 (UTC)

Images without Fair Use rationale

A large number of the long entries showing up in this report do in fact have an NFUR meeting NFCC, I've been marking this up with {{has-NFUR}}. Could this be added to the list of 'templates' to look for when removing the item from the report? Sfan00 IMG (talk) 09:12, 6 July 2012 (UTC)

Untagged stub articles

Per discussion at Wikipedia_talk:WikiProject_Stub_sorting#Untagged stub articles, weekly report showing short articles that aren't marked as stubs and are not marked as some other type of normally short page.

SELECT concat( '*] - (', page_len, ')' )
FROM enwiki_p.page
LEFT OUTER JOIN enwiki_p.categorylinks cc ON cl_from = page_id AND ( cl_to LIKE '%_stubs'
  OR cl_to IN ( 'All_disambiguation_pages', 'All_set_index_articles', 'Redirects_to_Wiktionary', 'Wikipedia_soft_redirects' ) )
WHERE page_namespace = 0
AND  page_title NOT LIKE 'List_of_%'
AND page_is_redirect = 0
AND cl_from IS NULL
AND  page_len < 1500
ORDER BY page_len ASC
LIMIT 1000;

- TB (talk) 20:06, 16 July 2012 (UTC)

Please add "AND page_title NOT LIKE 'Lists_of_%'" to the query above. This should be categorized under "Stub" reports as a companion to the Long stubs report. The long stubs report lists articles that have stub tags but probably shouldn't (based solely on length of article). This report indicates articles that do not have stub tags, but probably should. Dawynn (talk) 11:33, 31 July 2012 (UTC)

Many of the reports aren't being updated

Template:Toolserver

Many of the reports are currently not being updated - for example: Articles containing links to the user space was last updated at 15:25, 19 May UTC and should be updated weekly; PRODed articles with deletion logs was last updated at 22:45, 8 April and should be updated daily; and File description pages containing no templates or categories was last updated at 21:15, 19 March and should be updated daily. What's happenning here? עוד מישהו Od Mishehu 09:12, 17 July 2012 (UTC)

Presumably the Toolserver issues over the course of the last few weeks (months?) have played into this. There are probably threads on WP:VPT about this issue. Killiondude (talk) 18:34, 20 July 2012 (UTC)
I just posted this someplace else, but it looks like the s1 cluster, which is this wiki, toolserver database updates are backlogged over 8 days and this growing hourly. I'm adding a table that updates hourly. Vegaswikian (talk) 19:50, 25 July 2012 (UTC)

Update request

Misplaced Pages:Database reports/Largely duplicative file names Sfan00 IMG (talk) 17:12, 20 July 2012 (UTC)

I've popped an updated list up at User:Topbanana/DupeFilenames to tide you over until the proper report can be re-run. - TB (talk) 21:05, 20 July 2012 (UTC)

Uncategorized templates II

Wikipedia_talk:Database_reports/Archive_2#Uncategorized_templates indicates that when the "Uncategorized templates" report was created in December 2009, it was limited to 1000 entries because there were "too many" uncategorized templates to do a full report. Is that still the case? If not, would it be possible to have a full report? DH85868993 (talk) 05:49, 24 July 2012 (UTC)

It's always possible to do a full report. The question is whether it makes sense to do so. I don't know how many uncategorized templates there were in 2009 or how many there are now, but I imagine it'd be a lot of subpages if I switched the report to a paginated output. Do you need the full list for some reason? If you categorize the templates, wait for a refresh of the report, categorize the templates, etc., eventually you'll get less than 1,000 entries at Uncategorized templates (configuration). What's the problem? --MZMcBride (talk) 05:55, 24 July 2012 (UTC)
Sometimes when processing Uncategorized templates (configuration), rather than just progressing through the list, I prefer to scan the list looking for "easy targets" (e.g. it's probably going to be pretty easy to find a suitable category for a template named "Australian_<something>"). Having more templates listed in the report, would increase the number of "easy targets" available to address. I also think it would be useful to know how many uncategorized templates there were each time the report was run, so we could see whether the number was static, reducing or increasing over time. Perhaps if a full report is impractical, the number could be increased to 2000 or 3000? Just an idea. DH85868993 (talk) 09:13, 24 July 2012 (UTC)
As of (mumble mumble) 5-ish days ago, there were approximately 118,000 articles in namespace 10 with no categorylinks. - TB (talk) 18:10, 24 July 2012 (UTC)
Did you account for redirects? I imagine about half are template redirects (which are usually uncategorized, naturally). I'm trying to pull the figure now, but the Toolserver is taking its sweet-ass time. --MZMcBride (talk) 22:25, 24 July 2012 (UTC)
MZMcBride, please don't invest too much effort trying to determine an accurate figure. Even if half of the 118,000 identified by TB are template redirects, that still leaves way too many uncategorised templates to make generating a full report every time practical. Thank you both for your efforts. But maybe consider my suggestion of increasing the report to 2000 or 3000. DH85868993 (talk) 02:41, 25 July 2012 (UTC)
The toolserver was too grumpy to list for me entities that are non-redirects in namespace 10 that are transcluded into namespace 0. I'd concur with the 50% estimate above; a full report would include around 50,000 items. Best might be to list the 1000 'most interesting' uncategorised templates - the most transcluded or if this is too expensive, perhaps the least dusty. - TB (talk) 07:11, 25 July 2012 (UTC)

Popular/duplicated external links

I would like to be able to track down groups of external links which are duplicated across many articles. In particular I would like to be able to locate popular DOI and PMID links, those which are found in many articles, which are ripe for sharing using templates like {{cite doi}} and {{cite pmid}}. Currently my only method for finding such links is repeated use of the Linksearch which I doubt is the most efficient method! Is there an existing report which relates to the popularity of External Links, at least by site, and hopefully per site? TIA HAND —Phil | Talk 13:45, 2 August 2012 (UTC)

I've popped up an initial listing of common external links to the two sites listed above for you at User:Phil_Boswell/common_els. Be aware that the toolserver (on which this report was generated) is struggling a bit at the moment; changes made to Misplaced Pages in the last 12 days or so will not be reflected in the report. - TB (talk) 20:30, 2 August 2012 (UTC)
Wrong doi I think. http://www.doi.org/ should be the one. --Izno (talk) 22:39, 2 August 2012 (UTC)
Indeed, although it's actually http://dx.doi.org for DOI links. But there's something not quite right about the results anyway: there's not nearly enough of them. If you take a look at Special:linksearch/www.ncbi.nlm.nih.gov/pubmed/16381836, there's 79 articles linking to that PMID alone: it doesn't appear on your list, and I'm pretty sure those links were created before the Toolserver got backed up. I suspect if you remove the "http://" from the search criteria, you might see a difference: it does occasionally change the results of the manual search, don't know why because allegedly it shouldn't!
On another note, you'll see there's a template on that list, a sub-page of {{cite doi}}: it looks easy enough to extend the search to the Template NAMESPACE, is that correct? That would be helpful to determine which of those links already have a template suitable for sharing.
Thanks again, HAND —Phil | Talk 06:05, 3 August 2012 (UTC)