Revision as of 15:47, 3 August 2023 edit93.140.87.249 (talk) →Note: AddedTags: Mobile edit Mobile web edit← Previous edit | Latest revision as of 18:36, 11 January 2025 edit undoLowercase sigmabot III (talk | contribs)Bots, Template editors2,305,781 editsm Archiving 5 discussion(s) to User talk:John of Reading/Archive 28) (bot | ||
(132 intermediate revisions by 33 users not shown) | |||
Line 1: | Line 1: | ||
{{Talk header|search=yes |
{{Talk header|search=yes}}<!-- popup ] --> | ||
{{User:MiszaBot/config | {{User:MiszaBot/config | ||
|archiveheader = {{talkarchivenav}} | |archiveheader = {{talkarchivenav}} | ||
|maxarchivesize = 70K | |maxarchivesize = 70K | ||
|counter = |
|counter = 28 | ||
|algo = old(21d) | |algo = old(21d) | ||
|archive = User talk:John of Reading/Archive %(counter)d | |archive = User talk:John of Reading/Archive %(counter)d | ||
}} | }} | ||
== |
== Hi John! == | ||
You like typofixing? I got tens of thousands of typos and I can't fix em all alone. Perhaps we can combine our forces? ]. ] (]) 16:21, 8 September 2024 (UTC) | |||
] | |||
:{{Re|Polygnotus}} Interesting. I'm finding typos by running regular expressions on a database dump; how are you creating your work list? What's your false positive rate? | |||
correct edits, messed table, locals dont care ] (]) 23:44, 28 June 2023 (UTC) | |||
:I confess I'm so used to working with AWB and my 4000+ regular expressions that I'm unlikely to switch to a radically different method. -- ] (]) 16:47, 8 September 2024 (UTC) | |||
:{{la|2022–23 Indian State Leagues}} | |||
:I've made an edit so that the columns line up. -- ] (]) 06:44, 29 June 2023 (UTC) | |||
::I take a list of the most frequently used words, create typos with a Levenshtein distance of 1, and check which occur in the dump. Then I do a bunch of filtering and I check which exist in the live version of Misplaced Pages. | |||
Appreciated lot! If you add logos i can add few links, again users did not care. Just to know <!-- Template:Unsigned IP --><small class="autosigned">— Preceding ] comment added by ] (]) 08:51, 29 June 2023 (UTC)</small> <!--Autosigned by SineBot--> | |||
::Which programming languages, if any, are you familiar with? | |||
:League and team logos are probably ] which can't be used in tables like these. For example, the article on the ] displays its logo at the top right, but Misplaced Pages's rather strict rules do not permit the logo to be shown on the yearly articles such as ], and certainly not as decoration in an overview table. -- ] (]) 08:57, 29 June 2023 (UTC) | |||
::We could use a custom AWB module in C# or perhaps just use some custom Selenium-based tool (which would be pretty damn similar, not radically different). Or perhaps a JWB-like interface on wiki. Haven't really decided how to approach that yet. | |||
::I never really bothered to create stats of the amount of skips vs the amount of fixes but that is a good idea to have. | |||
::I use a '''lot''' of regex to avoid typos that shouldn't be fixed, see ]. | |||
::I have at least 60.000 potential typos left to fix so it is probably worth it to create a decent tool for that. | |||
::] (]) 17:14, 8 September 2024 (UTC) | |||
:::{{Re|Polygnotus}} Languages? Assembler, BCPL, C, C++ - all unused for a decade, I'm afraid. But I've used regular expressions on a copy of ] to extract the 3000+ article names and the alleged typos, and have begun an AWB run to detect those words in those articles. So far I've saved 23 edits and have skipped 25 other articles - not a bad hit rate, by my standards, so I'll press on with this over the next few days. "Gettig" is a surname; "protectin" is a kind of protein; ] is a stage name; and so on. -- ] (]) 18:08, 8 September 2024 (UTC) | |||
::::Yeah that is ] and then we got ] and ] and ]. When my Raspberry Pi is done I will have another ~60.000. The typos already have very similar regex ran on them as you saw in typo.js so much of the WONTFIX stuff has been filtered out already. ] (]) 18:15, 8 September 2024 (UTC) | |||
::::In an ideal world, AWB would accept ] in this format (''christmas|chirstmas|My Christmas'') as a list generator source. And AWB would contain code (very similar to ]) to not fix typos in certain situations. Do you know how we can get closer to that goal? ] lists some developers in the infobox. ] (]) 18:44, 8 September 2024 (UTC) | |||
could you add this example, as i had few articles where local user suddenly stop adding (part of why keeping ip editing to lower disappointment); asking other users at a time is only alternative. | |||
:::::AWB has two checkboxes at the top left of the "Find & Replace" configuration, which aim to cover the "certain situations". I run with those turned off, though, so that I ''do'' fix errors in quotations, references, foreign-language text and so on - with appropriate care and checking. -- ] (]) 18:50, 8 September 2024 (UTC) | |||
::::::I boldy created the shortcut at some point and it hasn't been reverted yet. It doesn't really make sense to faithfully reproduce simple mistakes made by others when they are irrelevant and only distract imo. Your approach does affect the hitrate tho. Are there others who I should contact? I assume the 16789 typos above will keep you busy for a while but you know where to find me when you want more. Perhaps I should stick the lists in a subpage of ]? I'll dive in the AWB code, thanks. ] (]) 19:40, 8 September 2024 (UTC) | |||
:::::::] is marked as an essay; the authoritative guide is at ]. Fortunately they say the same thing! I do fix typos in quotations if I think they are "insignificant" or are likely to have been copying errors. See ]. | |||
:::::::If you post your links at ] you may attract more helpers. Oh, and are you aware of the ] project? That's another attempt at co-ordinated checking using data-crunching techniques. -- ] (]) 20:14, 8 September 2024 (UTC) | |||
::::::::Thank you, redirect target improved. I combined typolist, typolist2 and typolist3 above (but not ], which you imported into AWB) into ]. If you want some, please delete them from the list so that its clear that they've been handled. | |||
::::::::I added Moss and the (code behind the) AWB checkboxes to my todolist, thanks again! ] (]) 04:30, 9 September 2024 (UTC) | |||
{{od}} | |||
] | |||
:{{Re|Polygnotus}} I've restarted the list after telling AWB not to sort the pages alphabetically, so I'm now processing them in the same order as they were listed in ]. This makes it easier for me, as the fixes for the same target word turn up together, and perhaps for you, since you can compare my contribution list with the list I'm working from. | |||
:Two of your "don't fix" tests aren't working correctly: | |||
*In many cases the typo is embedded within a URL - example <code>mmiller</code> within ] | |||
*In some cases the typo is embedded within a file name - example <code>distribuion</code> within ]. I exclude those by peeking ahead for a known image suffix - <code>(?!*\.(?i:(?:gif|jpe?g|ogg|ogv|pdf|png|svg|tiff?|webm))\b)</code> - this regular expression isn't perfect, I know. | |||
:-- ] (]) 07:26, 9 September 2024 (UTC) | |||
::I make the lists with Java and then I use Javascript to actually make the edits. When I improved the url regex in Javascript I forgot to add it to the Java code as well. I had a bunch of ideas to improve my workflow so I am cooking up a fresh batch for you. Might take a while, even on a modern pc. ] (]) 03:33, 10 September 2024 (UTC) | |||
::Originally I used <code>((http|https)://)(www.)?{2,256}\.{2,26}\b(*)</code>for URLs but a lot of them escaped the wrath of the regex. | |||
::I am considering using something like: | |||
::<code>\b((?:https?://|www\.)(?:\S+(?::\S*)?@)?(?:(?:{1,3}\.){3}{1,3}|(?:(?:-*)*+)(?:\.(?:-*)*+)*(?:\.(?:{2,})))(?::\d{2,5})?(?:\S*)?\b)</code> | |||
::instead unless you have a better idea. | |||
::For files I used: | |||
::File:(.*?)(\\.|\\|)" | |||
::Image:(.*?)\\. | |||
::Category:(.*?)\\. | |||
::and I haven't really decided how to improve on that. Not all of them have file extensions. Perhaps and the can be used? | |||
::] is steadily growing. ] (]) 03:41, 10 September 2024 (UTC) | |||
:::Are the URL regexes running with "ignore case" turned on? If not, the first URL regex fails to match the whole URL in the ] example because parts of it are uppercase. | |||
:::The filename in the ] has no <code>File:</code> prefix because it is being used as an infobox parameter. To exclude those, you'll either have to look backwards for <code>range_map =</code> or similar, or look forwards for <code>.png</code> or similar. -- ] (]) 07:01, 10 September 2024 (UTC) | |||
::::I use <code>Pattern.CASE_INSENSITIVE</code> and <code>Pattern.UNICODE_CASE</code>. I have added range_map to the list of disallowed parameters. I am currently trying to figure out whether ] can help identify typos better than a coinflip. ] (]) 07:47, 10 September 2024 (UTC) | |||
{{od}} | |||
https://m.facebook.com/photo.php/?fbid=380491030763398 <!-- Template:Unsigned IP --><small class="autosigned">— Preceding ] comment added by ] (]) 14:21, 29 June 2023 (UTC)</small> <!--Autosigned by SineBot--> | |||
:{{Re|Polygnotus}} Thank you very much for preparing these lists; I've had an enjoyable couple of months working through them. If my record-keeping and arithmetic can be trusted, I've made about 4,500 corrections from your 11,700 candidates. I'd be happy to work on another list like this one, but not immediately: I feel I've neglected some of my other self-assigned tasks and would like to spend some time on those. | |||
:Done, I hope, but I'll wait to see what the reviewers - bots and humans - make of it before I try any more. -- ] (]) 15:42, 29 June 2023 (UTC) | |||
:I ran into one more problem further down the list: by the time the data was saved in ], any special characters in page names had got corrupted. For example ] should have been an ndash, and ] should have been ]. I managed to guess the intended article names in most cases. -- ] (]) 16:49, 12 November 2024 (UTC) | |||
== ] has an ]== | |||
I will add few more whenever you are free, thanks! | |||
<div class="floatleft" style="margin-bottom:0">]</div>''']''' has an RfC for possible consensus. A discussion is taking place. If you would like to participate in the discussion, you are invited to add your comments on the ''']'''.<!-- Template:Rfc notice--> Thank you. ] (]) 18:14, 16 October 2024 (UTC) | |||
] | |||
== ArbCom 2024 Elections voter message == | |||
https://content3.jdmagicbox.com/comp/kolkata/93/033p400393/catalogue/dalhousie-athletic-club-esplanade-kolkata-sports-clubs-ph86uj221o.jpg | |||
<div class="ivmbox " style="margin-bottom: 1em; border: 1px solid #a2a9b1; background-color: #fdf2d5; padding: 0.5em; display: flex; align-items: center; "> | |||
] | |||
<div class="ivmbox-image noresize" style="padding-left:1px; padding-right:0.5em;">]</div> | |||
<div class="ivmbox-text"> | |||
Hello! Voting in the ''']''' is now open until 23:59 (UTC) on {{#time:l, j F Y|{{Arbitration Committee candidate/data|2024|end}}-1 day}}. All ''']''' are allowed to vote. Users with alternate accounts may only vote once. | |||
The ] is the panel of editors responsible for conducting the ]. It has the authority to impose binding solutions to disputes between editors, primarily for serious conduct disputes the community has been unable to resolve. This includes the authority to impose ], ], editing restrictions, and other measures needed to maintain our editing environment. The ] describes the Committee's roles and responsibilities in greater detail. | |||
https://m.facebook.com/photo.php/?fbid=198485495875077 | |||
If you wish to participate in the 2024 election, please review ] and submit your choices on the ''']'''. If you no longer wish to receive these messages, you may add {{tlx|NoACEMM}} to your user talk page. <small>] (]) 00:20, 19 November 2024 (UTC)</small> | |||
] | |||
https://www.transfermarkt.com/eastern-railway-calcutta/transfers/verein/79064 | |||
] | |||
https://m.facebook.com/photo.php/?fbid=104271325773061 | |||
] | |||
https://footystats.org/mongolia/mongolian-premier-league <!-- Template:Unsigned IP --><small class="autosigned">— Preceding ] comment added by ] (]) 18:28, 29 June 2023 (UTC)</small> <!--Autosigned by SineBot--> | |||
:Working through these... | |||
:] {{Not done}} The URL you've given me here is the URL of the ''image'', not of a page that ''uses'' the image, so I cannot check whether this green logo is the correct one. The external links in the Dalhousie AC article aren't much use either. The Facebook link leads to an inactive page with a tiny transparent logo; the second link leads to a page hosted by a computer gaming site, and I don't trust their black logo; the third link leads to a page with no logos. | |||
:] {{Done}} | |||
:] {{Not done}} The transfermarkt page displays a red logo that seems to be merely a red version of the green ] logo, the logo of a railway company not of a football club. | |||
:] {{Done}} | |||
:] {{Not done}} The footystats.org page displays a logo, but it also has a link labelled where I would expect to see the same logo, but don't. . So in this case also, I'm not sure that your link shows me an official logo. | |||
:So I've only uploaded 2 of the 5 you requested; sorry about that. -- ] (]) 08:17, 3 July 2023 (UTC) | |||
https://en.m.wikipedia.org/File:Mohun_Bagan_Super_Giant.svg#mw-jump-to-license | |||
If can be back here: ] | |||
Also, regarding these recent edit sources, and older from second page, ] / ] (17th), if can compare and decide which are most credible; user (dont mind girl who protected) only changed one article, ignoring dozen of other sources/pages. not only slow locals, but random foreigners out of topic, come at a time blindly sticking to one source. so best bet is again look for experienced native speaker who will check and fix. | |||
lastly, ] (stadiums, 85k needs to be reverted, if its final decision). | |||
appreciated previous work so much! <!-- Template:Unsigned IP --><small class="autosigned">— Preceding ] comment added by ] (]) 00:41, 28 July 2023 (UTC)</small> <!--Autosigned by SineBot--> | |||
:I'm not going to add ] to ], because it's not a logo that is specific to the academy and youth teams, and Misplaced Pages rules require "minimal use" of non-free images. Here's an example of how strictly this rule gets applied: ] is arguably ''the'' iconic image of the ]. Back in 2007, the image ] at the top right of that article, as well as in the article dedicated to that protestor, ]. But nowadays the "minimal use" rule means that you won't find that iconic image anywhere in the main article about the protests. | |||
:I'm not a football expert and have no knowledge of any South Asian languages, so I'm won't be delving into the other articles you've mentioned. I mostly stick to spellings and grammar anyway. -- ] (]) 07:39, 28 July 2023 (UTC) | |||
== bcmc == | |||
only for people can understand hindi with english | |||
if not then your a mcbc ] (]) 17:57, 29 June 2023 (UTC) | |||
:? -- ] (]) 07:31, 3 July 2023 (UTC) | |||
== Age verification and, just in case, thanks to everyone == | |||
<!-- ] 06:32, 13 February 2024 (UTC) -->{{User:ClueBot III/DoNotArchiveUntil|1707805942}}In view of the ], I just like to note here that I am a UK editor who would be locked out of Misplaced Pages by any geographic ban. As you can probably guess from my writing style, ] and , I am over 18. | |||
If such a ban takes place and you're visiting this page because I've stopped editing: thank you for checking up on me. I've enjoyed working here and have enjoyed most of my occasional interactions with other editors. -- ] (]) 08:36, 3 July 2023 (UTC) | |||
== Invitation == | |||
<div style="border:2px solid #90C0FF; background:#F0F0FF; width:99%; padding:4px"> | |||
] | |||
Hello {{BASEPAGENAME}}! | |||
* The ] is currently struggling to keep up with the influx of new articles needing review. We could use a few extra hands to help. | |||
* We think that someone with your activity and experience is very likely to meet the ]. | |||
* Reviewing/patrolling a page doesn't take much time, but it requires a strong understanding of Misplaced Pages’s CSD policy and notability guidelines. | |||
* Kindly read ] before making your decision, and feel free to post on the ] with questions. | |||
* If patrolling new pages is something you'd be willing to help out with, please consider ]. | |||
Thank you for your consideration. We hope to see you around! | |||
<!--Drafted by Illusion Flame. Reviewed by Raydann and Novem Linguae.--> | |||
</div> | </div> | ||
</div> | |||
<!-- Message sent by User:Cyberpower678@enwiki using the list at https://en.wikipedia.org/search/?title=Misplaced Pages:Arbitration_Committee_Elections_December_2024/Coordination/MM/03&oldid=1258243506 --> | |||
{{clear}} | |||
Sent by {{noping|Zippybonzo}} using ] (]) at 07:50, 21 July 2023 (UTC) | |||
== Greetings of the season == | |||
<!-- Message sent by User:Zippybonzo@enwiki using the list at https://en.wikipedia.org/search/?title=Misplaced Pages:New_pages_patrol/Coordination/Invite_list&oldid=1165620524 --> | |||
<div class="center" style="background:darkgreen;border:no;padding:0.2em 0em;{{round corners}}"><br />] <span style="color:white;">~ ~ ~ Greetings of the season ~ ~ ~</span>{{pb}} | |||
== Split the Reading article? == | |||
Have you considered splitting the Reading article into one for the town / urban area (using {{tl|infobox UK place}}) and another for the Borough (using {{tl|infobox settlement}}). The article is quite long and I suggest it would be improved by hiving off the political and administrative details to another article. Just a thought, not well considered enough to put as a formal RtS at the article talk page. Your call. ] (]) 09:46, 27 July 2023 (UTC) | |||
:{{Re|JMF}} No, I hadn't given that any thought. I visited the article only to pick up the current population figure, and was surprised to see all those maps in the infobox. I guess a split would be hard to do cleanly. To be honest, I don't know which parts of the town are outside the borough. -- ] (]) 10:26, 27 July 2023 (UTC) | |||
:: Legally, ''none'' of the town can be outside the Borough. In the real world (see ], which shows the status as of the 2011 census), quite a lot of it is and I suspect that the 2021 census will show even more blurring. So the question is, when describing Reading, do you give higher priority to lines on a map of administrative boundaries over real places where people live. It is rather nonsensical that Earley and Woodley, both clearly part of Reading, are legally part of Wokingham purely because of an accident of history. But I realise that "it's complicated", so maybe that is a step too far. The ONS has no qualms, though: ]. | |||
::See also the maps on these ONS pages: | |||
::* | |||
::* Same as our ]. | |||
::Where angels fear to tread? --] (]) 11:57, 27 July 2023 (UTC) | |||
== Note == | |||
One more logo to add please: | |||
] (from infobox website) ] (]) 22:03, 28 July 2023 (UTC) | |||
:{{Done}} -- ] (]) 08:27, 29 July 2023 (UTC) | |||
] | |||
https://m.facebook.com/photo.php?fbid=585832553535601&set=a.432218052230386&type=3 <!-- Template:Unsigned IP --><small class="autosigned">— Preceding ] comment added by ] (]) 01:58, 1 August 2023 (UTC)</small> <!--Autosigned by SineBot--> | |||
:{{Done}} -- ] (]) 06:23, 1 August 2023 (UTC) | |||
https://www.flashscore.com/team/dalhousie/QogomML0/ | |||
And if can add this logo for previously mentioned indian club <!-- Template:Unsigned IP --><small class="autosigned">— Preceding ] comment added by ] (]) 08:49, 2 August 2023 (UTC)</small> <!--Autosigned by SineBot--> | |||
:] {{Done}} -- ] (]) 13:10, 2 August 2023 (UTC) | |||
] | |||
https://www.mumbaicityfc.com/newcrest | |||
(and numerous media sources, got deleted in article?! check at least) <!-- Template:Unsigned IP --><small class="autosigned">— Preceding ] comment added by ] (]) 19:50, 2 August 2023 (UTC)</small> <!--Autosigned by SineBot--> | |||
:{{On hold}} It appears I can't upload the file here as fair use until ] has been deleted from commons. Commons does not accept fair-use material. -- ] (]) 07:58, 3 August 2023 (UTC) | |||
https://hr.m.wikipedia.org/Klub_hokeja_na_ledu_Medve%C5%A1%C4%8Dak | |||
next question, can hockey logo be copied to native page | |||
''<span style="color:lightyellow;">'''Hello John of Reading:''' Enjoy the ''']''' and ''']''' if it's occurring in your area of the world, and thanks for your work to maintain, improve and expand Misplaced Pages. Cheers, <span style="color:OliveDrab;><small>Spread the love; use {{]:]}} to send this message.</small></span> {{pb}}{{highlight|--] (]) 11:49, 24 December 2024 (UTC)|OliveDrab}}</span>''</div> | |||
== Test of AWB == | |||
:{{Re|Dustfreeworld}} Thank you! -- ] (]) 17:11, 24 December 2024 (UTC) | |||
== Redirect listed at ] == | |||
Hi John, this is just a message on your talk page to see if you have the same behaviour with AWB as I reported ]. The test would be to open AWB and login before checking this message. I hope this help, regards, ] (]) 19:48, 29 July 2023 (UTC) | |||
:Replied there. -- ] (]) 06:51, 30 July 2023 (UTC) | |||
:Garms | |||
:1500 ] (]) 10:11, 31 July 2023 (UTC) | |||
] | |||
== Precious anniversary == | |||
A redirect or redirects you have created has been listed at ] to determine whether its use and function meets the ]. Anyone, including you, is welcome to comment on this redirect at '''{{slink| Misplaced Pages:Redirects for discussion/Log/2024 December 27#"Musican" Redirects }}''' until a consensus is reached. <!-- Template:Rfd mass notice --> ] (I ], ]) 12:36, 27 December 2024 (UTC) | |||
{{User QAIbox/auto|years=Nine}} | |||
--] (]) 07:27, 2 August 2023 (UTC) | |||
:{{Re|Gerda Arendt}} Where does the time go? -- ] (]) 07:33, 2 August 2023 (UTC) |
Latest revision as of 18:36, 11 January 2025
This is John of Reading's talk page, where you can send him messages and comments. |
|
Archives: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28Auto-archiving period: 21 days |
Hi John!
You like typofixing? I got tens of thousands of typos and I can't fix em all alone. Perhaps we can combine our forces? User:Polygnotus/typos. Polygnotus (talk) 16:21, 8 September 2024 (UTC)
- @Polygnotus: Interesting. I'm finding typos by running regular expressions on a database dump; how are you creating your work list? What's your false positive rate?
- I confess I'm so used to working with AWB and my 4000+ regular expressions that I'm unlikely to switch to a radically different method. -- John of Reading (talk) 16:47, 8 September 2024 (UTC)
- I take a list of the most frequently used words, create typos with a Levenshtein distance of 1, and check which occur in the dump. Then I do a bunch of filtering and I check which exist in the live version of Misplaced Pages.
- Which programming languages, if any, are you familiar with?
- We could use a custom AWB module in C# or perhaps just use some custom Selenium-based tool (which would be pretty damn similar, not radically different). Or perhaps a JWB-like interface on wiki. Haven't really decided how to approach that yet.
- I never really bothered to create stats of the amount of skips vs the amount of fixes but that is a good idea to have.
- I use a lot of regex to avoid typos that shouldn't be fixed, see User:Polygnotus/typo.js.
- I have at least 60.000 potential typos left to fix so it is probably worth it to create a decent tool for that.
- Polygnotus (talk) 17:14, 8 September 2024 (UTC)
- @Polygnotus: Languages? Assembler, BCPL, C, C++ - all unused for a decade, I'm afraid. But I've used regular expressions on a copy of User:Polygnotus/typos to extract the 3000+ article names and the alleged typos, and have begun an AWB run to detect those words in those articles. So far I've saved 23 edits and have skipped 25 other articles - not a bad hit rate, by my standards, so I'll press on with this over the next few days. "Gettig" is a surname; "protectin" is a kind of protein; Supremme de Luxe is a stage name; and so on. -- John of Reading (talk) 18:08, 8 September 2024 (UTC)
- Yeah that is 3489 typos and then we got 2800 here and 9300 there and 1200 here. When my Raspberry Pi is done I will have another ~60.000. The typos already have very similar regex ran on them as you saw in typo.js so much of the WONTFIX stuff has been filtered out already. Polygnotus (talk) 18:15, 8 September 2024 (UTC)
- @Polygnotus: Languages? Assembler, BCPL, C, C++ - all unused for a decade, I'm afraid. But I've used regular expressions on a copy of User:Polygnotus/typos to extract the 3000+ article names and the alleged typos, and have begun an AWB run to detect those words in those articles. So far I've saved 23 edits and have skipped 25 other articles - not a bad hit rate, by my standards, so I'll press on with this over the next few days. "Gettig" is a surname; "protectin" is a kind of protein; Supremme de Luxe is a stage name; and so on. -- John of Reading (talk) 18:08, 8 September 2024 (UTC)
- In an ideal world, AWB would accept lists in this format (christmas|chirstmas|My Christmas) as a list generator source. And AWB would contain code (very similar to typo.js) to not fix typos in certain situations. Do you know how we can get closer to that goal? WP:AWB lists some developers in the infobox. Polygnotus (talk) 18:44, 8 September 2024 (UTC)
- AWB has two checkboxes at the top left of the "Find & Replace" configuration, which aim to cover the "certain situations". I run with those turned off, though, so that I do fix errors in quotations, references, foreign-language text and so on - with appropriate care and checking. -- John of Reading (talk) 18:50, 8 September 2024 (UTC)
- I boldy created the WP:QUOTETYPO shortcut at some point and it hasn't been reverted yet. It doesn't really make sense to faithfully reproduce simple mistakes made by others when they are irrelevant and only distract imo. Your approach does affect the hitrate tho. Are there others who I should contact? I assume the 16789 typos above will keep you busy for a while but you know where to find me when you want more. Perhaps I should stick the lists in a subpage of WP:TYPO? I'll dive in the AWB code, thanks. Polygnotus (talk) 19:40, 8 September 2024 (UTC)
- Misplaced Pages:Quotations is marked as an essay; the authoritative guide is at MOS:QUOTE. Fortunately they say the same thing! I do fix typos in quotations if I think they are "insignificant" or are likely to have been copying errors. See User:John of Reading/Typo fixing with AutoWikiBrowser#Editing quotes, book titles and such like.
- If you post your links at Misplaced Pages talk:Typo Team you may attract more helpers. Oh, and are you aware of the Misplaced Pages:Typo Team/moss project? That's another attempt at co-ordinated checking using data-crunching techniques. -- John of Reading (talk) 20:14, 8 September 2024 (UTC)
- Thank you, redirect target improved. I combined typolist, typolist2 and typolist3 above (but not User:Polygnotus/typos, which you imported into AWB) into User:Polygnotus/Data/Typolist. If you want some, please delete them from the list so that its clear that they've been handled.
- I added Moss and the (code behind the) AWB checkboxes to my todolist, thanks again! Polygnotus (talk) 04:30, 9 September 2024 (UTC)
- I boldy created the WP:QUOTETYPO shortcut at some point and it hasn't been reverted yet. It doesn't really make sense to faithfully reproduce simple mistakes made by others when they are irrelevant and only distract imo. Your approach does affect the hitrate tho. Are there others who I should contact? I assume the 16789 typos above will keep you busy for a while but you know where to find me when you want more. Perhaps I should stick the lists in a subpage of WP:TYPO? I'll dive in the AWB code, thanks. Polygnotus (talk) 19:40, 8 September 2024 (UTC)
- AWB has two checkboxes at the top left of the "Find & Replace" configuration, which aim to cover the "certain situations". I run with those turned off, though, so that I do fix errors in quotations, references, foreign-language text and so on - with appropriate care and checking. -- John of Reading (talk) 18:50, 8 September 2024 (UTC)
- In an ideal world, AWB would accept lists in this format (christmas|chirstmas|My Christmas) as a list generator source. And AWB would contain code (very similar to typo.js) to not fix typos in certain situations. Do you know how we can get closer to that goal? WP:AWB lists some developers in the infobox. Polygnotus (talk) 18:44, 8 September 2024 (UTC)
- @Polygnotus: I've restarted the list after telling AWB not to sort the pages alphabetically, so I'm now processing them in the same order as they were listed in User:Polygnotus/typos. This makes it easier for me, as the fixes for the same target word turn up together, and perhaps for you, since you can compare my contribution list with the list I'm working from.
- Two of your "don't fix" tests aren't working correctly:
- In many cases the typo is embedded within a URL - example
mmiller
within Merle Miller - In some cases the typo is embedded within a file name - example
distribuion
within Lesser blue-eared starling. I exclude those by peeking ahead for a known image suffix -(?!*\.(?i:(?:gif|jpe?g|ogg|ogv|pdf|png|svg|tiff?|webm))\b)
- this regular expression isn't perfect, I know.
- -- John of Reading (talk) 07:26, 9 September 2024 (UTC)
- I make the lists with Java and then I use Javascript to actually make the edits. When I improved the url regex in Javascript I forgot to add it to the Java code as well. I had a bunch of ideas to improve my workflow so I am cooking up a fresh batch for you. Might take a while, even on a modern pc. Polygnotus (talk) 03:33, 10 September 2024 (UTC)
- Originally I used
((http|https)://)(www.)?{2,256}\.{2,26}\b(*)
for URLs but a lot of them escaped the wrath of the regex. - I am considering using something like:
\b((?:https?://|www\.)(?:\S+(?::\S*)?@)?(?:(?:{1,3}\.){3}{1,3}|(?:(?:-*)*+)(?:\.(?:-*)*+)*(?:\.(?:{2,})))(?::\d{2,5})?(?:\S*)?\b)
- instead unless you have a better idea.
- For files I used:
- File:(.*?)(\\.|\\|)"
- Image:(.*?)\\.
- Category:(.*?)\\.
- and I haven't really decided how to improve on that. Not all of them have file extensions. Perhaps Commons Special:MediaStatistics and the local one can be used?
- My todolist is steadily growing. Polygnotus (talk) 03:41, 10 September 2024 (UTC)
- Are the URL regexes running with "ignore case" turned on? If not, the first URL regex fails to match the whole URL in the Merle Miller example because parts of it are uppercase.
- The filename in the Lesser blue-eared starling has no
File:
prefix because it is being used as an infobox parameter. To exclude those, you'll either have to look backwards forrange_map =
or similar, or look forwards for.png
or similar. -- John of Reading (talk) 07:01, 10 September 2024 (UTC)- I use
Pattern.CASE_INSENSITIVE
andPattern.UNICODE_CASE
. I have added range_map to the list of disallowed parameters. I am currently trying to figure out whether Ollama can help identify typos better than a coinflip. Polygnotus (talk) 07:47, 10 September 2024 (UTC)
- I use
- @Polygnotus: Thank you very much for preparing these lists; I've had an enjoyable couple of months working through them. If my record-keeping and arithmetic can be trusted, I've made about 4,500 corrections from your 11,700 candidates. I'd be happy to work on another list like this one, but not immediately: I feel I've neglected some of my other self-assigned tasks and would like to spend some time on those.
- I ran into one more problem further down the list: by the time the data was saved in User:Polygnotus/Data/Typolist, any special characters in page names had got corrupted. For example 2014â15 Presbyterian Blue Hose men's basketball team should have been an ndash, and History of à land should have been History of Åland. I managed to guess the intended article names in most cases. -- John of Reading (talk) 16:49, 12 November 2024 (UTC)
Misplaced Pages:Talk page guidelines has an RfC
Misplaced Pages:Talk page guidelines has an RfC for possible consensus. A discussion is taking place. If you would like to participate in the discussion, you are invited to add your comments on the discussion page. Thank you. Gnomingstuff (talk) 18:14, 16 October 2024 (UTC)
ArbCom 2024 Elections voter message
Hello! Voting in the 2024 Arbitration Committee elections is now open until 23:59 (UTC) on Monday, 2 December 2024. All eligible users are allowed to vote. Users with alternate accounts may only vote once.
The Arbitration Committee is the panel of editors responsible for conducting the Misplaced Pages arbitration process. It has the authority to impose binding solutions to disputes between editors, primarily for serious conduct disputes the community has been unable to resolve. This includes the authority to impose site bans, topic bans, editing restrictions, and other measures needed to maintain our editing environment. The arbitration policy describes the Committee's roles and responsibilities in greater detail.
If you wish to participate in the 2024 election, please review the candidates and submit your choices on the voting page. If you no longer wish to receive these messages, you may add {{NoACEMM}}
to your user talk page. MediaWiki message delivery (talk) 00:20, 19 November 2024 (UTC)
Greetings of the season
~ ~ ~ Greetings of the season ~ ~ ~ Hello John of Reading: Enjoy the holiday season and winter solstice if it's occurring in your area of the world, and thanks for your work to maintain, improve and expand Misplaced Pages. Cheers, Spread the love; use {{subst:User:Dustfreeworld/Xmas3}} to send this message. --Dustfreeworld (talk) 11:49, 24 December 2024 (UTC)
- @Dustfreeworld: Thank you! -- John of Reading (talk) 17:11, 24 December 2024 (UTC)
Redirect listed at Redirects for discussion
A redirect or redirects you have created has been listed at redirects for discussion to determine whether its use and function meets the redirect guidelines. Anyone, including you, is welcome to comment on this redirect at Misplaced Pages:Redirects for discussion/Log/2024 December 27 § "Musican" Redirects until a consensus is reached. User:Someone-123-321 (I contribute, Talk page so SineBot will shut up) 12:36, 27 December 2024 (UTC)