Revision as of 13:38, 8 August 2008 editJwinius (talk | contribs)Autopatrolled, Extended confirmed users23,055 edits →Displayname template: re to Remember the dot -- Why scientific name article titles are so important at Wikpedia.← Previous edit | Revision as of 15:09, 8 August 2008 edit undoOleg Alexandrov (talk | contribs)Administrators47,244 edits →fetch_articles_cats2: respNext edit → | ||
Line 106: | Line 106: | ||
::Please do let me know if you run into problems with Mediawiki::API, I will be glad to fix them. — Carl <small>(] · ])</small> 11:35, 8 August 2008 (UTC) | ::Please do let me know if you run into problems with Mediawiki::API, I will be glad to fix them. — Carl <small>(] · ])</small> 11:35, 8 August 2008 (UTC) | ||
::: Yeah, utf encoding has been a big annoyance for me too when I developed WP 1.0 bot. I'd appreciate it if you let me know when you notice bugs in Mediawiki::API that can affect fetch_articles_cats2 so that I can upgrade. The last thing we want is large amount of pages in which things like "Â" suddenly become \2032 or worse. :) ] (]) 15:09, 8 August 2008 (UTC) |
Revision as of 15:09, 8 August 2008
m:User:CBM | Please leave new comments at the bottom of the page, using the "new section" button at the top of the page. I will respond on this page unless you request otherwise. |
Archives
| ||||||||||||||||||
Displayname template
(Continued from Geometry_guy's talk page)
Sorry to intrude here. The main feature you need (to change the bold text above the article) is already in MediaWiki, as the DISPLAYTITLE magic word. It would be possible to make a template somewhat like {{wrongtitle}} that would take the common name and latin name as parameters and would display them appropriately. At least it wouldn't take any software changes. — Carl (CBM · talk) 23:47, 2 August 2008 (UTC)
- Interesting. However, judging from what it says in the manual, it looks to me like DISPLAYTITLE can only be used to change a title's case (i.e. IPod => iPod). Are you sure it would work? --Jwinius (talk) 01:06, 3 August 2008 (UTC)
- You're right - it would require a software change to remove the "normalizes to the same title" check. That's not as hard as adding the displaytitle functionality in the first place, though. I need to figure out (or ask around) why that check was added in the first place. — Carl (CBM · talk) 01:30, 3 August 2008 (UTC)
- Although I'm not optimistic that this check will be easy to remove, I'll be very interested to hear what you find out. --Jwinius (talk) 01:57, 3 August 2008 (UTC)
- Yes, I looked into this yesterday. DISPLAYTITLE doesn't even handle subscripts and superscripts, so for that reason there is code at MediaWiki:Common.js which does. Actually, the code is fairly straightforward and would be easy to change if there were consensus to change it. It is maintained by User:Remember the dot. Geometry guy 10:52, 3 August 2008 (UTC)
- Well, let's see what he has to say about this. Cheers, --Jwinius (talk) 11:49, 3 August 2008 (UTC)
- Please try to avoid making article titles different from their displayed titles. The JavaScript code we currently use should be used sparingly. Ideally, we would not use it at all because it causes problems for screen readers, text-only browsers, and search engines.
- I'm afraid that I don't know what the best solution to your problem is. I personally prefer common naming ("lion" instead of "panthera leo"). Both names are given in the lead section anyway to avoid confusion. Plus, templates like {{Felidae nav}} seem to do a good job of organizing articles already, without the need to move articles to uncommon names.
- I know this answer was probably not the one you were hoping for, but all the same I hope you find a good solution to the problem. Please let me know if there's anything else I can do for you. —Remember the dot 03:17, 5 August 2008 (UTC)
- That's exactly what I was hoping to do with it: change the titles of articles with scientific names to common names. That way, I figured readers could remain comfortable with the titles while the valid scientific names would remain inextricably linked with the articles, making it much easier for people like myself to maintain large collections of them. The problem is/was that DISPLAYNAME does not allow this.
- Your answer was predictable, unfortunately, but what I find most depressing is that the people on the long end of this consensus seem not even to want to consider the possibility that the opposition may have a point that is important enough to warrent a search for a possible solution in this matter. --Jwinius (talk) 23:30, 6 August 2008 (UTC)
- Basically, it boils down to the fact that it's not a good idea to have the displayed title of an article be significantly different than the actual title of the article. It would lead to confusion - people would copy-and-paste the title to make a wikilink to the article only to discover that "Lion" is really just a redirect to "Panthera leo". Isn't there a way that you can do your categorization of these articles without changing the article titles? —Remember the dot 01:19, 8 August 2008 (UTC)
To be honest, sure, it would be possible for me to work with common name titles and organize everything properly. For example, if I were to put in a gargantuan effort, basically devoting the rest of my life to completing 3,000+ properly organized articles on snakes -- using common names -- then I would be the only one who would know that the articles were actually following the proper taxonomy (totally consistent with no errors or duplication). That's simply because I organized them and any changes show up on my watchlist. However, when I eventually bow out and somebody else eventually come along and wants to take my place, then how easy do you think it would be for that person to tell whether everything really is properly organized or not? (Mind you, they're not going to assume this).
The answer is that it would be very difficult. You see, all that matters is where the scientific names point to -- they are the key in any zoological database. Any database administrator can tell you how important that is. However, with common name titles, the scientific names are only loosely linked to the articles as redirects. You can't know that they point to the correct articles unless you follow each and every one to verify that the article it leads to does indeed contain a description of the corresponding species. You may not even know if the redirects exist. Even worse, if you plan to maintain the collection, you'd have to add all of the redirects for the scientific names to your watchlist in addition to the articles. That's a lot of work!! In fact, it's so much work that I'm afraid nobody would ever again bother to maintain the entire collection. Thus, much of my work will have been for nothing. Sure, a few corrections would be made every once in a while, but in this environment the number of errors creeping into the collection would far outnumber them and within a few years it would all degrade into a typical incoherent mess, full of duplication and error.
On the other hand, if all of the articles were to have scientific name titles, anyone willing to take over for me would have a much easier task. It would be safe to assume that, in the articles for the higher taxa (genera, families) and in the main category overviews, the scientific names were all direct links to the articles. Therefore, to verify that the names were all valid you'd only have to check them against the 3rd party taxonomic database that they're supposed to be following. To maintain the lot, you'd only have to add the articles to your watchlist -- not all of the redirects as well.
IMHO, this is why scientific name article titles are so important at Misplaced Pages. Somehow, we must see to it that they are inextricably linked to the articles. If our technical options are so limited that it means renaming "Lion" to "Panthera leo", then so be it. It's a question of long-term and large-scale maintainability. The folks at WP:BOTONY have already figured this out; now it's up to the zoology people. The longer we take to come up with a solution for this problem, the longer it will take us to recover from the inevitable mess. --Jwinius (talk) 13:38, 8 August 2008 (UTC)
WP 1.0 bot v2
Hi, Carl. When do you think we could start coding the new bot? My availability will decrease two weeks from now (the school year starts again), and I'm really interested in working on the bot at least a little bit while I still can. Should we advertise User:WP 1.0 bot/Second generation more widely so we can get more opinions as to what to do? Titoxd 04:37, 5 August 2008 (UTC)
- Yes, I think we should advertise is more widely now. I do have some pre-alpha but functional code that can:
- Download ratings into a mysql database
- Make summary tables from the database
- Run simple web-based queries against the database (including intersecting two projects)
- Extract the old log information from wiki pages and add it to the database
- My goal with that code was to see what issues will actually arise in practice. I'll clean up that code today and put it into my svn repository on toolserver. I think most of the coding work will be in the query interface. — Carl (CBM · talk) 13:37, 5 August 2008 (UTC)
- You can download the proof-of-concept code on svn at https://svn.toolserver.org/svnroot/cbm/wp10.2g/alpha — Carl (CBM · talk) 17:48, 5 August 2008 (UTC)
- Whee, I'll have to learn Perl. :) That said, from a cursory design-level review, it looks good. What I didn't understand was what
$Extra
did. And also, wouldn't it be more efficient (from a SQL perspective) to generate a bunch of SQL queries and execute them all at once, instead of executing them one at a time? It would also help to prevent updates to the database failing halfway and leaving the database in an inconsistent state. Now, a small-scale test with limited live data (I as usual volunteer using WP:WPTC's assessments) would be nice. Titoxd 21:09, 6 August 2008 (UTC)
- Whee, I'll have to learn Perl. :) That said, from a cursory design-level review, it looks good. What I didn't understand was what
The $Extra var, and get_extra_assessments(), was a successful test of one way to let a wikiproject specify extra rating values. Since then, I changed my mind about how to do that. I am thinking that the project will put a template on their category (Category:Mathematics articles by quality for example) that the script will parse. The template might look like this:
{{WP10params| |homepage=Misplaced Pages:WikiProject Mathematics |extra1-name=Bplus |extra1-type=quality |extra1-category=Bplus mathematics articles |extra1-ranking=400 }}
That would tell the bot that about a new quality rating "Bplus" that is used by the project. The template could also be used to track which WikiProjects are task forces of larger projects, or to track other per-project data in a way that can be configured on the wiki.
The way to prevent updates from failing halfway through is to use database transactions. I will look into how to accomplish this with Perl's DBI class. My initial research says that I just have to add the right "start transaction" call at the beginning and a "finish transaction" call at the end of the script. I didn't want to use them yet because I like to kill it halfway through and look at the database manually.
I've been doing some initial testing on my local computer. I should be able to set up a demo on toolserver for you to look at. The web interfaces I wrote are all very very basic. — Carl (CBM · talk) 21:34, 6 August 2008 (UTC)
Here is a live demo:
- http://toolserver.org/~cbm//cgi-bin/wp10.2g/alpha/cgi-bin/list.pl
- http://toolserver.org/~cbm//cgi-bin/wp10.2g/alpha/cgi-bin/list2.pl
- http://toolserver.org/~cbm//cgi-bin/wp10.2g/alpha/cgi-bin/table.pl
I think you'll find this query interesting: . — Carl (CBM · talk) 22:28, 6 August 2008 (UTC)
- Indeed, that query is interesting. :) One question, though: if we allow B+ to be 400, and another project wants to use B+ and GA, (which is also 400), how will the script know which one to put ahead? Also, if a project thinks that GA should be above A, would they be able to modify their wp10params declaration to set GA to 500 and A to 400? How would we deal with those? Titoxd 06:13, 7 August 2008 (UTC)
- I just made up the 400. All the details about project-specific ratings are up for discussion at this point. — Carl (CBM · talk) 12:20, 7 August 2008 (UTC)
- I'll be learning Perl on the go, but sure, I'll try. Titoxd 20:06, 7 August 2008 (UTC)
- It doesn't actually seem that complicated if you have a book next to you (which I rushed to get from the local library), but the roadblock for now is getting Apache/Perl to recognize MySQL (apparently support for it is not installed on a vanilla Leopard installation, nor is mod_perl), so I might end up just getting a toolserver account to test this on. Titoxd 06:36, 8 August 2008 (UTC)
Invented words
Please verify your comment. And no, a blog fails miserably as a verifiable source. The New York Times does not, in fact, call it's lead section "lede". The term is archaic, and is used by some, and I mean some, Wikipedians to sound, well, snobbish. The word had a meaning at one time for newspapers (let's not even discuss an encyclopedia), but not now, according to several reliable sources, mainly the Oxford English Dictionary. To be honest, of all the problems with Misplaced Pages, snobby editors using "lede" doesn't rank in the top 100, but let's not come across as elitist, especially when we want to be a democratic project. OrangeMarlin 23:21, 6 August 2008 (UTC)
- As I commented on the talk page (where you didn't respond...) the Associated Press actively uses the term. — Carl (CBM · talk) 01:12, 7 August 2008 (UTC)
Misplaced Pages:Featured topics/Candidate list
This doesn't strictly work, as the main article name doesn't always match the nomination name. For example, there's one at the moment. What's this page used for, anyway? Seemingly nothing - rst20xx (talk) 16:06, 7 August 2008 (UTC)
- Yes, that doesn't work. The solution is to create the redlink as a redirect to the appropriate page. I think that the FTC list is used by some people to watch the lists of FTC nominations without all the text. I made it because of this request — Carl (CBM · talk) 16:19, 7 August 2008 (UTC)
- Hmm, that's not ideal at all, as I guess that would have to be done manually. Oh well - rst20xx (talk) 20:30, 7 August 2008 (UTC)
fetch_articles_cats2
Hi Carl. Preparing to migrate away from query.php I took a look at fetch_articles_cats2.pl. I wonder, is there way a simpler way to fetch articles and categories than done there? I tried to adapt it to mathbot, and I got the error:
XML PARSING ERROR 1 $VAR1 = ; Error parsing XML - truncated response? $VAR1 = 'Died at /home/mathbot/public_html/cgi-bin/wp/modules/Mediawiki/API.pm line 1361. ';
Did you see something of this kind before? Any help with moving from query.php will be very much appreciated. Thanks! Oleg Alexandrov (talk) 03:29, 8 August 2008 (UTC)
- Ah, I figured it out. My XML::Simple.pm was out of date. I did not do Perl programming for a while, I start feeling as if I've been living in the woods. :) Oleg Alexandrov (talk) 03:35, 8 August 2008 (UTC)
- Once you get the right libraries on your machine, fetch_articles_cats2.pl should be almost a drop-in replacement. I had thought about moving away from XML, but the other encodings that the API can use would also require some library to parse them, and XML was the first one I tried. I think that the version of Mediawiki::API in my svn is more recent than the one on kiwix.
- The main issue I have always run into is utf8 encoding. The general principle is that you pass utf8-encoded arguments to Mediawiki::API and it returns native (not encoded) results. But it's a continuing nuisance, especially since api.php was not stable when I was writing my code.
- Please do let me know if you run into problems with Mediawiki::API, I will be glad to fix them. — Carl (CBM · talk) 11:35, 8 August 2008 (UTC)
- Yeah, utf encoding has been a big annoyance for me too when I developed WP 1.0 bot. I'd appreciate it if you let me know when you notice bugs in Mediawiki::API that can affect fetch_articles_cats2 so that I can upgrade. The last thing we want is large amount of pages in which things like "Â" suddenly become \2032 or worse. :) Oleg Alexandrov (talk) 15:09, 8 August 2008 (UTC)
- Please do let me know if you run into problems with Mediawiki::API, I will be glad to fix them. — Carl (CBM · talk) 11:35, 8 August 2008 (UTC)