Misplaced Pages

Machine translation: Difference between revisions

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Browse history interactively← Previous editContent deleted Content addedVisualWikitext
Revision as of 22:47, 28 May 2012 editNazar (talk | contribs)Extended confirmed users2,339 edits Undid revision 494821381 by Glrx (talk) - refs to multiple independent reviews in notable IT-news publications are provided. pls discuss.← Previous edit Latest revision as of 02:03, 10 January 2025 edit undoTornadoLGS (talk | contribs)Extended confirmed users, Page movers, Pending changes reviewers, Rollbackers24,862 edits Reverted 2 edits by Curtisvas (talk): Rv promotional editTags: Twinkle Undo 
(763 intermediate revisions by more than 100 users not shown)
Line 1: Line 1:
{{short description|Computerized translation between natural languages}}
{{Refimprove|date=June 2008}}
{{distinguish|Computer-assisted translation|Interactive machine translation|Translator (computing)}}
{{Navbox translation}}
{{Use dmy dates|date=July 2014}}
<!-- PLEASE DO NOT CONVERT REFERENCES WITHOUT DISCUSSING ON TALK PAGE. SEE http://bugzilla.wikimedia.org/show_bug.cgi?id=5885 -->
]
'''Machine translation''', sometimes referred to by the abbreviation '''MT''' (not to be confused with ], '''machine-aided human translation''' '''MAHT''' and ]) is a sub-field of ] that investigates the use of ] to ] text or speech from one ] to another.
{{Translation sidebar}}
'''Machine translation''' is use of computational techniques to ] text or speech from one ] to another, including the contextual, idiomatic and pragmatic nuances of both languages.


Early approaches were mostly ] or ]. These methods have since been superseded by ]<ref>{{Cite web |date=3 October 2016 |title=Google Translate Gets a Deep-Learning Upgrade |url=https://spectrum.ieee.org/google-translate-gets-a-deep-learning-upgrade |access-date=2024-07-07 |website=IEEE Spectrum |language=en}}</ref> and ].<ref>{{Cite web |date=2024-02-23 |title=Google Translate vs. ChatGPT: Which One Is the Best Language Translator? |url=https://uk.pcmag.com/ai/151950/google-translate-vs-chatgpt-which-one-is-the-best-language-translator |access-date=2024-07-07 |website=PCMag UK |language=en-gb}}</ref>
On a basic level, MT performs simple substitution of words in one natural language for words in another, but that alone usually cannot produce a good translation of a text, because recognition of whole phrases and their closest counterparts in the target language is needed. Solving this problem with ] and ] techniques is a rapidly growing field that is leading to better translations, handling differences in ], translation of ]s, and the isolation of anomalies.{{Citation needed|date=February 2010}}

Current machine translation software often allows for customization by domain or ] (such as ]), improving output by limiting the scope of allowable substitutions. This technique is particularly effective in domains where formal or formulaic language is used. It follows that machine translation of government and legal documents more readily produces usable output than conversation or less standardised text.

Improved output quality can also be achieved by human intervention: for example, some systems are able to translate more accurately if the user has ] which words in the text are names. With the assistance of these techniques, MT has proven useful as a tool to assist human translators and, in a very limited number of cases, can even produce output that can be used as is (e.g., weather reports).

The progress and potential of machine translation has been debated much through its history. Since the 1950s, a number of scholars have questioned the possibility of achieving fully automatic machine translation of high quality.<ref>First and most notably Bar-Hillel, Yeheshua: "A demonstration of the nonfeasibility of fully automatic high quality machine translation," in ''Language and Information: Selected essays on their theory and application'' (Jerusalem Academic Press, 1964), pp. 174–179.</ref> Some critics claim that there are in-principle obstacles to automatizing the translation process.<ref></ref>


==History== ==History==
{{Main|History of machine translation}} {{Main|History of machine translation}}
The idea of machine translation may be traced back to the 17th century. In 1629, ] proposed a universal language, with equivalent ideas in different tongues sharing one symbol. In the 1950s, The ] (1954) involved fully automatic translation of over sixty ] sentences into ]. The experiment was a great success and ushered in an era of substantial funding for machine-translation research. The authors claimed that within three to five years, machine translation would be a solved problem.


===Origins===
Real progress was much slower, however, and after the ] (1966), which found that the ten-year-long research had failed to fulfill expectations, funding was greatly reduced. Beginning in the late 1980s, as ]al power increased and became less expensive, more interest was shown in ].
The origins of machine translation can be traced back to the work of ], a ninth-century Arabic ] who developed techniques for systemic language translation, including ], ], and ] and ], which are used in modern machine translation.<ref>{{Cite web |last=DuPont |first=Quinn |date=January 2018 |title=The Cryptological Origins of Machine Translation: From al-Kindi to Weaver |url=http://amodern.net/article/cryptological-origins-machine-translation/ |url-status=dead |archive-url=https://web.archive.org/web/20190814061915/http://amodern.net/article/cryptological-origins-machine-translation/ |archive-date=14 August 2019 |access-date=2 September 2019 |website=Amodern}}</ref> The idea of machine translation later appeared in the 17th century. In 1629, ] proposed a universal language, with equivalent ideas in different tongues sharing one symbol.<ref>{{Cite book |last=Knowlson |first=James |title=Universal Language Schemes in England and France, 1600-1800 |date=1975 |publisher=University of Toronto Press |isbn=0-8020-5296-7 |location=Toronto}}</ref>


The idea of using digital computers for translation of natural languages was proposed as early as 1946 by ] and possibly others. ] wrote an important memorandum "Translation" in 1949. The Georgetown experiment was by no means the first such application, and a demonstration was made in 1954 on the ] machine at ] (]) of a rudimentary translation of English into French. Several papers on the topic were published at the time, and even articles in popular journals (see for example '']'', Sept. 1955, Cleave and Zacharov). A similar application, also pioneered at Birkbeck College at the time, was reading and composing ] texts by computer. The idea of using digital computers for translation of natural languages was proposed as early as 1947 by England's ]<ref>{{Cite book |last=Booth |first=Andrew D. |url=https://archive.org/details/sim_computers-and-people_1953-05_2_4/page/n8/ |title=Computers and Automation 1953-05: Vol 2 Iss 4 |date=1953-05-01 |publisher=Berkeley Enterprises |pages=6 |language=en |chapter=MECHANICAL TRANSLATION}}</ref> and ] at ] in the same year. "The memorandum written by ] in 1949 is perhaps the single most influential publication in the earliest days of machine translation."<ref>{{cite book
|url=https://pdfs.semanticscholar.org/eaa9/ccf94b4d129c26faf45a1353ffcbbe9d4fda.pdf
|archive-url=https://web.archive.org/web/20200228015454/https://pdfs.semanticscholar.org/eaa9/ccf94b4d129c26faf45a1353ffcbbe9d4fda.pdf
|url-status=dead
|archive-date=2020-02-28
|chapter=Warren Weaver and the launching of MT |via=] |author=J. Hutchins|title=Early Years in Machine Translation |series=Studies in the History of the Language Sciences |year=2000 |volume=97 |page=17 |doi=10.1075/sihols.97.05hut |isbn=978-90-272-4586-1 |s2cid=163460375 }}</ref><ref>{{cite web
|url=https://www.britannica.com/biography/Warren-Weaver
|title=Warren Weaver, American mathematician
|date=July 13, 2020
|access-date=7 August 2020
|archive-date=6 March 2021
|archive-url=https://web.archive.org/web/20210306061225/https://www.britannica.com/biography/Warren-Weaver
|url-status=live
}}</ref> Others followed. A demonstration was made in 1954 on the ] machine at ] (]) of a rudimentary translation of English into French. Several papers on the topic were published at the time, and even articles in popular journals (for example an article by Cleave and Zacharov in the September 1955 issue of '']''). A similar application, also pioneered at Birkbeck College at the time, was reading and composing ] texts by computer.


===1950s===
==Translation process==
The first researcher in the field, ], began his research at MIT (1951). A ] MT research team, led by Professor Michael Zarechnak, followed (1951) with a public demonstration of its ] system in 1954. MT research programs popped up in Japan<ref>{{cite book|last1=上野|first1=俊夫|title=パーソナルコンピュータによる機械翻訳プログラムの制作|date=1986-08-13|publisher=(株)ラッセル社|location=Tokyo|isbn=494762700X|page=16|language=ja|quote=わが国では1956年、当時の電気試験所が英和翻訳専用機「ヤマト」を実験している。この機械は1962年頃には中学1年の教科書で90点以上の能力に達したと報告されている。(translation (assisted by ]): In 1959 Japan, the ](AIST) tested the proper English-Japanese translation machine ''Yamato'', which reported in 1964 as that reached the power level over the score of 90-point on the textbook of first grade of junior hi-school.)}}</ref><ref>{{Cite web | url=http://museum.ipsj.or.jp/computer/dawn/0027.html | title=機械翻訳専用機「やまと」-コンピュータ博物館 | access-date=4 April 2017 | archive-date=19 October 2016 | archive-url=https://web.archive.org/web/20161019171540/http://museum.ipsj.or.jp/computer/dawn/0027.html | url-status=live }}</ref> and Russia (1955), and the first MT conference was held in London (1956).<ref name="Nye">{{cite journal|last1=Nye|first1=Mary Jo|title=Speaking in Tongues: Science's centuries-long hunt for a common language|journal=Distillations|date=2016|volume=2|issue=1|pages=40–43|url=https://www.sciencehistory.org/distillations/magazine/speaking-in-tongues|access-date=20 March 2018|archive-date=3 August 2020|archive-url=https://web.archive.org/web/20200803130801/https://www.sciencehistory.org/distillations/magazine/speaking-in-tongues|url-status=live}}</ref><ref name="Babel">{{cite book|last1=Gordin|first1=Michael D.|title=Scientific Babel: How Science Was Done Before and After Global English|date=2015|publisher=University of Chicago Press|location=Chicago, Illinois|isbn=9780226000299}}</ref>
{{Main|Translation process}}
The human ] may be described as:
# ] the ] of the ]; and
# Re-] this ] in the target language.


] "wrote about computer-assisted language processing as early as 1957" and "was project leader on computational linguistics
Behind this ostensibly simple procedure lies a complex ] operation. To decode the meaning of the ] in its entirety, the translator must interpret and analyse all the features of the text, a process that requires in-depth knowledge of the ], ], ], ]s, etc., of the source language, as well as the ] of its speakers. The translator needs the same in-depth knowledge to re-encode the meaning in the target language.
at ] from 1955 to 1968."<ref>{{cite news
|newspaper=]
|quote=wrote about computer-assisted language processing as early as 1957.. was project leader on computational linguistics at Rand from 1955 to 1968.
|url=https://www.nytimes.com/1995/07/28/obituaries/david-g-hays-66-a-developer-of-language-study-by-computer.html
|title=David G. Hays, 66, a Developer Of Language Study by Computer
|author=Wolfgang Saxon
|date=July 28, 1995
|access-date=7 August 2020
|archive-date=7 February 2020
|archive-url=https://web.archive.org/web/20200207035914/https://www.nytimes.com/1995/07/28/obituaries/david-g-hays-66-a-developer-of-language-study-by-computer.html
|url-status=live
}}</ref>


===1960–1975===
Therein lies the challenge in machine translation: how to program a computer that will "understand" a text as a person does, and that will "create" a new text in the target language that "sounds" as if it has been written by a person.
Researchers continued to join the field as the Association for Machine Translation and Computational Linguistics was formed in the U.S. (1962) and the National Academy of Sciences formed the Automatic Language Processing Advisory Committee (ALPAC) to study MT (1964). Real progress was much slower, however, and after the ] (1966), which found that the ten-year-long research had failed to fulfill expectations, funding was greatly reduced.<ref name="ueno">{{cite book
|last1=上野 |first1=俊夫 |title=パーソナルコンピュータによる機械翻訳プログラムの制作 |date=1986-08-13 |publisher=(株)ラッセル社|isbn=494762700X|page=16|location=Tokyo|language=ja}}</ref> According to a 1972 report by the Director of Defense Research and Engineering (DDR&E), the feasibility of large-scale MT was reestablished by the success of the Logos MT system in translating military manuals into Vietnamese during that conflict.


The French Textile Institute also used MT to translate abstracts from and into French, English, German and Spanish (1970); Brigham Young University started a project to translate Mormon texts by automated translation (1971).
This problem may be approached in a number of ways.


===1975 and beyond===
==Approaches==
], which "pioneered the field under contracts from the U.S. government"<ref name="MT1998.EmptyAtlantic">{{Cite magazine |last=Budiansky |first=Stephen |date=December 1998 |title=Lost in Translation |magazine=] |pages=81–84}}</ref> in the 1960s, was used by Xerox to translate technical manuals (1978). Beginning in the late 1980s, as ]al power increased and became less expensive, more interest was shown in ]. MT became more popular after the advent of computers.<ref>{{Cite book|title=Conceptual Information Processing|last=Schank|first=Roger C.|date=2014|publisher=Elsevier|isbn=9781483258799|location=New York|pages=5}}</ref> SYSTRAN's first implementation system was implemented in 1988 by the online service of the ] called Minitel.<ref>{{Cite book|title=Machine Translation and the Information Soup: Third Conference of the Association for Machine Translation in the Americas, AMTA'98, Langhorne, PA, USA, October 28–31, 1998 Proceedings|last1=Farwell|first1=David|last2=Gerber|first2=Laurie|last3=Hovy|first3=Eduard|date=2003-06-29|publisher=Springer|isbn=3540652590|location=Berlin|pages=276}}</ref> Various computer based translation companies were also launched, including Trados (1984), which was the first to develop and market Translation Memory technology (1989), though this is not the same as MT. The first commercial MT system for Russian / English / German-Ukrainian was developed at Kharkov State University (1991).
] at the peak, followed by transfer-based, then direct translation.]]
Machine translation can use a method based on ], which means that words will be translated in a linguistic way &mdash; the most suitable (orally speaking) words of the target language will replace the ones in the source language.


By 1998, "for as little as $29.95" one could "buy a program for translating in one direction between English and a major European language of
It is often argued that the success of machine translation requires the problem of ] to be solved first.
your choice" to run on a PC.<ref name=MT1998.EmptyAtlantic/>


MT on the web started with SYSTRAN offering free translation of small texts (1996) and then providing this via AltaVista Babelfish,<ref name=MT1998.EmptyAtlantic/> which racked up 500,000 requests a day (1997).<ref>{{Cite web |url=https://digital.com/about/babel-fish/ |title=Babel Fish: What Happened To The Original Translation Application?: We Investigate |last1=Barron |first1=Brenda |date=November 18, 2019 |website=Digital.com |language=en-US |access-date=2019-11-22 |archive-date=20 November 2019 |archive-url=https://web.archive.org/web/20191120032732/https://digital.com/about/babel-fish/ |url-status=live }}</ref> The second free translation service on the web was ]'s GlobaLink.<ref name=MT1998.EmptyAtlantic/> ''Atlantic Magazine'' wrote in 1998 that "Systran's Babelfish and GlobaLink's Comprende" handled
Generally, rule-based methods parse a text, usually creating an intermediary, symbolic representation, from which the text in the target language is generated. According to the nature of the intermediary representation, an approach is described as ] or ]. These methods require extensive ]s with ], ], and ] information, and large sets of rules.
"Don't bank on it" with a "competent performance."<ref>and gave other examples too</ref>


] (the future head of Translation Development AT Google) won DARPA's speed MT competition (2003).<ref>{{Cite book
Given enough data, machine translation programs often work well enough for a ] of one language to get the approximate meaning of what is written by the other native speaker. The difficulty is getting enough data of the right kind to support the particular method. For example, the large multilingual ] of data needed for statistical methods to work is not necessary for the grammar-based methods. But then, the grammar methods need a skilled linguist to carefully design the grammar that they use.
|title=Routledge Encyclopedia of Translation Technology |last=Chan |first=Sin-Wai
|date=2015 |publisher=Routledge |isbn=9780415524841 |location=Oxon |pages=385}}</ref> More innovations during this time included MOSES, the open-source statistical MT engine (2007), a text/SMS translation service for mobiles in Japan (2008), and a mobile phone with built-in speech-to-speech translation functionality for English, Japanese and Chinese (2009). In 2012, Google announced that ] translates roughly enough text to fill 1 million books in one day.

==Approaches==
{{See also|Hybrid machine translation|Example-based machine translation|}}


Before the advent of ] methods, statistical methods required a lot of rules accompanied by ], ], and ] annotations.
To translate between closely related languages, a technique referred to as ] may be used.


===Rule-based=== ===Rule-based===
The rule-based machine translation paradigm includes transfer-based machine translation, interlingual machine translation and dictionary-based machine translation paradigms.
{{Main|Rule-based machine translation}} {{Main|Rule-based machine translation}}


The rule-based machine translation approach was used mostly in the creation of ] and grammar programs. Its biggest downfall was that everything had to be made explicit: orthographical variation and erroneous input must be made part of the source language analyser in order to cope with it, and lexical selection rules must be written for all instances of ambiguity.
'''''Transfer-based machine translation'''''

====Transfer-based machine translation====
{{Main|Transfer-based machine translation}} {{Main|Transfer-based machine translation}}


Transfer-based machine translation was similar to ] in that it created a translation from an intermediate representation that simulated the meaning of the original sentence. Unlike interlingual MT, it depended partially on the language pair involved in the translation.
'''''Interlingual'''''

====Interlingual====
{{Main|Interlingual machine translation}} {{Main|Interlingual machine translation}}


Interlingual machine translation is one instance of rule-based machine-translation approaches. In this approach, the source language, i.e. the text to be translated, is transformed into an interlingual, i.e. source-/target-language-independent representation. The target language is then generated out of the ]. Interlingual machine translation was one instance of rule-based machine-translation approaches. In this approach, the source language, i.e. the text to be translated, was transformed into an interlingual language, i.e. a "language neutral" representation that is independent of any language. The target language was then generated out of the ]. The only interlingual machine translation system that was made operational at the commercial level was the KANT system (Nyberg and Mitamura, 1992), which was designed to translate Caterpillar Technical English (CTE) into other languages.


'''''Dictionary-based''''' ====Dictionary-based====
{{Main|Dictionary-based machine translation}} {{Main|Dictionary-based machine translation}}


Machine translation can use a method based on ] entries, which means that the words will be translated as they are by a dictionary. Machine translation used a method based on ] entries, which means that the words were translated as they are by a dictionary.


===Statistical=== ===Statistical===
{{Main|Statistical machine translation}} {{main|Statistical machine translation}}
Statistical machine translation tried to generate translations using ] based on bilingual text corpora, such as the ] corpus, the English-French record of the Canadian parliament and ], the record of the ]. Where such corpora were available, good results were achieved translating similar texts, but such corpora were rare for many language pairs. The first statistical machine translation software was ] from ]. In 2005, Google improved its internal translation capabilities by using approximately 200 billion words from United Nations materials to train their system; translation accuracy improved.<ref>{{cite web |url=http://blog.outer-court.com/archive/2005-05-22-n83.html |title=Google Translator: The Universal Language |publisher=Blog.outer-court.com |date=25 January 2007 |access-date=2012-06-12 |archive-date=20 November 2008 |archive-url=https://web.archive.org/web/20081120030225/http://blog.outer-court.com/archive/2005-05-22-n83.html |url-status=live }}</ref>


SMT's biggest downfall included it being dependent upon huge amounts of parallel texts, its problems with morphology-rich languages (especially with translating ''into'' such languages), and its inability to correct singleton errors.
Statistical machine translation tries to generate translations using ] based on bilingual text corpora, such as the ] corpus, the English-French record of the Canadian parliament and ], the record of the ]. Where such corpora are available, impressive results can be achieved translating texts of a similar kind, but such corpora are still very rare. The first statistical machine translation software was ] from ]. Google used ] for several years, but switched to a statistical translation method in October 2007. Recently, they improved their translation capabilities by inputting
approximately 200 billion words from ] materials to train their system. Accuracy of the translation has improved.<ref></ref>


Some work has been done in the utilization of multiparallel ], that is a body of text that has been translated into 3 or more languages. Using these methods, a text that has been translated into 2 or more languages may be utilized in combination to provide a more accurate translation into a third language compared with if just one of those source languages were used alone.<ref>{{Cite conference |last=Schwartz |first=Lane |date=2008 |title=Multi-Source Translation Methods |url=https://dowobeha.github.io/papers/amta08.pdf |conference=Paper presented at the 8th Biennial Conference of the Association for Machine Translation in the Americas |archive-url=https://web.archive.org/web/20160629171944/http://dowobeha.github.io/papers/amta08.pdf |archive-date=29 June 2016 |access-date=3 November 2017 |url-status=live}}</ref><ref>{{Cite conference |last1=Cohn |first1=Trevor |last2=Lapata |first2=Mirella |date=2007 |title=Machine Translation by Triangulation: Making Effective Use of Multi-Parallel Corpora |url=http://homepages.inf.ed.ac.uk/mlap/Papers/acl07.pdf |conference=Paper presented at the 45th Annual Meeting of the Association for Computational Linguistics, June 23–30, 2007, Prague, Czech Republic |archive-url=https://web.archive.org/web/20151010171334/http://homepages.inf.ed.ac.uk/mlap/Papers/acl07.pdf |archive-date=10 October 2015 |access-date=3 February 2015 |url-status=live}}</ref><ref>{{Cite journal |last1=Nakov |first1=Preslav |last2=Ng |first2=Hwee Tou |date=2012 |title=Improving Statistical Machine Translation for a Resource-Poor Language Using Related Resource-Rich Languages |url=https://jair.org/index.php/jair/article/view/10764 |journal=Journal of Artificial Intelligence Research |volume=44 |pages=179–222 |arxiv=1401.6876 |doi=10.1613/jair.3540 |doi-access=free}}</ref>
===Example-based===
{{Main|Example-based machine translation}}


=== Neural MT ===
Example-based machine translation (EBMT) approach was proposed by ] in 1984.<ref>Nagao, M. 1981. A Framework of a Mechanical Translation between Japanese and English by Analogy Principle, in Artificial and Human Intelligence, A. Elithorn and R. Banerji (eds.) North- Holland, pp. 173-180, 1984.</ref><ref>{{Cite web | url = http://www.aclweb.org/index.php?option=com_content&task=view&id=36&Itemid=30 | title = the Association for Computational Linguistics - 2003 ACL Lifetime Achievement Award | publisher = Association for Computational Linguistics | accessdate = 2010-03-10}}</ref> It is often characterised by its use of a bilingual ] as its main knowledge base, at run-time. It is essentially a translation by ] and can be viewed as an implementation of ] approach of ].
{{Main|Neural machine translation}}


A ]-based approach to MT, ] has made rapid progress in recent years. However, the current consensus is that the so-called human parity achieved is not real, being based wholly on limited domains, language pairs, and certain test benchmarks<ref>Antonio Toral, Sheila Castilho, Ke Hu, and Andy
===Hybrid MT===
Way. 2018. Attaining the unattainable? reassessing claims of human parity in neural machine translation. CoRR, abs/1808.10432.</ref> i.e., it lacks statistical significance power.<ref>{{Cite arXiv |eprint=1906.09833 |first1=Graham |last1=Yvette |first2=Haddow |last2=Barry |title=Translationese in Machine Translation Evaluation |date=2019 |last3=Koehn |first3=Philipp|class=cs.CL }}</ref>


Translations by neural MT tools like ], which is thought to usually deliver the best machine translation results as of 2022, typically still need post-editing by a human.<ref>{{cite journal |last1=Katsnelson |first1=Alla |title=Poor English skills? New AIs help researchers to write better |journal=Nature |pages=208–209 |language=en |doi=10.1038/d41586-022-02767-9 |date=29 August 2022|volume=609 |issue=7925 |pmid=36038730 |bibcode=2022Natur.609..208K |s2cid=251931306 |doi-access=free }}</ref><ref>{{cite web |last1=Korab |first1=Petr |title=DeepL: An Exceptionally Magnificent Language Translator |url=https://towardsdatascience.com/deepl-an-exceptionally-magnificent-language-translator-78e86d8062d3 |website=Medium |access-date=9 January 2023 |language=en |date=18 February 2022}}</ref><ref>{{cite news |title=DeepL outperforms Google Translate – DW – 12/05/2018 |url=https://www.dw.com/en/deepl-cologne-based-startup-outperforms-google-translate/a-46581948 |access-date=9 January 2023 |work=Deutsche Welle |language=en}}</ref>
Hybrid machine translation (HMT) leverages the strengths of statistical and rule-based translation methodologies.<ref name="speechtechmag.com"></ref> Several MT companies (], LinguaSys, ], ], ]) are claiming to have a hybrid approach using both rules and statistics. The approaches differ in a number of ways:

* '''Rules post-processed by statistics''': Translations are performed using a rules based engine. Statistics are then used in an attempt to adjust/correct the output from the rules engine.
Instead of training specialized translation models on parallel datasets, one can also ] generative ]s like ] to translate a text.<ref name="Hendy2023">{{cite arXiv |last1=Hendy |first1=Amr |last2=Abdelrehim |first2=Mohamed |last3=Sharaf |first3=Amr |last4=Raunak |first4=Vikas |last5=Gabr |first5=Mohamed |last6=Matsushita |first6=Hitokazu |last7=Kim |first7=Young Jin |last8=Afify |first8=Mohamed |last9=Awadalla |first9=Hany |title=How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation |date=2023-02-18 |eprint=2302.09210 |class=cs.CL}}</ref><ref>{{cite news |last1=Fadelli |first1=Ingrid |title=Study assesses the quality of AI literary translations by comparing them with human translations |url=https://techxplore.com/news/2022-11-quality-ai-literary-human.html |access-date=18 December 2022 |work=techxplore.com |language=en}}</ref><ref name="arxiv221014250">{{Cite arXiv|last1=Thai |first1=Katherine |last2=Karpinska |first2=Marzena |last3=Krishna |first3=Kalpesh |last4=Ray |first4=Bill |last5=Inghilleri |first5=Moira |last6=Wieting |first6=John |last7=Iyyer |first7=Mohit |title=Exploring Document-Level Literary Machine Translation with Parallel Paragraphs from World Literature |date=25 October 2022|class=cs.CL |eprint=2210.14250 }}</ref> This approach is considered promising,<ref name="WMT2023">{{cite conference |last1=Kocmi |first1=Tom |last2=Avramidis |first2=Eleftherios |last3=Bawden |first3=Rachel |last4=Bojar |first4=Ondřej |last5=Dvorkovich |first5=Anton |last6=Federmann |first6=Christian |last7=Fishel |first7=Mark |last8=Freitag |first8=Markus |last9=Gowda |first9=Thamme |last10=Grundkiewicz |first10=Roman |last11=Haddow |first11=Barry |last12=Koehn |first12=Philipp |last13=Marie |first13=Benjamin |last14=Monz |first14=Christof |last15=Morishita |first15=Makoto |date=2023 |editor-last=Koehn |editor-first=Philipp |editor2-last=Haddow |editor2-first=Barry |editor3-last=Kocmi |editor3-first=Tom |editor4-last=Monz |editor4-first=Christof |title=Findings of the 2023 Conference on Machine Translation (WMT23): LLMs Are Here but Not Quite There Yet |url=https://aclanthology.org/2023.wmt-1.1 |journal=Proceedings of the Eighth Conference on Machine Translation |location=Singapore |publisher=Association for Computational Linguistics |pages=1–42 |doi=10.18653/v1/2023.wmt-1.1|doi-access=free }}</ref> but is still more resource-intensive than specialized translation models.
* '''Statistics guided by rules''': Rules are used to pre-process data in an attempt to better guide the statistical engine. Rules are also used to post-process the statistical output to perform functions such as normalization. This approach has a lot more power, flexibility and control when translating.

==Issues==
]'') being rendered as "wikipedia".]]
]. The broken Chinese sentence sounds like "there does not exist an entry" or "have not entered yet".]]
Studies using human evaluation (e.g. by professional literary translators or human readers) have ] with the latest advanced MT outputs.<ref name="arxiv221014250"/> Common issues include the translation of ambiguous parts whose correct translation requires common sense-like semantic language processing or context.<ref name="arxiv221014250"/> There can also be errors in the source texts, missing high-quality training data and the severity of frequency of several types of problems may not get reduced with techniques used to date, requiring some level of human active participation.


==Major issues==
===Disambiguation=== ===Disambiguation===
{{Main|Word sense disambiguation}} {{Main|Word-sense disambiguation|Syntactic disambiguation}}
Word-sense disambiguation concerns finding a suitable translation when a word can have more than one meaning. The problem was first raised in the 1950s by ].<ref> by John Hutchins</ref> He pointed out that without a "universal encyclopedia", a machine would never be able to distinguish between the two meanings of a word.<ref>Bar-Hillel (1960), "Automatic Translation of Languages". Available online at http://www.mt-archive.info/Bar-Hillel-1960.pdf</ref> Today there are numerous approaches designed to overcome this problem. They can be approximately divided into "shallow" approaches and "deep" approaches. Word-sense disambiguation concerns finding a suitable translation when a word can have more than one meaning. The problem was first raised in the 1950s by ].<ref> {{webarchive|url=https://web.archive.org/web/20070312062051/http://ourworld.compuserve.com/homepages/WJHutchins/Miles-6.htm |date=12 March 2007 }} by John Hutchins</ref> He pointed out that without a "universal encyclopedia", a machine would never be able to distinguish between the two meanings of a word.<ref>Bar-Hillel (1960), "Automatic Translation of Languages". Available online at http://www.mt-archive.info/Bar-Hillel-1960.pdf {{Webarchive|url=https://web.archive.org/web/20110928112348/http://www.mt-archive.info/Bar-Hillel-1960.pdf |date=28 September 2011 }}</ref> Today there are numerous approaches designed to overcome this problem. They can be approximately divided into "shallow" approaches and "deep" approaches.


Shallow approaches assume no knowledge of the text. They simply apply statistical methods to the words surrounding the ambiguous word. Deep approaches presume a comprehensive knowledge of the word. So far, shallow approaches have been more successful. {{Citation needed|date=April 2007}} Shallow approaches assume no knowledge of the text. They simply apply statistical methods to the words surrounding the ambiguous word. Deep approaches presume a comprehensive knowledge of the word. So far, shallow approaches have been more successful.<ref>{{Cite book|title=Hybrid approaches to machine translation|others=Costa-jussà, Marta R., Rapp, Reinhard, Lambert, Patrik, Eberle, Kurt, Banchs, Rafael E., Babych, Bogdan|date=21 July 2016|isbn=9783319213101|location=Switzerland|oclc=953581497}}</ref>


The late ], a long-time translator for the ] and the ], wrote that machine translation, at its best, automates the easier part of a translator's job; the harder and more time-consuming part usually involves doing extensive research to resolve ] in the ], which the ] and ] exigencies of the ] require to be resolved: ], a long-time translator for the United Nations and the ], wrote that machine translation, at its best, automates the easier part of a translator's job; the harder and more time-consuming part usually involves doing extensive research to resolve ] in the ], which the ] and ] exigencies of the ] require to be resolved:


: Why does a translator need a whole workday to translate five pages, and not an hour or two? ..... About 90% of an average text corresponds to these simple conditions. But unfortunately, there's the other 10%. It's that part that requires six hours of work. There are ambiguities one has to resolve. For instance, the author of the source text, an Australian physician, cited the example of an epidemic which was declared during World War II in a "Japanese prisoner of war camp". Was he talking about an American camp with Japanese prisoners or a Japanese camp with American prisoners? The English has two senses. It's necessary therefore to do research, maybe to the extent of a phone call to Australia.<ref name="piron">], ''Le défi des langues'' (The Language Challenge), Paris, L'Harmattan, 1994. <!-- GFDL translation by Jim Henry --></ref> {{Blockquote|Why does a translator need a whole workday to translate five pages, and not an hour or two? ..... About 90% of an average text corresponds to these simple conditions. But unfortunately, there's the other 10%. It's that part that requires six hours of work. There are ambiguities one has to resolve. For instance, the author of the source text, an Australian physician, cited the example of an epidemic which was declared during World War II in a "Japanese prisoners of war camp". Was he talking about an American camp with Japanese prisoners or a Japanese camp with American prisoners? The English has two senses. It's necessary therefore to do research, maybe to the extent of a phone call to Australia.<ref name="piron">], ''Le défi des langues'' (The Language Challenge), Paris, L'Harmattan, 1994. <!-- GFDL translation by Jim Henry --></ref>
}}


The ideal deep approach would require the translation software to do all the research necessary for this kind of disambiguation on its own; but this would require a higher degree of ] than has yet been attained. A shallow approach which simply guessed at the sense of the ambiguous English phrase that Piron mentions (based, perhaps, on which kind of prisoner-of-war camp is more often mentioned in a given corpus) would have a reasonable chance of guessing wrong fairly often. A shallow approach that involves "ask the user about each ambiguity" would, by Piron's estimate, only automate about 25% of a professional translator's job, leaving the harder 75% still to be done by a human. The ideal deep approach would require the translation software to do all the research necessary for this kind of disambiguation on its own; but this would require a higher degree of ] than has yet been attained. A shallow approach which simply guessed at the sense of the ambiguous English phrase that Piron mentions (based, perhaps, on which kind of prisoner-of-war camp is more often mentioned in a given corpus) would have a reasonable chance of guessing wrong fairly often. A shallow approach that involves "ask the user about each ambiguity" would, by Piron's estimate, only automate about 25% of a professional translator's job, leaving the harder 75% still to be done by a human.

===Non-standard speech===
One of the major pitfalls of MT is its inability to translate non-standard language with the same accuracy as standard language. Heuristic or statistical based MT takes input from various sources in standard form of a language. Rule-based translation, by nature, does not include common non-standard usages. This causes errors in translation from a vernacular source or into colloquial language. Limitations on translation from casual speech present issues in the use of machine translation in mobile devices.


===Named entities=== ===Named entities===
{{main|Named entity}}
{{Expand section|date=January 2010}}
In ], named entities, in a narrow sense, refer to concrete or abstract entities in the real world such as people, organizations, companies, and places that have a proper name: George Washington, Chicago, Microsoft. It also refers to expressions of time, space and quantity such as 1 July 2011, $500.
Related to ] in ].


In the sentence "Smith is the president of Fabrionix" both ''Smith'' and ''Fabrionix'' are named entities, and can be further qualified via first name or other information; "president" is not, since Smith could have earlier held another position at Fabrionix, e.g. Vice President.
==On-going and prospective developments==
The term ] is what defines these usages for analysis in statistical machine translation.
===ABBYY Compreno===
In 2011 and 2012 ] announced on various occasions that it is close to presenting a usable version of their new machine translation and text semantic analysis system named ABBYY Compreno, which, as the reviewers claim, will be a breakthrough in the field. It has supposedly been developed by the company since over 15 years, with the total invested budget of over $50 million USD and hundreds of employees involved on permanent basis. In the beginning of 2011 ] had additionally received a 475 million Russian roubles (about $15 million USD) grant for the development of their Compreno technology from ]. The technology is supposedly based on USH (Universal Semantic Hierarchy) and will allow for both the in-depth ] analysis of the source text and the differentiation of subtle details of meaning based on world- and subject-knowledge. It is prospected to be used for intellectual information search based on abstractly defined content and expressed ideas / involved subjects (regardless of specific terminology and vocabulary used), as opposed to currently widely used key-word searching. The company declared it's currently in the beta phase of technology development and a number of pilot project applications are being realized. First functional systems for English and Russian are expected to be implemented not earlier than by end of 2012; the semantic hierarchy trees for German, French and other widely-used West-European languages are expected to be ready within a few years.<ref>http://www.computerra.ru/sgolub/663954/ Голубятня: Чудо Compreno</ref><ref>http://biz.cnews.ru/news/top/index.shtml?2011/02/28/429739 Abbyy получила 450 млн рублей от «Сколково»</ref><ref>http://www.abbyy.ru/science/technologies/business/compreno#Section_2; Синтаксический и семантический анализ текстов</ref><ref>http://www.youtube.com/watch?v=HPlV9mzqeFQ Introducing ABBYY Compreno -- new approach to machine translation</ref><ref>http://www.3dnews.ru/software/624398 Лингвистические технологии ABBYY. От сложного — к совершенному</ref><ref>http://www.kommersant.ru/doc/1822898?stamp=634588995841938586 Программисты считают, что научили машину понимать смысл текста</ref>


Named entities must first be identified in the text; if not, they may be erroneously translated as common nouns, which would most likely not affect the ] rating of the translation but would change the text's human readability.<ref>{{Cite conference |last1=Babych |first1=Bogdan |last2=Hartley |first2=Anthony |date=2003 |title=Improving Machine Translation Quality with Automatic Named Entity Recognition |url=http://www.cl.cam.ac.uk/~ar283/eacl03/workshops03/W03-w1_eacl03babych.local.pdf |conference=Paper presented at the 7th International EAMT Workshop on MT and Other Language Technology Tools... |archive-url=https://web.archive.org/web/20060514031411/http://www.cl.cam.ac.uk/~ar283/eacl03/workshops03/W03-w1_eacl03babych.local.pdf |archive-date=14 May 2006 |access-date=4 November 2013 |url-status=dead}}</ref> They may be omitted from the output translation, which would also have implications for the text's readability and message.
As announced by ABBYY, systems based on the Compreno technology will allow to<ref>http://www.abbyy.ru/science/technologies/business/compreno#Section_2; Синтаксический и семантический анализ текстов</ref><ref>http://www.youtube.com/watch?v=HPlV9mzqeFQ Introducing ABBYY Compreno -- new approach to machine translation</ref>:
* perform intellectual search of information;
* answer questions asked in natural language;
* receive search results in multiple languages for specific queries (asked in one language);
* automatically annotate and produce summaries of documents;
* recognize natural language;
* extract facts and relationships for specific search queries.


] includes finding the letters in the target language that most closely correspond to the name in the source language. This, however, has been cited as sometimes worsening the quality of translation.<ref>Hermajakob, U., Knight, K., & Hal, D. (2008). {{Webarchive|url=https://web.archive.org/web/20180104073326/http://www.aclweb.org/old_anthology/P/P08/P08-1.pdf#page=433 |date=4 January 2018 }}. Association for Computational Linguistics. 389–397.</ref> For "Southern California" the first word should be translated directly, while the second word should be transliterated. Machines often transliterate both because they treated them as one entity. Words like these are hard for machine translators, even those with a transliteration component, to process.
A large number of specialists, which is supposed to be required to further develop the world-wide use of Compreno technology in subsequent years, is expected to be dedicatedly trained on the two newly opened chairs of Computational Linguistics, which were established in May 2012, under support of ] and ], in the Institute of Linguistics of ] and in the faculty of Innovations and High-Tech of ]<ref>http://www.it-weekly.ru/news/itnews/187189.html Компьютерная лингвистика получит пополнение</ref><ref>http://biz.cnews.ru/news/line/index.shtml?2012/05/15/489462 В РГГУ и МФТИ открыты кафедры «Компьютерной лингвистики» при поддержке Abbyy и IBM</ref><ref>http://www.pcweek.ru/business/article/detail.php?ID=139282 ABBYY: через мобильность и облака к интеллектуальной лингвистике</ref>

Use of a "do-not-translate" list, which has the same end goal – transliteration as opposed to translation.<ref name="singla">{{Citation |last1=Neeraj Agrawal |title=Using Named Entity Recognition to improve Machine Translation |url=http://nlp.stanford.edu/courses/cs224n/2010/reports/singla-nirajuec.pdf |archive-url=https://web.archive.org/web/20130521075940/http://nlp.stanford.edu/courses/cs224n/2010/reports/singla-nirajuec.pdf |access-date=4 November 2013 |archive-date=21 May 2013 |last2=Ankush Singla |mode=cs1 |url-status=live}}</ref> still relies on correct identification of named entities.

A third approach is a class-based model. Named entities are replaced with a token to represent their "class"; "Ted" and "Erica" would both be replaced with "person" class token. Then the statistical distribution and use of person names, in general, can be analyzed instead of looking at the distributions of "Ted" and "Erica" individually, so that the probability of a given name in a specific language will not affect the assigned probability of a translation. A study by Stanford on improving this area of translation gives the examples that different probabilities will be assigned to "David is going for a walk" and "Ankit is going for a walk" for English as a target language due to the different number of occurrences for each name in the training data. A frustrating outcome of the same study by Stanford (and other attempts to improve named recognition translation) is that many times, a decrease in the ] scores for translation will result from the inclusion of methods for named entity translation.<ref name="singla" />


==Applications== ==Applications==
While no system provides the ideal of fully automatic high-quality machine translation of unrestricted text, many fully automated systems produce reasonable output.<ref>{{Cite book |url=http://www.benjamins.com/cgi-bin/t_bookview.cgi?bookid=BTL%2014 |title=Melby, Alan. The Possibility of Language (Amsterdam:Benjamins, 1995, 27–41) |publisher=Benjamins.com |year=1995 |isbn=9789027216144 |access-date=2012-06-12 |archive-url=https://web.archive.org/web/20110525234319/http://www.benjamins.com/cgi-bin/t_bookview.cgi?bookid=BTL%2014 |archive-date=25 May 2011 |url-status=live}}</ref><ref>{{Cite web |last=Wooten |first=Adam |date=14 February 2006 |title=A Simple Model Outlining Translation Technology |url=http://tandibusiness.blogspot.com/2006/02/simple-model-outlining-translation.html |archive-url=https://archive.today/20120716095630/http://tandibusiness.blogspot.de/2006/02/simple-model-outlining-translation.html |archive-date=16 July 2012 |access-date=2012-06-12 |website=T&I Business}}</ref><ref>{{cite web |url=http://www.mt-archive.info/Bar-Hillel-1960-App3.pdf |title=Appendix III of 'The present status of automatic translation of languages', Advances in Computers, vol.1 (1960), p.158-163. Reprinted in Y.Bar-Hillel: Language and information (Reading, Mass.: Addison-Wesley, 1964), p.174-179. |access-date=2012-06-12 |archive-date=28 September 2018 |archive-url=https://web.archive.org/web/20180928203641/http://www.mt-archive.info/Bar-Hillel-1960-App3.pdf |url-status=dead }}</ref> The quality of machine translation is substantially improved if the domain is restricted and controlled.<ref>{{cite web |url=http://tauyou.com/blog/?p=47 |title=Human quality machine translation solution by Ta with you |language=es |publisher=Tauyou.com |date=15 April 2009 |access-date=2012-06-12 |archive-date=22 September 2009 |archive-url=https://web.archive.org/web/20090922094140/http://tauyou.com/blog/?p=47 |url-status=live }}</ref> This enables using machine translation as a tool to speed up and simplify translations, as well as producing flawed but useful low-cost or ad-hoc translations.
There are now many ] programs for translating natural language, several of them ], such as:


===Travel===
*] A free open source machine translation from English to Hindi based on Panini grammar and uses state of the art NLP tools. Can be used online and downloaded from
Machine translation applications have also been released for most mobile devices, including mobile telephones, pocket PCs, PDAs, etc. Due to their portability, such instruments have come to be designated as ] tools enabling mobile business networking between partners speaking different languages, or facilitating both foreign language learning and unaccompanied traveling to foreign countries without the need of the intermediation of a human translator.
*], a free and open source machine translation platform ( gives this a ] GUI)
*], which released a hybrid MT system in 2009.<ref name="speechtechmag.com"/>
* in multilingual framework.
*] provides a custom machine translation engine building capability that they claim gives near-human quality compared to the "gist" based quality of free online engines. ] also provides tools to edit and create custom machine translation engines with their suite of products.
*] a free online translator from ].
*] An open-source platform for data-driven machine translation released under the ]. Platform-independent Java code with both a command-line and graphical interface.
*] A web service which uses the Google Translate API to automatically translate and return Office document files (Word, Excel, PowerPoint, PDF) while preserving the original document layouts.
*
*English to Punjabi Translation Web based English to Punjabi Machine Translation System.
*] A free online translator from ].
*] A web service designed to allow translators to edit the translations that Google Translate automatically generates. With the Google Translator Toolkit, translators can organize their work and use shared translations, glossaries and ].
*] Award winning translation software from Systran, the most popular translation software worldwide used by most professional translators.
*], provides machine translation using a direct approach. It translates Hindi into Punjabi. It also features writing e-mail in the Hindi language and sending the same in Punjabi to the recipient.
*], provides machine translation using a direct approach. It translates Punjabi to Hindi. It also features converting any website in Punjabi to Hindi on the fly. The Punjabi Website must be in Unicode.
*], which powers online translation services at idiomax.com
*], such and ] and .
* is a MT-aggregation platform including an API, supported by the major European MT providers such as ], ], , , , , Amebis, Sunda, and others. The platform allows translation from 46 languages and is open to new MT-providers.
*] sells a bidirectional, offline, speech-to-speech translation app for Apple's ] and the ].
*] A translation environment provided for human translators.
*] Cloud-based platform for generation of custom MT engines from user provided data. Powered by ].
*] provides highly customized hybrid machine translation that can go from any language to any language.
*] Translates in several European languages.
*] A ] statistical machine translation engine released under the ] license for ] and ].
* is a software translator that translates books, web pages, documents, e-mails, faxes, memos, manuals, reports, spreadsheets, correspondence, letters and more to and from many languages. For Windows and Macintosh.
*]
*], which powers online translation services at Voila.fr and Orange.fr
*] and ] which power ]
*]
* &mdash; A hybrid machine translation engine for Spanish-Catalan translation.
*], which powers ]
*] is specialized in customized machine translation solutions in any language. Their web-based user interface makes it easy for any Language Service Provider to generate any combination of domain and language pair to achieve the best quality. Their solution works with almost human quality for a wide variety of language pairs.
*] Free online translator for Latvian language. Provides also free apps for Android and iOS.
*] uses a transfer-based system (known as Kataku) to translate between ] and ].
*] A free online round-trip machine translation tool, which enables checking correctness by back translation. Contains virtual keyboards and human voice. Suitable for right to left languages, as well.
*] translates between ] and ] and ].
* is a rule-based interlingua MT system with statistical component. It allows translation between English, Bulgarian, German, French, Spanish, Italian and Turkish.
*] provides machine translation using both statistical based TE's and rule based TE's. Most recognizable as the MT partner in Microsoft Windows and ].
*], powered by ]


For example, the Google Translate app allows foreigners to quickly translate text in their surrounding via ] using the smartphone camera that overlays the translated text onto the text.<ref>{{cite news |title=Google Translate Adds 20 Languages To Augmented Reality App |url=https://www.popsci.com/google-translate-adds-augmented-reality-translation-app/ |access-date=9 January 2023 |work=Popular Science |date=30 July 2015}}</ref> It can also ] and then translate it.<ref>{{cite news |last1=Whitney |first1=Lance |title=Google Translate app update said to make speech-to-text even easier |url=https://www.cnet.com/tech/services-and-software/google-translation-app-may-better-recognize-certain-languages/ |access-date=9 January 2023 |work=CNET |language=en}}</ref>
A number of translation software programs are available free of charge, e.g. , the multiplatform ]<ref></ref>, and .


===Public administration===
While no system provides the holy grail of fully automatic high-quality machine translation of unrestricted text, many fully automated systems produce reasonable output.<ref></ref><ref></ref><ref></ref> The quality of machine translation is substantially improved if the domain is restricted and controlled.<ref></ref>
Despite their inherent limitations, MT programs are used around the world. Probably the largest institutional user is the ]. In 2012, with an aim to replace a rule-based MT by newer, statistical-based MT@EC, The European Commission contributed 3.072 million euros (via its ISA programme).<ref>{{cite web|url=http://ec.europa.eu/isa/actions/02-interoperability-architecture/2-8action_en.htm|title=Machine Translation Service|date=5 August 2011|access-date=13 September 2013|archive-date=8 September 2013|archive-url=https://web.archive.org/web/20130908232212/http://ec.europa.eu/isa/actions/02-interoperability-architecture/2-8action_en.htm|url-status=live}}</ref>


===Misplaced Pages===
Despite their inherent limitations, MT programs are used around the world. Probably the largest institutional user is the ]. The ] project, for example, coordinated by the ], received more than 2.375 million euros project support from the EU to create a reliable translation tool that covers a majority of the EU languages.
Machine translation has also been used for translating ] articles and could play a larger role in creating, updating, expanding, and generally improving articles in the future, especially as the MT capabilities may improve. There is a "content translation tool" which allows editors to more easily translate articles across several select languages.<ref>{{cite news |last1=Wilson |first1=Kyle |title=Misplaced Pages has a Google Translate problem |url=https://www.theverge.com/2019/5/8/18526739/wikipedia-translation-tool-machine-learning-ai-english |access-date=9 January 2023 |work=The Verge |date=8 May 2019}}</ref><ref>{{cite news |title=Misplaced Pages taps Google to help editors translate articles |url=https://venturebeat.com/ai/wikipedia-taps-google-to-help-editors-translate-articles/ |access-date=9 January 2023 |work=VentureBeat |date=9 January 2019}}</ref><ref>{{cite web |title=Content translation tool helps create over half a million Misplaced Pages articles |url=https://wikimediafoundation.org/news/2019/09/23/content-translation-tool-helps-create-over-half-a-million-wikipedia-articles/ |website=Wikimedia Foundation |access-date=10 January 2023 |date=23 September 2019}}</ref> English-language articles are thought to usually be more comprehensive and less biased than their non-translated equivalents in other languages.<ref>{{cite web |last1=Magazine |first1=Undark |title=Misplaced Pages Has a Language Problem. Here's How To Fix It. |url=https://undark.org/2021/08/12/wikipedia-has-a-language-problem-heres-how-to-fix-it/ |website=Undark Magazine |access-date=9 January 2023 |date=12 August 2021}}</ref> As of 2022, ] has over 6.5 million articles while the ] and ]s each only have over 2.5 million articles,<ref>{{cite web |title=List of Wikipedias - Meta |url=https://meta.wikimedia.org/List_of_Wikipedias |website=meta.wikimedia.org |access-date=9 January 2023 |language=en}}</ref> each often far less comprehensive.


===Surveillance and military===
] has claimed that promising results were obtained using a proprietary statistical machine translation engine.<ref> (by ])</ref> The statistical translation engine used in the ] for Arabic <-> English and Chinese <-> English had an overall score of 0.4281 over the runner-up IBM's BLEU-4 score of 0.3954 (Summer 2006) in tests conducted by the National Institute for Standards and Technology.<ref></ref><ref></ref><ref></ref>
Following terrorist attacks in Western countries, including ], the U.S. and its allies have been most interested in developing ] programs, but also in translating ] and ] languages.{{Citation needed|date=February 2007}} Within these languages, the focus is on key phrases and quick communication between military members and civilians through the use of mobile phone apps.<ref>{{cite journal |last=Gallafent |first=Alex |title=Machine Translation for the Military |journal=PRI's the World |date=26 Apr 2011 |access-date=17 Sep 2013 |url=http://www.theworld.org/2011/04/machine-translation-military/ |archive-date=9 May 2013 |archive-url=https://web.archive.org/web/20130509171415/http://www.theworld.org/2011/04/machine-translation-military/ |url-status=live }}</ref> The Information Processing Technology Office in ] hosted programs like ] and ]. US Air Force has awarded a $1 million contract to develop a language translation technology.<ref>{{cite web |last=Jackson |first=William |url=http://gcn.com/articles/2003/09/09/air-force-wants-to-build-a-universal-translator.aspx |title=GCN – Air force wants to build a universal translator |publisher=Gcn.com |date=9 September 2003 |access-date=2012-06-12 |archive-date=16 June 2011 |archive-url=https://web.archive.org/web/20110616052943/http://gcn.com/articles/2003/09/09/air-force-wants-to-build-a-universal-translator.aspx |url-status=live }}</ref>


===Social media===
With the recent focus on terrorism, the military sources in the United States have been investing significant amounts of money in natural language engineering. ''In-Q-Tel''<ref></ref> (a ] fund, largely funded by the US Intelligence Community, to stimulate new technologies through private sector entrepreneurs) brought up companies like ]. Currently the military community is interested in translation and processing of languages like ], ], and ]. {{Citation needed|date=February 2007}} The Information Processing Technology Office in ] hosts programs like ] and ]. US Air Force has awarded a $1 million contract to develop a language translation technology.<ref></ref>
The notable rise of ] on the web in recent years has created yet another niche for the application of machine translation software – in utilities such as ], or ] clients such as ], ], ], etc. – allowing users speaking different languages to communicate with each other.


==== Online games ====
The notable rise of ] on the web in recent years has created yet another niche for the application of machine translation software – in utilities such as Facebook, or ] clients such as Skype, GoogleTalk, MSN Messenger, etc. – allowing users speaking different languages to communicate with each other. Machine translation applications have also been released for most mobile devices, including mobile telephones, pocket PCs, PDAs, etc. Due to their portability, such instruments have come to be designated as ] tools enabling mobile business networking between partners speaking different languages, or facilitating both foreign language learning and unaccompanied traveling to foreign countries without the need of the intermediation of a human translator.
] gained popularity in Japan because of its machine translation features allowing players from different countries to communicate.<ref>{{Cite web |last=Young-sil |first=Yoon |date=2023-06-26 |title=Korean Games Growing in Popularity in Tough Japanese Game Market |url=http://www.businesskorea.co.kr/news/articleView.html?idxno=117105 |access-date=2023-08-08 |website=BusinessKorea |language=}}</ref>


== Evaluation == ===Medicine===
Despite being labelled as an unworthy competitor to human translation in 1966 by the Automated Language Processing Advisory Committee put together by the United States government,<ref>{{Cite report |url=http://www.nap.edu/html/alpac_lm/ARC000005.pdf |title=Language and Machines: Computers in Translation and Linguistics |last=Automatic Language Processing Advisory Committee, Division of Behavioral Sciences, National Academy of Sciences, National Research Council |date=1966 |publisher=National Research Council, National Academy of Sciences |location=Washington, D. C. |access-date=21 October 2013 |archive-url=https://web.archive.org/web/20131021044934/http://www.nap.edu/html/alpac_lm/ARC000005.pdf |archive-date=21 October 2013 |url-status=live}}</ref> the quality of machine translation has now been improved to such levels that its application in online collaboration and in the medical field are being investigated. The application of this technology in medical settings where human translators are absent is another topic of research, but difficulties arise due to the importance of accurate translations in medical diagnoses.<ref>{{cite journal|url=http://www.cfp.ca/content/59/4/382.full|title=Using machine translation in clinical practice|journal=Canadian Family Physician|date=April 2013|volume=59|issue=4|pages=382–383|access-date=21 October 2013|archive-date=4 May 2013|archive-url=https://web.archive.org/web/20130504040732/http://www.cfp.ca/content/59/4/382.full|url-status=live|last1=Randhawa|first1=Gurdeeshpal|last2=Ferreyra|first2=Mariella|last3=Ahmed|first3=Rukhsana|last4=Ezzat|first4=Omar|last5=Pottie|first5=Kevin|pmid=23585608|pmc=3625087}}</ref>

Researchers caution that the use of machine translation in medicine could risk mistranslations that can be dangerous in critical situations.<ref name=":02">{{Cite journal |last1=Vieira |first1=Lucas Nunes |last2=O’Hagan |first2=Minako |last3=O’Sullivan |first3=Carol |date=2021-08-18 |title=Understanding the societal impacts of machine translation: a critical review of the literature on medical and legal use cases |journal=Information, Communication & Society |language=en |volume=24 |issue=11 |pages=1515–1532 |doi=10.1080/1369118X.2020.1776370 |s2cid=225694304 |issn=1369-118X|doi-access=free |hdl=1983/29727bd1-a1ae-4600-9e8e-018f11ec75fb |hdl-access=free }}</ref><ref>{{Cite journal |last1=Khoong |first1=Elaine C. |last2=Steinbrook |first2=Eric |last3=Brown |first3=Cortlyn |last4=Fernandez |first4=Alicia |date=2019-04-01 |title=Assessing the Use of Google Translate for Spanish and Chinese Translations of Emergency Department Discharge Instructions |journal=JAMA Internal Medicine |language=en |volume=179 |issue=4 |pages=580–582 |doi=10.1001/jamainternmed.2018.7653 |issn=2168-6106 |pmc=6450297 |pmid=30801626}}</ref> Machine translation can make it easier for doctors to communicate with their patients in day to day activities, but it is recommended to only use machine translation when there is no other alternative, and that translated medical texts should be reviewed by human translators for accuracy.<ref>{{Cite journal |last=Piccoli |first=Vanessa |date=2022-07-05 |title=Plurilingualism, multimodality and machine translation in medical consultations: A case study |url=http://www.jbe-platform.com/content/journals/10.1075/tis.21012.pic |journal=Translation and Interpreting Studies |language=en |volume=17 |issue=1 |pages=42–65 |doi=10.1075/tis.21012.pic |s2cid=246780731 |issn=1932-2798}}</ref><ref>{{Cite journal |last1=Herrera-Espejel |first1=Paula Sofia |last2=Rach |first2=Stefan |date=2023-11-20 |title=The Use of Machine Translation for Outreach and Health Communication in Epidemiology and Public Health: Scoping Review |journal=JMIR Public Health and Surveillance |language=en |volume=9 |pages=e50814 |doi=10.2196/50814 |pmid=37983078 |issn=2369-2960 |doi-access=free |pmc=10696499 }}</ref>

=== Law ===
] poses a significant challenge to machine translation tools due to its precise nature and atypical use of normal words. For this reason, specialized algorithms have been developed for use in legal contexts.<ref name=":1">{{Cite web |last=legalj |date=2023-01-02 |title=Man v. Machine: Social and Legal Implications of Machine Translation |url=https://legaljournal.princeton.edu/man-v-machine-social-and-legal-implications-of-machine-translation/ |access-date=2023-12-04 |website=Princeton Legal Journal |language=en-US}}</ref> Due to the risk of mistranslations arising from machine translators, researchers recommend that machine translations should be reviewed by human translators for accuracy, and some courts prohibit its use in ].<ref>{{Cite journal |last=Chavez |first=Edward L. |date=2008 |title=New Mexico's Success with Non-English Speaking Jurors |url=https://heinonline.org/HOL/Page?handle=hein.journals/jrlci1&id=307&div=&collection= |journal=Journal of Court Innovation |volume=1 |pages=303}}</ref>

The use of machine translation in law has raised concerns about translation errors and ]. Lawyers who use free translation tools such as Google Translate may accidentally violate client confidentiality by exposing private information to the providers of the translation tools.<ref name=":1" /> In addition, there have been arguments that consent for a police search that is obtained with machine translation is invalid, with different courts issuing different verdicts over whether or not these arguments are valid.<ref name=":02"/>

=== Ancient languages ===
The advancements in ]s in recent years and in low resource machine translation (when only a very limited amount of data and examples are available for training) enabled machine translation for ancient languages, such as ] and its dialects Babylonian and Assyrian.<ref>{{Cite journal |last1=Gutherz |first1=Gai |last2=Gordin |first2=Shai |last3=Sáenz |first3=Luis |last4=Levy |first4=Omer |last5=Berant |first5=Jonathan |date=2023-05-02 |editor-last=Kearns |editor-first=Michael |title=Translating Akkadian to English with neural machine translation |url=https://academic.oup.com/pnasnexus/article/doi/10.1093/pnasnexus/pgad096/7147349 |journal=PNAS Nexus |language=en |volume=2 |issue=5 |pages=pgad096 |doi=10.1093/pnasnexus/pgad096 |issn=2752-6542 |pmc=10153418 |pmid=37143863}}</ref>

==Evaluation==
{{Main|Evaluation of machine translation}} {{Main|Evaluation of machine translation}}
There are many factors that affect how machine translation systems are evaluated. These factors include the intended use of the translation, the nature of the machine translation software, and the nature of the translation process.
Machine translation systems and output can be evaluated along numerous dimensions. The intended use of the translation, characteristics of the MT software, the nature of the translation process, etc., all affect how one evaluates MT systems and their output. The FEMTI taxonomy of dimensions, with associated evaluation metrics, appears at http://www.issco.unige.ch:8080/cocoon/femti/st-home.html .

Different programs may work well for different purposes. For example, ] (SMT) typically outperforms ] (EBMT), but researchers found that when evaluating English to French translation, EBMT performs better.<ref name="Way 295–309">{{cite journal|last=Way|first=Andy|author2=Nano Gough|title=Comparing Example-Based and Statistical Machine Translation|journal=Natural Language Engineering|date=20 September 2005|volume=11|issue=3|pages=295–309|doi=10.1017/S1351324905003888|doi-broken-date=1 November 2024 |s2cid=3242163}}</ref> The same concept applies for technical documents, which can be more easily translated by SMT because of their formal language.

In certain applications, however, e.g., product descriptions written in a ], a ] system has produced satisfactory translations that require no human intervention save for quality inspection.<ref>Muegge (2006), " {{Webarchive|url=https://web.archive.org/web/20111017043848/http://www.mt-archive.info/Aslib-2006-Muegge.pdf |date=17 October 2011 }}," in ''Translating and the computer 28. Proceedings of the twenty-eighth international conference on translating and the computer, 16–17 November 2006, London'', London: Aslib. {{ISBN|978-0-85142-483-5}}.</ref>

There are various means for evaluating the output quality of machine translation systems. The oldest is the use of human judges<ref>{{cite web |url=http://www.morphologic.hu/public/mt/2008/compare12.htm |title=Comparison of MT systems by human evaluation, May 2008 |publisher=Morphologic.hu |access-date=2012-06-12 |archive-url=https://web.archive.org/web/20120419072313/http://www.morphologic.hu/public/mt/2008/compare12.htm |archive-date=19 April 2012 |url-status=dead |df=dmy-all }}</ref> to assess a translation's quality. Even though human evaluation is time-consuming, it is still the most reliable method to compare different systems such as rule-based and statistical systems.<ref>Anderson, D.D. (1995). {{Webarchive|url=https://web.archive.org/web/20180104073518/http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.961.5377&rep=rep1&type=pdf |date=4 January 2018 }}. CALICO Journal. 13(1). 68–96.</ref> ]d means of evaluation include ], ], ], and ].<ref>Han et al. (2012), " {{Webarchive|url=https://web.archive.org/web/20180104073506/http://repository.umac.mo/jspui/bitstream/10692/1747/1/10205_0_%5B2012-12-08~15%5D%20C.%20%28COLING2012%29%20LEPOR.pdf |date=4 January 2018 }}," in ''Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012): Posters, pages 441–450'', Mumbai, India.</ref>

Relying exclusively on unedited machine translation ignores the fact that communication in ] is context-embedded and that it takes a person to comprehend the ] of the original text with a reasonable degree of probability. It is certainly true that even purely human-generated translations are prone to error. Therefore, to ensure that a machine-generated translation will be useful to a human being and that publishable-quality translation is achieved, such translations must be reviewed and edited by a human.<ref>J.M. Cohen observes (p.14): "Scientific translation is the aim of an age that would reduce all activities to ]. It is impossible however to imagine a literary-translation machine less complex than the human brain itself, with all its knowledge, reading, and discrimination."</ref> The late ] wrote that machine translation, at its best, automates the easier part of a translator's job; the harder and more time-consuming part usually involves doing extensive research to resolve ] in the ], which the ] and ] exigencies of the target language require to be resolved. Such research is a necessary prelude to the pre-editing necessary in order to provide input for machine-translation software such that the output will not be ].<ref name="NIST">See the {{Webarchive|url=https://web.archive.org/web/20090322202656/http://nist.gov/speech/tests/mt/ |date=22 March 2009 }} and ]</ref>

In addition to disambiguation problems, decreased accuracy can occur due to varying levels of training data for machine translating programs. Both example-based and statistical machine translation rely on a vast array of real example sentences as a base for translation, and when too many or too few sentences are analyzed accuracy is jeopardized. Researchers found that when a program is trained on 203,529 sentence pairings, accuracy actually decreases.<ref name="Way 295–309"/> The optimal level of training data seems to be just over 100,000 sentences, possibly because as training data increases, the number of possible sentences increases, making it harder to find an exact translation match.

Flaws in machine translation have been noted for ]. Two videos uploaded to ] in April 2017 involve two Japanese ] characters えぐ ('']'' and '']'') being repeatedly pasted into Google Translate, with the resulting translations quickly degrading into nonsensical phrases such as "DECEARING EGG" and "Deep-sea squeeze trees", which are then read in increasingly absurd voices;<ref>{{Cite web|url=https://www.businessinsider.com/google-translate-fails-2017-11|title=4 times Google Translate totally dropped the ball|first=Mark|last=Abadi|website=Business Insider}}</ref><ref>{{Cite web|url=https://nlab.itmedia.co.jp/nl/articles/1704/16/news013.html|title=回数を重ねるほど狂っていく Google翻訳で「えぐ」を英訳すると奇妙な世界に迷い込むと話題に|website=ねとらぼ}}</ref> the full-length version of the video currently has 6.9 million views {{as of|lc=y|March 2022|post=.}}<ref>{{Cite web|url=https://www.youtube.com/watch?v=3-rfBsWmo0M|title=えぐ|date=12 April 2017 |via=www.youtube.com}}</ref>


==Machine translation and signed languages==
There are various means for evaluating the output quality of machine translation systems. The oldest is the use of human judges<ref></ref> to assess a translation's quality. Even though human evaluation is time-consuming, it is still the most reliable way to compare different systems such as rule-based and statistical systems. ]d means of evaluation include ], ] and ].
{{main|Machine translation of sign languages}}
In the early 2000s, options for machine translation between spoken and signed languages were severely limited. It was a common belief that deaf individuals could use traditional translators. However, stress, intonation, pitch, and timing are conveyed much differently in spoken languages compared to signed languages. Therefore, a deaf individual may misinterpret or become confused about the meaning of written text that is based on a spoken language.<ref name="Zhao, L. 2000">Zhao, L., Kipper, K., Schuler, W., Vogler, C., & Palmer, M. (2000). {{Webarchive|url=https://web.archive.org/web/20180720012839/https://repository.upenn.edu/cgi/viewcontent.cgi?article=1043&context=hms |date=20 July 2018 }}. Lecture Notes in Computer Science, 1934: 54–67.</ref>


Researchers Zhao, et al. (2000), developed a prototype called TEAM (translation from English to ASL by machine) that completed English to ] (ASL) translations. The program would first analyze the syntactic, grammatical, and morphological aspects of the English text. Following this step, the program accessed a sign synthesizer, which acted as a dictionary for ASL. This synthesizer housed the process one must follow to complete ASL signs, as well as the meanings of these signs. Once the entire text is analyzed and the signs necessary to complete the translation are located in the synthesizer, a computer generated human appeared and would use ASL to sign the English text to the user.<ref name="Zhao, L. 2000"/>
Relying exclusively on unedited machine translation ignores the fact that communication in ] is ]-embedded and that it takes a person to comprehend the context of the original text with a reasonable degree of probability. It is certainly true that even purely human-generated translations are prone to error. Therefore, to ensure that a machine-generated translation will be useful to a human being and that publishable-quality translation is achieved, such translations must be reviewed and edited by a human.<ref>J.M. Cohen observes (p.14): "Scientific translation is the aim of an age that would reduce all activities to ]. It is impossible however to imagine a literary-translation machine less complex than the human brain itself, with all its knowledge, reading, and discrimination."</ref> The late ] wrote that machine translation, at its best, automates the easier part of a translator's job; the harder and more time-consuming part usually involves doing extensive research to resolve ] in the ], which the ] and ] exigencies of the target language require to be resolved.<ref name="piron" /> Such research is a necessary prelude to the pre-editing necessary in order to provide input for machine-translation software such that the output will not be ].<ref name="NIST">See the and ]</ref>


==Copyright==
In certain applications, however, e.g., product descriptions written in a ], a ] system has produced satisfactory translations that require no human intervention save for quality inspection.<ref>Muegge (2006), "Fully Automatic High Quality Machine Translation of Restricted Text: A Case Study," in ''Translating and the computer 28. Proceedings of the twenty-eighth international conference on translating and the computer, 16–17 November 2006, London'', London: Aslib. ISBN 978-0-85142-483-5.</ref>
Only ]s that are ] are subject to ] protection, so some scholars claim that machine translation results are not entitled to copyright protection because MT does not involve ].<ref>{{cite web|url=http://www.seo-translator.com/machine-translation-no-copyright-on-the-result/|title=Machine Translation: No Copyright On The Result?|access-date=24 November 2012|publisher=SEO Translator, citing ]|archive-date=29 November 2012|archive-url=https://web.archive.org/web/20121129042959/http://www.seo-translator.com/machine-translation-no-copyright-on-the-result/|url-status=live}}</ref> The copyright at issue is for a ]; the author of the ] in the original language does not lose his ] when a work is translated: a translator must have permission to ] a translation.{{Citation needed|date=July 2024}}


== See also == ==See also==
{{div col|colwidth=18em}}
* ]
*]
* ]
*]
* ]
*]
* ]
*]
* ]
*]
* ]
* ] and ] *] and ]
* ] *]
* ] *]
*]
* ]
*]
* ]
* ] *]
* ] *]
*] ("howlers")
* ]
*]
* ]
*]
* ]
*]
* ]
* ] *]
* ] *]
*]
* ]
* ] *]
* ] *]
*]
*]
*]
*]
*{{section link|Translation#Machine translation}}
*]
*]
*]
*]
{{div col end}}


==Notes== ==Notes==
{{reflist|2}} {{reflist}}


==References== ==Further reading==
* Cohen, J.M., "Translation", '']'', 1986, vol. 27, pp.&nbsp;12–15. * {{Citation |last=Cohen |first=J. M. |contribution=Translation |title=Encyclopedia Americana |year=1986 |volume=27 |pages=12–15 |ref=none|title-link=Encyclopedia Americana }}
*{{cite book | last = Hutchins | first = W. John | authorlink = John Hutchins | coauthors = and Harold L. Somers | year = 1992 | title = An Introduction to Machine Translation | url = http://www.hutchinsweb.me.uk/IntroMT-TOC.htm | publisher = Academic Press | location = London | isbn = 0-12-362830-X}} * {{Cite book |last1=Hutchins |first1=W. John |url=https://archive.org/details/introductiontoma0000hutc |title=An Introduction to Machine Translation |last2=Somers |first2=Harold L. |publisher=Academic Press |year=1992 |isbn=0-12-362830-X |location=London |author-link=W. John Hutchins |url-access=registration}}
* {{Cite magazine |last=Lewis-Kraus |first=Gideon |date=7 June 2015 |title=Tower of Babble |magazine=New York Times Magazine |pages=48–52}}
*], ''Le défi des langues — Du gâchis au bon sens'' (The Language Challenge: From Chaos to Common Sense), Paris, L'Harmattan, 1994.
* {{Cite journal |last1=Weber |first1=Steven |last2=Mehandru |first2=Nikita |date=2022 |title=The 2020s Political Economy of Machine Translation |journal=Business and Politics |language=en |volume=24 |issue=1 |pages=96–112 |doi=10.1017/bap.2021.17|arxiv=2011.01007 |s2cid=226236853 }}


==External links== ==External links==
{{Wikiversity|Topic:Computational linguistics}} {{Wikiversity|Topic:Computational linguistics}}
* *
* * {{Webarchive|url=https://web.archive.org/web/20100624162302/http://www.eamt.org/iamt.php |date=24 June 2010 }}
* {{Webarchive|url=https://web.archive.org/web/20190401232615/http://www.mt-archive.info/ |date=1 April 2019 }} by ]. An electronic repository (and bibliography) of articles, books and papers in the field of machine translation and computer-based translation technology
* Machine Translation in Post-Contemporary Era by Grace Hui Chin Lin
* – Publications by John Hutchins (includes ]s of several books on machine translation)
* Machine Translation for Academic Purposes by Grace Hui Chin Lin & Paul Shih Chieh Chien
*
* by ]. An electronic repository (and bibliography) of articles, books and papers in the field of machine translation and computer-based translation technology
* {{Webarchive|url=https://web.archive.org/web/20070907182416/http://www.foreignword.com/Technology/art/Hutchins/hutchins99.htm |date=7 September 2007 }}
* &mdash; Publications by John Hutchins (includes ]s of several books on machine translation)
*
*
*
*
{{Natural Language Processing}}
*
{{Approaches to machine translation}}
{{emerging technologies|topics=yes|infocom=yes}}


{{Authority control}}
{{Approaches to machine translation}}

{{Emerging technologies}}
] ]
]
] ]
]
] ]
] ]
]

{{Link FA|eu}}

]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]

Latest revision as of 02:03, 10 January 2025

Computerized translation between natural languages Not to be confused with Computer-assisted translation, Interactive machine translation, or Translator (computing).

A mobile phone app translating Spanish text into English
Part of a series on
Translation
Types
Theory
Technologies
Localization
Institutional
Related topics

Machine translation is use of computational techniques to translate text or speech from one language to another, including the contextual, idiomatic and pragmatic nuances of both languages.

Early approaches were mostly rule-based or statistical. These methods have since been superseded by neural machine translation and large language models.

History

Main article: History of machine translation

Origins

The origins of machine translation can be traced back to the work of Al-Kindi, a ninth-century Arabic cryptographer who developed techniques for systemic language translation, including cryptanalysis, frequency analysis, and probability and statistics, which are used in modern machine translation. The idea of machine translation later appeared in the 17th century. In 1629, René Descartes proposed a universal language, with equivalent ideas in different tongues sharing one symbol.

The idea of using digital computers for translation of natural languages was proposed as early as 1947 by England's A. D. Booth and Warren Weaver at Rockefeller Foundation in the same year. "The memorandum written by Warren Weaver in 1949 is perhaps the single most influential publication in the earliest days of machine translation." Others followed. A demonstration was made in 1954 on the APEXC machine at Birkbeck College (University of London) of a rudimentary translation of English into French. Several papers on the topic were published at the time, and even articles in popular journals (for example an article by Cleave and Zacharov in the September 1955 issue of Wireless World). A similar application, also pioneered at Birkbeck College at the time, was reading and composing Braille texts by computer.

1950s

The first researcher in the field, Yehoshua Bar-Hillel, began his research at MIT (1951). A Georgetown University MT research team, led by Professor Michael Zarechnak, followed (1951) with a public demonstration of its Georgetown-IBM experiment system in 1954. MT research programs popped up in Japan and Russia (1955), and the first MT conference was held in London (1956).

David G. Hays "wrote about computer-assisted language processing as early as 1957" and "was project leader on computational linguistics at Rand from 1955 to 1968."

1960–1975

Researchers continued to join the field as the Association for Machine Translation and Computational Linguistics was formed in the U.S. (1962) and the National Academy of Sciences formed the Automatic Language Processing Advisory Committee (ALPAC) to study MT (1964). Real progress was much slower, however, and after the ALPAC report (1966), which found that the ten-year-long research had failed to fulfill expectations, funding was greatly reduced. According to a 1972 report by the Director of Defense Research and Engineering (DDR&E), the feasibility of large-scale MT was reestablished by the success of the Logos MT system in translating military manuals into Vietnamese during that conflict.

The French Textile Institute also used MT to translate abstracts from and into French, English, German and Spanish (1970); Brigham Young University started a project to translate Mormon texts by automated translation (1971).

1975 and beyond

SYSTRAN, which "pioneered the field under contracts from the U.S. government" in the 1960s, was used by Xerox to translate technical manuals (1978). Beginning in the late 1980s, as computational power increased and became less expensive, more interest was shown in statistical models for machine translation. MT became more popular after the advent of computers. SYSTRAN's first implementation system was implemented in 1988 by the online service of the French Postal Service called Minitel. Various computer based translation companies were also launched, including Trados (1984), which was the first to develop and market Translation Memory technology (1989), though this is not the same as MT. The first commercial MT system for Russian / English / German-Ukrainian was developed at Kharkov State University (1991).

By 1998, "for as little as $29.95" one could "buy a program for translating in one direction between English and a major European language of your choice" to run on a PC.

MT on the web started with SYSTRAN offering free translation of small texts (1996) and then providing this via AltaVista Babelfish, which racked up 500,000 requests a day (1997). The second free translation service on the web was Lernout & Hauspie's GlobaLink. Atlantic Magazine wrote in 1998 that "Systran's Babelfish and GlobaLink's Comprende" handled "Don't bank on it" with a "competent performance."

Franz Josef Och (the future head of Translation Development AT Google) won DARPA's speed MT competition (2003). More innovations during this time included MOSES, the open-source statistical MT engine (2007), a text/SMS translation service for mobiles in Japan (2008), and a mobile phone with built-in speech-to-speech translation functionality for English, Japanese and Chinese (2009). In 2012, Google announced that Google Translate translates roughly enough text to fill 1 million books in one day.

Approaches

See also: Hybrid machine translation and Example-based machine translation

Before the advent of deep learning methods, statistical methods required a lot of rules accompanied by morphological, syntactic, and semantic annotations.

Rule-based

Main article: Rule-based machine translation

The rule-based machine translation approach was used mostly in the creation of dictionaries and grammar programs. Its biggest downfall was that everything had to be made explicit: orthographical variation and erroneous input must be made part of the source language analyser in order to cope with it, and lexical selection rules must be written for all instances of ambiguity.

Transfer-based machine translation

Main article: Transfer-based machine translation

Transfer-based machine translation was similar to interlingual machine translation in that it created a translation from an intermediate representation that simulated the meaning of the original sentence. Unlike interlingual MT, it depended partially on the language pair involved in the translation.

Interlingual

Main article: Interlingual machine translation

Interlingual machine translation was one instance of rule-based machine-translation approaches. In this approach, the source language, i.e. the text to be translated, was transformed into an interlingual language, i.e. a "language neutral" representation that is independent of any language. The target language was then generated out of the interlingua. The only interlingual machine translation system that was made operational at the commercial level was the KANT system (Nyberg and Mitamura, 1992), which was designed to translate Caterpillar Technical English (CTE) into other languages.

Dictionary-based

Main article: Dictionary-based machine translation

Machine translation used a method based on dictionary entries, which means that the words were translated as they are by a dictionary.

Statistical

Main article: Statistical machine translation

Statistical machine translation tried to generate translations using statistical methods based on bilingual text corpora, such as the Canadian Hansard corpus, the English-French record of the Canadian parliament and EUROPARL, the record of the European Parliament. Where such corpora were available, good results were achieved translating similar texts, but such corpora were rare for many language pairs. The first statistical machine translation software was CANDIDE from IBM. In 2005, Google improved its internal translation capabilities by using approximately 200 billion words from United Nations materials to train their system; translation accuracy improved.

SMT's biggest downfall included it being dependent upon huge amounts of parallel texts, its problems with morphology-rich languages (especially with translating into such languages), and its inability to correct singleton errors.

Some work has been done in the utilization of multiparallel corpora, that is a body of text that has been translated into 3 or more languages. Using these methods, a text that has been translated into 2 or more languages may be utilized in combination to provide a more accurate translation into a third language compared with if just one of those source languages were used alone.

Neural MT

Main article: Neural machine translation

A deep learning-based approach to MT, neural machine translation has made rapid progress in recent years. However, the current consensus is that the so-called human parity achieved is not real, being based wholly on limited domains, language pairs, and certain test benchmarks i.e., it lacks statistical significance power.

Translations by neural MT tools like DeepL Translator, which is thought to usually deliver the best machine translation results as of 2022, typically still need post-editing by a human.

Instead of training specialized translation models on parallel datasets, one can also directly prompt generative large language models like GPT to translate a text. This approach is considered promising, but is still more resource-intensive than specialized translation models.

Issues

Machine translation could produce some non-understandable phrases, such as "鸡枞" (Macrolepiota albuminosa) being rendered as "wikipedia".
Broken Chinese "沒有進入" from machine translation in Bali, Indonesia. The broken Chinese sentence sounds like "there does not exist an entry" or "have not entered yet".

Studies using human evaluation (e.g. by professional literary translators or human readers) have systematically identified various issues with the latest advanced MT outputs. Common issues include the translation of ambiguous parts whose correct translation requires common sense-like semantic language processing or context. There can also be errors in the source texts, missing high-quality training data and the severity of frequency of several types of problems may not get reduced with techniques used to date, requiring some level of human active participation.

Disambiguation

Main articles: Word-sense disambiguation and Syntactic disambiguation

Word-sense disambiguation concerns finding a suitable translation when a word can have more than one meaning. The problem was first raised in the 1950s by Yehoshua Bar-Hillel. He pointed out that without a "universal encyclopedia", a machine would never be able to distinguish between the two meanings of a word. Today there are numerous approaches designed to overcome this problem. They can be approximately divided into "shallow" approaches and "deep" approaches.

Shallow approaches assume no knowledge of the text. They simply apply statistical methods to the words surrounding the ambiguous word. Deep approaches presume a comprehensive knowledge of the word. So far, shallow approaches have been more successful.

Claude Piron, a long-time translator for the United Nations and the World Health Organization, wrote that machine translation, at its best, automates the easier part of a translator's job; the harder and more time-consuming part usually involves doing extensive research to resolve ambiguities in the source text, which the grammatical and lexical exigencies of the target language require to be resolved:

Why does a translator need a whole workday to translate five pages, and not an hour or two? ..... About 90% of an average text corresponds to these simple conditions. But unfortunately, there's the other 10%. It's that part that requires six hours of work. There are ambiguities one has to resolve. For instance, the author of the source text, an Australian physician, cited the example of an epidemic which was declared during World War II in a "Japanese prisoners of war camp". Was he talking about an American camp with Japanese prisoners or a Japanese camp with American prisoners? The English has two senses. It's necessary therefore to do research, maybe to the extent of a phone call to Australia.

The ideal deep approach would require the translation software to do all the research necessary for this kind of disambiguation on its own; but this would require a higher degree of AI than has yet been attained. A shallow approach which simply guessed at the sense of the ambiguous English phrase that Piron mentions (based, perhaps, on which kind of prisoner-of-war camp is more often mentioned in a given corpus) would have a reasonable chance of guessing wrong fairly often. A shallow approach that involves "ask the user about each ambiguity" would, by Piron's estimate, only automate about 25% of a professional translator's job, leaving the harder 75% still to be done by a human.

Non-standard speech

One of the major pitfalls of MT is its inability to translate non-standard language with the same accuracy as standard language. Heuristic or statistical based MT takes input from various sources in standard form of a language. Rule-based translation, by nature, does not include common non-standard usages. This causes errors in translation from a vernacular source or into colloquial language. Limitations on translation from casual speech present issues in the use of machine translation in mobile devices.

Named entities

Main article: Named entity

In information extraction, named entities, in a narrow sense, refer to concrete or abstract entities in the real world such as people, organizations, companies, and places that have a proper name: George Washington, Chicago, Microsoft. It also refers to expressions of time, space and quantity such as 1 July 2011, $500.

In the sentence "Smith is the president of Fabrionix" both Smith and Fabrionix are named entities, and can be further qualified via first name or other information; "president" is not, since Smith could have earlier held another position at Fabrionix, e.g. Vice President. The term rigid designator is what defines these usages for analysis in statistical machine translation.

Named entities must first be identified in the text; if not, they may be erroneously translated as common nouns, which would most likely not affect the BLEU rating of the translation but would change the text's human readability. They may be omitted from the output translation, which would also have implications for the text's readability and message.

Transliteration includes finding the letters in the target language that most closely correspond to the name in the source language. This, however, has been cited as sometimes worsening the quality of translation. For "Southern California" the first word should be translated directly, while the second word should be transliterated. Machines often transliterate both because they treated them as one entity. Words like these are hard for machine translators, even those with a transliteration component, to process.

Use of a "do-not-translate" list, which has the same end goal – transliteration as opposed to translation. still relies on correct identification of named entities.

A third approach is a class-based model. Named entities are replaced with a token to represent their "class"; "Ted" and "Erica" would both be replaced with "person" class token. Then the statistical distribution and use of person names, in general, can be analyzed instead of looking at the distributions of "Ted" and "Erica" individually, so that the probability of a given name in a specific language will not affect the assigned probability of a translation. A study by Stanford on improving this area of translation gives the examples that different probabilities will be assigned to "David is going for a walk" and "Ankit is going for a walk" for English as a target language due to the different number of occurrences for each name in the training data. A frustrating outcome of the same study by Stanford (and other attempts to improve named recognition translation) is that many times, a decrease in the BLEU scores for translation will result from the inclusion of methods for named entity translation.

Applications

While no system provides the ideal of fully automatic high-quality machine translation of unrestricted text, many fully automated systems produce reasonable output. The quality of machine translation is substantially improved if the domain is restricted and controlled. This enables using machine translation as a tool to speed up and simplify translations, as well as producing flawed but useful low-cost or ad-hoc translations.

Travel

Machine translation applications have also been released for most mobile devices, including mobile telephones, pocket PCs, PDAs, etc. Due to their portability, such instruments have come to be designated as mobile translation tools enabling mobile business networking between partners speaking different languages, or facilitating both foreign language learning and unaccompanied traveling to foreign countries without the need of the intermediation of a human translator.

For example, the Google Translate app allows foreigners to quickly translate text in their surrounding via augmented reality using the smartphone camera that overlays the translated text onto the text. It can also recognize speech and then translate it.

Public administration

Despite their inherent limitations, MT programs are used around the world. Probably the largest institutional user is the European Commission. In 2012, with an aim to replace a rule-based MT by newer, statistical-based MT@EC, The European Commission contributed 3.072 million euros (via its ISA programme).

Misplaced Pages

Machine translation has also been used for translating Misplaced Pages articles and could play a larger role in creating, updating, expanding, and generally improving articles in the future, especially as the MT capabilities may improve. There is a "content translation tool" which allows editors to more easily translate articles across several select languages. English-language articles are thought to usually be more comprehensive and less biased than their non-translated equivalents in other languages. As of 2022, English Misplaced Pages has over 6.5 million articles while the German and Swedish Wikipedias each only have over 2.5 million articles, each often far less comprehensive.

Surveillance and military

Following terrorist attacks in Western countries, including 9-11, the U.S. and its allies have been most interested in developing Arabic machine translation programs, but also in translating Pashto and Dari languages. Within these languages, the focus is on key phrases and quick communication between military members and civilians through the use of mobile phone apps. The Information Processing Technology Office in DARPA hosted programs like TIDES and Babylon translator. US Air Force has awarded a $1 million contract to develop a language translation technology.

Social media

The notable rise of social networking on the web in recent years has created yet another niche for the application of machine translation software – in utilities such as Facebook, or instant messaging clients such as Skype, Google Talk, MSN Messenger, etc. – allowing users speaking different languages to communicate with each other.

Online games

Lineage W gained popularity in Japan because of its machine translation features allowing players from different countries to communicate.

Medicine

Despite being labelled as an unworthy competitor to human translation in 1966 by the Automated Language Processing Advisory Committee put together by the United States government, the quality of machine translation has now been improved to such levels that its application in online collaboration and in the medical field are being investigated. The application of this technology in medical settings where human translators are absent is another topic of research, but difficulties arise due to the importance of accurate translations in medical diagnoses.

Researchers caution that the use of machine translation in medicine could risk mistranslations that can be dangerous in critical situations. Machine translation can make it easier for doctors to communicate with their patients in day to day activities, but it is recommended to only use machine translation when there is no other alternative, and that translated medical texts should be reviewed by human translators for accuracy.

Law

Legal language poses a significant challenge to machine translation tools due to its precise nature and atypical use of normal words. For this reason, specialized algorithms have been developed for use in legal contexts. Due to the risk of mistranslations arising from machine translators, researchers recommend that machine translations should be reviewed by human translators for accuracy, and some courts prohibit its use in formal proceedings.

The use of machine translation in law has raised concerns about translation errors and client confidentiality. Lawyers who use free translation tools such as Google Translate may accidentally violate client confidentiality by exposing private information to the providers of the translation tools. In addition, there have been arguments that consent for a police search that is obtained with machine translation is invalid, with different courts issuing different verdicts over whether or not these arguments are valid.

Ancient languages

The advancements in convolutional neural networks in recent years and in low resource machine translation (when only a very limited amount of data and examples are available for training) enabled machine translation for ancient languages, such as Akkadian and its dialects Babylonian and Assyrian.

Evaluation

Main article: Evaluation of machine translation

There are many factors that affect how machine translation systems are evaluated. These factors include the intended use of the translation, the nature of the machine translation software, and the nature of the translation process.

Different programs may work well for different purposes. For example, statistical machine translation (SMT) typically outperforms example-based machine translation (EBMT), but researchers found that when evaluating English to French translation, EBMT performs better. The same concept applies for technical documents, which can be more easily translated by SMT because of their formal language.

In certain applications, however, e.g., product descriptions written in a controlled language, a dictionary-based machine-translation system has produced satisfactory translations that require no human intervention save for quality inspection.

There are various means for evaluating the output quality of machine translation systems. The oldest is the use of human judges to assess a translation's quality. Even though human evaluation is time-consuming, it is still the most reliable method to compare different systems such as rule-based and statistical systems. Automated means of evaluation include BLEU, NIST, METEOR, and LEPOR.

Relying exclusively on unedited machine translation ignores the fact that communication in human language is context-embedded and that it takes a person to comprehend the context of the original text with a reasonable degree of probability. It is certainly true that even purely human-generated translations are prone to error. Therefore, to ensure that a machine-generated translation will be useful to a human being and that publishable-quality translation is achieved, such translations must be reviewed and edited by a human. The late Claude Piron wrote that machine translation, at its best, automates the easier part of a translator's job; the harder and more time-consuming part usually involves doing extensive research to resolve ambiguities in the source text, which the grammatical and lexical exigencies of the target language require to be resolved. Such research is a necessary prelude to the pre-editing necessary in order to provide input for machine-translation software such that the output will not be meaningless.

In addition to disambiguation problems, decreased accuracy can occur due to varying levels of training data for machine translating programs. Both example-based and statistical machine translation rely on a vast array of real example sentences as a base for translation, and when too many or too few sentences are analyzed accuracy is jeopardized. Researchers found that when a program is trained on 203,529 sentence pairings, accuracy actually decreases. The optimal level of training data seems to be just over 100,000 sentences, possibly because as training data increases, the number of possible sentences increases, making it harder to find an exact translation match.

Flaws in machine translation have been noted for their entertainment value. Two videos uploaded to YouTube in April 2017 involve two Japanese hiragana characters えぐ (e and gu) being repeatedly pasted into Google Translate, with the resulting translations quickly degrading into nonsensical phrases such as "DECEARING EGG" and "Deep-sea squeeze trees", which are then read in increasingly absurd voices; the full-length version of the video currently has 6.9 million views as of March 2022.

Machine translation and signed languages

Main article: Machine translation of sign languages

In the early 2000s, options for machine translation between spoken and signed languages were severely limited. It was a common belief that deaf individuals could use traditional translators. However, stress, intonation, pitch, and timing are conveyed much differently in spoken languages compared to signed languages. Therefore, a deaf individual may misinterpret or become confused about the meaning of written text that is based on a spoken language.

Researchers Zhao, et al. (2000), developed a prototype called TEAM (translation from English to ASL by machine) that completed English to American Sign Language (ASL) translations. The program would first analyze the syntactic, grammatical, and morphological aspects of the English text. Following this step, the program accessed a sign synthesizer, which acted as a dictionary for ASL. This synthesizer housed the process one must follow to complete ASL signs, as well as the meanings of these signs. Once the entire text is analyzed and the signs necessary to complete the translation are located in the synthesizer, a computer generated human appeared and would use ASL to sign the English text to the user.

Copyright

Only works that are original are subject to copyright protection, so some scholars claim that machine translation results are not entitled to copyright protection because MT does not involve creativity. The copyright at issue is for a derivative work; the author of the original work in the original language does not lose his rights when a work is translated: a translator must have permission to publish a translation.

See also

Notes

  1. "Google Translate Gets a Deep-Learning Upgrade". IEEE Spectrum. 3 October 2016. Retrieved 7 July 2024.
  2. "Google Translate vs. ChatGPT: Which One Is the Best Language Translator?". PCMag UK. 23 February 2024. Retrieved 7 July 2024.
  3. DuPont, Quinn (January 2018). "The Cryptological Origins of Machine Translation: From al-Kindi to Weaver". Amodern. Archived from the original on 14 August 2019. Retrieved 2 September 2019.
  4. Knowlson, James (1975). Universal Language Schemes in England and France, 1600-1800. Toronto: University of Toronto Press. ISBN 0-8020-5296-7.
  5. Booth, Andrew D. (1 May 1953). "MECHANICAL TRANSLATION". Computers and Automation 1953-05: Vol 2 Iss 4. Berkeley Enterprises. p. 6.
  6. J. Hutchins (2000). "Warren Weaver and the launching of MT". Early Years in Machine Translation (PDF). Studies in the History of the Language Sciences. Vol. 97. p. 17. doi:10.1075/sihols.97.05hut. ISBN 978-90-272-4586-1. S2CID 163460375. Archived from the original (PDF) on 28 February 2020 – via Semantic Scholar.
  7. "Warren Weaver, American mathematician". 13 July 2020. Archived from the original on 6 March 2021. Retrieved 7 August 2020.
  8. 上野, 俊夫 (13 August 1986). パーソナルコンピュータによる機械翻訳プログラムの制作 (in Japanese). Tokyo: (株)ラッセル社. p. 16. ISBN 494762700X. わが国では1956年、当時の電気試験所が英和翻訳専用機「ヤマト」を実験している。この機械は1962年頃には中学1年の教科書で90点以上の能力に達したと報告されている。(translation (assisted by Google Translate): In 1959 Japan, the National Institute of Advanced Industrial Science and Technology(AIST) tested the proper English-Japanese translation machine Yamato, which reported in 1964 as that reached the power level over the score of 90-point on the textbook of first grade of junior hi-school.)
  9. "機械翻訳専用機「やまと」-コンピュータ博物館". Archived from the original on 19 October 2016. Retrieved 4 April 2017.
  10. Nye, Mary Jo (2016). "Speaking in Tongues: Science's centuries-long hunt for a common language". Distillations. 2 (1): 40–43. Archived from the original on 3 August 2020. Retrieved 20 March 2018.
  11. Gordin, Michael D. (2015). Scientific Babel: How Science Was Done Before and After Global English. Chicago, Illinois: University of Chicago Press. ISBN 9780226000299.
  12. Wolfgang Saxon (28 July 1995). "David G. Hays, 66, a Developer Of Language Study by Computer". The New York Times. Archived from the original on 7 February 2020. Retrieved 7 August 2020. wrote about computer-assisted language processing as early as 1957.. was project leader on computational linguistics at Rand from 1955 to 1968.
  13. 上野, 俊夫 (13 August 1986). パーソナルコンピュータによる機械翻訳プログラムの制作 (in Japanese). Tokyo: (株)ラッセル社. p. 16. ISBN 494762700X.
  14. ^ Budiansky, Stephen (December 1998). "Lost in Translation". Atlantic Magazine. pp. 81–84.
  15. Schank, Roger C. (2014). Conceptual Information Processing. New York: Elsevier. p. 5. ISBN 9781483258799.
  16. Farwell, David; Gerber, Laurie; Hovy, Eduard (29 June 2003). Machine Translation and the Information Soup: Third Conference of the Association for Machine Translation in the Americas, AMTA'98, Langhorne, PA, USA, October 28–31, 1998 Proceedings. Berlin: Springer. p. 276. ISBN 3540652590.
  17. Barron, Brenda (18 November 2019). "Babel Fish: What Happened To The Original Translation Application?: We Investigate". Digital.com. Archived from the original on 20 November 2019. Retrieved 22 November 2019.
  18. and gave other examples too
  19. Chan, Sin-Wai (2015). Routledge Encyclopedia of Translation Technology. Oxon: Routledge. p. 385. ISBN 9780415524841.
  20. "Google Translator: The Universal Language". Blog.outer-court.com. 25 January 2007. Archived from the original on 20 November 2008. Retrieved 12 June 2012.
  21. Schwartz, Lane (2008). Multi-Source Translation Methods (PDF). Paper presented at the 8th Biennial Conference of the Association for Machine Translation in the Americas. Archived (PDF) from the original on 29 June 2016. Retrieved 3 November 2017.
  22. Cohn, Trevor; Lapata, Mirella (2007). Machine Translation by Triangulation: Making Effective Use of Multi-Parallel Corpora (PDF). Paper presented at the 45th Annual Meeting of the Association for Computational Linguistics, June 23–30, 2007, Prague, Czech Republic. Archived (PDF) from the original on 10 October 2015. Retrieved 3 February 2015.
  23. Nakov, Preslav; Ng, Hwee Tou (2012). "Improving Statistical Machine Translation for a Resource-Poor Language Using Related Resource-Rich Languages". Journal of Artificial Intelligence Research. 44: 179–222. arXiv:1401.6876. doi:10.1613/jair.3540.
  24. Antonio Toral, Sheila Castilho, Ke Hu, and Andy Way. 2018. Attaining the unattainable? reassessing claims of human parity in neural machine translation. CoRR, abs/1808.10432.
  25. Yvette, Graham; Barry, Haddow; Koehn, Philipp (2019). "Translationese in Machine Translation Evaluation". arXiv:1906.09833 .
  26. Katsnelson, Alla (29 August 2022). "Poor English skills? New AIs help researchers to write better". Nature. 609 (7925): 208–209. Bibcode:2022Natur.609..208K. doi:10.1038/d41586-022-02767-9. PMID 36038730. S2CID 251931306.
  27. Korab, Petr (18 February 2022). "DeepL: An Exceptionally Magnificent Language Translator". Medium. Retrieved 9 January 2023.
  28. "DeepL outperforms Google Translate – DW – 12/05/2018". Deutsche Welle. Retrieved 9 January 2023.
  29. Hendy, Amr; Abdelrehim, Mohamed; Sharaf, Amr; Raunak, Vikas; Gabr, Mohamed; Matsushita, Hitokazu; Kim, Young Jin; Afify, Mohamed; Awadalla, Hany (18 February 2023). "How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation". arXiv:2302.09210 .
  30. Fadelli, Ingrid. "Study assesses the quality of AI literary translations by comparing them with human translations". techxplore.com. Retrieved 18 December 2022.
  31. ^ Thai, Katherine; Karpinska, Marzena; Krishna, Kalpesh; Ray, Bill; Inghilleri, Moira; Wieting, John; Iyyer, Mohit (25 October 2022). "Exploring Document-Level Literary Machine Translation with Parallel Paragraphs from World Literature". arXiv:2210.14250 .
  32. Kocmi, Tom; Avramidis, Eleftherios; Bawden, Rachel; Bojar, Ondřej; Dvorkovich, Anton; Federmann, Christian; Fishel, Mark; Freitag, Markus; Gowda, Thamme; Grundkiewicz, Roman; Haddow, Barry; Koehn, Philipp; Marie, Benjamin; Monz, Christof; Morishita, Makoto (2023). Koehn, Philipp; Haddow, Barry; Kocmi, Tom; Monz, Christof (eds.). Findings of the 2023 Conference on Machine Translation (WMT23): LLMs Are Here but Not Quite There Yet. Proceedings of the Eighth Conference on Machine Translation. Singapore: Association for Computational Linguistics. pp. 1–42. doi:10.18653/v1/2023.wmt-1.1.
  33. Milestones in machine translation – No.6: Bar-Hillel and the nonfeasibility of FAHQT Archived 12 March 2007 at the Wayback Machine by John Hutchins
  34. Bar-Hillel (1960), "Automatic Translation of Languages". Available online at http://www.mt-archive.info/Bar-Hillel-1960.pdf Archived 28 September 2011 at the Wayback Machine
  35. Hybrid approaches to machine translation. Costa-jussà, Marta R., Rapp, Reinhard, Lambert, Patrik, Eberle, Kurt, Banchs, Rafael E., Babych, Bogdan. Switzerland. 21 July 2016. ISBN 9783319213101. OCLC 953581497.{{cite book}}: CS1 maint: location missing publisher (link) CS1 maint: others (link)
  36. Claude Piron, Le défi des langues (The Language Challenge), Paris, L'Harmattan, 1994.
  37. Babych, Bogdan; Hartley, Anthony (2003). Improving Machine Translation Quality with Automatic Named Entity Recognition (PDF). Paper presented at the 7th International EAMT Workshop on MT and Other Language Technology Tools... Archived from the original (PDF) on 14 May 2006. Retrieved 4 November 2013.
  38. Hermajakob, U., Knight, K., & Hal, D. (2008). Name Translation in Statistical Machine Translation Learning When to Transliterate Archived 4 January 2018 at the Wayback Machine. Association for Computational Linguistics. 389–397.
  39. ^ Neeraj Agrawal; Ankush Singla. Using Named Entity Recognition to improve Machine Translation (PDF). Archived (PDF) from the original on 21 May 2013. Retrieved 4 November 2013.
  40. Melby, Alan. The Possibility of Language (Amsterdam:Benjamins, 1995, 27–41). Benjamins.com. 1995. ISBN 9789027216144. Archived from the original on 25 May 2011. Retrieved 12 June 2012.
  41. Wooten, Adam (14 February 2006). "A Simple Model Outlining Translation Technology". T&I Business. Archived from the original on 16 July 2012. Retrieved 12 June 2012.
  42. "Appendix III of 'The present status of automatic translation of languages', Advances in Computers, vol.1 (1960), p.158-163. Reprinted in Y.Bar-Hillel: Language and information (Reading, Mass.: Addison-Wesley, 1964), p.174-179" (PDF). Archived from the original (PDF) on 28 September 2018. Retrieved 12 June 2012.
  43. "Human quality machine translation solution by Ta with you" (in Spanish). Tauyou.com. 15 April 2009. Archived from the original on 22 September 2009. Retrieved 12 June 2012.
  44. "Google Translate Adds 20 Languages To Augmented Reality App". Popular Science. 30 July 2015. Retrieved 9 January 2023.
  45. Whitney, Lance. "Google Translate app update said to make speech-to-text even easier". CNET. Retrieved 9 January 2023.
  46. "Machine Translation Service". 5 August 2011. Archived from the original on 8 September 2013. Retrieved 13 September 2013.
  47. Wilson, Kyle (8 May 2019). "Misplaced Pages has a Google Translate problem". The Verge. Retrieved 9 January 2023.
  48. "Misplaced Pages taps Google to help editors translate articles". VentureBeat. 9 January 2019. Retrieved 9 January 2023.
  49. "Content translation tool helps create over half a million Misplaced Pages articles". Wikimedia Foundation. 23 September 2019. Retrieved 10 January 2023.
  50. Magazine, Undark (12 August 2021). "Misplaced Pages Has a Language Problem. Here's How To Fix It". Undark Magazine. Retrieved 9 January 2023.
  51. "List of Wikipedias - Meta". meta.wikimedia.org. Retrieved 9 January 2023.
  52. Gallafent, Alex (26 April 2011). "Machine Translation for the Military". PRI's the World. Archived from the original on 9 May 2013. Retrieved 17 September 2013.
  53. Jackson, William (9 September 2003). "GCN – Air force wants to build a universal translator". Gcn.com. Archived from the original on 16 June 2011. Retrieved 12 June 2012.
  54. Young-sil, Yoon (26 June 2023). "Korean Games Growing in Popularity in Tough Japanese Game Market". BusinessKorea. Retrieved 8 August 2023.
  55. Automatic Language Processing Advisory Committee, Division of Behavioral Sciences, National Academy of Sciences, National Research Council (1966). Language and Machines: Computers in Translation and Linguistics (PDF) (Report). Washington, D. C.: National Research Council, National Academy of Sciences. Archived (PDF) from the original on 21 October 2013. Retrieved 21 October 2013.{{cite report}}: CS1 maint: multiple names: authors list (link)
  56. Randhawa, Gurdeeshpal; Ferreyra, Mariella; Ahmed, Rukhsana; Ezzat, Omar; Pottie, Kevin (April 2013). "Using machine translation in clinical practice". Canadian Family Physician. 59 (4): 382–383. PMC 3625087. PMID 23585608. Archived from the original on 4 May 2013. Retrieved 21 October 2013.
  57. ^ Vieira, Lucas Nunes; O’Hagan, Minako; O’Sullivan, Carol (18 August 2021). "Understanding the societal impacts of machine translation: a critical review of the literature on medical and legal use cases". Information, Communication & Society. 24 (11): 1515–1532. doi:10.1080/1369118X.2020.1776370. hdl:1983/29727bd1-a1ae-4600-9e8e-018f11ec75fb. ISSN 1369-118X. S2CID 225694304.
  58. Khoong, Elaine C.; Steinbrook, Eric; Brown, Cortlyn; Fernandez, Alicia (1 April 2019). "Assessing the Use of Google Translate for Spanish and Chinese Translations of Emergency Department Discharge Instructions". JAMA Internal Medicine. 179 (4): 580–582. doi:10.1001/jamainternmed.2018.7653. ISSN 2168-6106. PMC 6450297. PMID 30801626.
  59. Piccoli, Vanessa (5 July 2022). "Plurilingualism, multimodality and machine translation in medical consultations: A case study". Translation and Interpreting Studies. 17 (1): 42–65. doi:10.1075/tis.21012.pic. ISSN 1932-2798. S2CID 246780731.
  60. Herrera-Espejel, Paula Sofia; Rach, Stefan (20 November 2023). "The Use of Machine Translation for Outreach and Health Communication in Epidemiology and Public Health: Scoping Review". JMIR Public Health and Surveillance. 9: e50814. doi:10.2196/50814. ISSN 2369-2960. PMC 10696499. PMID 37983078.
  61. ^ legalj (2 January 2023). "Man v. Machine: Social and Legal Implications of Machine Translation". Princeton Legal Journal. Retrieved 4 December 2023.
  62. Chavez, Edward L. (2008). "New Mexico's Success with Non-English Speaking Jurors". Journal of Court Innovation. 1: 303.
  63. Gutherz, Gai; Gordin, Shai; Sáenz, Luis; Levy, Omer; Berant, Jonathan (2 May 2023). Kearns, Michael (ed.). "Translating Akkadian to English with neural machine translation". PNAS Nexus. 2 (5): pgad096. doi:10.1093/pnasnexus/pgad096. ISSN 2752-6542. PMC 10153418. PMID 37143863.
  64. ^ Way, Andy; Nano Gough (20 September 2005). "Comparing Example-Based and Statistical Machine Translation". Natural Language Engineering. 11 (3): 295–309. doi:10.1017/S1351324905003888 (inactive 1 November 2024). S2CID 3242163.{{cite journal}}: CS1 maint: DOI inactive as of November 2024 (link)
  65. Muegge (2006), "Fully Automatic High Quality Machine Translation of Restricted Text: A Case Study Archived 17 October 2011 at the Wayback Machine," in Translating and the computer 28. Proceedings of the twenty-eighth international conference on translating and the computer, 16–17 November 2006, London, London: Aslib. ISBN 978-0-85142-483-5.
  66. "Comparison of MT systems by human evaluation, May 2008". Morphologic.hu. Archived from the original on 19 April 2012. Retrieved 12 June 2012.
  67. Anderson, D.D. (1995). Machine translation as a tool in second language learning Archived 4 January 2018 at the Wayback Machine. CALICO Journal. 13(1). 68–96.
  68. Han et al. (2012), "LEPOR: A Robust Evaluation Metric for Machine Translation with Augmented Factors Archived 4 January 2018 at the Wayback Machine," in Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012): Posters, pages 441–450, Mumbai, India.
  69. J.M. Cohen observes (p.14): "Scientific translation is the aim of an age that would reduce all activities to techniques. It is impossible however to imagine a literary-translation machine less complex than the human brain itself, with all its knowledge, reading, and discrimination."
  70. See the annually performed NIST tests since 2001 Archived 22 March 2009 at the Wayback Machine and Bilingual Evaluation Understudy
  71. Abadi, Mark. "4 times Google Translate totally dropped the ball". Business Insider.
  72. "回数を重ねるほど狂っていく Google翻訳で「えぐ」を英訳すると奇妙な世界に迷い込むと話題に". ねとらぼ.
  73. "えぐ". 12 April 2017 – via www.youtube.com.
  74. ^ Zhao, L., Kipper, K., Schuler, W., Vogler, C., & Palmer, M. (2000). A Machine Translation System from English to American Sign Language Archived 20 July 2018 at the Wayback Machine. Lecture Notes in Computer Science, 1934: 54–67.
  75. "Machine Translation: No Copyright On The Result?". SEO Translator, citing Zimbabwe Independent. Archived from the original on 29 November 2012. Retrieved 24 November 2012.

Further reading

External links

Natural language processing
General terms
Text analysis
Text segmentation
Automatic summarization
Machine translation
Distributional semantics models
Language resources,
datasets and corpora
Types and
standards
Data
Automatic identification
and data capture
Topic model
Computer-assisted
reviewing
Natural language
user interface
Related
Approaches to machine translation
Emerging technologies
Fields
Information and
communications
Topics
Categories: