This is an old revision of this page, as edited by 199.72.115.66 (talk) at 18:03, 25 October 2007. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
Revision as of 18:03, 25 October 2007 by 199.72.115.66 (talk)(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff) This article contains special characters. Without proper rendering support, you may see question marks, boxes, or other symbols.Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of images of handwritten or typewritten text (usually captured by a scanner) into machine-editable text.
OCR is aRecognition]].
Recognition of cursive text is an active area of research, with recognition rates even lower than that of hand-printed text. Higher rates of recognition of general cursive script will likely not be possible without the use of contextual or grammatical information. For example, recognizing entire the mid 1970s at MIT and other institutions. Successive efforts were made to localize and remove musical staff lines leaving symbols to be recognized and parsed. The first proprietary music-scanning program, MIDISCAN, was released in 1991. Three proprietary products are currently available. At this time, OCR software does not recognize handwritten scores.
Magnetic ink character recognition
One area where accuracy and speed of computer input of character information exceeds that of humans is in the area of magnetic ink character recognition, where the error rates range around one read error for every 20,000 to 30,000 checks.
Optical Character Recognition in Unicode
In Unicode, Optical Character Recognition symbol characters are placed in the hexadecimal range 0x2440–0x245F, as shown below (see also Unicode Symbols):
colspan="4" rowspan="3" Template:CT-2| | Symbol | rowspan="2" Template:CT-3| Name | colspan="4" rowspan="3" Template:CT-4| | ||||||
---|---|---|---|---|---|---|---|---|---|
Hex | |||||||||
colspan="2" Template:CT-2| Symbol's Picture | |||||||||
width="0*" Template:CT-7| ⑀ | rowspan="2" Template:CT-3| OCR Hook | width="0*" Template:CT-7| ⑁ | rowspan="2" Template:CT-3| OCR Chair | width="0*" Template:CT-7| ⑂ | rowspan="2" Template:CT-3| OCR Fork | width="0*" Template:CT-7| ⑃ | rowspan="2" Template:CT-3| OCR Inverted Fork | width="0*" Template:CT-7| ⑄ | rowspan="2" Template:CT-3| OCR Belt Buckle |
0x2440 | 0x2441 | 0x2442 | 0x2443 | 0x2444 | |||||
colspan="2" width="20%" Template:CT-2| File:U+2440.gif | colspan="2" width="20%" Template:CT-2| File:U+2441.gif | colspan="2" width="20%" Template:CT-2| File:U+2442.gif | colspan="2" width="20%" Template:CT-2| File:U+2443.gif | colspan="2" width="20%" Template:CT-2| File:U+2444.gif | |||||
Template:CT-7| ⑅ | rowspan="2" Template:CT-3| OCR Bow Tie | Template:CT-7| ⑆ | rowspan="2" Template:CT-3| OCR Branch Bank Identification | Template:CT-7| ⑇ | rowspan="2" Template:CT-3| OCR Amount Of Check | Template:CT-7| ⑈ | rowspan="2" Template:CT-3| OCR Customer Account Number | Template:CT-7| ⑉ | rowspan="2" Template:CT-3| OCR Dash |
0x2445 | 0x2446 | 0x2447 | 0x2448 | 0x2449 | |||||
colspan="2" Template:CT-2| File:U+2445.gif | colspan="2" Template:CT-2| File:U+2446.gif | colspan="2" Template:CT-2| File:U+2447.gif | colspan="2" Template:CT-2| File:U+2448.gif | colspan="2" Template:CT-2| File:U+2449.gif | |||||
Template:CT-7| ⑊ | rowspan="2" Template:CT-3| OCR Double Backslash | rowspan="2" Template:CT-3| Classified | rowspan="2" Template:CT-3| Not Defined | rowspan="2" Template:CT-3| Not Defined | rowspan="2" Template:CT-3| Not Defined | ||||
0x244A | 0x244B | 0x244C | 0x244D | 0x244E | |||||
colspan="2" Template:CT-3| File:U+244A.gif | colspan="2" Template:CT-3| - | colspan="2" Template:CT-3| - | colspan="2" Template:CT-3| - | colspan="2" Template:CT-3| - |
OCR software
- ABBYY FineReader OCR
- Adobe Acrobat
- GOCR
- Microsoft Office Document Imaging
- NovoDynamics VERUS
- Ocrad
- Ocropus
- OmniPage
- Readiris
- ReadSoft
- SimpleOCR
- SmartScore
- Tesseract (software)
See also
- Automatic number plate recognition
- CAPTCHA
- Computational linguistics
- Computer vision
- Machine learning
- Optical mark recognition
- Raster to vector
- Raymond Kurzweil
- SmartPen - optical character recognition technology system used in clinical trials
- Speech recognition
References
External links
- ICDAR, a comprehensive conference on all aspects of document recognition
- Linux OCR: A review of free optical character recognition software