Japanese OCR (Optical Character Recognition) Software

In college, my Japanese wasn't quite up to par, and I had to read several legal articles for my thesis. Since there were so many kanji I didn't know, I used OCR (Optical Character Recognition) software to digitize the articles, and then read them using a combination of rikaichan and other computer-based Japanese dictionaries.

OCR software converts printed text you scan into digital text that you can read in Microsoft Word, Firefox, etc. For Japanese, it works decently. It is certainly not perfect, and you will have to look up more complicated, rare kanji on your own, but if you have some short articles and little Japanese ability it can save you a lot of time.

That said, the act of scanning the text is quite time consuming. One must make sure to line up the pages properly, and even a little mistake like forgetting to scan one page can cost you even more time later to go back and set up everything again.

For this reason, I wouldn't recommend scanning a whole book under most circumstances. In the long run, if you want to read Japanese books, simply biting the bullet and studying the kanji will help you more.

But, if you're in a situation like I was, OCR software can help. I tried many different programs, but the only two that gave me any results were OmniPage and ReadIris. Both of these are standard OCR programs, and they compete in features in a variety of ways. However, from my own experience, as far as Japanese is concerned, Omnipage did a significantly better job correctly recognizing the kanji. I often had to rescan the pages with Readiris, and even then the output from Omnipage was more accurate.

In theory, ruby text printed on the page is added in parentheses after the word; in my experience, this was very hit and miss. But, most adult books don't have a lot of ruby, so it's not likely to be too much of a problem; I was much more aggravated by incorrect kanji recognition. Another annoyance is that sometimes it mis-recognizes the size of the kanji; for example, あ is transcribed as ぁ. I found it curious that although the software did decent kanji recognition, it couldn't consistently get the size of the kanji correct.

Of course, OCR software is quite expensive. For your own personal use, it's probably not worth the money. In my case, I was able to get my college to pay for it, so it wasn't such a problem; if you can, I recommend you do the same.