Sometimes when working with Japanese documents you may need to remove furigana (sometimes called ‘ruby’) from a block of text. For example, you may be taking furigana-marked source text from Aozora Bunko, but don’t want to include the furigana in some other media. I had this exact problem when working on an E-book of translated Japanese fairy tales.
A Google search for this shows a variety of suggestions, but I couldn’t easily find anything that gave me an end-to-end process. The closest I found was this post that gave a big piece of the puzzle. Here I will be describing a process that leverages that, plus some additional details and explanation.
If you want a sample text to try this process on you can use Rashomon on Aozora bunko here.
One thing you should know when doing this type of work is that there are different types of encoding, and each text editor only supports a subset of those. So when you cut-and-paste between one program and another, things can get converted and pieces of information lost. For example, if we simply cut and paste text from Aozora Bunko directly into Sublime Text 2 (a simple text editor that lacks furigana support), you will see something like this:
Notice the parts I have put in bold (ex: げにん). Those are the furigana that have been separated out from the words and put after them. While you can manually edit this and remove the furigana, that would be pretty tedious for a long document that frequently uses furigana.
So here is the process to remove furigana:
- Copy and paste the text from the original source (ex: Aozora Bunko website) into Microsoft Word. This will retain the furigana in a format similar to how it appeared in your web browser. (Note: I have seen problems when using Safari on Mac to perform this step, and recommend using Chrome.)
- Now copy and paste this text from Word into Sublime Text 2 (this may work with other text editors, so if you don’t want to download Sublime Text you can try with another simple text editor). This will give you something like: ある日の暮方の事である。一人の下人 (げにん)が、羅生門(らしょうもん)の下で雨やみを待っていた。Notice how the furigana is now included after the words, but in parenthesis. You may see the furigana actually shifted so it doesn’t match up with the right word, but that won’t prevent this technique from working (you’ll see why in a moment)
- Now copy and paste back into Word. You should now see the parenthesis-enclosed furigana (similar to what you saw in Sublime Text, but different than how you originally saw it in Word).
- The next steps may vary depending on your platform, but I will give the steps for Mac Microsoft Word 2011. In Word, go to Edit->Find->Advanced Find.
- Click on the “Replace” button near the top of the window that has opened.
- In the first field (“Find what”) put “\([!)]@\)” (without the double quotes)
- Leave the second field (“Replace with”) blank.
- Click on the little “v” icon on the bottom left part of the window to reveal additional options.
- Select “use wildcards”
- Click on “Replace All”.
- Now the text should have all the furigana removed, for example:
You can now copy this text to any other location you want and the furigana will not reappear.
Please be aware that this process can remove certain types of formatting (italics, bold, etc.) so if the source text has any of those you may have to put them back in manually. Documents with parenthesis (not related to furigana) might also give you problems since they would be erased as part of this process. For texts on Aozora Bunko I generally haven’t had that problem, however.
The version of Word I am using does have a specific menu for adjusting furigana (Format->Asian Layout->Phoenetic Guide). Here you can change the furigana content, font, and even remove the furigana. However, this menu doesn’t allow you to work across the entire document, and trying to remove furigana after selecting a block of text only removes furigana for a single word.
By the way, if you don’t have Word you can use some other text editor that supports search and replace with regular expressions.