imedic-tools
This gem provides command-line tools for converting Japanese input dictionary word-list files between common formats.
Features
- ATOK support: Reads and writes ATOK word-list text files
- MS-IME support: Reads and writes modern Microsoft IME word-list files with a Unicode header, UTF-16LE BOM, and CRLF line endings
- Kotoeri output: Generates UTF-8 CSV files for Kotoeri-compatible import workflows
- Comment preservation: Converts comments between ATOK and MS-IME comment syntax
- Multiple input files: Accepts one or more input files and writes a combined dictionary to standard output
Installation
Install the gem from RubyGems:
gem install imedic-tools
The scripts under exe/ can also be used standalone without installing the gem.
Usage
This gem includes three tools:
atok2msime: Converts ATOK word-list files to Microsoft IME formatmsime2atok: Converts Microsoft IME word-list files to ATOK formatatok2kotoeri: Converts ATOK word-list files to Kotoeri CSV format
Usage is common to all tools:
atok2msime < input.atok.txt > output.msime.txt
atok2msime input1.atok.txt input2.atok.txt > output.msime.txt
Each tool accepts one or more input files, or standard input when no file is given, and writes the converted word list to standard output.
ATOK Input Format
Input files are expected to be ATOK word-list text files:
!!ATOK_TANGO_TEXT_HEADER_1
!! Optional comment
よみ 単語 名詞*
The tools read UTF-8 files and UTF-8/UTF-16 files with a BOM.
License
This project is distributed under the 2-clause BSD license.