Module: CppjiebaRb
- Defined in:
- lib/cppjieba_rb.rb,
lib/cppjieba_rb/segment.rb,
lib/cppjieba_rb/version.rb,
ext/cppjieba_rb/cppjieba_rb.c
Overview
CppjiebaRb segments a Chinese sentence into words.
Available segmentation methods include HMM, MP, and mix mode. Dictionaries takes a strong part in CppjiebaRb’s accuracy. Read more github.com/yanyiwu/cppjieba
Defined Under Namespace
Classes: Segment
Constant Summary collapse
- EXT_BASE =
File.join(File.dirname(__FILE__), '..', 'ext', 'cppjieba', 'dict')
- DICT_PATH =
File.join(EXT_BASE, 'jieba.dict.utf8')
- HMM_DICT_PATH =
File.join(EXT_BASE, 'hmm_model.utf8')
- USER_DICT =
File.join(EXT_BASE, 'user.dict.utf8')
- IDF_PATH =
File.join(EXT_BASE, 'idf.utf8')
- STOP_WORD_PATH =
File.join(EXT_BASE, 'stop_words.utf8')
- VERSION =
'0.4.4'
Class Method Summary collapse
- .extract_keyword(str, top_n) ⇒ Object
- .filter_stop_word(arr) ⇒ Object
- .internal ⇒ Object
- .segment(str, opts = nil) ⇒ Object
- .segment_tag(str) ⇒ Object
Class Method Details
.extract_keyword(str, top_n) ⇒ Object
20 21 22 |
# File 'lib/cppjieba_rb.rb', line 20 def self.extract_keyword(str, top_n) internal.extract_keyword(str, top_n) end |
.filter_stop_word(arr) ⇒ Object
32 33 34 |
# File 'lib/cppjieba_rb.rb', line 32 def self.filter_stop_word(arr) arr.reject { |w| internal.stop_word?(w) } end |
.internal ⇒ Object
37 38 39 40 41 42 43 |
# File 'lib/cppjieba_rb.rb', line 37 def internal @internal ||= CppjiebaRb::Internal.new(DICT_PATH, HMM_DICT_PATH, USER_DICT, IDF_PATH, STOP_WORD_PATH) end |
.segment(str, opts = nil) ⇒ Object
24 25 26 |
# File 'lib/cppjieba_rb.rb', line 24 def self.segment(str, opts = nil) CppjiebaRb::Segment.new(opts).segment(str) end |
.segment_tag(str) ⇒ Object
28 29 30 |
# File 'lib/cppjieba_rb.rb', line 28 def self.segment_tag(str) internal.segment_tag(str) end |