Module: CppjiebaRb

Defined in:
lib/cppjieba_rb.rb,
lib/cppjieba_rb/segment.rb,
lib/cppjieba_rb/version.rb,
ext/cppjieba_rb/cppjieba_rb.c

Overview

CppjiebaRb segments a Chinese sentence into words.

Available segmentation methods include HMM, MP, and mix mode. Dictionaries takes a strong part in CppjiebaRb’s accuracy. Read more github.com/yanyiwu/cppjieba

Defined Under Namespace

Classes: Segment

Constant Summary collapse

EXT_BASE =
File.join(File.dirname(__FILE__), '..', 'ext', 'cppjieba', 'dict')
DICT_PATH =
File.join(EXT_BASE, 'jieba.dict.utf8')
HMM_DICT_PATH =
File.join(EXT_BASE, 'hmm_model.utf8')
USER_DICT =
File.join(EXT_BASE, 'user.dict.utf8')
IDF_PATH =
File.join(EXT_BASE, 'idf.utf8')
STOP_WORD_PATH =
File.join(EXT_BASE, 'stop_words.utf8')
VERSION =
'0.4.4'

Class Method Summary collapse

Class Method Details

.extract_keyword(str, top_n) ⇒ Object



20
21
22
# File 'lib/cppjieba_rb.rb', line 20

def self.extract_keyword(str, top_n)
  internal.extract_keyword(str, top_n)
end

.filter_stop_word(arr) ⇒ Object



32
33
34
# File 'lib/cppjieba_rb.rb', line 32

def self.filter_stop_word(arr)
  arr.reject { |w| internal.stop_word?(w) }
end

.internalObject



37
38
39
40
41
42
43
# File 'lib/cppjieba_rb.rb', line 37

def internal
  @internal ||= CppjiebaRb::Internal.new(DICT_PATH,
                                         HMM_DICT_PATH,
                                         USER_DICT,
                                         IDF_PATH,
                                         STOP_WORD_PATH)
end

.segment(str, opts = nil) ⇒ Object



24
25
26
# File 'lib/cppjieba_rb.rb', line 24

def self.segment(str, opts = nil)
  CppjiebaRb::Segment.new(opts).segment(str)
end

.segment_tag(str) ⇒ Object



28
29
30
# File 'lib/cppjieba_rb.rb', line 28

def self.segment_tag(str)
  internal.segment_tag(str)
end