Golang 中国

Go 语言高效分词, 支持英文、中文、日文等

词典用双数组trie(Double-Array Trie)实现, 分词器算法为基于词频的最短路径加动态规划。

支持普通和搜索引擎两种分词模式,支持用户词典、词性标注,可运行JSON RPC服务。

项目地址: https://github.com/go-ego/gse

package main

import (
    "fmt"

    "github.com/go-ego/gse"
)

func main() {
    var seg gse.Segmenter
    seg.LoadDict("zh,testdata/test_dict.txt,testdata/test_dict1.txt")

    text1 := []byte("你好世界, Hello world")

    segments := seg.Segment(text1)
    fmt.Println(gse.ToString(segments))
}

Lethe River

Add

  • [NEW] Add slice() and string() func and test
  • [NEW] Add more test
  • [NEW] Optimize textSliceToString splicing speed
  • [NEW] Update LoadDict() log.Printf and optimize read dict log
  • [NEW] Add ToString() and ToSlice() default value and update test
  • [NEW] ToString and ToSlice use return not use else and update code
  • [NEW] Update sever code
  • [NEW] Add token equals() func and test
  • [NEW] Add search mode example
  • [NEW] Optimize file defer close
  • [NEW] Segment return use nil not empty array
  • [NEW] Update pkg to newest ( optimize cedar code )
    <br/>

  • [NEW] Update and refactoring segment test code

  • [NEW] Update dictionary and static demo
  • [NEW] Refactoring gse benchmark code
  • [NEW] Update and simplify test code

Update

  • [NEW] Update issue template more obvious
  • [NEW] Update godoc, pull_request_template.md and issue_template.md
  • [NEW] Update README.md Uniform name
  • [NEW] Update godoc
  • [NEW] Update Update README.md add searchMode docs
  • [NEW] Optimize Japanese subparticipation errors
  • [NEW] Update code style and name style
  • [NEW] Update examples and benchmark code
  • [NEW] Add Travis ci go1.11 support

Fix

  • [FIX] Update examples lang fix #4
  • [FIX] Fix typo for example
  • [FIX] Fix LoadDict() godoc error
  • [FIX] Fix sub-word error
  • [FIX] Fix dict is nil segmentWords panic nil pointer
  • [FIX] Update README.md Fixed Release badge

See Commits for more details, after Apr 27.

0 回复
需要 登录 后方可回复, 如果你还没有账号你可以 注册 一个帐号。