Hanlp

Latest version: v2.1.0

Safety actively analyzes 693883 Python packages for vulnerabilities to keep your Python projects secure.

Page 4 of 11

1.7.3

- 感知机词法分析器默认使用[98年人民日报6个月的大模型](https://github.com/hankcs/HanLP/issues/1143#issuecomment-480515589)
- 优化DoubleArrayTrie fix https://github.com/hankcs/HanLP/issues/1136
- CRFNERecognizer支持在构造时传入自定义命名实体标签，新增addNERLabels方法 zhangruinan
- 防止ViterbiSegment.dat不必要的初始化
- 修复词法分析器对动态插入的词条的处理 fix https://github.com/hankcs/HanLP/issues/271#issuecomment-479719965
- 词法分析器seg接口支持自定义词性覆盖统计词性 fix https://github.com/hankcs/HanLP/issues/1156
- 修订拼音
- 新数据包[data-for-1.7.3.zip](https://s3-us-west-2.amazonaws.com/elitcloud-public-data/models/elit/data/data-for-1.7.3.zip) 或[网盘](https://pan.baidu.com/s/1Knb9gpjHTTah3Rp7zyQOTw)`md5=4e4f3695565a75b56427ba4a40731949`
- Portable版同步升级到v1.7.3

<dependency>
<groupId>com.hankcs</groupId>
<artifactId>hanlp</artifactId>
<version>portable-1.7.3</version>
</dependency>

:tada:感谢所有在issue中提出宝贵建议的用户！

1.7.2

- 新增[基于ArcEager转移系统的柱搜索依存句法分析器](https://github.com/hankcs/HanLP/blob/master/src/test/java/com/hankcs/demo/DemoDependencyParser.java#L34)，废弃MaxEntDependencyParser
- 调整繁體分詞策略 fix https://github.com/hankcs/HanLP/issues/1059
- 修正卡方检验整型溢出的问题，准确率提升（95.47->96.08） fix https://github.com/hankcs/HanLP/issues/1075
- 使LexicalAnalyzer支持TranslatedPersonRecognition和JapanesePersonRecognition fix https://github.com/hankcs/HanLP/issues/1080
- 提示在线学习不可能学习新的标签
- tokenizer的seg2sentence修改为static
- 词法分析器默认关闭规则系统
- 修正CustomDictionary.reload(); fix https://github.com/hankcs/HanLP/issues/1100
- unigram、bigram微调
- 新数据包[data-for-1.7.2.zip](http://file.hankcs.com/hanlp/data-for-1.7.2.zip) 或[网盘](https://pan.baidu.com/s/1Z2wtBrHmWwfBnNH6ShbT1g)`md5=2228732bae47b8dc8e410678af72847f`
- Portable版同步升级到v1.7.2

<dependency>
<groupId>com.hankcs</groupId>
<artifactId>hanlp</artifactId>
<version>portable-1.7.2</version>
</dependency>

:tada:感谢所有在issue中提出宝贵建议的用户！

1.7.1

- 新增可自定义用户词典的维特比分词器 AnyListen
- 利用BufferedOutputStream加速缓存生成，快37倍
- 自定义词典兼容含有空格的路径 fix https://github.com/hankcs/HanLP/issues/1025
- 增加isCustomNature方法
- 使热更新产生的缓存文件包含用户词性 fix https://github.com/hankcs/HanLP/issues/1028
- 修复可变DAT的entrySet方法 fix https://github.com/hankcs/HanLP/issues/1038
- 微调ngram，简繁等
- 新数据包[data-for-1.7.1.zip](http://hanlp.linrunsoft.com/release/data-for-1.7.1.zip) `MD5 = 9b8faa7fc7fddb24e27da27bd404126d`
- Portable版同步升级到v1.7.1

<dependency>
<groupId>com.hankcs</groupId>
<artifactId>hanlp</artifactId>
<version>portable-1.7.1</version>
</dependency>

感谢所有在issue中提出宝贵建议的用户！

1.7.0

- :triangular_flag_on_post:[新增文本聚类模块（k-means和repeated bisection）](https://github.com/hankcs/HanLP/wiki/%E6%96%87%E6%9C%AC%E8%81%9A%E7%B1%BB)
- :triangular_flag_on_post:[词法分析器新增流水线模式](https://github.com/hankcs/HanLP/blob/master/src/test/java/com/hankcs/demo/DemoPipeline.java#L24)
- 词法分析器加入规则 `enableRuleBasedSegment` https://github.com/hankcs/HanLP/issues/991
- 支持通过JVM的启动参数指定data路径：`java -DHANLP_ROOT=/opt/hanlp` 则加载`/opt/hanlp/data` https://github.com/hankcs/HanLP/issues/983
- 分词断句支持指定断句颗粒 https://github.com/hankcs/HanLP/issues/1018
- `CustomDictionary.insert("新词语", "词性标签") `支持省略频次
- `NeuralNetworkDependencyParser`构造函数接受`Segment`
- `TextRankKeyword`支持构造自任意分词器
- 优化双数组trie树，构建后自动shrink到最低内存 https://github.com/hankcs/HanLP/issues/984
- 修订简繁词典
- 微调ngram和nr模型
- 新数据包[data-for-1.7.0.zip](http://hanlp.linrunsoft.com/release/data-for-1.7.0.zip) `MD5 = 4c396f3039230ddfcef20865264512b1`
- Portable版同步升级到v1.7.0

<dependency>
<groupId>com.hankcs</groupId>
<artifactId>hanlp</artifactId>
<version>portable-1.7.0</version>
</dependency>

:tada:节日快乐！感谢所有在issue中提出宝贵建议的用户！

1.6.8

- 新模型训练自一亿字的大型综合语料库，是目前**全世界最大**的中文分词语料库。语料规模决定实际效果，希望如此大规模的语料库能够引起大家对语料库建设工作的重视。欢迎使用`NLPTokenizer.analyze`接口或`PerceptronLexicalAnalyzer`体验这一改进。
- 修复“改进人名UV拆分”造成的问题 fix https://github.com/hankcs/HanLP/issues/932
- 文本分类的卡方检测失败时不过滤特征 fix https://github.com/hankcs/HanLP/issues/920
- 废弃`HMMSegment`
- 修订简繁词典
- 新数据包[data-for-1.6.8.zip](http://hanlp.linrunsoft.com/release/data-for-1.6.8.zip) `md5=0eae09571f080bd99b81f79bee6c6b62`
- Portable版同步升级到v1.6.8

<dependency>
<groupId>com.hankcs</groupId>
<artifactId>hanlp</artifactId>
<version>portable-1.6.8</version>
</dependency>

:tada:感谢所有在issue中提出宝贵建议的用户！

1.6.7

- 默认感知机分词模型训练自 [MSRA Named Entity Corpus](https://github.com/hankcs/OpenCorpus/tree/master/msra-ne)
- 词法分析器在低优先级用户词典模式下合并统计分词结果，高优先级模式则最长匹配
- 词法分析器用户词典覆盖词性标注器的结果:https://github.com/hankcs/HanLP/issues/525
- 改进人名UV拆分 fix https://github.com/hankcs/HanLP/issues/880
- 修复 MaxEntDependencyParser fix https://github.com/hankcs/HanLP/issues/914
- 新增TF和TF-IDF统计与关键词提取工具
- word2vec适配IOAdapter与集群 fix https://github.com/hankcs/HanLP/issues/903
- HanLP.extractWords增加更多参数
- 新增NERTrainer.tagSet成员，方便Python用户
- Sentence新增更多语料操作接口
- LinearModel显示压缩进度
- 微调人名、bigram等模型
- 修订简繁词典，根据国家统计局2016行政区划数据校订地名词典
- 新数据包[data-for-1.6.7.zip](http://hanlp.linrunsoft.com/release/data-for-1.6.7.zip) `md5=4da338b7bcf3939a70b8cc16ed338c45`
- Portable版同步升级到v1.6.7

<dependency>
<groupId>com.hankcs</groupId>
<artifactId>hanlp</artifactId>
<version>portable-1.6.7</version>
</dependency>

:tada:感谢所有在issue中提出宝贵建议的用户！

Page 4 of 11

Releases

Has known vulnerabilities

Previous Next

Hanlp

Page 4 of 11

1.7.3

1.7.2

1.7.1

1.7.0

1.6.8

1.6.7

Page 4 of 11

Links

Releases