Model Files
We provide multiple model files for Vaporetto that you can download and use in your work.
These models have been trained using [BCCWJ](https://ccd.ninjal.ac.jp/bccwj/) and [UniDic](https://ccd.ninjal.ac.jp/unidic/).
All of these models are trained with L1-regularization.
See below for license terms of each model.
(NOTE) Some of BCCWJ are not included in training data due to rights reasons.
Models With Dictionary
We provide two models containing UniDic. These models have the highest accuracy in our distributions.
* `bccwj-suw+unidic+tag.model.zst`: contains a tag prediction model
* `bccwj-suw+unidic.model.zst`: does not contain a tag prediction model
Models Without Dictionary
We also provide models that do not contain UniDic.
These models have been trained over three model sizes and two word units.
| | Short unit words (SUW) | Long unit words (LUW) |
| ---- | ---- | ---- |
| Small (C=0.1) | `bccwj-suw-small.model.zst` | `bccwj-luw-small.model.zst` |
| Middle (C=0.5) | `bccwj-suw-middle.model.zst` | `bccwj-luw-middle.model.zst` |
| Large (C=1.0) | `bccwj-suw-large.model.zst` | `bccwj-luw-large.model.zst` |
License
The following models are licensed under [3-Clause BSD License](https://opensource.org/licenses/BSD-3-Clause).
* `bccwj-suw+unidic+tag.model.zst`
* `bccwj-suw+unidic.model.zst`
The following models are licensed under either of [Apache License (Version 2.0)](http://www.apache.org/licenses/LICENSE-2.0) or [MIT License](http://opensource.org/licenses/MIT) at your option.
* `bccwj-suw-small.model.zst`
* `bccwj-suw-middle.model.zst`
* `bccwj-suw-large.model.zst`
* `bccwj-luw-small.model.zst`
* `bccwj-luw-middle.model.zst`
* `bccwj-luw-large.model.zst`