Product Research Enterprise Plans Docs

Pecab

Latest version: v1.0.7

Safety actively analyzes 623165 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

1.0.7

Improve `drop_space`.

1.0.4

- Fix emoji related bug https://github.com/hyunwoongko/pecab/issues/4

1.0.3

Apply LRU cache to `_tokenize` method to reduce elapse time for same inputs

1.0.2

- fix requirements relate bug https://github.com/hyunwoongko/pecab/issues/1
- modify comments about emoji module https://github.com/hyunwoongko/pecab/issues/2

1.0.0

Pecab

Pecab is pure python Korean morpheme analyzer based on [Mecab](https://github.com/taku910/mecab). Mecab is a CRF-based morpheme analyzer made by Taku Kudo at 2011. It is very fast and accurate at the same time, which is why it is still very popular even though it is quite old. However, it is known to be one of the most tricky libraries to install, and in fact many people have had a hard time installing Mecab.

So, since a few years ago, I wanted to make a pure python version of Mecab that was easy to install while inheriting the advantages of Mecab.
Now, Pecab came out. This ensures results very similar to Mecab and at the same time easy to install. For more details, please refer the following.

Installation
console
pip install pecab

Usages
The user API of Pecab is inspired by [KoNLPy](https://github.com/konlpy/konlpy),
a one of the most famous natural language processing package in South Korea.

1) `PeCab()`: creating Pecab object.
python
from pecab import PeCab

pecab = PeCab()

2) `morphs(text)`: splits text into morphemes.
python
pecab.morphs("아버지가방에들어가시다")
['아버지', '가', '방', '에', '들어가', '시', '다']

3) `pos(text)`: returns morphemes and POS tags together.
python
pecab.pos("이것은 문장입니다.")
[('이것', 'NP'), ('은', 'JX'), ('문장', 'NNG'), ('입니다', 'VCP+EF'), ('.', 'SF')]

4) `nouns(text)`: returns all nouns in the input text.
python
pecab.nouns("자장면을 먹을까? 짬뽕을 먹을까? 그것이 고민이로다.")
["자장면", "짬뽕", "그것", "고민"]

5) `Pecab(user_dict=List[str])`: Set up a user dictionary.
Note that words included in the user dictionary **cannot contain spaces**.
- Without `user_dict`
python
from pecab import PeCab

pecab = PeCab()
pecab.pos("저는 삼성디지털프라자에서 지펠냉장고를 샀어요.")
[('저', 'NP'), ('는', 'JX'), ('삼성', 'NNP'), ('디지털', 'NNP'), ('프라자', 'NNP'), ('에서', 'JKB'), ('지', 'NNP'), ('펠', 'NNP'), ('냉장고', 'NNG'), ('를', 'JKO'), ('샀', 'VV+EP'), ('어요', 'EF'), ('.', 'SF')]

- With `user_dict`
python
from pecab import PeCab

user_dict = ["삼성디지털프라자", "지펠냉장고"]
pecab = PeCab(user_dict=user_dict)
pecab.pos("저는 삼성디지털프라자에서 지펠냉장고를 샀어요.")
[('저', 'NP'), ('는', 'JX'), ('삼성디지털프라자', 'NNG'), ('에서', 'JKB'), ('지펠냉장고', 'NNG'), ('를', 'JKO'), ('샀', 'VV+EP'), ('어요', 'EF'), ('.', 'SF')]

6) `PeCab(split_compound=bool)`: Divide compound words into smaller pieces.
python
from pecab import PeCab

pecab = PeCab(split_compound=True)
pecab.morphs("가벼운 냉장고를 샀어요.")
['가볍', 'ᆫ', '냉장', '고', '를', '사', 'ㅏㅆ', '어요', '.']

7) `ANY_PECAB_FUNCTION(text, drop_space=bool)`: Determines whether spaces are returned or not.
This can be used for all of `morphs`, `pos`, `nouns`. default value of this is `True`.
python
from pecab import PeCab

pecab = PeCab()
pecab.pos("토끼정에서 크림 우동을 시켰어요.")
[('토끼', 'NNG'), ('정', 'NNG'), ('에서', 'JKB'), ('크림', 'NNG'), ('우동', 'NNG'), ('을', 'JKO'), ('시켰', 'VV+EP'), ('어요', 'EF'), ('.', 'SF')]

pecab.pos("토끼정에서 크림 우동을 시켰어요.", drop_space=False)
[('토끼', 'NNG'), ('정', 'NNG'), ('에서', 'JKB'), (' ', 'SP'), ('크림', 'NNG'), (' ', 'SP'), ('우동', 'NNG'), ('을', 'JKO'), (' ', 'SP'), ('시켰', 'VV+EP'), ('어요', 'EF'), ('.', 'SF')]

Releases

Has known vulnerabilities