Isanlp-rst

Latest version: v3.0.1a5

Safety actively analyzes 685525 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 2

45.4

41.1

37.7

36.8

Usage
For usage example, look into [README](https://github.com/tchewik/isanlp_rst/blob/master/README.md).

Speed up
You can increase the parsing speed about twice if you turn off the prediction of relations between paragraphs. To do this, replace the last value [in this line](https://github.com/tchewik/isanlp_rst/blob/v2.1/src/isanlp_rst/processor_rst.py#L27) with 1.0. In this case, each output tree will correspond to a single paragraph.

27.3

23.9

Requires running Docker containers: ``tchewik/isanlp_udpipe`` (syntax), ``tchewik/isanlp_rst:2.0`` (RST)

Usage in Python:

python

from isanlp import PipelineCommon
from isanlp.processor_razdel import ProcessorRazdel
from isanlp.processor_remote import ProcessorRemote
from isanlp.ru.processor_mystem import ProcessorMystem
from isanlp.ru.converter_mystem_to_ud import ConverterMystemToUd
import razdel

put the address here ->
address_syntax = ('', 3134)
address_rst = ('', 3335)

Highly recommended to pre-tokenize texts
def tokenize(text):
""" Tokenize text, but keep paragraph boundaries. """

while '\n\n' in text:
text = text.replace('\n\n', '\n')
result = []
for paragraph in text.split('\n'):
result.append(' '.join([tok.text for tok in razdel.tokenize(paragraph)]))
return '\n'.join(result).strip()

ppl = PipelineCommon([
(ProcessorRazdel(), ['text'],
{'tokens': 'tokens',
'sentences': 'sentences'}),
(ProcessorRemote(address_syntax[0], address_syntax[1], '0'),
['tokens', 'sentences'],
{'lemma': 'lemma',
'syntax_dep_tree': 'syntax_dep_tree',
'postag': 'ud_postag'}),
(ProcessorMystem(delay_init=False),
['tokens', 'sentences'],
{'postag': 'postag'}),
(ConverterMystemToUd(),
['postag'],
{'morph': 'morph',
'postag': 'postag'}),
(ProcessorRemote(address_rst[0], address_rst[1], 'default'),
['text', 'tokens', 'sentences', 'postag', 'morph', 'lemma', 'syntax_dep_tree'],
{'rst': 'rst'})
])

Page 1 of 2

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.