dropped cwb-python dependency
- should close 35
- `cqp.py` was already included in cwb-ccc
- `cl.pyx` now included too
- this means that C-code has to be compiled during installation
improved tests
- closes 31
- included new UCS reference counts on dedicated test corpus
- included EmpiriST counts for testing keyword functionality
- re-wrote most of the tests on dedicated test corpus
- re-wrote most of the tests so they really assert instead of print
included github actions
- build & test
- dist & publish on PyPI (WIP)
- helps closing 35
improved data_path / cache
- address 30
- data_path for each `__version__`
- data_path for each library: this invalidates the cache when wordlists and macros are updated
- library files must now end on ".txt"
introduced FreqFrames
- FreqFrames are DataFrames with frequency information returned by
- `Counts.cpos()`
- `Counts.dump()`
- `Counts.matches()`
- `Counts.mwus()`
- `Corpus.marginals()`
- `Corpus.marginals_complex()`
- new consistent behaviour:
- format = `[(" ".join(p_att))] freq, p_att[0], p_att[1], ...`
- indexed by a single character named `item`
- frequency column named `freq`
- additional columns with all separate p-attributes
- cf. old behaviour: `(p_att[0], p_att[1], ...) some_column`
- `MultiIndex`
- inconsistently named frequency column
- heuristics for MWUs remains unchanged, i.e. they are " "-joined in index
re-factored Discourseme Constellations
- now with tests!
- two types of constellations (inner vs textual constellations)
- `create_constellation()` wrapper
improved Collocates
- consistent AM scoring (ScoreFrame) for keywords and collocates
- collocation retrieval considerably faster (count once for max window size)
- upgrade of association-measures module gives more (and more stable) AMs
further enhancements
- p-att selection in `dump.breakdown()`
- `corpus.marginals()` can now be called without items (yielding marginal freq of _all_ items)
- `cqpy_dump()`, `cqpy_load()`, `cqpy_dumps()`, `cqpy_loads()`
miscellaneous
- changed anchor correction behaviour: use context/contextend instead of NA when out of bounds
- removed some for loops using list comprehensions
- included some more `__str__` and `__repr__`
- sphinx documentation (WIP, address 7)
- Lint (WIP)
- Docker