- The Exquisite Corpus data has been updated to include Google Books Ngrams
2019, Reddit data through 2019, Wikipedia data from 2020, and Twitter-sampled
data from 2020, and somewhat more reliable language detection.
- Updated dependencies to require recent versions of `regex` and `jieba`,
to get tokenization that's consistent with the word lists. `regex` now
requires a version after 2020.04.04.