DenseBoW can encode a text in a matrix where the number of columns corresponds to the tokens, and the rows are the decision values of Support Vector Machines trained to identify the token represented in each row.
0.0.5b
DenseBoW can encode a text in a matrix where the number of columns corresponds to the tokens, and the rows are the decision values of Support Vector Machines trained to identify the token represented in each row.
0.0.4
The version includes the subwords model. This is the default in `DialectId`.
0.0.3
It addresses a typo issue in French countries.
0.0.2
It includes the class `DialectId` to identify the dialect (country) from a text, given the language.
0.0.1
The first version of dialectid. It has a Bag of Word (BoW) model where the weights were estimated in 4 million ($2^{22}$) tweets uniformly selected from the Spanish countries.
data The release is to have a place to store the data of the dialectid.