'token': 11893,
'token_str': 'อ่านหนังสือ',
'sequence': 'ผมชอบอ่านหนังสือมากๆ'},
...]
Preprocess
If you want to preprocessing data before training model, you can use preprocess.
> from thaixtransformers.preprocess import process_transformers
> process_transformers(str) -> str
**Example**
python
from thaixtransformers.preprocess import process_transformers
print(process_transformers("สวัสดี :D"))
output: 'สวัสดี<_>:d'
BibTeX entry and citation info
misc{lowphansirikul2021wangchanberta,
title={WangchanBERTa: Pretraining transformer-based Thai Language Models},
author={Lalita Lowphansirikul and Charin Polpanumas and Nawat Jantrakulchai and Sarana Nutanong},
year={2021},
eprint={2101.09635},
archivePrefix={arXiv},
primaryClass={cs.CL}
}