library_name: transformers | |
license: mit | |
datasets: | |
- JeanKaddour/minipile | |
language: | |
- en | |
# BEE-spoke-data/MiniTokenizer-20480 | |
This is a `ByteLevelBPETokenizer` trained on the `JeanKaddour/minipile` dataset with the aim to create a compact English-only tokenizer. | |
## Usage | |
load with AutoTokenizer, i.e.: | |
```py | |
from transformers import AutoTokenizer | |
tk = AutoTokenizer.from_pretrained('BEE-spoke-data/MiniTokenizer-20480') | |
tk | |
``` | |