Transformersv4.x:将慢速分词器转换为快速分词器
我正在关注变压器的预训练模型xlm-roberta-large-xnli示例
from transformers import pipeline
classifier = pipeline("zero-shot-classification",
model="joeddav/xlm-roberta-large-xnli")
我收到以下错误
ValueError: Couldn't instantiate the backend tokenizer from one of: (1) a `tokenizers` library serialization file, (2) a slow tokenizer instance to convert or (3) an equivalent slow tokenizer class to instantiate and convert. You need to have sentencepiece installed to convert a slow tokenizer to a fast one.
我用的是变形金刚版 '4.1.1'
回答
根据 Transformers v4.0.0 release,sentencepiece作为必需的依赖项被删除。这意味着
“依赖 SentencePiece 库的分词器将无法用于标准转换器安装”
包括XLMRobertaTokenizer. 但是,sentencepiece可以作为额外的依赖项安装
pip install transformers[sentencepiece]
或者
pip install sentencepiece
如果您已经安装了变压器。
- pip install sentencepiece followed by kernel/runtime restart solves the issue.