Huggingfaceフレームワークを使用した任意の言語での高速で簡単なテキスト生成
コース「機械学習」の一環として。Advanced」は興味深い資料の翻訳を用意しました。
また、「ABテストを最適化するための多腕バンディット」に関する公開ウェビナーに参加することをお勧めします。ウェビナーでは、参加者は専門家と一緒に、強化学習の最も効果的なユースケースの1つを分析し、ABテスト問題をベイズ推定問題に再定式化する方法についても検討します。
前書き
— (Natural Language Processing - NLP) . , , GPT-3, , , . - , , .
GPT-2 — GPT-3. Transformers, Huggingface. , GPT-2 , : GPT2 Pytorch
GPT-2 , , ! , .
1:
2:
3:
4: ,
5:
:
1:
Huggingface Transformers, , PyTorch. PyTorch, .
PyTorch, Huggingface Transformers, :
pip install transformers
2:
Transformers, pipeline:
from transformers import pipeline
pipeline , .
3:
. :
text_generation = pipeline(“text-generation”)
— GPT-2, .
4: ,
, . :
The world is
()
prefix_text = "The world is"
5:
, , ! , :
generated_text= text_generation(prefix_text, max_length=50, do_sample=False)[0]
print(generated_text[‘generated_text’])
max_length
50 . :
The world is a better place if you’re a good person.
( , .)
I’m not saying that you should be a bad person. I’m saying that you should be a good person.
( , . , .)
I’m not saying that you should be a bad
( , .)
, , , . , . , , (, top-k/top-p ) , . , Huggingface TextGenerationPipeline.
:
-, , ; , , . , Huggingface , ( ), , .
, . GPT2 CKIPLab , .
from transformers import BertTokenizerFast, AutoModelWithLMHead
:
tokenizer = BertTokenizerFast.from_pretrained(‘bert-base-chinese’) model = AutoModelWithLMHead.from_pretrained(‘ckiplab/gpt2-base-chinese’)
:
text_generation = pipeline(“text-generation”, model=model, tokenizer=tokenizer)
, , :
我 想 要 去
prefix_text = "我 想 要 去"
##
, , :
generated_text= text_generation(prefix_text, max_length=50, do_sample=False)[0]
print(generated_text['generated_text'])
:
我 想 要 去 看 看 。 」 他 說 : 「 我 們 不 能 說, 我 們 不 能 說, 我 們 不 能 說, 我 們 不 能 說, 我 們 不 能 說, 我 們 不 能 說, 我 們
## ». : « , , ,
, , , .
, , .
! , , API, Huggingface, . Jupyter:
In [1]:
from transformers import pipeline
In [ ]:
text_generation = pipeline("text-generation")
In [7]:
prefix_text = "The world is"
In [8]:
generated_text= text_generation(prefix_text, max_length=50, do_sample=False)[0]
print(generated_text['generated_text'])
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The world is a better place if you're a good person.
I'm not saying that you should be a bad person. I'm saying that you should be a good person.
I'm not saying that you should be a bad
! , . - , . , , . , .
Brown, Tom B., et al. “Language models are few-shot learners.” arXiv preprint arXiv:2005.14165 (2020).
Radford, Alec, et al. “Language models are unsupervised multitask learners.” OpenAI blog 1.8 (2019): 9.
Transformers Github, Huggingface
Transformers Official Documentation, Huggingface
Pytorch Official Website, Facebook AI Research
Fan, Angela, Mike Lewis, and Yann Dauphin. “Hierarchical neural story generation.” arXiv preprint arXiv:1805.04833 (2018).
Welleck, Sean, et al. “Neural text generation with unlikelihood training.” arXiv preprint arXiv:1908.04319 (2019).
CKIPLab Transformers Github, Chinese Knowlege and Information Processing at the Institute of Information Science and the Institute of Linguistics of Academia Sinica
«Multi-armed bandits AB ».