GPT2とPyTorchを使用したテキストの生成

Huggingfaceフレームワークを使用した任意の言語での高速で簡単なテキスト生成

コース「機械学習」の一環としてAdvanced」は興味深い資料の翻訳を用意しました。



また、「ABテストを最適化するための多腕バンディットに関する公開ウェビナーに参加することをお勧めします。ウェビナーでは、参加者は専門家と一緒に、強化学習の最も効果的なユースケースの1つを分析し、ABテスト問題をベイズ推定問題に再定式化する方法についても検討します。






前書き

(Natural Language Processing - NLP) . , , GPT-3, , , . - , , .





GPT-2 — GPT-3. Transformers, Huggingface. , GPT-2 , : GPT2 Pytorch





GPT-2 , , ! , .





  • 1:





  • 2:





  • 3:





  • 4: ,





  • 5:





  • :





1:

Huggingface Transformers, , PyTorch. PyTorch, .





PyTorch, Huggingface Transformers, :





pip install transformers
      
      



2:

Transformers, pipeline:





from transformers import pipeline
      
      



pipeline , .





3:

. :





text_generation = pipeline(“text-generation”)
      
      



— GPT-2, .





4: ,

, . :





The world is

()

prefix_text = "The world is"
      
      



5:

, , ! , :





generated_text= text_generation(prefix_text, max_length=50, do_sample=False)[0]

print(generated_text[‘generated_text’])
      
      



max_length



50 . :





The world is a better place if you’re a good person.

(   ,    .)

I’m not saying that you should be a bad person. I’m saying that you should be a good person.

(  ,      .  ,      .)

I’m not saying that you should be a bad

(  ,     .)
      
      



, , , . , . , , (, top-k/top-p ) , . , Huggingface TextGenerationPipeline.





:

-, , ; , , . , Huggingface , ( ), , .





, . GPT2 CKIPLab , .





, :





from transformers import BertTokenizerFast, AutoModelWithLMHead
      
      



:





tokenizer = BertTokenizerFast.from_pretrained(‘bert-base-chinese’)

model = AutoModelWithLMHead.from_pretrained(‘ckiplab/gpt2-base-chinese’)
      
      



:





text_generation = pipeline(“text-generation”, model=model, tokenizer=tokenizer)
      
      



, , :





我 想 要 去

prefix_text = "我 想 要 去"

##  
      
      



, , :





generated_text= text_generation(prefix_text, max_length=50, do_sample=False)[0]

print(generated_text['generated_text'])
      
      



:





我 想 要 去 看 看 。 」 他 說 : 「 我 們 不 能 說, 我 們 不 能 說, 我 們 不 能 說, 我 們 不 能 說, 我 們 不 能 說, 我 們 不 能 說, 我 們

##    ».  : «   ,    ,    , 
   ,    ,    , .
      
      



, , .





! , , API, Huggingface, . Jupyter:









In [1]:
from transformers import pipeline
 
In [ ]:
text_generation = pipeline("text-generation")
 
In [7]:
prefix_text = "The world is"
 
In [8]:
generated_text= text_generation(prefix_text, max_length=50, do_sample=False)[0]
print(generated_text['generated_text'])
 
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
 
The world is a better place if you're a good person.
 
I'm not saying that you should be a bad person. I'm saying that you should be a good person.
 
I'm not saying that you should be a bad

      
      



! , . - , . , , . , .





Brown, Tom B., et al. “Language models are few-shot learners.” arXiv preprint arXiv:2005.14165 (2020).





Radford, Alec, et al. “Language models are unsupervised multitask learners.” OpenAI blog 1.8 (2019): 9.





Transformers Github, Huggingface





Transformers Official Documentation, Huggingface





Pytorch Official Website, Facebook AI Research





Fan, Angela, Mike Lewis, and Yann Dauphin. “Hierarchical neural story generation.” arXiv preprint arXiv:1805.04833 (2018).





Welleck, Sean, et al. “Neural text generation with unlikelihood training.” arXiv preprint arXiv:1908.04319 (2019).





CKIPLab Transformers Github, Chinese Knowlege and Information Processing at the Institute of Information Science and the Institute of Linguistics of Academia Sinica






«Machine Learning. Advanced».





«Multi-armed bandits AB ».








All Articles