mDeBERTa-v3-base-kor-further

๐Ÿ’ก ์•„๋ž˜ ํ”„๋กœ์ ํŠธ๋Š” KPMG Lighthouse Korea์—์„œ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
KPMG Lighthouse Korea์—์„œ๋Š”, Financial area์˜ ๋‹ค์–‘ํ•œ ๋ฌธ์ œ๋“ค์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด Edge Technology์˜ NLP/Vision AI๋ฅผ ๋ชจ๋ธ๋งํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. https://kpmgkr.notion.site/

What is DeBERTa?

  • DeBERTa๋Š” Disentangled Attention + Enhanced Mask Decoder ๋ฅผ ์ ์šฉํ•˜์—ฌ ๋‹จ์–ด์˜ positional information์„ ํšจ๊ณผ์ ์œผ๋กœ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ์ด์™€ ๊ฐ™์€ ์•„์ด๋””์–ด๋ฅผ ํ†ตํ•ด, ๊ธฐ์กด์˜ BERT, RoBERTa์—์„œ ์‚ฌ์šฉํ–ˆ๋˜ absolute position embedding๊ณผ๋Š” ๋‹ฌ๋ฆฌ DeBERTa๋Š” ๋‹จ์–ด์˜ ์ƒ๋Œ€์ ์ธ ์œ„์น˜ ์ •๋ณด๋ฅผ ํ•™์Šต ๊ฐ€๋Šฅํ•œ ๋ฒกํ„ฐ๋กœ ํ‘œํ˜„ํ•˜์—ฌ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ, BERT, RoBERTA ์™€ ๋น„๊ตํ–ˆ์„ ๋•Œ ๋” ์ค€์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.
  • DeBERTa-v3์—์„œ๋Š”, ์ด์ „ ๋ฒ„์ „์—์„œ ์‚ฌ์šฉํ–ˆ๋˜ MLM (Masked Language Model) ์„ RTD (Replaced Token Detection) Task ๋กœ ๋Œ€์ฒดํ•œ ELECTRA ์Šคํƒ€์ผ์˜ ์‚ฌ์ „ํ•™์Šต ๋ฐฉ๋ฒ•๊ณผ, Gradient-Disentangled Embedding Sharing ์„ ์ ์šฉํ•˜์—ฌ ๋ชจ๋ธ ํ•™์Šต์˜ ํšจ์œจ์„ฑ์„ ๊ฐœ์„ ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
  • DeBERTa์˜ ์•„ํ‚คํ…์ฒ˜๋กœ ํ’๋ถ€ํ•œ ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•ด์„œ, mDeBERTa-v3-base-kor-further ๋Š” microsoft ๊ฐ€ ๋ฐœํ‘œํ•œ mDeBERTa-v3-base ๋ฅผ ์•ฝ 40GB์˜ ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ ์ถ”๊ฐ€์ ์ธ ์‚ฌ์ „ํ•™์Šต์„ ์ง„ํ–‰ํ•œ ์–ธ์–ด ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

How to Use

  • Requirements
    pip install transformers
    pip install sentencepiece
    
  • Huggingface Hub
    from transformers import AutoModel, AutoTokenizer
    
    model = AutoModel.from_pretrained("lighthouse/mdeberta-v3-base-kor-further")  # DebertaV2ForModel
    tokenizer = AutoTokenizer.from_pretrained("lighthouse/mdeberta-v3-base-kor-further")  # DebertaV2Tokenizer (SentencePiece)
    

Pre-trained Models

  • ๋ชจ๋ธ์˜ ์•„ํ‚คํ…์ฒ˜๋Š” ๊ธฐ์กด microsoft์—์„œ ๋ฐœํ‘œํ•œ mdeberta-v3-base์™€ ๋™์ผํ•œ ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค.

    Vocabulary(K) Backbone Parameters(M) Hidden Size Layers Note
    mdeberta-v3-base-kor-further (mdeberta-v3-base์™€ ๋™์ผ) 250 86 768 12 250K new SPM vocab

Further Pretraing Details (MLM Task)

  • mDeBERTa-v3-base-kor-further ๋Š” microsoft/mDeBERTa-v3-base ๋ฅผ ์•ฝ 40GB์˜ ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ MLM Task๋ฅผ ์ ์šฉํ•˜์—ฌ ์ถ”๊ฐ€์ ์ธ ์‚ฌ์ „ ํ•™์Šต์„ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

    Max length Learning Rate Batch Size Train Steps Warm-up Steps
    mdeberta-v3-base-kor-further 512 2e-5 8 5M 50k

Datasets

  • ๋ชจ๋‘์˜ ๋ง๋ญ‰์น˜(์‹ ๋ฌธ, ๊ตฌ์–ด, ๋ฌธ์–ด), ํ•œ๊ตญ์–ด Wiki, ๊ตญ๋ฏผ์ฒญ์› ๋“ฑ ์•ฝ 40 GB ์˜ ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ์…‹์ด ์ถ”๊ฐ€์ ์ธ ์‚ฌ์ „ํ•™์Šต์— ์‚ฌ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
    • Train: 10M lines, 5B tokens
    • Valid: 2M lines, 1B tokens
    • cf) ๊ธฐ์กด mDeBERTa-v3์€ XLM-R ๊ณผ ๊ฐ™์ด cc-100 ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ํ•™์Šต๋˜์—ˆ์œผ๋ฉฐ, ๊ทธ ์ค‘ ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ์…‹์˜ ํฌ๊ธฐ๋Š” 54GB์ž…๋‹ˆ๋‹ค.

Fine-tuning on NLU Tasks - Base Model

Model Size NSMC(acc) Naver NER(F1) PAWS (acc) KorNLI (acc) KorSTS (spearman) Question Pair (acc) KorQuaD (Dev) (EM/F1) Korean-Hate-Speech (Dev) (F1)
XLM-Roberta-Base 1.03G 89.03 86.65 82.80 80.23 78.45 93.80 64.70 / 88.94 64.06
mdeberta-base 534M 90.01 87.43 85.55 80.41 82.65 94.06 65.48 / 89.74 62.91
mdeberta-base-kor-further (Ours) 534M 90.52 87.87 85.85 80.65 81.90 94.98 66.07 / 90.35 68.16

KPMG Lighthouse KR

https://kpmgkr.notion.site/

Citation

@misc{he2021debertav3,
      title={DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing}, 
      author={Pengcheng He and Jianfeng Gao and Weizhu Chen},
      year={2021},
      eprint={2111.09543},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
@inproceedings{
he2021deberta,
title={DEBERTA: DECODING-ENHANCED BERT WITH DISENTANGLED ATTENTION},
author={Pengcheng He and Xiaodong Liu and Jianfeng Gao and Weizhu Chen},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=XPZIaotutsD}
}

Reference

Downloads last month
1,726
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.