dittops commited on
Commit
64ca826
·
1 Parent(s): 403652a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -37
README.md CHANGED
@@ -1,58 +1,75 @@
1
  ---
2
- license: other
3
- base_model: codellama/CodeLlama-13b-Python-hf
4
- tags:
5
- - generated_from_trainer
6
- model-index:
7
- - name: codellama13b
8
- results: []
9
  ---
10
 
11
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
  should probably proofread and complete it, then remove this comment. -->
13
 
14
- # codellama13b
15
 
16
- This model is a fine-tuned version of [codellama/CodeLlama-13b-Python-hf](https://huggingface.co/codellama/CodeLlama-13b-Python-hf) on the oss-evol dataset.
17
 
18
- ## Model description
19
 
20
- More information needed
21
 
22
- ## Intended uses & limitations
23
 
24
- More information needed
 
 
25
 
26
- ## Training and evaluation data
 
27
 
28
- More information needed
 
29
 
30
- ## Training procedure
 
 
31
 
32
- ### Training hyperparameters
33
 
34
- The following hyperparameters were used during training:
35
- - learning_rate: 2e-05
36
- - train_batch_size: 2
37
- - eval_batch_size: 8
38
- - seed: 42
39
- - distributed_type: multi-GPU
40
- - num_devices: 8
41
- - total_train_batch_size: 16
42
- - total_eval_batch_size: 64
43
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
44
- - lr_scheduler_type: cosine
45
- - lr_scheduler_warmup_ratio: 0.1
46
- - num_epochs: 3.0
47
- - mixed_precision_training: Native AMP
48
 
49
- ### Training results
 
50
 
 
51
 
 
52
 
53
- ### Framework versions
 
 
 
54
 
55
- - Transformers 4.36.2
56
- - Pytorch 2.1.2+cu121
57
- - Datasets 2.15.0
58
- - Tokenizers 0.15.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: llama2
 
 
 
 
 
 
3
  ---
4
 
5
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
6
  should probably proofread and complete it, then remove this comment. -->
7
 
8
+ # Introducing Code Millenials 13B
9
 
10
+ Welcome to our Code Model repository! Our model is specifically fine-tuned for code generation tasks, aiming to revolutionize how systems understand and translate natural language instructions into code queries. Built on CodeLLaMa 13B, our model has been meticulously fine-tuned with a curated code generation instructions, ensuring quality and precision. The model has capability of 120K+ sequence length without affecting the preplexity with the implemenation of lambda attention.
11
 
 
12
 
13
+ ## Generate responses
14
 
15
+ Inference code using the pre-trained model from the Hugging Face model hub
16
 
17
+ ```python
18
+ import torch
19
+ from transformers import AutoTokenizer, AutoModelForCausalLM
20
 
21
+ tokenizer = AutoTokenizer.from_pretrained("budecosystem/sql-millennials-13b")
22
+ model = AutoModelForCausalLM.from_pretrained("budecosystem/sql-millennials-13b")
23
 
24
+ prompt = "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
25
+ USER: Create SQL query for the given table schema and question ASSISTANT:"
26
 
27
+ inputs = tokenizer(prompt, return_tensors="pt")
28
+ sample = model.generate(**inputs, max_length=128)
29
+ print(tokenizer.decode(sample[0]))
30
 
31
+ ```
32
 
33
+ To get extended context length, use the generate.py file from the [github repo](https://github.com/BudEcosystem/code-millenials)
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
+ ```
36
+ python generate.py --base_model budecosystem/code-millenials-13b
37
 
38
+ ```
39
 
40
+ You can integrate the model in your code my loading convert_llama_model function.
41
 
42
+ ```python
43
+ import torch
44
+ from transformers import GenerationConfig, AutoModelForCausalLM, AutoTokenizer
45
+ from model.llama import convert_llama_model
46
 
47
+ local_branch = 2048
48
+ global_branch = 10
49
+ limit_distance = 2048
50
+
51
+ model = AutoModelForCausalLM.from_pretrained(
52
+ "budecosystem/code-millenials-13b",
53
+ torch_dtype=torch.float16,
54
+ device_map="auto",
55
+ )
56
+ model = convert_llama_model(model, local_branch, global_branch)
57
+
58
+ ```
59
+
60
+ ## Training details
61
+
62
+ The model is trained of 8 A100 80GB for approximately 55hrs.
63
+
64
+ | Hyperparameters | Value |
65
+ | :----------------------------| :-----: |
66
+ | per_device_train_batch_size | 2 |
67
+ | gradient_accumulation_steps | 1 |
68
+ | epoch | 3 |
69
+ | steps | 19206 |
70
+ | learning_rate | 2e-5 |
71
+ | lr schedular type | cosine |
72
+ | warmup ratio | 0.1 |
73
+ | optimizer | adamw |
74
+ | fp16 | True |
75
+ | GPU | 8 A100 80GB |