Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,101 @@
|
|
1 |
-
---
|
2 |
-
license: mit
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
tags:
|
6 |
+
- gpu
|
7 |
+
---
|
8 |
+
# Text Summarization Model with Seq2Seq and LSTM
|
9 |
+
|
10 |
+
This model is a sequence-to-sequence (seq2seq) model for text summarization. It uses a bidirectional LSTM encoder and an LSTM decoder to generate summaries from input articles. The model was trained on a dataset with sequences of length up to 800 tokens.
|
11 |
+
|
12 |
+
## Model Architecture
|
13 |
+
|
14 |
+
### Encoder
|
15 |
+
|
16 |
+
- **Input Layer:** Takes input sequences of length `max_len_article`.
|
17 |
+
- **Embedding Layer:** Converts input sequences into dense vectors of size 100.
|
18 |
+
- **Bidirectional LSTM Layer:** Processes the embedded input, capturing dependencies in both forward and backward directions. Outputs hidden and cell states from both directions.
|
19 |
+
- **State Concatenation:** Combines forward and backward hidden and cell states to form the final encoder states.
|
20 |
+
|
21 |
+
### Decoder
|
22 |
+
|
23 |
+
- **Input Layer:** Takes target sequences of variable length.
|
24 |
+
- **Embedding Layer:** Converts target sequences into dense vectors of size 100.
|
25 |
+
- **LSTM Layer:** Processes the embedded target sequences using an LSTM with the initial states set to the encoder states.
|
26 |
+
- **Dense Layer:** Applies a Dense layer with softmax activation to generate the probabilities for each word in the vocabulary.
|
27 |
+
|
28 |
+
### Model Summary
|
29 |
+
|
30 |
+
| Layer (type) | Output Shape | Param # | Connected to |
|
31 |
+
|-----------------------|---------------------|-------------|-----------------------------|
|
32 |
+
| input_1 (InputLayer) | [(None, 800)] | 0 | - |
|
33 |
+
| embedding (Embedding) | (None, 800, 100) | 47,619,900 | input_1[0][0] |
|
34 |
+
| bidirectional | [(None, 200), | 160,800 | embedding[0][0] |
|
35 |
+
| (Bidirectional) | (None, 100), | | |
|
36 |
+
| | (None, 100), | | |
|
37 |
+
| | (None, 100), | | |
|
38 |
+
| | (None, 100)] | | |
|
39 |
+
| input_2 (InputLayer) | [(None, None)] | 0 | - |
|
40 |
+
| embedding_1 | (None, None, 100) | 15,515,800 | input_2[0][0] |
|
41 |
+
| (Embedding) | | | |
|
42 |
+
| concatenate | (None, 200) | 0 | bidirectional[0][1] |
|
43 |
+
| (Concatenate) | | | bidirectional[0][3] |
|
44 |
+
| concatenate_1 | (None, 200) | 0 | bidirectional[0][2] |
|
45 |
+
| (Concatenate) | | | bidirectional[0][4] |
|
46 |
+
| lstm | [(None, None, 200), | 240,800 | embedding_1[0][0] |
|
47 |
+
| (LSTM) | (None, 200), | | concatenate[0][0] |
|
48 |
+
| | (None, 200)] | | concatenate_1[0][0] |
|
49 |
+
| dense (Dense) | (None, None, 155158)| 31,186,758 | lstm[0][0] |
|
50 |
+
| | | | |
|
51 |
+
|
52 |
+
Total params: 94,724,060
|
53 |
+
|
54 |
+
Trainable params: 94,724,058
|
55 |
+
|
56 |
+
Non-trainable params: 0
|
57 |
+
|
58 |
+
## Training
|
59 |
+
|
60 |
+
The model was trained on a dataset with sequences of length up to 800 tokens using the following configuration:
|
61 |
+
|
62 |
+
- **Optimizer:** Adam
|
63 |
+
- **Loss Function:** Categorical Crossentropy
|
64 |
+
- **Metrics:** Accuracy
|
65 |
+
|
66 |
+
### Training Loss and Validation Loss
|
67 |
+
|
68 |
+
| Epoch | Training Loss | Validation Loss | Time per Epoch (s) |
|
69 |
+
|-------|---------------|-----------------|--------------------|
|
70 |
+
| 1 | 3.9044 | 0.4543 | 3087 |
|
71 |
+
| 2 | 0.3429 | 0.0976 | 3091 |
|
72 |
+
| 3 | 0.1054 | 0.0427 | 3096 |
|
73 |
+
| 4 | 0.0490 | 0.0231 | 3099 |
|
74 |
+
| 5 | 0.0203 | 0.0148 | 3098 |
|
75 |
+
|
76 |
+
### Test Loss
|
77 |
+
|
78 |
+
| Test Loss |
|
79 |
+
|----------------------|
|
80 |
+
| 0.014802712015807629 |
|
81 |
+
|
82 |
+
## Usage -- I will update this soon
|
83 |
+
|
84 |
+
To use this model, you can load it using the Hugging Face Transformers library:
|
85 |
+
|
86 |
+
```python
|
87 |
+
from transformers import TFAutoModel
|
88 |
+
|
89 |
+
model = TFAutoModel.from_pretrained('your-model-name')
|
90 |
+
|
91 |
+
from transformers import AutoTokenizer, TFAutoModelForSeq2SeqLM
|
92 |
+
|
93 |
+
tokenizer = AutoTokenizer.from_pretrained('your-model-name')
|
94 |
+
model = TFAutoModelForSeq2SeqLM.from_pretrained('your-model-name')
|
95 |
+
|
96 |
+
article = "Your input text here."
|
97 |
+
inputs = tokenizer.encode("summarize: " + article, return_tensors="tf", max_length=800, truncation=True)
|
98 |
+
summary_ids = model.generate(inputs, max_length=150, min_length=40, length_penalty=2.0, num_beams=4, early_stopping=True)
|
99 |
+
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
|
100 |
+
|
101 |
+
print(summary)
|