Model Card for Fake News Detection Model

Model Summary

This is a fine-tuned DistilBERT model for fake news detection. It classifies news articles as either real or fake based on textual content. The model has been trained on a labeled dataset consisting of true and false news articles collected from various sources.

Model Details

Model Description

  • Developed by: Dhruv Pal
  • Finetuned from: distilbert-base-uncased
  • Language: English
  • Model type: Transformer-based text classification model
  • License: MIT
  • Intended Use: Fake news detection on social media and news websites

Model Sources

Uses

Direct Use

  • This model can be used to detect whether a given news article is real or fake.
  • It can be integrated into fact-checking platforms, misinformation detection systems, and social media moderation tools.

Downstream Use

  • Can be further fine-tuned on domain-specific fake news datasets.
  • Useful for media companies, journalists, and researchers studying misinformation.

Out-of-Scope Use

  • This model is not designed for generating news content.
  • It may not work well for languages other than English.
  • Not suitable for fact-checking complex claims requiring external knowledge.

Bias, Risks, and Limitations

Risks

  • The model may be biased towards certain topics, sources, or writing styles based on the dataset used for training.
  • There is a possibility of false positives (real news misclassified as fake) or false negatives (fake news classified as real).
  • Model performance can degrade on out-of-distribution samples.

Recommendations

  • Users should not rely solely on this model for determining truthfulness.
  • It is recommended to use human verification and cross-check information from multiple sources.

How to Use the Model

You can load the model using transformers and use it for inference as shown below:

from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification
import torch

tokenizer = DistilBertTokenizerFast.from_pretrained("your-model-id")
model = DistilBertForSequenceClassification.from_pretrained("your-model-id")

def predict(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
    outputs = model(**inputs)
    probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
    return "Fake News" if torch.argmax(probs) == 1 else "Real News"

text = "Breaking: Scientists discover a new element!"
print(predict(text))

Training Details

Training Data

The model was trained on a dataset consisting of news articles labeled as real or fake. The dataset includes information from reputable sources and misinformation websites.

Training Procedure

  • Preprocessing:

    • Tokenization using DistilBertTokenizerFast
    • Removal of stop words and punctuation
    • Converting text to lowercase
  • Training Configuration:

    • Model: distilbert-base-uncased
    • Optimizer: AdamW
    • Batch size: 16
    • Epochs: 3
    • Learning rate: 2e-5

Compute Resources

  • Hardware: NVIDIA Tesla T4 (Google Colab)
  • Training Time: ~2 hours

Evaluation

Testing Data

  • The model was evaluated on a held-out test set of 10,000 news articles.

Metrics

  • Accuracy: 92%
  • F1 Score: 90%
  • Precision: 91%
  • Recall: 89%

Results

Metric Score
Accuracy 92%
F1 Score 90%
Precision 91%
Recall 89%

Environmental Impact

  • Hardware Used: NVIDIA Tesla T4
  • Total Compute Time: ~2 hours
  • Carbon Emissions: Estimated using the ML Impact Calculator

Technical Specifications

Model Architecture

  • The model is based on DistilBERT, a lightweight transformer architecture that reduces computation while retaining accuracy.

Dependencies

  • transformers
  • torch
  • datasets
  • scikit-learn

Citation

If you use this model, please cite it as:

@misc{DhruvPal2025FakeNewsDetection,
  title={Fake News Detection with DistilBERT},
  author={Dhruv Pal},
  year={2025},
  howpublished={\url{https://huggingface.co/your-model-id}}
}

Contact

For any queries, feel free to reach out:

Downloads last month
120
Safetensors
Model size
67M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.