OCR Quality Assessment using Unigram Language Model

This HuggingFace model repository contains a unigram language model built for OCR quality assessment.

Model & Bloom Filter Integration

The build process creates bloom filter dictionaries with the following metadata:

  • Version: A specific version identifier (e.g. v1.0.0)
  • Language: The target language (e.g. en)
  • Model Name: A short identifier (e.g. wp for Wikipedia)
  • False Positive Probability: The target FP probability (e.g. 0.001)

The bloom filter dictionaries are first generated in a designated build directory (BUILD_DIR). They are then copied into this repository following a flat hierarchy structure. This means all built bloom filter files reside in a single directory (e.g. /bloom) without further nested subfolders, ensuring a streamlined layout.

Deployment Workflow

The Makefile targets:

  • copy-bloom: Copies the built bloom filter file to bloom/.
  • commit-bloom: Automatically stages and commits the update with a descriptive commit message.
  • push-bloom: Pushes the commit to the remote repository.
  • deploy-bloom: Aggregates the above steps into one deployment command.

This integration maintains a modular workflow where build artifacts created in BUILD_DIR are rapidly incorporated into the HuggingFace model repository.

...existing model usage and evaluation instructions...

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.