OCR Quality Assessment using Unigram Language Model
This HuggingFace model repository contains a unigram language model built for OCR quality assessment.
Model & Bloom Filter Integration
The build process creates bloom filter dictionaries with the following metadata:
- Version: A specific version identifier (e.g. v1.0.0)
- Language: The target language (e.g. en)
- Model Name: A short identifier (e.g. wp for Wikipedia)
- False Positive Probability: The target FP probability (e.g. 0.001)
The bloom filter dictionaries are first generated in a designated build directory (BUILD_DIR
). They are then copied into this repository following a flat hierarchy structure. This means all built bloom filter files reside in a single directory (e.g. /bloom
) without further nested subfolders, ensuring a streamlined layout.
Deployment Workflow
The Makefile targets:
- copy-bloom: Copies the built bloom filter file to
bloom/
. - commit-bloom: Automatically stages and commits the update with a descriptive commit message.
- push-bloom: Pushes the commit to the remote repository.
- deploy-bloom: Aggregates the above steps into one deployment command.
This integration maintains a modular workflow where build artifacts created in BUILD_DIR
are rapidly incorporated into the HuggingFace model repository.
...existing model usage and evaluation instructions...
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model's library.