Document datasets with .pdf files that are usable with pixparse libraries and tools.
![](https://cdn-avatars.huggingface.co/v1/production/uploads/604a5184dca2c7ac7508b849/ld-CxkhwjLSy4sBkxmaUS.jpeg)
Pixel Parsing
community
AI & ML interests
Document and User Interface Parsing, Understanding, Q&A.
Recent Activity
View all activity
Organization Card
Multi-modal document, image, and text datasets and models for document understanding, OCR, VQA tasks.
GitHub repos:
- Data Loading:
chug
- https://github.com/huggingface/chug - Modelling:
pixparse
- coming soon
Collections
2
models
None public yet
datasets
6
pixparse/pdfa-eng-wds
Viewer
•
Updated
•
7.1k
•
2.51k
•
143
pixparse/idl-wds
Viewer
•
Updated
•
3.41M
•
4.61k
•
178
pixparse/docvqa-wds
Updated
•
135
•
4
pixparse/docvqa-single-page-questions
Viewer
•
Updated
•
50k
•
440
•
8
pixparse/cc12m-wds
Viewer
•
Updated
•
11M
•
5.34k
•
21
pixparse/cc3m-wds
Viewer
•
Updated
•
2.93M
•
6.34k
•
26