Pixel Parsing

community

Activity Feed Request to join this org

AI & ML interests

Document and User Interface Parsing, Understanding, Q&A.

Recent Activity

rwightman new activity about 2 months ago

pixparse/cc3m-wds:Converting Arrow to WebDataset TAR Format for Offline Use

lhoestq authored a paper about 2 months ago

Croissant: A Metadata Format for ML-Ready Datasets

rwightman new activity 3 months ago

pixparse/cc12m-wds:Is this where all the data is?

View all activity

Organization Card

Community About org cards

Multi-modal document, image, and text datasets and models for document understanding, OCR, VQA tasks.

GitHub repos:

Data Loading: chug - https://github.com/huggingface/chug
Modelling: pixparse - coming soon

Collections 2

models

None public yet

datasets 6

pixparse/pdfa-eng-wds

Viewer • Updated Mar 29, 2024 • 7.1k • 2.51k • 143

pixparse/idl-wds

Viewer • Updated Mar 29, 2024 • 3.41M • 4.61k • 178

pixparse/docvqa-wds

Updated Mar 29, 2024 • 135 • 4

pixparse/docvqa-single-page-questions

Viewer • Updated Mar 29, 2024 • 50k • 440 • 8

pixparse/cc12m-wds

Viewer • Updated Dec 15, 2023 • 11M • 5.34k • 21

pixparse/cc3m-wds

Viewer • Updated Dec 15, 2023 • 2.93M • 6.34k • 26