Started work on a OCR model that uses surya, PaliGemma, and ColPali to extract the text from a PDF.
The general idea is as follows:
- Use Surya to identify and isolate all of the text/ paragraphs individually
- use PaliGemma to get the text from the image (TODO: compare quality against surya OCR)
- Use ColPali to check to see how closely the transcribed text matches the output
Model Info
End
Here’s an image for your time