Started work on a OCR model that uses surya, PaliGemma, and ColPali to extract the text from a PDF.

The general idea is as follows:

  • Use Surya to identify and isolate all of the text/ paragraphs individually
  • use PaliGemma to get the text from the image (TODO: compare quality against surya OCR)
  • Use ColPali to check to see how closely the transcribed text matches the output

Model Info

PaliGemma

ColPali

Surya

End

Here’s an image for your time