I skimmed through the tesseract github and they have a variety of 3rd party frontend guis, with windows/mac support.
https://tesseract-ocr.github.io/tessdoc/User-Projects-%E2%80%93-3rdParty.html
The first one on the list supports windows. https://github.com/manisandro/gImageReader Just go to releases, and download the x86_64.exe from the latest stable release.
You can use Tesseract with the gImageReader frontend. Tesseract needs the German Fraktur data files (deu_frak.traineddata) to work properly with German Fraktur. Without the traineddata, Tesseract only produces gibberish.
Seems like it may not be that. I created this patch but I still get the same error. I also tried #include <QUrl>
but that didn't work either.
Edit: Please forgive me you are 100% correct as proven here: https://github.com/manisandro/gImageReader/commit/6209e25dab20b233e399ff36fabe4252db0f9e44
Wow, this & your linked FAQ was a really comprehensive answer! I only sparingly need to transcribe stuff, and happen to not be as fast when reading & typing from scratch, so the workflow of OCR > manual correction pass has suited my needs well.
Maybe in time, we can get smarter OCR software that can easily add "templates" for common use cases, support for redactions, and better accuracy by supplying additional context (e.g. "this is a screenshot of a text group with 3+ people").
BTW, I'm using gImageReader, a GUI for Tesseract OCR (both open source). It can autodetect blocks of text based on one of several "page segmentation modes", each block of which can be manually added/deleted/resized, the sequence of blocks being transcribed can be reordered, and you can set a language and a character blacklist/whitelist among other things. If you haven't yet, you can check it out to see if it could help your volunteer work: https://github.com/manisandro/gImageReader
gImageReader is the linux standard that I'm aware of. It's a GUI to Tessaeract, but IIRC you can use other models if you have them.
There's also Transkribus, which is built for medieval texts, and I think you need to train yourself, which takes a powerful machine and a large corpus.
Baixa um programa.
Tem o gImageReader que é um guia visual para o Tesseract. Se você souber usar a linha de comando, pode instalar o tesseract direto.
I would love to have something that could adequately do what Abbyy FineReader does in Windows. GimageReader got about half the way there, but it's not a complete tool that can get a scanned PDF and produce a reasonable docx version of it. Specially important is the ability to tell Abbyy about the different "zones" in a page (text/table/picture etc.), OCR+correct misreads and then export. As a translator I need it constantly, and I'm sure other professionals could use it as well.
Doing tables is a heck of a job. If you code you could have a look at Camelot, but it works with PDF with text, not scanned ones and therefor not images either. gimageReader is looking into doing tables from images, but manually selecting columns and rows, and it has still some way to go.