I've read Burkov's 100-page ML book and it was great, so based on that I'd also recommend his latest book on ML engineering, although I haven't read that one myself.
Hands on ML is quite famous, and has some chapters on scalability + TFRecords.
I've personally deployed a simple pre-baked sklearn matrix factorization model using FastAPI, and Docker, with all the documentation open in another browser, it was quite manageable, but probably not robust enough for anything besides a hobbyist project.
If I were working on something that needed serious uptime and scalability, I'd probably start looking at working with TF Serving API. Leveraging all the stuff inside TF Serving is likely a bit overkill, and you may be satisfied with a pipeline where you just get your research team to export the models (API Doc Link), and then just use cortex.