My favourite textbook is Uwe Mayer-Baese, 'Digital Signal Processing with Field Programmable Gate Arrays'
https://www.amazon.com/Digital-Processing-Programmable-Communication-Technology/dp/3642453082
I've implemented FFTs, FIRs and DDCs in a Spartan-3a DSP board, running real time with a sampling rate of 70MS/s. To a large extent, that meant using ready-made modules from Xilinx that already ship with the design software. Study the manual that comes with these blocks really carefully, but for my application, they just worked. The blocks aren't all that specific to your device, but are quite vendor specific, although each of the vendors have their own version of them.