sorry, Wasserman's all of statistics.
And linear algebra is a funny thing. There's a lot of different concepts and objects buried in there. Understanding the fundamental theorem of linear algebra is kind of a beastly journey, considering it's likely one of the first truly bizarre, abstract mathematical theorems an undergrad seems likely to encounter. It was definitely my first 'holy shit, math is getting trippy' topic at least. Going that far in is really important for understanding PCA, and a whole giant slew of other important topics, but I don't think a person needs that level of linear algebra knowledge to at least start to get the basics of how matrix transformations work. A simple feed forward neural network really isn't a hugely complex object, I feel pretty confident that a motivated student would be able to handle Michael Nielsen's book or Andrew Ng's class well enough to the point where they could implement their own network by following along. The only real prerequisite I'd say is having some familiarity with for loops, and whatever data structure is used for the language you're using (np arrays for Python).
There are many ways to gain understanding of neural networks, implementing one before you have deep understanding of the math is a fine way to proceed I think, just means you won't fully understand it until later.
And yeah, if someone doesn't know linear algebra, a rigorous treatment (like in Bishop's PRML or in Hastie and Tibshirani's ESL) will be completely undoable, but while Wasserman's book requires a high degree of comfort with multivariable calculus, there's comparatively little linear algebra. It kind of even seems like the big jump from classic statistics to ML is partly about generalizing to high dimensional vector spaces, that's when the linear algebra really starts to be important. The intro math text I went through at least only started to incorporate linear algebra in the very last chapter (multivariate Gaussians), I think Wasserman's is the same.
Hrmh, given your background I guess I would go with a suggestion of Wasserman for Statistical Inference or Casella and Berger which isn't really applied. If those are too much for you (which I doubt with your background), there is also Wackerly's Mathematical Statistics with Applications :)