>May-13-20
Regex is always fun to play with; there are online tools to help you test the pattern
^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)-\d{2}-\d{2}
​
Regular Expressions 101 is a useful one. There's also a tool called "Expresso" that I've found handy
regex101 is effing awesome and is how I got started learning it, however, this resource does not support .Net RegEx which is what PowerShell uses.
There are some minor differences and things should mostly work.
My recommendation is an application called Expresso that is built on .Net so it will be the same behavior as PowerShell.
It's free to use but you have to register.
I'll be honest, I haven't actually used it on a project yet. And GitHub says I starred one of the Verbal Expression repos about 2 years ago :)
Looks like there's been a lot of new ports since I first saw the project.
And yeah, regex is hard to get correct. You might also find the various regular expression visualisers useful. When working with regex I usually compose them in a tool like Expresso (it's free, but you have to register with them to get a free key) as I can test the regex without leaving the tool.
Im not sure if you can use an xpath expresson for that. all my xpaths have run against a straight .xml file. The other option is use regular expressions. Im pretty sure that will work. There is a free tool i use for building\testing the regular expressions called Expresso
I've used the Expresso regular expression tool by Ultrapico for years - it's free and takes the hard work out of writing expressions.
I, personally, prefer Expresso. It's built on .Net and I have found differences in the syntax between PowerShell regex (.net) and online tools which might be focusing on Java or something else.
Expresso (it's freeware) is probably what you're looking for if you're just starting to learn regex... it has an expression builder / expression library that makes it easy to understand what each regex character/string does and the test mode gives you instant feedback so you can refine your expression on the fly.
I'm no pro, but:
There's a decent tutorial here: http://www.regular-expressions.info/tutorialcnt.html (quickstart here: http://www.regular-expressions.info/quickstart.html). That site advertises their own regex helper, but there are others, and some are free: http://www.ultrapico.com/Expresso.htm
Regular expressions can help you if your data (or the non-data) follows some sort of pattern. If, as you suggest, the useful data is contained "near" an SSN or some other recognizable datum, it may be possible to extract nearby data, assuming you can come up for criteria for what "near" means. Alternately, if the non-data that's interspersed follows a pattern of some sort, it may be possible to key on that and remove it. Ultimately it will vary file to file, but it can be a powerful tool for data extraction.