http://spinn3r.com does something similar for blogs.
It's a very hard problem actually... I wrote the code.
The biggest issue is that these applications work VERY well for one particular application but if your app has special requirements, it might fall down.
Like if you're ok with more noise then dropping some signal is a bit rough.
Name / URL: Spinn3r
Elevator Pitch: Need to index a massive amount of weblogs, news and social media data? We're about 1/2 to a 1/10th of the price of anyone else in the space with better or equivalent data.
Longer description: Not a startup in the strict sense since we've been around for about ten years but we're rebooting/pivoting the company to be more competitive in social media analysis and to power MOST of the back-end of social media companies.
What stage are you in?: About a year since our pivot. Sales going well with lots of market demand. We're just about to launch 6.0 and focusing on powering the analytics of other social media monitoring companies.