This is a case of chaining together a bunch of different services.
Something like Diffbot can extract article text from a URL, you then pipe that into a TTS service like CereVoice, and that's it in a nutshelll.
Trying to get TTS to sound better with $25k is hilarious. That's CereVoice's entire business, and I would imagine that Apple, Google, and Microsoft are throwing money at the problem to improve Siri, Google Now, and Cortana respectively.
Masonry would be perfect for this: http://masonry.desandro.com/
Enable filtering by religion too?
Hell, diffbot could do most the work in extracting headlines/images too: https://www.diffbot.com/
Godspeed!
Those are some external resources I've found useful (I'm sure there are more like all of them, these are just some particular ones I've used). Also become familiar with xml/html parsers and xpath, and localization / character encodings, and don't be afraid to modify data going into or coming out of a parser.
Source: Skill and birth year data from Diffbot's Knowledge Graph
Queries like the following surface the most prominent skills by a collection of birth years:
type:person birthDate<="2021-12-31" birthDate>="1989-01-01" facet:skills.name
Tools used: Knowledge Graph, G Sheets, Figma
Source: Article index of Diffbot's Knowledge Graph.
Tools: Google sheets, Diffbot Knowledge Graph, Figma
Method: Pull article entities with publisher URLs matching major news outlets. Average sentiment across all articles. Compare average sentiment to sentiment of articles mentioning democratic and republican parties. Normalize political bias on -.1 to .1 scale.
More detailed analysis / methodology: here.
Processed dataset available here.
Source: Diffbot Knowledge Graph
Data can be surfaced with queries like the following (with login):
type:Person description:or("statistician","statistics") location.country.name:"United States" facet:skills.name
Link to results here.
I have been exploring skill data in the Knowledge Graph and wanted to see how much overlap there is between statistician and data scientist skill sets. Any
Source: Diffbot Knowledge Graph person entities.
Check out underlying dataset (with login) with queries like the following:
type:Person employments.{employer.name:or("Meta","Google","Amazon","Apple","Netflix","Tesla","Microsoft") isCurrent:true} employments.{title:"data analyst" isCurrent:true} facet['C',"C++",'C#','Java','Julia','PHP','Python','R','SQL','MATLAB','sas']:skills.name
Choice of programming languages and graphic format based off of internal survey results at facebook as seen here
Source: Diffbot Knowledge Graph person entities.
Check out underlying dataset (with login) with queries like the following:
type:Person employments.{employer.name:or("Meta","Google","Amazon","Apple","Netflix","Tesla","Microsoft") isCurrent:true} employments.{title:"data analyst" isCurrent:true} facet['C',"C++",'C#','Java','Julia','PHP','Python','R','SQL','MATLAB','sas']:skills.name
Choice of programming languages and graphic format based off of internal survey results at facebook as seen here
Diffbot's new Natural Language API can be used to create the beginning of your own knowledge graph from your corpora of choice.
If you're looking for more theoretical or instructional data, I would suggest this video on the construction of knowledge graphs or this ebook by pool party called the knowledge graph cookbook.
>They still don't cite the amount of illegal immigrants that want to pay federal taxes vs the amount that don't.
The data doesn't exist for various reasons regarding fear of deportation to complexity of measuring them. That's why they have to estimate.
Here's the vox article again, but ran through something else to make it readable.
Ya as others have said, #1 would be the most trickiest to implement. However if you are extracting the data from the same website then technically speaking the said website should have the same html layout and you can kinda grab the element by parsing the html. You would have to repeat this process each time when you want to work with a new site.
Alternatively, you can try something like Diffbot Extraction where they extract the info of the page based on some AI solution.
Hi
> But does anyone know of a company that can do this (maybe even self service?). It'd be great if we could get commentary from thousands of websites.
I think you can try:
DiffBot Discussion API - structures the full content of forum threads, article comments, product reviews
Or use Scrapy library to create your custom comment scrappers
I am not sure if this count but scrapinghub and diffbot have developed automatic extrction API to parse and extract data from news article and online product urls.