Not sure if this applies, I don't know much about data scraping - but my friend has a data scraping web app that you can use. To my knowledge - you can use it on the larger sites: https://www.parsehub.com/
Hi Reddit,
The original source for our full research can be found here (plus additional context about our methodology and findings!): https://www.parsehub.com/blog/when-elon-musk-tweets/
All our data was scraped from: https://www.twitter.com/elonmusk
Hope you find this interesting!
You asked for an approach, so that's what I'll share.
Start by finding the data. Where could you find a list of cars for sale? There are lots of website that offer this information - search around and find a site that allows you to search by make/model and year.
Collect the data. You can pull the prices by hand (inefficient and time consuming) or take the time to learn a new skill like web scraping. Alternatively, consider using a site like fivver or Amazon's Mechanical Turk to have someone else do it for you. Watch for the accuracy of the data if you use the latter approach.
You can calculate mean and SD in Excel or any of a dozen programming languages. Which software will you use? If the data is in an Excel sheet that might be easiest. If you put the data in a Google sheet the "Insights" button on the bottom right will probably even plot it for you automatically!
Break the process of building a regression into steps like we did for the first three steps above. You can search "regression" + your program of choice for plenty of tutorials. This is the core of what you're supposed to be learning, so the details are up to you.
If I can challenge you - see this as an opportunity to learn scrappy ingenuity. The data is out there. So are the stats programs and tutorials to learn this process. If you can break a big problem into small problems you'll find yourself making progress in no time. You will be delighted when you realize that you've overcome the challenge. Much of life is made up of ill-defined challenges - best to start practicing now. :)
Come back with specific questions and you will get answers! Best of luck to you!
I've read that, and theres a lot of it I just don't understand. I know that an API key is needed for the link to work, but I just don't know the syntax of it. I've looked for an example online of what one would look like, but haven't found one specifically for reddit.
Like this one I found as an example, but I don't know how I would need to change it to get it to work for reddits API.
=IMPORTDATA("https://www.parsehub.com/api/v2/projects/PROJECT_TOKEN/last_ready_run/data?api_key=API_KEY&format=csv")
Hahaha yes the mini challenges can get boring after awhile. One of your best bets is to start scraping websites. This skill will come in handy in a lot of scenarios.
One guy scraped all the bad reviews then seperates the reviews according to the company name, bundles all the company bad reviews and sells the bundle to the company. The company will review the bad comments and make needed changes to better support their customers.
You could make a rotating Bitcoin faucet collector
I plan on scraping a sports channel to find injuries and non starters in the NFL to predict game winners
the short answer is no. Not without a lot of coding expertise.
scraping from websites is tedious, requires a lot of trial and error and involves being very conversant in html.
Also, most consumer facing websites go to great lengths to prevent scraping because it undermines the purpose of the website which is to get you to see ads or buy things from them.
You might take a look at ParseHub. I've used them for something in the past. I found it pretty convoluted, but was eventually able to cobble something together.
If you have some patience, https://www.parsehub.com/ is a good tool for scraping. Once you have that you can pull the data from its API into a set of CSVs, or maybe download them from Sheets. Then you can use GPT-3 to gather sentiment. Finally any web framework will help you display things. I'm partial to Django.
Yes, Quora can drive traffic to your blog. It's recommended to invest and build up your Quora profile first. Then, search for related questions and naturally insert your link in your answer. You can use a Quora scraper to gather questions and private proxies to avoid getting blocked.
You can have a look at the following solutions :
Tools: webscraping was made with ParseHub (https://www.parsehub.com/) plus manual cleanup and visualization in Microsoft Excel
Data source: subreddit r/coronavirus , the data is a snapshot taken on April 16th 2020
Method discussion:
https://fcpython.com/blog/introduction-scraping-data-transfermarkt
I can’t find the original bookmark of a guide I first saw but those should kick things off. Didn’t have time to actually start anything but I think you’d find a lot of interesting player data from TM
I prefer parse hub https://www.parsehub.com
It let's you use an input table. For instance if you want to find competitor store location based on zip code, and have to enter in a zip code to a field and search for stores, you provide parse hub a list of zip codes and it will enter a zip, scrape the results, and repeat until it have all zip codes covered.
I put "set -x" at first line of script and execute following line: curl -X GET "
<code>https://www.example.com/$RUN_TOKEN/data?api_key=$API_TOKEN&amp;format=json</code>" | gunzip
Result:
+ curl -X GET 'https://www.example.com/t_LL-pgJzf6K/data?api_key=tww&format=json' + gunzip % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 162 100 162 0 0 197 0 --:--:-- --:--:-- --:--:-- 197 gzip: stdin: not in gzip format
I've had good results with ParseHub on sites that Kimono and Import.io would crash on.
After you've created your API, you can call it via a GET request and get a JSON object with all your data returned. Check the API references for each service!
Try out nightmare.js. It's a phantomjs wrapper that makes such tasks really easy. If you don't want to code a scraper by yourself, maybe parsehub can help you.