We ingested all of the FCC comments (acquired via a scraper hitting their display API https://ecfsapi.fcc.gov/filings?proceedings.name=17-108&sort=date_disseminated,ASC&limit=3&offset=0)
Then:
The above image is a dashboard for August which had a surge in activity as the comment period was closing.
Tools used:
For detailed info and more charts, we have a complete write-up:
https://www.gravwell.io/blog/discovering-truth-through-lies-on-the-internet-fcc-comments-analyzed
Edit: Cleaned up some of the charts on the blog post. Verified the comments that were provided by /u/indrora were included in this analysis.
Just reading this headline, one might conclude that only 1M of 20M+ comment were fake. That's false.
This post is just a report of this great post, <strong>More</strong> than a Million Pro-Repeal Net Neutrality Comments were Likely Faked, about a particularly obvious subset of fake FCC comments. They read like Mad Libs.
There is a more complete and scientific analyses that concludes:
A very small minority of comments are unique -- only 17.4% of the 22,152,276 total. The highest occurrence of a single comment was over 1 million.
Most comments were submitted in bulk and many come in batches with obviously incorrect information -- over 1,000,000 comments in July claimed to have a pornhub.com email address
Bot herders can be observed launching the bots -- there are submissions from people living in the state of "{STATE}" that happen minutes before a large number of comment submissions.
Both of these linked articles are worthy of your attention.
oh hai, this research was done by us at Gravwell.
You can check out the full writeup on our blog at: https://www.gravwell.io/blog/discovering-truth-through-lies-on-the-internet-fcc-comments-analyzed
Original reddit threat on r/dataisbeautiful: https://www.reddit.com/r/dataisbeautiful/comments/73wjxf/fcc_net_neutrality_comments_are_closed_we_looked/
Happy to answer questions.
This is a good read for some analysis of the submitted and fake responses: https://www.gravwell.io/blog/discovering-truth-through-lies-on-the-internet-fcc-comments-analyzed
Before they got their script working there are clearly entries with {STATE} and {CITY} hardcoded in because they messed up their variables.
Edit for quote: > Like the rest of us, bot herders aren't immune to bugs in their code and sometimes push bad data to production. Except...unlike the rest of us they can't always roll it back because production is usually someone ELSE'S production. Here we see a bot start up with {STATE} and {CITY}. On the second try we get {STATE} and Orlando (Florida Man strikes again?).
And the entire research methodology including results and queries are all open source here
https://www.gravwell.io/blog/discovering-truth-through-lies-on-the-internet-fcc-comments-analyzed
We did some analytics on all 22m FCC comments and the research was picked up by Vice and the BBC. The data tells a story about the Net Neutrality proceeding.
https://www.gravwell.io/blog/discovering-truth-through-lies-on-the-internet-fcc-comments-analyzed
They contacted the people who supposedly made the comments and quite a lot of them said they did not post to the FCC website.
Also there's evidence that many of the comments were posted simultaneously. And, from the start, the system was pretty open to anyone mass submitting comments. All they would need to do is scrape a database for American sounding names and addresses.
FCC claims they were DDoS'd, but are very silent when asked to produce evidence that they were DDoS'd.
https://www.gravwell.io/blog/discovering-truth-through-lies-on-the-internet-fcc-comments-analyzed >Most comments were submitted in bulk and many come in batches with obviously incorrect information -- over 1,000,000 comments in July claimed to have a pornhub.com email addres
This has been known for months which is why people have filed FOIA requests so the FCC can show it's not full of shit. They haven't met the requests and they're now being sued.
Appreciate the feedback. Apologies for sending you to the wrong quickstart page, that's been fixed. We're tightening up the docs to streamline the CE and fix outdated info. Docs are always something that can be improved, sorry we let you down there.
If you're up for it, you can confirm whether entries are making it in via the system stats page. Looking at your wells will show data/entry counts or you could try run a search spanning yesterday to tomorrow. One issue we've had with pfsense is around timestamps and you may have to add/uncomment `Assume-Local-Timezone=true`. If the entries aren't there at all, that's a separate issue that comes down to network packets and port bindings. Simple relay is pretty simple especially without RFC specification, if you spit stuff at a listening port it will ingest it into Gravwell.
Support can be requested via https://www.gravwell.io/request-support or emailing [email protected].
ok you have a head on your shoulders...now take the next step. read the article!
https://www.gravwell.io/blog/discovering-truth-through-lies-on-the-internet-fcc-comments-analyzed
17% were unique comments typed into the web form. that doesn't mean the rest were entirely bots, but the comments were identical to others. some were obviously bots.
Checkout Gravwell.
The ingest and management libraries are all pure go and BSD-2 Clause (https://pkg.go.dev/github.com/gravwell/gravwell/v3/ingest)
You can host it yourself in a cloud provider or onprem - Free for up to 2G/day. https://www.gravwell.io/download
I work for a vendor and our product is a Threat Intelligence Gateway. We are in the process of integrating with various SIEM solutions. Our CTO recently came across a solution called Gravwell and he loves it. May be worth checking it out. https://www.gravwell.io
Actually these guys went through 22 million comments about net neutrality.. https://www.gravwell.io/blog/discovering-truth-through-lies-on-the-internet-fcc-comments-analyzed
I'm pretty sure you can at least high level tell who is full of shit or not. It's just whether the social media platform wants to tell or not.
Dunno if you are still looking but we just released the Community Edition (free tier) of Gravwell. We're a startup who built a Splunk alternative from scratch (using Go). Free tier is 2GB/day and paid licenses all come with unlimited data. I'm one of the founders if you have questions.
Hey Reddit, we made a Splunk alternative written in Go called Gravwell that has CoreDNS integration. This post is part of the Community Edition (free up to 2GB/day) Complete Guide to Building a Home Operations Center but obviously has value for enterprises. It walks through setting up CoreDNS but any DNS logs would work as the bulk of the guide covers automated comparisons against DNS blacklists and setting up scripts to auto-correlate with netflow and windows events if there's a DNS tipoff. <3
Get Gravwell CE for free here: https://www.gravwell.io/blog/gravwell-community-edition
We're releasing a series of posts that have dashboard import codes around some standard data. Check out the first one on collectd here: https://www.gravwell.io/blog/gravwell-and-collectd
We're working on making more tools to enable the community to build and share dashboards. Part of the challenge comes from disparate systems and no two networks being quite alike. Sometimes log formats change between versions of the same product =/ At least with the "ingest first ask questions later" mantra, that might break some existing dashboards but ingest hums along nicely so the ground truth is always available.
I think it's up to us as the public to do some of this work as well. I'm founder of a data analytics software company but we spent some time dogfooding the tool on the submitted comments. Our research on the FCC comments didn't go unnoticed. https://www.gravwell.io/blog/discovering-truth-through-lies-on-the-internet-fcc-comments-analyzed
In fact, there is some movement within the Government itself. We have spoken about our research with the NY AG and the Government Accountability Office.
Here's the public documentation about using the API:
https://www.fcc.gov/ecfs/public-api-docs.html
And to use the API even to query comments or documents you have to sign up for a key here: https://api.data.gov
Someone could have submitted through the API using fake info, but the FCC has their recorded key info and IP. The fact is that 98% of all pro repeal comments were submitted in bulk with identical comments and in alphabetical order. Here's a more in-depth analysis by a data company:
Admittedly, I just saw this on the front page, so I'll take a hit to originality for that. I'm also not a huge fan of Vice as a reliable or high-quality news source (given articles like this...), but I feel that the specifics aren't very important for this discussion, only that it is happening. Entities, whether they be people, companies, or other organizations, are using bots en-masse to create false discussion on the web to sway points of view about very real topics.
How far do the implications go for this? The fact that AI are using social media to sway real humans doesn't quite sit well with me, personally, especially as they employ media much closer to home (reddit being a primary example!) than I'd like. Should we be second-guessing every comment we see on a forum site from here on out, as it may not even be posted by a human? And perhaps the more important question- should these corporations, lobbyists, or other entities be held accountable for this "false persuasion"? Is this wrong, or merely the next natural step for the internet?
...and after a but of digging, the manner in which these comments are submitted seem to suggest they were organized by the FCC itself.