I work for Google - and I love this post - thanks for sharing it!
Let me suggest to also take a look at this open source dashboard platform:
(talk to /u/arikfr to learn more about it)
I mostly use Redshift as the data warehouse, and currently Luigi for the ETL process (but previously I've used Azkaban).
With the orchestration stuff you can definitely just test it out yourself. The same with Redshift here: https://aws.amazon.com/redshift/free-trial/?nc1=h_ls - the SQL is very similar to PostgreSQL - but the important things to learn are the use of distribution and sort keys, column encodings and the need to vacuum and analyze the cluster.
Honestly, if you already know SQL I would just recommend testing the technologies directly and using their documentation. Set up your own small ETL project for reading some Twitter data or other APIs for example. Then you can add real-time analysis using something like Kafka, etc.
You can use Redash for visualization: https://redash.io/
It's free and open source too, I'd really like to contribute to that as I've contributed to Luigi.
If you don't mind a bit of a homebrew option, we use an AMI for redash hosted on AWS EC2. We set up a basic schema in RDS for storing test results and redash allows us to build charts based around SQL queries pretty easily. Auth is configured using Google Oauth, so anyone with an email configured for our work domain can view or create dashboards. We followed this guide for setup:
Thanks for sharing!
2 suggestions that would make this much better for me:
Instead of asking OAuth2 permissions (which many users won't give you), give users an option to give you a service account. It's harder to set up initially, but then allows a lot more flexibility on what teams decide to share with 3rd parties. See how https://redash.io/ does it.
Please no zebra stripped tables.
Btw, who's behind this project?
Cool. :)
​
Not sure if being able to generate graphs live is a useful thing for this data, but if so then the DBHub.io website can export tables as JSON too.
At the moment the only JSON format supported is Redash (https://redash.io), but it shouldn't be too hard to add others over time if there's demand. :)
if you'd entertain a server-side option, redash is pretty rad. It'll do autocompletion from schema and syntax, and it's great for workgroup-oriented stuff (reports, scheduling, sharing snippets, etc.)
I am prepping all my data in knime and then write it into a postgres table. I require dashboard type functionality and some user management, limited in budget, so my my current free graphing tools of choice is https://redash.io (it requires some basic sql knowledge and not super flexible nor pretty, but what it does it does do well) . Keeping my eyes on the https://superset.apache.org project as it’s nice and pretty and fast, has some big name support and is actively working towards their 1.0 release ( also sql based, atm probably more support for time series type data)
I've used redash.io before and it's a nice tool. You have to write your queries yourself though. You can host it yourself or pay for a hosted version.
​
PowerBI is ridiculously cheap for everything it offers.
I would stick with whatever database you are comfortable with, and then add Redash on top:
https://redash.io/help/open-source/setup
Redash will let you execute SQL as well as build all kinds of dashboards, charts, and visualizations. There are a number of components to it, but you'll find tons of examples to get started via docker-compose.
It is hard to understand your full requirements just by the words "data lake". Which parts of the data lake are you looking to implement? As others have pointed out, Minio is fantastic for the S3 compatible storage part. While there aren't necessarily great alternatives to things like Snowflake, things like the self-hosted version of Redash could work. Would love to know more about what components you are looking for.
Awesome!
Not fully Django-specific, but if you want more complex embeddable visualizations, and you know SQL, https://redash.io/ is an amazing FOSS self-hosted BI product that you can hook up directly to your database. Gives parameterizable embeds you can easily put into iframes in your templates. But certainly OP's approach allows you to more securely control what data is accessed.
Now I'm curious, I've read the intro doc https://developers.home-assistant.io/docs/en/hassio_addon_tutorial.html and it looks pretty straightforward. Do you have any tips or are there any pitfalls to avoid? (I really like using https://redash.io/ but its read-only, not a db mgmt tool)
If you’re okay with not building something from scratch, you can use a Business Intelligence tool. One example of an open source tool is Metabase: https://metabase.com/
Another is Redash: https://redash.io/help/open-source/setup
Metabase is a bit easier to build reports your need. But redash is also very solid.
Check these two out:
https://www.metabase.com/ and https://redash.io/
Both are free, redash looks paid, but look for the “host it yourself” version on the page. Metabase is probably what you’ll want, it’s a bit more friendly, but redash is a bit more customizable, but requires more work up front.
grafana.com Possibility to attach that to different datasources.
metabase.com if you have the data in a SQL DB
redash.io similar to metabase.
Definitely need more info here:
How much data do you have? I'm assuming since Google Sheets, it's not a lot.
What's your budget? If you're a 1-10 person shop looking to share some charts, that's going to be a drastically different answer than a 1000+ person company with $10's of thousands to budget. Many BI Tools charge based on seats (i.e. license count) and user type, such as content creator, admin, view-only, etc.
What's your technical competency? If you're comfortable with Linux, you can run open source Redash for free on your own EC2 server: https://redash.io/ Seems a bit more limited compared to the bigger players but has a pretty low barrier to entry.
I've support both Tableau & Looker as a data warehouse engineer, and they can both be very powerful in the hands of BI folks that know what they're doing with the tool. PowerBI had been the tool that people had access to and could create quick visualizations since we already had licenses for it, because we were a Microsoft shop.
If this is a simple load and report type of solution, might I suggest maybe spinning up something like Re:Dash and migrating the spreadsheet's data over to that? It's built in Python, runs on PostgreSQL, and it has a lot of nice features for simple reporting over SQL queries and Pandas. There are a myriad of ways you could import new incoming data into Postgres as well. /2c