It’s absolutely something all BI developers should be concerned about, but probably not to the level where you have nightmares :)
In order to reduce the stress and get more certainty, your data pipeline should have automated tests that check the most critical failure points.
For example each row in a fact table should have a unique key and that key should be tested for uniqueness. A test like that can detect row duplication on joins.
You should also check that all dimension keys have a row in the dimension table.
That’s the bare minimum in my opinion.
The next stage is to have automated tests that compare your transformed data to the source data.
For example, I load Facebook performance data at the ad level granularity.
But in addition to that I also load daily totals of the numbers.
Then I have a daily test that compares the summed up totals of the ad level to the daily level.
This way I can be certain that nothing went wrong during extraction or transformation.
Another practice during development that has been extremely valuable is to compare the data after you make changes to the model.
You wrote an article about it that you can find here:
Check out these products / resources as a starting point for modern data testing:
https://blog.getdbt.com/how-great-data-teams-test-their-data-models/
I can really recommend https://www.metabase.com/. It’s open source and self hostable. It has a wide variety of visualizations, which it auto-filters based on your visually built queries. So if one of the axes is time-based and the other a number, it automatically selects a barchart etc.
You can even try it out without any setup with the desktop versions. I assume it’s just a electron wrapper around the server + browser
I'm on the Metabase team and I wanted to share this with this subreddit. We've been working on a hosted version of Metabase for a while , and we're happy to formally pull back the covers.
Over the years we've tried to make self-hosting as easy as possible, but lots of companies have let us know they'd prefer a fully managed version of Metabase. So... here it is!
https://www.metabase.com/blog/Announcing-Metabase-Cloud/ has the official announcement, but if you have any questions, fire away =)
So, a free tool that connects to postgres? I assume the Kafka and 1 second refresh will be happening behind the scenes and you're just talking about a standard free BI tool.
Superset and Metabase are open source and free. Power BI has a free tier and Tableau Public is free. FlexIt Analytics is free and easy to set up on-prem or cloud (Heroku or AWS Marketplace).
Maybe this will work: https://www.metabase.com/
It's basically a human-friendly analytics interface to your database. You just download it (it's open source), connect it to your data, create visualizations with the provided builder or with SQL and then compose these visualizations into dashboards and share them. It's simple to get up and running and there's _tons_ of functionality.
Try Metabase which is a browser based query/reporting tool, where you can pre-define dependencies between tables which makes putting together queries easier for non-technical users. As it is browser based, it runs on a server, not the user's Windows computer.
There are other browser based tools, e.g. Hue (which needs a Linux server)
Or maybe Falcon as a desktop tool.
Instead of loading up a database administration tool that could potentially expose yourself to additional risk, try something like Metabase.
I've used it in the past with success.
Metabase is the easy, open source way for everyone in your company to ask questions and learn from data.
Connect google sheets to Google BigQuery and Metabase has a native connection to BigQuery.
Grafana is mostly for time-series data. For general dashboard stuff we use metabase which is epic. Just hook it up to a database and you can build really cool looking dashboards very easily.
Regarding getting your data into SQL, it really is not that difficult. Use the PowerCLI commands to get the data you want and then use the Invoke-SqlCmd function from the SqlServer powershell module to push that to your database using SQL statements. This is the setup I use at work with great success. Good luck!
Also, this sounds a bit like https://www.metabase.com/, with the difference that you need to set up your own local db instance (usually a view/snapshot of the production db), and so you can only ever have access to your own data. Good for visualisations, reporting et al.
You can look into this open source project called Metabase . It's a great data visualization tool, and it's self hosted and is super customizable. I've been using it to visualize client stores, my own businesses and more!
There's some learning curve to it, but then again every tool has one.
Just a heads up, not affiliated with the project, just really love it
it can be used on prem.
it does run on linux.
it appears that whitelabeling is possible but on the payed enterprise edition.
While it might not have all the features of Tableau - yet, I think a solid candidate, given they continue to develop their existing product, would be Metabase. https://www.metabase.com/ I've replaced several Tableau dashboards with Metabase with very little effort.
I agree that he or she shouldn't attempt to build a whole framework from scratch in two weeks... but as a CS student he or she should be able to deploy something like https://www.metabase.com/ in about a day.. two or three if he or she has never worked with docker before.
We use Metabase over our datalake that integrates everything https://www.metabase.com/
We don't set any particular KPI, we just try to maximize our KPIs. The main one is net cash flow.
There are people who could be fired if they will fail in their PI. But overall, the whole business is a funnel and the Theory of constraints applies to us. So I spend lots of time optimizing our bottleneck because the rest of the company is not an obstacle in our growth.
A paradox is our main bottleneck is in hiring, not in sales. If we could hire immediately what we need, we could instantly improve our net cash flow from $1.2M to $2M.
Same! We self-hosted Metabase on AWS for the last 3 years to get analytics from Redshift. Actually, it's pretty easy and cheap to self-host if you have someone on your team responsible for infra/DevOps. Btw, Metabase (like many others) has just released "a migration guide".
For those of you still in the middle of searching for a new home for your analytics and dashboards, I saw this Product Evaluation matrix on the Chartio community that may help you at selecting the next product/vendor.
I would definitively encourage you to check Metabase Embedding (which I think is already on your shortlist) then. However, if row-level permission and white-label are hard requirements you would have to check their Enterprise Embedded Analtyics.
You can do this with field filters, this should help: https://www.metabase.com/learn/building-analytics/sql-templates/sql-variables.html
At Metabase, we're looking for Software Engineers & DevOps to join our team in doing the hard work that makes our users’ lives easy.
Metabase is bringing data tools with the elegance and simplicity of consumer products to the crufty world of enterprise business intelligence. We provide an opinionated open source starting point for how companies should measure, analyze and share their data as well as a suite of tools to deal with the complexity that arises as they grow.
We run on a mix of Clojure and JavaScript, and the ideal candidate has shipped production code in one or more of these languages.
That's cool you were looking into doing this, OP. I'm actually on the Metabase team, and I thought I'd throw it out there that we just announced today that we're offering a paid, fully hosted version now for folks who don't want to set Metabase up and maintain it on a server themselves. You can check it out on our blog: https://www.metabase.com/blog/Announcing-Metabase-Cloud/index.html
I know this is 3 months later, but I'm on the Metabase team and just thought I'd mention that we actually announced a hosted version of Metabase today: https://www.metabase.com/blog/Announcing-Metabase-Cloud/index.html
Fully agree! Something that really helps me is this: https://www.postgresqltutorial.com (we uses PostgreSQL)
Engineers have ORM now and use lesser raw SQL. When I become a PM, being able to write my own SQL query to pull interesting data from Metabase(our data analytic tool) provides me with lots of valuable insight.
We used Metabase for a while but recently switched to Databread. It was less about charts and more about the actual data records. Right now they only let you browse the data but soon they will let you create, edit and delete the records too and it works for than just SQL databases. We ended up hooking up stripe too which came in handy.
Any modern relational database can be used for analytics.
If you don't want to pay Microsoft's price tag, I would recommend Postgres together with something like QueryTree, MetaBase or Apache Superset.
These are the best tools I have used in the past for this type of thing:
I recommend a lot either locking down who can make SQL reports/Questions or making them run off of a clone of your database and not your live production DB.
So, if you only need something that connects to your sql database and creates charts and so on, take a look at https://www.metabase.com
It is open source and we use it for our products, which works really well.
Hello
Not sure how sophisticated of visualization you need to do, but one option would be to point an analytics tool at your Django database and embed the dashboards in your web app via iframe.
I’ve done this with Metabase and it works very well.
Metabase is great and open source / free.
Is this something you're planning to expose to customers, or is it just an internal reporting tool?
You might also consider Metabase which is a free and open-source internal reporting tool.