The Data Warehouse Toolkit by Kimball was recommended to me as "The Source" for DW. I just started reading it, so no experience yet.
The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd Edition https://www.amazon.com/dp/1118530802/ref=cm_sw_r_cp_apa_i_LZ-7CbHQTXGRM
Not strictly DE, but https://www.amazon.com/Schema-Complete-Reference-Christopher-Adamson/dp/0071744320/ref=nodl_ . I found the idea of dimensional modeling/star schema to be something people would talk about in the abstract, but never anything real. This book helped me out a lot. Perhaps not as useful as it was 10yr ago, but there’s still some gems in there.
I’m one of the domain experts where I work so usually don’t need to interrogate too hard, but I am reading this excellent book on gathering requirements for BI projects.
Hi u/ryanblumenow, No. The project simulates building a data pipeline given an already existing data model.
Enterprise data arch involves a lot of data modeling, consolidating with multiple teams, planning, etc. The book <strong>https://www.amazon.com/Data-Warehouse-Toolkit-Complete-Dimensional/dp/0471200247</strong> goes over this in detail. Hope this helps.
I'm wondering what's new in SQL though? I haven't used it day to day since August 2016, but still do the basics occasionally in my current role.
Here's the Amazon link to checkout the review scores: SQL Cookbook: Query Solutions and Techniques for Database Developers (Cookbooks (O'Reilly)) https://www.amazon.com/dp/0596009763/ref=cm_sw_r_cp_apa_i_iJnuCbBS4T84X
Mostly positive and FWIW, the Kindle version is $17.27.
I was already considering buying the bundle, now I'm leaning even more towards purchasing.
As a BA I'd expect them to be responsible for requirements gathering. To that end, I'd recommend this:
https://www.amazon.co.uk/dp/0956817203/ref=cm_sw_r_cp_apa_fabc_31q3Fb1TT3YQZ
I'd second the idea of reading Kimball too. Maybe not the whole thing from a technique perspective, but certainly the first 2 chapters which are more high level.
Honestly though, I sometimes feel that data nerds are wired differently and you either get it, or you don't. I've worked with IT professionals and programmers who I'd consider far more gifted and qualified than I am, but data concepts really confuse them, whereas I've always picked it up without much effort.
As someone who has also made the DS → DE transition, your focus will (broadly speaking) shift from "how can I use this data?" to "how can I help others use this data?"
To help answer that question, I got the most bang for my buck studying dimensional modeling. Classics include The Data Warehouse Toolkit and Agile Data Warehouse Design.
Which specific tools/libraries you should learn depends on your new role. Given that you're entering an AWS shop, I assume DMS and Glue are on the table.
Have a nice “coachable” read.
https://www.amazon.com.br/Data-Warehouse-Toolkit-Definitive-Dimensional/dp/1118530802
A 3 edição nao tem em ptbr. Só a segunda ed.
https://www.holistics.io/blog/how-to-read-data-warehouse-toolkit/
Devido a isso recomendo a 3ª
You’d benefit by understanding relational models a little more. You can have an authors dataset, as well as a books dataset, and have an authors/books relationship dataset where it is one record per author per book. This would allow 1 book to have more than 1 author.
https://www.amazon.com/Data-Warehouse-Toolkit-Complete-Dimensional/dp/0471200247
Generally, if you want to analyze data, you want to put it into Dimensional form, which is highly denormalized
Ideally, this would end up on its own server machine - your data warehouse. which you woudl then query with Power BI or what have you
But you can also do the same in your regular database server, using separate tables, or materialized views. Only do this if your regular database server is not used heavily
You don't see much regression analysis in-db, but it's certainly an option. For analyzing data, you want to focus on analytic SQL: group by, aggregate functions, having, window functions. You also want to learn how to load Slowly Changing Dimensions Type 2, which can be tricky
Agile Data Warehouse Design is a good book to start with
I feel like you may be getting at the heart of at least some of what I'm attempting to accomplish. One of the things I need to model is product data, and, at surface, that seems incredibly complex if not outright complicated. I've been thinking about purchasing a copy of a text containing contrived solutions to common data-modeling problems (https://www.amazon.com/gp/product/0471380237/ref=ppx_yo_dt_b_asin_title_o00?ie=UTF8&psc=1), but I'm not sure it's worth it. I strongly suspect, just given what I've seen around the web, that the model for something like product data would be, for lack of a better term, deeply nested/dimensional. (A quick Google search on ER diagrams and dimensionality is teaching me to tread lightly so as to not conflate these things.)
Yes and yes, but this book covers all the same concepts in a very readable but shorter book
Star Schema The Complete Reference https://www.amazon.com/dp/0071744320/ref=cm_sw_r_awdo_CE4YNSVYEHQCF9ERSZXZ
The Kimball Data Warehouse Toolkit is probably the definitive answer, but Star Schema: The Complete Reference does a great job of distilling the concepts into a very readable (and shorter) book. https://www.amazon.com/Schema-Complete-Reference-Christopher-Adamson/dp/0071744320?ref_=d6k_applink_bb_dls&dplnkId=4194aa11-94d8-4bfe-b4a3-6bbb471f4e9c
There is also Agile Data Warehouse Design which is written by ex-Kimball staff applying an Agile approach.
I recommend Kimball first, Corr’s book is very good around asking questions of subject matter experts, and so is well suited to consultants or people new to an industry or employer.
Not sure about a cheat sheet, understanding which tool to use when requires some understanding of distributed systems and their limitations, such as the CAP theorem. This book goes deep on how databases work, getting into the nitty gritty on things like b-trees and index implementation and eventually zooming out to distributed databases. It's a grind but it's an amazingly thorough walk through (at least for someone like me who only had working knowledge of databases prior):
Database Internals: A Deep Dive into How Distributed Data Systems Work https://www.amazon.com/dp/1492040347/ref=cm_sw_r_awdo_BV61XFFBK9HS97W061HG
The canonical Designing Data Intensive Applications by Martin Kleppman is a bit easier to get through and gives a really great base understanding to work from with regards to distributed systems, and examines many different distributed technologies with discussions on their tradeoffs.
Ah man... My Google Fu deserted me. I couldn't find a version called a "Definitive guide...". I wonder, are they possibly published with different titles in different territories?
Hang on... This is the version I have. It seems I may be splitting heirs over titles and subtitles (what a douche 🤦♂️) ...I guess this puts me firmly in the reject pile 😭
The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd Edition https://www.amazon.co.uk/dp/1118530802/ref=cm_sw_r_apan_i_VAWAXBY7TRS6VYKPM5FD
I don't think so...
You should give "The Data Warehouse Toolkit" a read.
The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd Edition https://www.amazon.com/dp/1118530802/ref=cm_sw_r_apan_i_XM6AVWWD293EBY2GAGZZ
I should add specific to your original question — a great resource and guidance for data analysis using sql and excel is the book by the same title. https://www.amazon.com/Data-Analysis-Using-SQL-Excel/dp/111902143X
https://www.amazon.com/Data-Warehouse-Toolkit-Definitive-Dimensional/dp/1118530802
I saw a similar question in a post and this was recommended and like the poster said it changes how the poster sees databases design and I totally agree. Great book.
FYI, if you’re still in school or recently graduated see if they have access to O’Reilly Media and you can read this and other books for free.
Read this instead if you can: Star Schema The Complete Reference (https://www.amazon.com/dp/0071744320/) but also, make sure you understand dimensional modeling, difference between fact and dimension, and slowly changing dimensions. Its also useful to know what a cube or data mart are. Both books are not really written for a cloud data warehouse audience but an on prem data warehouse audience. So is the job cloud or no? If cloud, add The Informed Company: How to Build Modern Agile Data Stacks that Drive Winning Insights (https://www.amazon.com/dp/1119748003/). You don’t have to read all 3 books just skimming one of the three looking for the concepts mentioned should help.
Does your company have a self service BI tools that would alleviate some of the issues with scheduling? Tableau, Power BI?
These tools can help centralize some of the “reporting” stuff and provide some automation.
What’s the real issue? Data quality? Automation? Data accessibility? Reusability? Sharing?
Here’s a great book in case that’s all your looking for… get at least through the first section.
Star Schema The Complete Reference https://www.amazon.com/dp/0071744320/ref=cm_sw_r_cp_api_glt_i_RBKV51H6C2SPVXB4XHRA
database internals is my favorite on how dbs work, and can be a good base for branching your learning into different areas of DBing
Database Internals: A Deep Dive into How Distributed Data Systems Work https://www.amazon.com/dp/1492040347/ref=cm_sw_r_cp_api_glt_i_G6TT6ZTK7WNKYEKA3FD0
Knowledge of internal workings of a Database is also helpful in design. Check these 2 books,
Star Schema Star Schema The Complete Reference https://www.amazon.com/dp/0071744320/ref=cm_sw_r_cp_api_glt_fabc_EXP3MZQDDBA7A509W7AC. The book also provides subject specific referrals to other books and resources like to Kimball's Data Warehouse Toolkit
This is the golden standard for Kimballs methodologies: https://www.amazon.com/Data-Warehouse-Toolkit-Complete-Dimensional/dp/0471200247
There are also many blogs explaining the differences between Kimball, Inmon, Data vaults and Snowflakes.A great one is: https://towardsdatascience.com/guide-to-data-warehousing-6fdcf30b6fbe
This is a good book. Shorter than the Kimball book anyways:
https://www.amazon.ca/Agile-Data-Warehouse-Design-Collaborative/dp/0956817203
This is a good answer on the DBA stack exchange:
https://dba.stackexchange.com/questions/45655/what-are-measures-and-dimensions-in-cubes/45669#45669