{"product":{"product_id":"0956817203","title":"Agile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to Star Schema","price":"39.98","image_url":"https://m.media-amazon.com/images/I/51WV5QjDbwL._SL500_.jpg","url":"https://www.amazon.com/dp/0956817203"},"comments":[{"body":"I’m one of the domain experts where I work so usually don’t need to interrogate too hard, but I am reading this excellent book on gathering requirements for BI projects.\r\n\r\n[Agile Data Warehouse Design ](https://www.amazon.com/Agile-Data-Warehouse-Design-Collaborative/dp/0956817203/ref=mp_s_a_1_1?dchild=1\u0026amp;keywords=agile+bi+data+warehouse\u0026amp;qid=1614334534\u0026amp;sr=8-1)","subreddit_name":"BusinessIntelligence","author":"kiwifruta"},{"body":"As a BA I'd expect them to be responsible for requirements gathering. To that end, I'd recommend this:\r\n\r\nhttps://www.amazon.co.uk/dp/0956817203/ref=cm_sw_r_cp_apa_fabc_31q3Fb1TT3YQZ\r\n\r\nI'd second the idea of reading Kimball too. Maybe not the whole thing from a technique perspective, but certainly the first 2 chapters which are more high level.\r\n\r\nHonestly though, I sometimes feel that data nerds are wired differently and you either get it, or you don't. I've worked with IT professionals and programmers who I'd consider far more gifted and qualified than I am, but data concepts really confuse them, whereas I've always picked it up without much effort.","subreddit_name":"BusinessIntelligence","author":"j0hnny147"},{"body":"There is also [Agile Data Warehouse Design](https://www.amazon.com.au/Agile-Data-Warehouse-Design-Collaborative/dp/0956817203) which is written by ex-Kimball staff applying an Agile approach.\r\n\r\nI recommend Kimball first, Corr’s book is very good around asking questions of subject matter experts, and so is well suited to consultants or people new to an industry or employer.","subreddit_name":"BusinessIntelligence","author":"kiwifruta"},{"body":"I would recommend this book if you want to understand the creation process better https://smile.amazon.co.uk/Agile-Data-Warehouse-Design-Collaborative/dp/0956817203/ref=sr_1_1?crid=3BXBGVUZN8KZF\u0026amp;dchild=1\u0026amp;keywords=agile+data+warehouse+design\u0026amp;qid=1596777778\u0026amp;sprefix=agile+data+wa%2Caps%2C142\u0026amp;sr=8-1 \r\n\r\nAt a very high level description you would have a set of data sources, the degree to which they can be considered structured would determine the ETL processes you need to carry.  \r\n\r\nYou would need to identify the \"fact\" ie the entity you want to measure, which would be a table containing a set of attributes:\r\n\r\nsales_fact( sales_fact_id,sum_sales, gross_profit, total_cost, net_profit);\r\n\r\nThe sales fact would be affected by a set of dimensions, these would represent a point of view on the fact you want to measure:\r\n\r\nsales_area(country_id,country, region,city);\r\nsales_representative(representative_id,name,country;);\r\npromotions(promotion_id,promotion_name...date);\r\n\r\nAs you can see these are denormalized, country is present in both sales_area and sales_representative\r\n\r\nThe stages would be:\r\n\r\nConnect to the data source(s), perform the needed transformation, load those into the destionation tables. If your data needs a lot of cleaning I would use Talend, Pentaho or similar.\r\n\r\nFor one course I did we used the AirBnB Berlin dataset, we had to explore the data, identify the relevant dimensions, create the logical and physical design for the tables, then design the ETL processes using Pentaho.\r\n\r\nFinally we had to create cubes in Visual Studio to perform OLAP analysis and use POWER BI to perform visualizations ( in my case I also connected the database to R to perform further data exploration)\r\n\r\nI hope this helps","subreddit_name":"dataengineering","author":"smpvlc"},{"body":"Thanks for the reply!\r\n\r\nThis book? https://www.amazon.com/Agile-Data-Warehouse-Design-Collaborative/dp/0956817203/ref=pd_bxgy_14_img_3?ie=UTF8\u0026amp;psc=1\u0026amp;refRID=Z6937YHDPA17DS5NNMR6\r\n\r\nThanks for all the sweet links! It doesn't look like the aggregation Designer link works but I'll poke around to find it.\r\n\r\nDoes the workbench build a schema for you? I'm looking through the docs atm\r\n\r\nThanks again!","subreddit_name":"Database","author":"cF516"},{"body":"Generally, if you want to analyze data, you want to put it into [Dimensional](https://en.wikipedia.org/wiki/Dimensional_modeling) form, which is highly denormalized\r\n\r\nIdeally, this would end up on its own server machine - your data warehouse. which you woudl then query with Power BI or what have you\r\n\r\nBut you can also do the same in your regular database server, using separate tables, or materialized views. Only do this if your regular database server is not used heavily\r\n\r\nYou don't see much regression analysis in-db, but it's certainly an option. For analyzing data, you want to focus on analytic SQL: group by, aggregate functions, having, window functions. You also want to learn how to load Slowly Changing Dimensions Type 2, which can be tricky\r\n\r\n[Agile Data Warehouse Design](https://www.amazon.ca/Agile-Data-Warehouse-Design-Collaborative/dp/0956817203) is a good book to start with","subreddit_name":"SQL","author":"boy_named_su"},{"body":"https://www.amazon.ca/Agile-Data-Warehouse-Design-Collaborative/dp/0956817203 is a good book regarding collaborative dimensional schema design","subreddit_name":"dataengineering","author":"boy_named_su"},{"body":"This is a good book. Shorter than the Kimball book anyways:\r\n\r\nhttps://www.amazon.ca/Agile-Data-Warehouse-Design-Collaborative/dp/0956817203\r\n\r\nThis is a good answer on the DBA stack exchange:\r\n\r\nhttps://dba.stackexchange.com/questions/45655/what-are-measures-and-dimensions-in-cubes/45669#45669","subreddit_name":"datascience","author":"boy_named_su"},{"body":"I am also a beginner to data engineering as well (currently a senior software engineer), but here's what I've learned so far -- there are 2 layers in your data stack where you can add SCD, each solves different problems and require different implementations.\r\n\r\n\u0026amp;#x200B;\r\n\r\nFor all of the below, I will discuss Type 2 SCD, which (like others in this thread have said) seems like the most standard way to add \"history\" to your data.\r\n\r\n1) SCD for your data model (ie: dimension tables).  This helps answer questions like \"our revenue per user is X, how does that compare with last month?  last year?\"  Let's say you run your data models in a daily batch, then you'll append 1 new row per day.  You'll also update the \\`end\\_at\\` timestamp for the previous row.\r\n\r\nTypically I always argue \"don't overengineer something and only build it when you need it\", but in this area, most people can predict that these historical questions will come, so it's worth adding it now (because it's very difficult to add later).  Lawrence Corr's [Agile Data Warehouse Design](https://www.amazon.com/Agile-Data-Warehouse-Design-Collaborative/dp/0956817203) book pretty much says the same thing.\r\n\r\n\u0026amp;#x200B;\r\n\r\n2) SCD for your source/raw models (ie: the data coming from your transactional database).  This helps you answer questions like \"when X event happened, what was the data in Y tables\"?    \r\n  A better example: \"I wonder how our purchases break down by zipcode.\"  If you have Type 1 SCD data from your mysql/postgres database, then all you can answer is \"What is each user's zipcode right now.  But if the purchase was made 2 months ago, then you have no idea what zipcode that user lived in 2 months ago\".\r\n\r\nThis is often harder to implement and is typically done by an \"ingestion vendor\".  Fivetran has [\"History mode\"](https://fivetran.com/docs/getting-started/feature/history-mode) and Airbyte has [\"Incremental Sync - Deduped History\"](https://docs.airbyte.io/understanding-airbyte/connections/incremental-deduped-history).  They have slight differences between how they implement them, but they are very similar.  This is often implemented with \"Change Data Capture\", CDC, where each change to a database row means a new row in your scd table.  For example: Let's say a user changes their \"users.first\\_name\" 5 times in one minute, then that you would be 5 more rows in the \"users\" SCD table.\r\n\r\n\u0026amp;#x200B;\r\n\r\nNote: DataCoral [adds columns](https://docs.datacoral.com/tech_docs/timestamp_columns/) which are very similar to type2 scd, but are missing the \"end\\_at\" column which tells you when one row was superseded by another.  I haven't tried to use these columns as a type2 scd (todo \"point in time queries\", PIT), but I \\*think\\* you should be able to replicate the \"end\\_at\" PIT logic with a window function.\r\n\r\n\u0026amp;#x200B;\r\n\r\nOrthogonal: When talking about \"history\", it is also possible your datawarehouse has [\"time travel\"](https://docs.snowflake.com/en/user-guide/data-time-travel.html) support.  Snowflake allows you to pass in a specific timestamp to a query and the query will return results as the data looked at that time.  While this is interesting, I label it as orthogonal because it solves a different set of problems.  For the standard plan, time travel only goes back 1 day and for the enterprise plan it goes back 90 days.  And it's very likely any \"historical query\" will require you to look back further than 90 days, so timetravel isn't the right solution for that problem.\r\n\r\n\u0026amp;#x200B;\r\n\r\nHope this helps!","subreddit_name":"dataengineering","author":"tomhallett"},{"body":"I had good experience with this book: \r\n\r\nhttps://www.amazon.com/Agile-Data-Warehouse-Design-Collaborative/dp/0956817203\r\n\r\nIt’s also introducing some kimball ideas and basic star schema and snowflake schema stuff.","subreddit_name":"dataengineering","author":"OberstK"},{"body":"I had good experience with this book: \r\n\r\nhttps://www.amazon.com/Agile-Data-Warehouse-Design-Collaborative/dp/0956817203\r\n\r\nIt’s also introducing some kimball ideas and basic star schema and snowflake schema stuff.","subreddit_name":"dataengineering","author":"OberstK"},{"body":"Is this the one you mean? https://www.amazon.com/Agile-Data-Warehouse-Design-Collaborative/dp/0956817203","subreddit_name":"BusinessIntelligence","author":"SystemFixer"},{"body":"OLAP is all about analyzing data.\r\n\r\nOLAP is a set of operations you can do against a set of data, such as slicing, dicing, pivoting. \r\n\r\nYou don't need a database to do OLAP, for example, you can do OLAP operations against a flat file using MS Excel PivotTables.\r\n\r\nSome stackexchange answers here:\r\n\r\nhttp://dba.stackexchange.com/questions/45655/what-are-measures-and-dimensions-in-cubes\r\n\r\nhttp://stackoverflow.com/questions/18916682/data-warehouse-vs-olap-cube\r\n\r\nThis is a good book: http://www.amazon.com/Agile-Data-Warehouse-Design-Collaborative/dp/0956817203","subreddit_name":"Database","author":"el_chief"}]}