I recently wrote a data model to answer some questions about retail media sales. In this post I describe the approach to data models for analytics. OLTP relational databases are highly normalized, simplifying data updates and reducing data redundancy. Providing business reports from an OLTP databae can be done, but need complex SQL queries to join and filter data.
With the goal to answer business questions in mind, the data can be modeled in a way that lends itself to that type of analytics. Put the data together around the measures that you want to query. Analytical models make it fast and efficient to aggregate data and provides a stable dataset for reporting. The measures we are interested in become the fact tables and everyting else is built around it. The relations associated with the measures become the dimension tables. This is the Star schema -- one fact table at the center and add a single join to each of the dimensions.
I chose a fact table that commonly lends itself to analytics questions, the sale of a single item. With this dataset, the invoice line item is the measure around which I want to analyze. The idea of a fact table defines the finest grain of detail and forms a base for what questions can be answered. The fact table holds foreign keys to data related to the sale of a single item, such as the purchase customer, the name of the product sold, the product category, and the date it was sold.
Data modeling is the most important skill to have. It requires practice planning and designing models to answer business questions. Poor data models can often hinder and entire project. It's not at all intuitive how to build them. However you need resources to teach how to overcome common issues you might face, yet there still can be major hurdles. Data modeling skills are a great asset that is valuable to the projects you are a part of.
A key idea in analytical modeling is a surrogate key to decouple the dataset from the source database by not using the primary key from the source and defining your own. Important when major changes come about in a system, as they do from time to time. Another key ideas is Slowly Changing Dimensions and the solutions to build in place to solve them. I can just say that SCD is the way you handle data when records are updated in a way that changes the reporting you designed for. An example, imagine what happened if a customer moved from Connecticut to Chicago and business reporting was interested regional data. Now their transaction history is incorrectly associated with Midwest instead of Northeast.
Preview Analytics tables