Steps to Analyse Data Model

Sue Bayes

February 7, 2022

Understanding your Power BI model.

If you are interested in understanding further how a Power BI datamodel works and consequently how to improve your model, I highly recommend SQLBI Optimising your DAX .  Sometimes you can have reports that are slow and need to work more efficiently.  In order to improve their performance and understand why they are slow these are some of the steps that I follow:

The very first thing I do, is to look at the data model to understand some of the following.  What type of model is it?  How are the relationships configured?  How many fields are there in each table?  Is it a Star Schema?  If it’s not, it should be.  A Star Schema at a normalised level makes DAX easier.  The importance of the correct datamodel is critical to the running of your report.  This introduction to Data modeling course is free and so incredibly helpful.

I also want to understand how the tables and columns relate to the size of the model.  You can do this by opening the Power BI desktop file and connect to DAX Studio using External Tools.  External Tools are a useful way to have additional development capabilities as a Power BI Developer.  PowerBI.Tips are your best site if you want to see the different external tools on offer and install them.

When in DAX Studio, click on Advanced Tab and View Metrics, Vertipaq Metrics is usually default tab and you can iterate through the different tabs at the top to see the size of tables and columns as a starting point.

Look at the columns and check to see if they are being used in the model.  If they are not being used take them out of the model.  It is best practise to ensure that only those columns that have a purpose are in the model but if you start with the biggest columns, that will have the quickest results.  This tool by the Biccountant is very helpful in identifying what is being used within your model.  There is a YouTube session by ImkeFeldman with Reid Havens where the use of the tool is demonstrated.

Also look at the structure of your biggest columns.  Could you store them as two separate columns using the same principle as Date and Time.   For example, if you split Date and Time into two columns then you have a smaller number of unique values within date, and a smaller number of unique values within time.  Vertipaq is a column based database and uses compression to reduce the size of the columns and Run Length Coding to determine how to store the data.  By having a smaller cardinality of data, you can reduce the memory size and make scanning of the column quicker.

Understanding how the engines work.

There are two engines that are used within Power BI – the Formula Engine and the Storage Engine (Vertipaq).  The Formula Engine runs more complicated expressions within DAX but is only single threaded which means it can only do one thing at a time.

The Storage Engine carries out the more simple expressions but is multi-threaded.  This means it can use multiple cores to do many things at the same time.  Queries that are run within the Storage Engine can also be stored as part of the cache, which means that the engine doesn’t need to repeat that part of the query if you rerun the measure.  This makes it quicker.

DAX is not cost based like SQL server which means that the number of rows “generally” doesn’t have an impact upon performance.  However, if the DAX measure is not great in a smaller model it will can be amplified in a larger model.  It is best practise to use a smaller model to troubleshoot.

You can use DAX Studio to analyse how long it takes for expressions to take to run by opening DAX Studio from External Tools, turning on Query Plan and Server Timings, Create a query and then Run with Clear cache to give a set of results.

NB: If you do not have a dedicated server, then your workstation is the best option.  

If the query below is under 16ms it isn’t necessary measuring the query but more the process.  Any measure that is worth analysing and optimising, will be greater than 16ms.

 

Why the data size of your column matters.

Vertipaq sorts the data and then compresses it according to an algorithm.  Vertipaq considers two factors when compressing the data – the distribution of the data and the number of unique values.  So for date and time it is likely that the engine will sort date first and then compress it which will give a smaller data size.  Time will then be sorted and compressed after date.  You might think it would be more efficient to compress time first and then date, but in most cases you are more likely to iterate over the date column rather than the time column so it makes sense to sort the date first as it will be used more often.  The reality is that you don’t need to worry about this, the engine does a good job of finding the most efficient sort order.

Therefore if you have a column that you are iterating over many times that has a high data size, the query will be longer.  Your queries will also be a lot slower if you iterate over a table rather than over specific columns within the table.  It is best practise to write your queries specifiying only what data you need for that query, ie, fields within a table rather than a whole table, particularly if it’s a fact table.

DAX Studio allows you to measure the Query Plan and the Server Timings.  The Query Plan enables you to understand the logical executions that the Query follows and is then converted into a physical query plan that shows the actual executions.  It’s complex but in my mind, the logical query is the map that you will follow, the physical is the actual route that you took.  For a more detailed and technical explanation, this was useful reading by @Alberto Ferrari DAX Query Plans

DAX Studio will show you the Steps that the Vertipaq engine takes to evaluate the Query.

When you have carried out the first results from Server Timings, save them in Excel.  

Look at your steps to see where the time is being taken up  and analyse the query steps.

All the above queries are run in the storage engine but you need to consider when analysing your queries, are they in the storage engine or the formula engine? Is there any materialisation happening?

Materialisation is when the engine has to create more rows than the results – this can happen with an iteration within an iteration which will run context transition within each row, or when the engine has to call and join more columns than is required for the result.  You can look at the Server Timings or Query Plan for each step to understand the number of rows generated and why.  A key optimisation technique is to split your filter conditions into separate ones using calculate rather than filtering the table using Filter and AND/Or.

You also need to check if the formula engine has to keep calling data from the storage engine (callback ID) – usually as a result of an IF statement.  

Consider the granularity of the data, can you reduce the number of iterations that the evaluation has to take. 

Look for standard optimisation patterns

My take-outs from the course are that:

1. Have a star schema (stay away from bi-directional filtering unless you really understand what it does).

2. Only bring in what you need from the source and push your transformations as far up-stream as possible.

3. Be efficient with your datatypes and your columns.

4. Optimise your DAX by understanding how the storage engine and formula engine works.

5. Apply filters on columns not tables.

6. Stop when it is good enough.

Any questions, happy to help, but if you’ve got this far, and haven’t completed the course, I would highly recommend it.  Link below.

Sue 

LinkedIn
Twitter
Facebook

February 7, 2022