DAX Optimizations: Write it like the DAX calls it

March 5, 2019 at 10:01 am

I don’t’ understand how having data in the model that you are not using affects performance. If it is not used in a calculation how can it slow things down. I get that it takes up space (memory)

The hardest thing i contend with is high caridnality. I have to have TRANSACTION ID. Distinct counts are killing me.
The only solution I have (am testing) is splitting the FACT tables and linking them on primary key which enables me to countrows.
But you take a hit on the split too.
Initial testing is splitting is still faster

March 12, 2019 at 7:49 am

Hello Fred. I certainly feel your pain–I’ve had transaction-level requirements too. You can try turning your transaction ID into data type number (if you haven’t already done so), that may help. Number data types put less strain on the compression engine than text. Something else you’ve probably thought of already is time scoping to reduce the number of rows, so that your solution matches the requirement exactly. And, a final suggestion is grouping techniques on some other attribute. For example, if counts of transactions are what you need, maybe you could live with counts by account? Just a couple suggestions. Good luck!

May 14, 2019 at 3:11 am

Hi Fred (and Matthew)
I totally agree with Matthew that bringing unneeded data into your model might have impact on performance. In this case I just thought of another example that none of you covered. It is very intuitive to think that if the data is not used how could it ever slow down performance?. Let me give a great example of this:

If you take in a column in one table with high cardinality that you dont need you can run into one particular issue. Whenever you write a DAX measure the function FILTER(tablename, expression) is a very common table to use. If you put your table into FILTER that has the column with high cardinality and make some expression over that table for each row – which columns do you think are beeing filtered by FILTER? The columns referenced in your expression parameter or the entire table with all the columns?. The answer is indeed the last one unless you use other table functions as eg. VALUES as the table parameter. Filtering just your table puts a filter on all the columns of that table including your high-cardinality column and that might have an impact on performance. It is a small detail but its often that the devil is hiding there.

Wouldnt you agree, Matthew?

March 5, 2019 at 10:16 am

This was extremely helpful. As I learn to build more and more complicated formulas, and my Power BI reports begin to slow down, I have always wondered about the “calculational overhead” effects of using them.

March 12, 2019 at 7:50 am

Thanks, Bill. Appreciate the comment!

March 5, 2019 at 12:56 pm

Hello Matt, great article! I am pondering the first section: “Cardinality- Put simply, the more unique choices in a column, the greater the cardinality. For example, if there is a “yes/no” attribute, the cardinality of that column is low. If there is a unique transaction number for each of 100 million rows, cardinality is high.” and wondering if that applies to a data model that I have. Table A is related to Table B. In Table A, I have an added column, =RELATED which is delivering the value from the related row of Table B. Would you classify this as HIGH cardinality? What about =COUNTROWS(RELATEDTABLE)) also HIGH? I use these all the time and never really thought much about it…

March 12, 2019 at 7:55 am

Hi John. Thanks for your comment and nice words! In general, I stay far away from RELATED–it *could* increase cardinality depending on what you’re relating, but the strain on the calculation engine is most concerning for me. When I feel the need to use RELATED, I usually go for a snowflake rather than star schema data model (so I have dimensions of dimensions). My article on income statement modeling touches on this. When you get into the section about header and subheader tables, these are examples of tables that would be typical RELATED candidates.

March 5, 2019 at 10:05 pm

Nice article Matt! Here’s a slightly cleaner DAX trick for you.

IN can be used instead of many || statements:

– CALCULATE (
[GL Amt (Correct Signs) Act],
Headers[Header] IN
{“Cost of Sales”
,“Other Income & Expense”
,“Depreciation & Amortization”
,“Interest Income/Expense”
,“Taxes”}
)

just wrap the list of arguments in curly braces!

SQL article here: https://www.sqlbi.com/articles/the-in-operator-in-dax/

March 11, 2019 at 7:54 pm

You what!!! Where has this been hiding? Thanks @Chris H

March 12, 2019 at 7:58 am

Hi Chris. Your comment was a game changer for me. I have a model that was completely OR’d out. I haven’t tested if there was a performance improvement, but it certainly made my formulas more readable. I’m training people up using this particular model too and the ORs can make your eyes cross. Thanks again! Much appreciated.

March 11, 2019 at 11:26 am

I love the straightforward approach in this article! Thank you for a clean and concise case study in DAX optimization.

March 12, 2019 at 7:58 am

Thanks for the comment, Jeff. Much appreciated.

March 11, 2019 at 4:51 pm

Great post, thanks!

March 12, 2019 at 7:59 am

Thanks for the comment, Maxim. Much appreciated!

March 17, 2019 at 8:08 am

Hi Matthew,
Quick question – on the performance part, I assume you have done it clearing the cache before each execution, right? (I would do so…)
I’m asking because of the following concern: caching. If I have a full income statement, showing every step, the single elements of [GP Act] – [OI&E Act] – [D&A Act] – [II&E Act] – [Taxes Act] would be calculated, and if I’m not mistaken, put in the cache. Afterwards, NI Act would directly use the cache, making the calculation pretty much instantaneous, instead of having to re-evaluate the formula.
Am I right in thinking that in the scenario that I have described, using your approach would be less efficient? (in real world use, with caching – even if it is the first time the model is loaded) ?
Thanks a lot
Martin

March 22, 2019 at 10:30 am

Can you elaborate on caching……if i open a report for ALL YEARS with high cardinatity distinct column, it will take 60 secoonds.
Is the caching shared for the next user that uses it? Or is caching only for the session of the person?
When does the cache go away? when a new power bi report is saved over the original? when deploying a change from visual studio to cube? when the cube is refreshed?

If cache can be shared, is there a way to run a report with out user intervention so the when users get to work they can take advantage of it??

April 1, 2019 at 11:01 pm

No performance increase (or decrease) using IN. Not sure if it’s “syntax sugar”, but it sure looks prettier…

April 2, 2019 at 5:12 am

Hi – I have a similar case (high cardinality column). I can answer only on the SSAS case: caching as far as I know is shared, as long as there is enough memory allocated to SSAS, so the second time I, or another user, does the query it runs fast. After some time, or if there are many other heavy queries that need memory, that first cache can be flushed out (to make space for new data)
Going away: On SSAS, it certainly goes away on table / partition processing, and on modifications of the model (including deploys).
Running a report in advance / warming the cache for a specific query…. hmmm… maybe you can capture the DAX queries requested by your report, and throw them in advance after you refresh your cube?

DAX Optimizations: Write it like the DAX calls it

Basic Tenets of Optimization

Cardinality, filtering, and VAR

Cardinality

Filtering

VAR

DAX Optimization Mantra: Write It Like the DAX Calls It

Where It’s At: The Intersection of Biz, Human, and Tech*

Cancel reply