Complete SHAP tutorial for model explanation Part 4. TreeSHAP

4 min readJan 2, 2021

Build up TreeSHAP intuition and introduce Shapley Value calculation in a tree based model

Nazca lines download from https://vistapointe.net/

Part 1. Shapley Value
Part 2. Shapley Value as Feature Contribution
Part 3. KernelSHAP
Part 4. TreeSHAP
Part 5. Python Example

TreeSHAP is specialized in calculating feature contribution or feature Shapley value for a tree based model instance. In this part, we will build up the intuition of TreeSHAP.

In a big image, TreeSHAP is still for calculating Shapley value, so I think the best way to understand TreeSHAP is starting from basic Shapley value calculation and evolving to TreeSHAP, let’s starting evolving.

Adapt Shapley value calculation into tree model context

Below is an simple example to calculating Shapley value from Part.1, and I believe we all understand it

Let’s adapt some description of the example into Tree model context

Let’s change the human resource as model features, so we have 3 features which are Allan, Bob and Cindy
Assume the trained Tree model can calculate all coalition output profit (in RED BOX)

Then, based on above two conditions, we can calculate each feature Shapley value like above human resource example, right?
But one road blocker, in the Tree mode, how the Tree estimate the output profit values in RED BOX ? The tree suppose has 3 features input, how to handle the missing features?

TreeSHAP Prediction on Feature Missing Instance

For illustrate how TreeSHAP predict tree output for missing feature instances, I randomly construct the following tree structure for explanation purpose. The tree has 3 features (x,y,z) and total 100 training samples.

Here, my purpose is to show how to predict feature missing instance according TreeSHAP rules, so I may use some special value in coalition to demo:

1.For coalition (x=15), tree will go through Node 1,3 to 6, unfortunately node 6 is not leaf node, but node 6 has two leaf children(12,13), so the prediction is weighted average of the two leaf children, weight is based on node sample size proportion.

coalition(x=15) prediction =15*(10/30) + 10*(20/30) = 350/30

2.For coalition (x=5), tree will go through node 1 to 2, again node 2 is not leaf node, and node 2’s children(4,5) are not leaf nodes as well, so we need keep going deeper until to leaf nodes(8,9,10,11) and calculate weighted average of these leaf nodes

coalition(x=5) prediction =10*(15/60) + 20*(5/60) + 5*(30/60) + 8*(10/60)

3. For coalition (x=5, z=10), compare with coalition (x=5) which has nodes(8,9,10,11), node 8 is unreachable so should be removed because z=10 < 20.

coalition(x=5,z=10) prediction =20*(5/45) + 5*(30/45) + 8*(10/45)

Once we know how to calculate prediction for feature missing instance in a tree, we can calculate the marginal contribution, for example:
Marginal contribution z = prediction(x=5,z=10)-prediction(x=5)

Please reference the beginning human resource example and Part.1 and 2, I believe you will know how to put everything together to calculate Shapley values for each feature.

Special Note for ML Feature Contribution

Below is the common ML feature contribution formula(refer to Part.2), we can see the sum of all feature contributions is offset by the model average prediction E(f(X)).

From https://christophm.github.io/interpretable-ml-book/shapley.html

We always use φ0 to denote E(f(X)), and E(f(X)) is a constant value given a trained model and training dataset.

Because the offset, for any feature combination, the first feature’s marginal contribution is first feature’s prediction minus E(f(X)).

So, in summary, ML instance feature contribution actually contribute to the difference between instance prediction and the training dataset average prediction.
Also,again I want to emphasis Shapley values or feature contribution is based on per instance.

Conclusion

In this part, I aim to build up the intuition of TreeSHAP, and how to calculate feature contribution or Shapley value for a trained decision tree, for an ensemble tree model, we just need average the feature contribution or Shapley value across all the trees in the ensemble. The TreeSHAP theory is published in paper Consistent Individualized Feature Attribution for Tree
Ensembles, please refer to it for further details. Next Part.5 I will explore the SHAP python library and build some examples.

REFERENCES

Interpretable Machine Learning: https://christophm.github.io/interpretable-ml-book/shap.html
A Unified Approach to Interpreting Model Prediction: https://arxiv.org/abs/1705.07874
Consistent Individualized Feature Attribution for Tree
Ensembles: https://arxiv.org/abs/1802.03888
SHAP Part 3: Tree SHAP: https://medium.com/analytics-vidhya/shap-part-3-tree-shap-3af9bcd7cd9b
PyData Tel Aviv Meetup: SHAP Values for ML Explainability — Adi Watzman: https://www.youtube.com/watch?v=0yXtdkIL3Xk
The Science Behind InterpretML- SHAP: https://www.youtube.com/watch?v=-taOhqkiuIo
Game Theory (Stanford) — 7.3 — The Shapley Value : https://www.youtube.com/watch?v=P46RKjbO1nQ
Understanding SHAP for Interpretable Machine Learning: https://medium.com/ai-in-plain-english/understanding-shap-for-interpretable-machine-learning-35e8639d03db
Kernel SHAP:https://www.telesens.co/2020/09/17/kernel-shap/
Understanding the SHAP interpretation method: Kernel SHAP:https://data4thought.com/kernel_shap.html