Getting started on decision trees

5 min readDec 19, 2019

how decision trees work,training a decision tree model,visualizing and making predictions

decision tree animation — hey there am a decision tree,nice to meet you :)

“What exactly are decision trees?”

A decision tree is a versatile machine learning algorithm that emulates a flowchart like structure containing branches that continuously splits data into smaller sets showing all possible outcomes of an event and can be used to make predictions.When you think of a decision tree this should come in mind.

how a decision tree looks like,the nodes are A,B,C,D,E and F

It is worth noting that decision trees are very powerful machine learning algorithms capable of fitting complex data sets. They can be used for regression and classification problems and they also form the fundamental components of Random forest.Lets dive in and see how this super algorithm works.

First we have to understand some basic terms that are used when referring to different types of nodes within a decision tree,these are: root node, internal/interior node and a leaf node. The root node is the top most node.An internal node represents an attribute or ‘test’(e.g. whether a dice roll comes out as 1,2,3,4,5 or 6), each branch represents the outcome of the test.A leaf node represents a label,an outcome after all attributes are computed.A leaf node has arrows pointing towards them but don’t have arrows pointing away from them.The diagram below will help you understand.

How decision trees work

Given tabulated data it would be interesting to understand how this algorithm formulates the tree that’s used to make decisions.Assuming we tabulated data on whether a person is to be classified as good or not good and a decision tree was to be fitted.Consider the table below.

Given a data set like this, the decision tree algorithm begins by figuring out which attribute should be the root node(node at the top of the tree) and it does this by computing the “gini impurity”. Gini impurity is simply a measurement of the likelihood of an incorrect classification of a random variable,if it were to be classified according to the distribution of class labels from the data set.Therefore this means the smaller the gini impurity value for a particular attribute the more suitable it is as the root node. Gini impurity formula is given by :

Gini impurity = 1 - (probability of “YES”)2 - (probability of “NO”) 2

Supposing we want to determine the root node ,we will look at how well the each attribute predicts the target variable(if a person is good or not) and gini impurity is used for this.Therefore we’ll begin by calculating the gini impurity for the attribute “Generous”.Consider the diagram below.

In the diagram above, the main concern is on how generosity separates people from being good or not good by considering a case study where 303 people in observed.

The gini impurity for the attribute “Generosity” is calculated by finding the gini impurity for the both leaf nodes separately then finding the weighted average of both as follows:

Gini impurity = 1-(probability of “TRUE”)2 -(probability of “FALSE”) 2

part 1

Gini impurity(TRUE) = 1 — ( “number of yes in True/ Total” )2 — (“number of No’s in True/Total”) 2

Gini impurity(TRUE) = 1 — ( “105/ 105+39” )2 — (“ 39/105+39”) 2 = 0.395

part 2

Gini impurity(False) = 1 — ( “number of yes in False/ Total” )2 — (“number of No’s in False/Total”) 2

Gini impurity(TRUE) = 1 — ( “34/ 34+125” )2 — (“ 125/34+125”) 2 = 0.336

weighted average will be:

weighted average = gini impurity for the True leaf node ( total number of people chose True in the study / total number of people that took part in the study,both that selected true and false) + gini impurity for the False leaf node (total number of people chose False in the study / total number of people that took part in the study,both that selected true and false)

to simplify it:

(144 /144 +159) 0.395 + (159 / 159 +144) 0.336 = 0.364

therefore this means that the gini impurity for the attribute “Generosity ”is 0.364.

Below are the gini impurities for the other attributes,considering the study,

the attribute “kind” yielded a gini impurity of 0.360

the attribute “ slow to anger” yielded a gini impurity of 0.381

“the lower the gini impurity value the more suitable that attribute is as the root node” therefore the attribute“ Kind ” is the most suitable root node having a value of 0.360 .

It is worth noting that the total number of people considered good from the study is different in each attributes that is; generous,kind and slow to anger and this is because some people were considered generous but not kind, others were considered kind but not slow to anger etc.

Looking at the trees that we arrived at from the study,you realize that no attribute yielded 100% ‘YES’ good person and 100% not good person,therefore all this attributes are considered “impure”.

It is also important to understand how the algorithm determines the subsequent nodes.

Having the root node as the attribute kind,the algorithm goes ahead to calculated the gini impurity value of the other two attributes and the one with the lower value is considered as the attribute to split into more nodes.In this case the attribute Generous has a lower gini impurity value and therefore it replaces its values on that specific part of the tree.

This process goes on and until all the data mapped on the tree and that is how a decision tree works as it tries to the tree to show all possible outcomes.It is not complex,just that its repetitive.

Code part

For the code part i will use an example of the common iris data set to show to train a basic decision tree model ,make predictions as well visualizing the decision tree itself.All this is in python.

The above code can be found on my git hub , here’s the link:

https://github.com/keithmartinkinyua/Getting-started-on-Decision-trees.

I hope this article was helpful,next,i’m gonna explore more on the hyper-parameters and fine tuning of decision trees.