White
Paper: Using Decision Trees
This short tutorial illustrates why decision trees can offer a more
practical way of capturing knowledge than coding rules in more conventional
languages. Note that screen shots are taken from the earlier XpertRule
KBS (Knowledge Based System) although the same principles apply
to XpertRule Knowledge Builder.
The Company's accounts department wants to build a simple decision
making system that captures the logic by which they pass or reject
claims for hotel expenses from their employees. These decisions
are based on things like the Grade of the employee (Director,
Senior Manager or Junior Manager) and the type of Hotel
they stayed in (a quality rating of A, B, or C). They want to automate
the processing of claims so that , for example, when a claim is
sent in by an employee with a Grade of Senior Manager
and the Hotel stayed in was type A then the system
would Reject the claim.
Step one might be to assign an expert from the accounts department
to handle the project. The department gives some thought to writing
the rules in COBOL or C, as the suggested syntax of IF condition
AND condition THEN outcome would be quite easy to code in computer
languages such as these. Another option considered is why not keep
these rules in a database? They could use an industry standard database,
have an index of rules, sort the rules by topic and easily do "find
& replace" changes. Their confidence in the expert being
able to author the rules is high. Let's imagine that is just what
the expert started to do for their expenses claims rule base.
Here are the expert's first attempt at hand crafted rules. Are
they ready to be coded by a programmer or put into a database? Are
they good quality rules? Do you see anything wrong?
IF Grade = Director AND Sex = Male THEN Pass
IF Grade = Director AND Sex = Female THEN Pass
IF Grade = Senior_Manager AND Hotel = A THEN Reject
IF Grade = Senior_Manager AND Hotel = B AND Sex = Male THEN Pass
IF Grade = Junior_Manager AND Hotel = A AND Sex = Female THEN Reject
IF Grade = Senior_Manager AND Hotel = C THEN Pass
IF Grade = Senior_Manager AND Hotel = B AND Sex = Male THEN Reject
IF Grade = Junior_Manager AND Hotel = B THEN Reject
IF Grade = Junior_Manager AND Hotel = B THEN Reject
IF Grade = Junior_Manager AND Hotel = C THEN Pass
IF Grade = Junior_Manager AND Hotel = A THEN Reject
There are only eleven rules. How long did it take you to find the
expert's errors? Its not too difficult given some time.
Let's put these rules into a table style format, as shown below.
We can enter "decision tables" just like this into XpertRule
KBS. We have put the factors (attributes) as the column heading
and can "read off" our eleven rules like this, starting
from line 1:
If Grade = director (* means just ignore whatever Hotel is) and
Sex = female then pass.
 |
We have our same eleven rules entered in the expert's same
order.
|
This could be a table in a database. In a database, even with
more rules, sorting and sub sorting could have identified the fact
that we appear to have two rules like this (the highlighted 4 and
7) where the values are the same, but the outcome is contradictory.
One of these two rules is wrong or perhaps there is some other factor
(attribute) that the expert needs to take into account and modify
the rule base. But, would such sorting or visual searching be very
practical in real life? A bit of expertise in software
might help our expert. Using XpertRule, a few mouse-clicks would
automatically generate a decision tree from our table of rules,
as shown below:
 |
We can "read off" the actual rule set by following
the tree branching from the top to each Pass/Reject outcome.
If you do this you will see that our expert 's knowledge so
far only really consists of the six rules shown below. |
IF Grade = Director THEN Pass
IF Grade = Senior_Manager AND Hotel = A THEN Reject
IF Grade = Senior_Manager AND Hotel = B THEN (Clash)Pass/Reject
IF Grade = Senior_Manager AND Hotel = C THEN Pass
IF Grade = Junior_Manager AND Hotel = A or B THEN Reject
IF Grade = Junior_Manager AND Hotel = C THEN Pass
The last rule IF Grade = Junior_Manager AND Hotel = A or B
THEN Reject is just shorthand that the expert could have used
anyway, and reduced the number of rules in the first place. The
main point is not the reduction of rules (although this invariably
happens in real life) but that we have been able to validate the
QUALITY of the rules. If we scale this up to the real world and
have more factors (attributes) and more rules then the problem of
validation becomes very real. Bringing in another expert who is
better at hand crafting rules is also a problem of validation. How
do you test the expert's ability?
To build a successful and quality knowledge base our domain experts
must have some help to:
- Identify gaps in the logic:
Lists of rules don't illustrate these at all well, but decision
trees are very good at avoiding gaps in your logic.
- Identify conflicts: The
discovery of the (Clash) by automatic tree building illustrates
this. If the tree had been created manually by the expert this
error would also not have been made.
- Identify missing factors:
This overlaps the previous point. The (Clash) suggests some missing
factor to our expert, who might need to make the tree branch further
at this point.
- Identify redundant factors and rules:
Our expert has no sex in his rule base! The expert should not
be discriminating against people like this of course, but thankfully,
sex has been shown to be redundant. We will sack this expert anyway
for even thinking such a thing!
We mentioned using OR conditions before. When we start using more
complex syntax like this in more conventional rule statements, such
as within programming languages, these statements will get harder
to follow. Individually they may be quite simple to read, but they
get much harder to manually validate.
Here is a hand crafted decision tree that illustrates the rich
variety of expressions that you can use. Its not the simple If
A = B then C logic that we used above, but its still just as
simple to understand, simply because of the graphical structure.
 |
In trees like this we can use:
- Numeric/date compare
- Compare attributes
- Group values
- 'Otherwise' splits
- Multiple (list) outcomes & attributes
|
Structuring Knowledge
In real world systems with hundreds or thousands of rules the expert
needs to be able to break down the knowledge into manageable units.
Just as you don't want a big bag of rules, you also don't want the
mother of all decision trees. The simple task structure below
illustrates a typical "backward chain", where the attribute
Hotel is now also a decision making "task" with its own
decision tree (i.e. its own rule set). In order to evaluate the Hotel
attribute, the main tree would call the Hotel tree to decide on the
value A, B, or C and then return it back to the higher level decision
tree so that it can continue.

This also allows a variety of techniques for knowledge capture
to be used for each individual task. The decision making tasks make
the components understandable, both as individual units and to show
the hierarchy of the knowledge.
| A task Map enables you to review and navigate
around your application, as shown by this illustration of an
application for corporate loans. You can simply zoom in and
out of the tasks. |
 |
See also this extract from an article by Rockwell on Expert
Systems Vs. Procedural Language Development from PCAI magazine
which is also relevant.
|