MML Inference of Decision Trees, Graphs and Forests

Tan, Peter J.

doi:10.4225/03/58ca16c01f581

20170309-Tan-Thesis.pdf (1.64 MB)

MML Inference of Decision Trees, Graphs and Forests

thesis

posted on 2017-03-16, 04:38 authored by Peter J. Tan

The Minimum Message Length (MML) principle is a general inductive inference framework which recasts inductive inference as a coding process. Using assumptions identical to those from the well-known Bayesian inference, prediction and modelling, MML represents general inductive and statistical inference problems as a data encoding process which conveys the data to a receiver in a two-part message. The first part is the message to encode the inferred model. The second part is the message to encode the data in light of the inferred model. MML states that the model with the shortest two-part message is the best model approximating the true model.

   Decision trees, decision graphs and decision forests are popular supervised learning methods in machine learning. In this dissertation, the MML principle is applied to build machine learning schemes for decision trees, decision graphs and decision forests. Two novel MML inference schemes are developed. One is a MML coding scheme for Oblique decision trees, which are decision trees with linear discriminate functions at their internal nodes. Another is a MML coding scheme for decision graphs with multi-way joins and dynamic attributes. A decision forests learning scheme based on MML oblique decision trees is also presented.

   Experiments were conducted across a range of problems using data from University of California Machine Learning Repository and the Singapore Data Mining Centre. These experiments showed that compared to other popular decision tree models such C4.5 and C5, models generated by MML inference schemes achieved favourable results in both classification and probabilistic predictive accuracy. The study showed that MML inference schemes are able to find the optimal trade-off between the complexity of these structure models and goodness-of-fit for a given set of data.

History

Campus location

Australia

Principal supervisor

David Dowe

Additional supervisor 1

Trevor Dix

Year of Award

2017

Department, School or Centre

Information Technology (Monash University Clayton)

Course

Doctor of Philosophy

Degree Type

DOCTORATE

Faculty

Faculty of Information Technology

Usage metrics

Keywords

Minimum message length Bayesian inference Supervised learning Decision trees Decision graphs Decision forests

Licence

CC BY 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

MML Inference of Decision Trees, Graphs and Forests

History

Campus location

Principal supervisor

Additional supervisor 1

Year of Award

Department, School or Centre

Course

Degree Type

Faculty

Usage metrics

Categories

Keywords

Licence

Exports