Data Characteristics Tool
The data characteristics tool is used by the Meta Miner to extract several types of data characteristics:
- statistical measures: number of instances, number of classes, proportion of missing values, proportion of continuous / categorical features, noise signal ratio.
- information-theoretic measures: class entropy, mutual information [1].
- geometrical and topological measures: non-linearity, volume of overlap region, maximum fisher’s discriminant ratio, fraction of instance on class boundary, ratio of average intra/inter class nearest neighbour distance [2].
- model-based measures: error rates and pairwise 1 − p values obtained by landmarkers such as 1NN or DecisionStump [3], and histogram weights learned by Relief or SVM.
[1] http://www.metal-kdd.org
[2] Ho, T. K., & Basu, M. (2006). Data complexity in pattern recognition. Springer.
[3] Pfahringer, B., Bensusan, H., & Giraud-Carrier., C. (2000). Meta-learning by landmarking various learning algorithms. Proc. 17th International Conference on Machine Learning, 743–750.