HUGIML Benchmark Analysis

A 40-dataset tabular classification benchmark comparing HUGIML with tuned XGBoost, LightGBM, and RandomForest baselines. Budgeted tree variants are constrained by learned model complexity, measured as terminal leaves across the fitted ensemble. HUGIML complexity is measured as selected patterns.

Primary metric: AUC Datasets: 40 Models: 7 Friedman p-value: 0.0021 Tree budget: ≤300 leaves HUGIML topK: 30 / 50 / 100
Best mean AUC
0.9388
RandomForest complexity-budgeted
HUGIML mean AUC
0.9362
mean complexity: 32.8 patterns
Best mean rank
2.67
RandomForest standard
Global rank test
0.0021
Significant global rank differences among models
Pairwise vs HUGIML
No significant difference
after Holm correction
Budget violations
0
budgeted tree fits exceeded 300 leaves

Mean AUC by model

Standard-deviation intervals shown

AUC versus model complexity

Leaves for tree ensembles; selected patterns for HUGIML

Benchmark interpretation

Statistical findings
Global comparison: Significant global rank differences among models. The Friedman test is a global rank test and does not identify which model pairs differ.
HUGIML-specific comparison: No baseline differs significantly from HUGIML after Holm correction. This statement applies only to comparisons against HUGIML.

Complexity-budget check

Budgeted tree variants checked after fitting

Interactive dataset explorer

Filter datasets, toggle models, and inspect metrics, complexity, and tuned settings

Filtered dataset table

Metric values and winners

Selected dataset: tuned settings

Includes fitted model complexity

HUGIML feature mode

Best selected mode

HUGIML depth and topK

Selected configuration frequencies

HUGIML AUC gap

HUGIML minus best non-HUGIML baseline

Pairwise tests versus HUGIML

Paired Wilcoxon tests; Holm correction within this family

Overall summary

AUC, ranks, wins, and model complexity

Significant all-pair comparisons

Real model-pair differences after Holm correction across all 21 tests

These rows are statistically significant pairwise differences among the baseline/model set after Holm correction. They are separate from the HUGIML-specific table above, where no individual baseline differs significantly from HUGIML after correction.