HUGIML Benchmark Analysis Dashboard

Cross-validation protocol

Dataset panel: 100 binary-classification tasks (50 real-world and 50 synthetic).
Rows per dataset: all available rows.
Outer evaluation: 5-fold StratifiedKFold with shuffle=True and random_state=42.
Inner selection: 3-fold StratifiedKFold within every outer-training partition, with shuffle=True and random_state=42.
Selection metric: mean inner-fold ROC-AUC. The selected candidate is refitted on the complete outer-training partition and evaluated once on the untouched outer-test partition.
Reported dataset AUC: arithmetic mean of the 5 outer-fold ROC-AUC values.
Statistical comparisons use dataset-level outer-CV aggregates, Friedman ranking, paired Wilcoxon tests, and Holm adjustment where shown.

Preprocessing and timing

HUGIML receives the native pandas table, including categorical columns, and performs its own binning and pattern construction inside each training fold.
Non-HUGIML models use a fold-local pipeline: numeric median imputation; categorical most-frequent imputation followed by dense one-hot encoding with unknown categories ignored.
Source dataframe indexes are discarded before modelling; statsmodels-backed tasks use only columns declared in load_pandas().data.
Preprocessing is fitted only on the relevant training partition. No test-fold information is used for preprocessing or parameter selection.
fit_seconds measures the selected estimator's final outer-fold refit; tune_seconds measures inner-CV search; predict_seconds measures outer-test inference.

Model search spaces

HUGIML — Augmented pair path

Search space: performance_ho · 16 candidate configurations

Parameter	Values considered
B	-1
adaptive_binning	True
L	1, 2
topK	50, 100
feature_mode	original_plus_patterns
G	0.01, 0.001
convert_binary_to_categorical	False
augmented_pair_transforms	True
topk_budget_strict	False
base_estimator	Logistic regression (L1 penalty, liblinear solver, C=1.0), OneVsRestClassifier(RPTE; leaf_config=3xD, depth=4, adaptive lookahead)

Constant settings: execution_mode=production; n_jobs=1; Linear base estimator: logistic regression with L1 penalty, liblinear solver, C=1.0, random_state=0, and max_iter=500.; Inner cross-validation selects between the stated logistic-regression base estimator and one-vs-rest RPTE.; Numeric 0/1 columns remain numeric and eligible for augmented-pair transforms.; RPTE uses leaf indicators together with downstream inputs not selected in accepted tree splits.; RPTE lookahead is adaptive for this path.

Complexity definition: Model inspection units represent the complete fitted HUGIML model that a reviewer must inspect. Linear branches count active source contributions; RPTE branches count conditions across active terminal paths plus direct terms. Intercepts are excluded, and fitted numeric components are active when their absolute value exceeds 1e-12.

HUGIML — Interaction-relaxed path

Search space: interpretability_ho · 16 candidate configurations

Parameter	Values considered
B	-1
adaptive_binning	True
L	1, 2
topK	50, 100
feature_mode	patterns_only
G	0.01, 0.001
interaction_relaxed_mining	True
augmented_pair_transforms	False
convert_binary_to_categorical	True
topk_budget_strict	False
base_estimator	Logistic regression (L1 penalty, liblinear solver, C=1.0), OneVsRestClassifier(RPTE; leaf_config=3xD, depth=4, adaptive lookahead)

Constant settings: execution_mode=production; n_jobs=1; Linear base estimator: logistic regression with L1 penalty, liblinear solver, C=1.0, random_state=0, and max_iter=500.; Inner cross-validation selects between the stated logistic-regression base estimator and one-vs-rest RPTE.; Numeric 0/1 indicators are treated categorically for the pattern-mining surface.; Interaction relaxation is performed by the pattern miner; augmented pairs are disabled.; RPTE uses the sequential backend with lookahead inactive for this path.

XGB standard

Search space: XGBoost · 16 candidate configurations

Parameter	Values considered
n_estimators	100, 200
max_depth	3, 4
learning_rate	0.03, 0.1
min_child_weight	1, 5

Constant settings: eval_metric=logloss; verbosity=0; n_jobs=1; random_state=42; Unspecified estimator settings use the library defaults.

Complexity definition: Model inspection units sum the conditions across complete root-to-leaf paths with active terminal outputs.

LightGBM standard

Search space: LightGBM · 16 candidate configurations

Parameter	Values considered
n_estimators	100, 200
learning_rate	0.03, 0.1
num_leaves	15, 31
min_child_samples	10, 20

Constant settings: verbose=-1; n_jobs=1; random_state=42; Unspecified estimator settings use the library defaults.

Complexity definition: Model inspection units sum the conditions across complete root-to-leaf paths with active terminal outputs.

RandomForest standard

Search space: RandomForest · 16 candidate configurations

Parameter	Values considered
n_estimators	200, 400
max_depth	4, 8
min_samples_leaf	1, 5
max_features	sqrt, 0.5

Constant settings: n_jobs=1; random_state=42; Unspecified estimator settings use the library defaults.

Complexity definition: Model inspection units sum the conditions across every complete root-to-leaf path in the fitted forest.

XGB complexity-budgeted

Search space: XGBoost · 16 candidate configurations

Parameter	Values considered
n_estimators	25, 50
max_depth	1, 2
learning_rate	0.03, 0.1
min_child_weight	1, 5

Constant settings: eval_metric=logloss; verbosity=0; n_jobs=1; random_state=42; Complexity budget=200; candidates within the budget are preferred during inner-CV selection.; Unspecified estimator settings use the library defaults.

Complexity definition: Model inspection units sum the conditions across complete root-to-leaf paths with active terminal outputs.

LightGBM complexity-budgeted

Search space: LightGBM · 16 candidate configurations

Parameter	Values considered
n_estimators	25, 50
num_leaves	2, 4
learning_rate	0.03, 0.1
min_child_samples	10, 20

Constant settings: verbose=-1; n_jobs=1; random_state=42; Complexity budget=200; candidates within the budget are preferred during inner-CV selection.; Unspecified estimator settings use the library defaults.

Complexity definition: Model inspection units sum the conditions across complete root-to-leaf paths with active terminal outputs.

RandomForest complexity-budgeted

Search space: RandomForest · 16 candidate configurations

Parameter	Values considered
n_estimators	25, 50
max_leaf_nodes	2, 4
min_samples_leaf	1, 5
max_features	sqrt, 0.5
max_depth	None

Constant settings: n_jobs=1; random_state=42; Complexity budget=200; candidates within the budget are preferred during inner-CV selection.; Unspecified estimator settings use the library defaults.

Complexity definition: Model inspection units sum the conditions across every complete root-to-leaf path in the fitted forest.

EBM

Search space: EBM · 8 candidate configurations

Parameter	Values considered
learning_rate	0.01, 0.05
max_bins	32, 64
interactions	0, 5
max_rounds	500

Constant settings: outer_bags=4; max_rounds=500; n_jobs=1; random_state=42; Unspecified estimator settings use the library defaults.

Complexity definition: Model inspection units count finite nonzero term-score cells.

RuleFit

Search space: RuleFit · 8 candidate configurations

Parameter	Values considered
n_estimators	50, 100
max_rules	50, 100
tree_size	5, 10

Constant settings: alpha=None; random_state=42; Unspecified estimator settings use the library defaults.

Complexity definition: Model inspection units count active linear terms and every condition in each active rule.

HUGIML Benchmark Analysis

Mean AUC by model

AUC versus model complexity

Benchmark interpretation

Complexity-budget check

Interactive dataset explorer

Filtered dataset table

Selected dataset: tuned settings

HUGIML depth and topK

HUGIML AUC gap

Pairwise tests versus HUGIML

Overall summary

Significant all-pair comparisons

Summary results by dataset scope

Cross-validation protocol

Preprocessing and timing

Model search spaces

HUGIML — Augmented pair path

HUGIML — Interaction-relaxed path

XGB standard

LightGBM standard

RandomForest standard

XGB complexity-budgeted

LightGBM complexity-budgeted

RandomForest complexity-budgeted

EBM

RuleFit