Orig+Pat fit time
Pat Only fit time
AUC comparison
Mem Δ comparison

Fit time vs n (log–log, 4 threads)

Orig+PatPat OnlyXGBoostLightGBM

Test AUC vs n

Orig+PatPat OnlyXGBoostLightGBM

n-Scaling

Fit time vs n (log–log)

Orig+PatPat OnlyXGBoostLightGBM

Test AUC vs n

Orig+PatPat OnlyXGBoostLightGBM

Patterns mined vs n

nOrig+Pat fitPat Only fitXGBoost fitLightGBM fitO+P AUCP-Only AUCXGB AUCLGB AUC

p-Scaling

Fit time vs p

Orig+PatPat OnlyXGBoostLightGBM

Fit time ratio vs XGBoost (<1.0 = faster than XGB)

Orig+Pat/XGBPat Only/XGBLightGBM/XGB

Test AUC vs p

Orig+PatPat OnlyXGBoostLightGBM
pnOrig+Pat fitPat Only fitXGBoost fitLightGBM fitO+P AUCP-Only AUCXGB AUCLGB AUC

Memory

Fit Δ memory: peak RSS during fit minus RSS after data load. Reflects additional working memory allocated during training only. All runs at 4 threads.

Fit Δ memory (MB) vs n

Orig+PatPat OnlyXGBoostLightGBM

Fit Δ memory (MB) vs p

Orig+PatPat OnlyXGBoostLightGBM
nOrig+Pat (MB)Pat Only (MB)XGBoost (MB)LightGBM (MB)
pnOrig+Pat (MB)Pat Only (MB)XGBoost (MB)LightGBM (MB)O+P / XGBP-Only / XGB

Parameter Sweep

Sensitivity of fit time, AUC, and pattern count to individual HUGIML hyperparameters. Orig+Pat mode unless noted. Baseline signal dataset, 4 threads.

Methodology

Benchmark setup, dataset descriptions, and model configurations.

Environment

HUGIMLv1.1.9 (C++ pybind11)
Python3.13.5
XGBoost50 trees · max_depth=4
LightGBM50 trees · max_depth=4
Threads4 (n_jobs=4)
Train/test split75% / 25%, stratified, seed=42
MetricTest-set ROC AUC
MemoryPeak RSS − RSS after data load

Datasets

baseline_signal: float32 features, nonlinear signal in 8 of p features (rest noise). p=20 fixed for n-scaling; p varied 20–3000 for p-scaling at moderate n. n up to 3,000,000.

threshold_grid: Uniform(−1,1) features, up to 96 threshold terms and interaction terms, labels median-binarised. p=200 fixed for n-scaling; p varied 20–3000 for p-scaling. n up to 500,000.

HUGIML feature modes

■ original_plus_patterns (strict): feature_mode='original_plus_patterns', topk_budget_strict=True. Downstream matrix = n × (original features + mined patterns), total capped at topK=50 columns. Higher accuracy, higher memory and fit cost due to wider downstream matrix.
■ patterns_only: feature_mode='patterns_only', topk_budget_strict=False. Downstream matrix = n × mined patterns only. Lower accuracy, but 3–5× faster fit and substantially lower memory at large p, since the matrix width equals the (small) number of patterns rather than p + patterns.
Shared config: adaptive_binning=True, b_candidates=[3,5,7,10], L=1, G=0.01, topK=50, n_jobs=4, use_hotpath=True, augmented_pair_transforms=False