Change Log¶

Version 1.9.0 (July 27, 2020)¶

New Features:
- Multinode training (alpha)
- Queuing of experiments to avoid system overload
- Automatic Leaderboard: Single-button creation of a project with a series of diverse experiments
- Multi-layer hierarchical feature engineering:
  Allow optional pre-processing layer for specific custom data cleanup/conversions
  
  Subsequent layers take each previous layer’s output as input (can be numeric or categorical/string)
- PyTorch deep learning backend in addition to TensorFlow
- Image classification and regression with pre-trained and fine-tuned state-of-the-art Deep Learning models:
  Image data ingest from binary archives
  
  Archives can contain (one) optional .csv file with mapping of image paths to target (regression/classification)
  
  Automatic training dataset creation and label creation (from directory structure) if no .csv provided
  
  Image Transformers (for converting image path columns
  
  “densenet121”, “efficientnetb0”, “efficientnetb2”, “inception_v3”, “mobilenetv2”, “resnet34”, “resnet50”, “seresnet50”, “seresnext50”, “xception”
  
  Optional fine-tuning
  
  Optional GPU acceleration (strongly recommended when enabling fine-tuning)
  
  Pretrained and fine-tuneable ImageVectorizer transformer with automatic dimensionality reduction
  
  Images can be provided either as zipped archives, or as paths to local or remote locations (URIs)
  
  Automatic image labeling when importing zipped archives of images (based on folder names and structure)
  
  Can handle multiple image columns with URIs in a tabular dataset
  
  Single experiment can combine image, NLP and tabular data
  
  MOJO support (also for CPU-only systems)
  
  Automatic Image model
  
  End-to-end model training, no tuning needed
  
  State-of-the-art results with grandmaster techniques
  
  Neural architecture search based on pretrained and fine-tuned TensorFlow models
  
  Multi-GPU training
  
  Visual insights in GUI (losses, sample images, augmentation, Grad-CAM visual explanations)
  
  MLI is not available for image experiments and is a work in progress
- PyTorch BERT NLP pre-trained and fine-tuned state-of-the-art Deep Learning models:
  “bert-base-uncased”, “distilbert-base-uncased”, “xlnet-base-cased”, “xlm-mlm-enfr-1024”, “roberta-base”, “albert-base-v2”, “camembert-base”, “xlm-roberta-base”
  
  Optional GPU acceleration (strongly recommended)
  
  MOJO support (also for CPU-only systems)
  
  BERT transformers (for converting text columns into numeric features for other models like GBMs)
  
  BERT models (when only have one text column)
- AutoReport now includes the following:
  Information about the time series validation strategy
  
  Experiment lineage (model lineage plot)
  
  NLP/Image architecture details
- Zero-inflated regression models for insurance use cases (combination of classification + regression models)
- Time series centering and de-trending transformations:
  Inner ML model is trained on residuals after fitting and removing trend from target signal (per time-series group)
  
  Support for constant (centering), linear and logistic trends
  
  SEIRD model for epidemic modeling of (S)usceptible, (E)xposed, (I)nfected, (R)ecovered and (D)eceased, fully configurable lower/upper bounds for model parameters
- Graphical config.toml editor for expert settings
- Empiric prediction intervals for regression problems with user-defined confidence levels (based on holdout predictions)
- Insights tab with helpful visualizations (currently only for time-series and image problems)
- For binary classification problems with F05, F1, F2, MCC scorers, use the same metric for optimal threshold determination
- Custom data recipes can now be part of the experiment’s modeling pipeline, and will be part of the Python scoring package
- Custom visualizations in AutoViz following the Grammar of Graphics
- Pass data to (custom) scorers, so can access other columns, not only actual and predicted values
- Added many new scorers for common regression and classification metrics out of the box
- Added holiday calendar for 24 more countries, allow user to select list of countries to create is-holiday features for.
- Added identity_no_clip target transformer for regression problems that never clips the predictions to observed ranges and allows extrapolation
- MLI:
  New GUI/UX for MLI
  
  Added Kernel Explainer for original feature Shapley importance
  
  Added ability to download Shapley values for original features from UI as CSV file
  
  Added intercept column to k-LIME output CSV file
  
  Added ability to run surrogate models on DAI model residuals to help debug model errors
  
  Added ability to export Decision Tree Surrogate model rules as text and Python code
  
  Added Decision Tree Surrogate model for multinomial experiments
  
  Added Leave One Covariate Out (LOCO) for multinomial experiments
  
  Added two traditional fair lending metrics for Disparate Impact Analysis (DIA): Standardized Mean Difference (SMD) and Marginal Error (ME)
  
  Added two interpretable model recipes to https://github.com/h2oai/driverlessai-recipes: GA2M and XNN (https://github.com/h2oai/driverlessai-recipes/tree/master/models/mli)
  
  Display prediction label for binary classification experiments in MLI summary page
Improvements:
- Improved parsability (machine readability) of log files
- Custom recipes are now only visible to the user that created them, previously created custom recipes remain globally visible
- Faster time-series experiments
- Improve preview to show more details about modeling part of final pipeline
- Improved notifications system
- Reduced size of MOJO
- Only allow imbalanced sampling techniques when data is larger than user controllable threshold
- Upgraded to latest H2O-3 backend for custom recipes
- Faster feature selection for large imbalanced datasets
Documentation updates:
- Added animated GIFs
- Added tabbed content
- Added more details for imbalanced sampling methods for binary classification
- New content (refer to above linked topics)
Bug fixes:
- Various bug fixes

Version 1.8.7.2 LTS (July 13, 2020)¶

Bug Fixes:
- Add and pass authentication_method parameter to use proper get_true_username and start_session
- SQL-like connector: strip unnecessary semi-colon from the end of query
Documentation updates:
- Document use of hive_app_jvm_args

Version 1.8.7.1 LTS (June 23, 2020)¶

New Features:
- Add ability to push artifacts to a Bitbucket server
- Add per-feature user control for monotonicity constraints for XGBoostGBM, LightGBM and DecisionTree models
Bug Fixes:
- Fix Hive kerberos impersonation
- Fix a DTap connector issue by using the proper login username for impersonation
- Fix monotonicity constraints for XGBoostGBM, LightGBM and DecisionTree models

Version 1.8.7 LTS (June 15, 2020)¶

New Features:
- Add intercept term to k-LIME csv
- Add control of default categorical & numeric feature rendering in DAI PD/ICE
- Add ability to restrict custom recipe upload to a specific git repository and branch
- Add translations for Korean and Chinese
- Add ability to use multiple authentication methods simultaneously
Improvements:
- Improve behavior of systemctl in the case Driverless AI fails to start
- Improve logging behavior for JDBC and Hive connectors
- Improve behavior of C++ scorer, fewer unnecessary files saved in tmp directory
- Improve Docker image behavior in Kubernetes
- Improve LDAP authentication to allow for anonymous binding
- Improve speed of feature selection for experiments on large, wide, imbalanced datasets
- Improve speed of data import on busy system
Bug fixes:
- Fix automatic Kaggle submission and score retrieval
- Fix intermittent Java exception seen by surrogate DRF model in MLI when several MLI jobs are run concurrently
- Fix issue with deleting Deployments if linked Experiment was deleted
- Fix issue causing Jupyter Notebooks to not work properly in Docker Image
- Fix custom recipe scorers not being displayed on Diagnostics page
- Fix issue with AWS Lambda Deployment not handling dropped columns properly
- Fix issue with not being able to limit number of GPUs for specific experiment
- Fix in-server scoring inaccuracies for certain models built in 1.7.1 and 1.8.0 (standalone scoring not affected)
- Fix rare datatable type casting exception
Documentation updates:
- The “Maximum Number of Rows to Perform Permutation-Based Feature Selection” expert setting now has a default value of 500,000
- Improved Hive and Snowflake connector documentation
- Updated the Main.java example in the Java Scoring Pipeline chapter
- Added documentation describing how to change the language in the UI before starting the application
- Added information about how custom recipes are described and documented in the Autoreport
- Updated the LDAP authentication documentation
- Improved the Linux DEB and RPM installation instructions
- Improved the AWS Community AMI installation instructions
- Improved documentation for the Reproducible button

Version 1.8.6 LTS (Apr 30, 2020)¶

New Features:
- Add expert setting to reduce size of MOJO scoring pipelines (and hence reduce latency and memory usage for inference)
- Enable Lambda deployment for IBM Power
- Add restart button for Deployments
- Add automatic Kaggle submission for supported datasets, show private/public scores (requires Kaggle API Username/Key)
- Show warning if single final model is worse on back-testing splits (for time series) or cross-validation folds (for IID) than the fold models (indicates issue with signal or fit)
- Update R client API to include autodoc, experiment preview, dataset download, autovis functions
- Add button in expert settings that toggle some effective settings to make a small MOJO production pipeline
- Add an option to upload artifacts to S3 or a Git repository
Improvements:
- Improve experiment restart/refit robustness if model type is changed
- Extra protection against dropping features
- Improve implementation of Hive connector
Bug fixes:
- Upgrade datatable to fix endless loop during stats calculation at file import
- Web server and UI now respect dynamic base URL suffix
- Fix incorrect min_rows in MLI when providing weight column with small values
- Fix segfault in MOJO for TensorFlow/PyTorch models
- Fix elapsed time for MLI
- Enable GPU by default for R client
- Fix python scoring h2oai ModuleNotFound error
- Update no_drop_features toml and expert button to more generally avoid dropping features
- Fix datatable mmap strategy
Documentation updates:
- Add documentation for enabling the Hive data connector
- Add documentation for updating expired DAI licenses on AWS Lambda deployments using a script
- Documentation for uploading artifacts now includes support for S3 and Git in the artifacts store
- Improve documentation for one-hot encoding
- Improve documentation for systemd logs/journalctl
- Improve documentation for time series ‘unavailable columns at prediction time’
- Improve documentation for Azure blob storage
- Improve documentation for MOJO scoring pipeline
- Add information about reducing the size of a MOJO using a new expert setting

Version 1.8.5 LTS (Mar 09, 2020)¶

New Features:
- Handle large (up to 10k) multiclass problems, including GUI improvements in such cases
- Detect class imbalance for binary problems where target class is non-rare
- Add feature count to iteration panel
- Add experiment lineage pdf in experiment summary zip file
- Issue warnings if final pipeline scores are unstable across (cross-)validation folds
- Issue warning if Constant Model is improving quality of final pipeline (sign of bad signal)
- Report origin of leakage detection as from model fit (AUC/R2), GINI, or correlation
Improvements:
- Improve handling of ID columns
- Improve exception handling to improve stability of raising python exceptions
- Improve exception handling when any individual transformer or model throw exception or segfaults
- Improve robustness of restart and refit experiment to changes in experiment choices
- Improve handling of missing values when transforming dataset
- Improve robustness of custom recipe importing of modules
- Improve documentation for installation instructions
- Improve selection of initial lag sizes for time series
- Improve LightGBM stability for regression problems for certain mutation parameters
Documentation updates:
- Improved documentation for time-series experiments
- Added topics describing how to re-enable the Data Recipe URL and Data Recipe File connectors
- For users running older versions of the Standalone Python Scoring Pipeline, added information describing how to install upgraded versions of outdated dependencies
- Improved the description for the “Sampling Method for Imbalanced Binary Classification Problems” expert setting
- Added constraints related to the REST server deployments
- Noted required vs optional parameters in the HDFS connector topics
- Added an FAQ indicating that MOJOs are thread safe
- On Windows 10, only Docker installs are supported
- Added information about the Recommendations AutoViz graph
- Added information to the Before you Begin Installing topic that master.db files are not backward compatible with earlier Driverless AI versions
Bug fixes:
- Update LightGBM for bug fixes, including hangs and avoiding hard-coded library paths
- Stabilize use of psutil package
- Fix time-series experiments when test set has missing target values
- Fix python scoring to not depend upon original data_directory
- Fix preview for custom time series validation splits and low accuracy
- Fix ignored minimum lag size setting for single time series
- Fix parsing of Excel files with datetime columns
- Fix column type detection for columns with mostly missing values
- Fix invalid display of 0.0000 score in iteration scores
- Various MLI fixes (don’t show invalid graphs, fix PDP sort order, overlapping labels)
- Various bug fixes

Version 1.8.4.1 LTS (Feb 4, 2020)¶

Available here

Add option for dynamic port allocation
Documentation for AWS community AMI
Various bug fixes (MLI UI)

Version 1.8.4 LTS (Jan 31, 2020)¶

Available here

New Features:
- Added ‘Scores’ tab in experiment page to show detailed tuning tables and scores for models and folds
- Added Constant Model (constant predictions) and use it as reference model by default
- Show score of global constant predictions in experiment summary as reference
- Added support for setting up mutual TLS for the DriverlessAI
- Added option to use client/personal certificate as an authentication method
Documentation Updates:
- Added sections for enabling mTLS and Client Certificate authentication
- Constant Models is now included in the list of Supported Algorithms
- Added a section describing the Model Scores page
- Improved the C++ Scoring Pipeline documentation describing the process for importing datatable
- Improved documentation for the Java Scoring Pipeline
Bug fixes:
- Fix refitting of final pipeline when new features are added
- Various bug fixes

Version 1.8.3 LTS (Jan 22, 2020)¶

Available here

Added option to upload experiment artifacts to a configured disk location
Various bug fixes (correct feature engineering from time column, migration for brain restart)

Version 1.8.2 LTS (Jan 17, 2020)¶

Available here

New Features:
- Decision Tree model
- Automatically enabled for accuracy <= 7 and interpretability >= 7
- Supports all problem types: regression/binary/multiclass
- Using LightGBM GPU/CPU backend with MOJO
- Visualization of tree splits and leaf node decisions as part of pipeline visualization
- Per-Column Imputation Scheme (experimental)
- Select one of [const, mean, median, min, max, quantile] imputation scheme at start of experiment
- Select method of calculation of imputation value: either on entire dataset or inside each pipeline’s training data split
- Disabled by default and must be enabled at startup time to be effective
- Show MOJO size and scoring latency (for C++/R/Python runtime) in experiment summary
- Automatically prune low weight base models in final ensemble (based on interpretability setting) to reduce final model complexity
- Automatically convert non-raw github URLs for custom recipes to raw source code URLs
Improvements:
- Speed up feature evolution for time-series and low-accuracy experiments
- Improved accuracy of feature evolution algorithm
- Feature transformer interpretability, total count, and importance accounted for in genetic algorithm’s model and feature selection
- Binary confusion matrix in ROC curve of experiment page is made consistent with Diagnostics (flipped positions of TP/TN)
- Only include custom recipes in Python scoring pipeline if the experiment uses any custom recipes
- Additional documentation (New OpenID config options, JDBC data connector syntax)
- Improved AutoReport’s transformer descriptions
- Improved progress reporting during Autoreport creation
- Improved speed of automatic interaction search for imbalanced multiclass problems
- Improved accuracy of single final model for GLM and FTRL
- Allow config_overrides to be a list/vector of parameters for R client API
- Disable early stopping for Random Forest models by default, and expose new ‘rf_early_stopping’ mode (optional)
- Create identical example data (again, as in 1.8.0 and before) for all scoring pipelines
- Upgraded versions of datatable and Java
- Installed graphviz in Docker image, now get .png file of pipeline visualization in MOJO package and Autoreport. Note: For RPM/DEB/TAR SH installs, user can install graphviz to get this optional functionality
Documentation Updates:
- Added a simple example for modifying a dataset by recipe using live code
- Added a section describing how to impute datasets (experimental)
- Added Decision Trees to list of supported algorithms
- Fixed examples for enabling JDBC connectors
- Added information describing how to use a JDBC driver that is not tested in house
- Updated the Missing Values Handling topic to include sections for “Clustering in Transformers” and “Isolation Forest Anomaly Score Transformer”
- Improved the “Fold Column” description
Bug Fixes:
- Fix various reasons why final model score was too far off from best feature evolution score
- Delete temporary files created during test set scoring
- Fixed target transformer tuning (was potentially mixing up target transformers between feature evolution and final model)
- Fixed tensorflow_nlp_have_gpus_in_production=true mode
- Fixed partial dependence plots for missing datetime values and no longer show them for text columns
- Fixed time-series GUI for quarterly data
- Feature transformer exploration limited to no more than 1000 new features (Small data on 10/10/1 would try too many features)
- Fixed Kaggle pipeline building recipe to try more input features than 8
- Fixed cursor placement in live code editor for custom data recipe
- Show correct number of cross-validation splits in pipeline visualization if have more than 10 splits
- Fixed parsing of datetime in MOJO for some datetime formats without ‘%d’ (day)
- Various bug fixes
Backward/Forward compatibility:
- Models built in 1.8.2 LTS will remain supported in upcoming versions 1.8.x LTS
- Models built in 1.7.1/1.8.0/1.8.1 are not deprecated and should continue to work (best effort is made to preserve MOJO and Autoreport creation, MLI, scoring, etc.)
- Models built in 1.7.0 or earlier will be deprecated

Version 1.8.1.1 (Dec 21, 2019)¶

Available here

Bugfix for time series experiments with quarterly data when launched from GUI

Version 1.8.1 (Dec 10, 2019)¶

Available here

New Features:
- Full set of scoring metrics and corresponding downloadable holdout predictions for experiments with single final models (time-series or i.i.d)
- MLI Updates:
  - What-If (sensitivity) analysis
  - Interpretation of experiments on text data (NLP)
- Custom Data Recipe BYOR:
  - BYOR (bring your own recipe) in Python: pandas, numpy, datatable, third-party libraries for fast prototyping of connectors and data preprocessing inside DAI
  - data connectors, cleaning, filtering, aggregation, augmentation, feature engineering, splits, etc.
  - can create one or multiple datasets from scratch or from existing datasets
  - interactive code editor with live preview
  - example code at https://github.com/h2oai/driverlessai-recipes/tree/rel-1.8.1/data
- Visualization of final scoring pipeline (Experimental)
  - In-GUI display of graph of feature engineering, modeling and ensembling steps of entire machine learning pipeline
  - Addition to Autodoc
- Time-Series:
  - Ability to specify which features will be unavailable at test time for time-series experiments
  - Custom user-provided train/validation splits (by start/end datetime for each split) for time-series experiments
  - Back-testing metrics for time-series experiments (regression and classification, with and without lags) based on rolling windows (configurable number of windows)
- MOJO:
  - Java MOJO for FTRL
  - PyTorch MOJO (C++/Py/R) for custom recipes based on BERT/DistilBERT NLP models (available upon request)
Improvements:
- Accuracy:
  - Automatic pairwise interaction search (+,-,*,/) for numeric features (“magic feature” finder)
  - Improved accuracy for time series experiments with low interpretability
  - Improved leakage detection logic
  - Improved genetic algorithm heuristics for feature evolution (more exploration)
- Time-Series Recipes:
  - Re-enable Test-time augmentation in Python scoring pipeline for time-series experiments
  - Reduce default number of time-series rolling holdout predictions to same number as validation splits (but configurable)
- Computation:
  - Faster feature evolution part for non-time-series experiments with single final model
  - Faster binary imbalanced models for very high class imbalance by limiting internal number of re-sampling bags
  - Faster feature selection
  - Enable GPU support for ImbalancedXGBoostGBMModel
  - Improved speed for importing multiple files at once
  - Faster automatic determination of time series properties
  - Enable use of XGBoost models on large datasets if low enough accuracy settings, expose dataset size limits in expert settings
  - Reduced memory usage for all experiments
  - Faster creation of holdout predictions for time-series experiments (Shapley values are now computed by MLI on demand by default)
- UX Improvements:
  - Added ability to rename datasets
  - Added search bar for expert settings
  - Show traces for long-running experiments
  - All experiments create a MOJO (if possible, set to ‘auto’)
  - All experiments create a pipeline visualization
  - By default, all experiments (iid and time series) have holdout predictions on training data and a full set of metrics for final model
Documentation Updates:
- Updated steps for enabling GPU persistence mode
- Added information about deprecated NVIDIA functions
- Improved documentation for enabling LDAP authentication
- Added information about changing the column type in datasets
- Updated list of experiment artifacts available in an experiment summary
- Added steps describing how to expose ports on Docker for the REST service deployment within the Driverless AI Docker container
- Added an example showing how to run an experiment with a custom transform recipe
- Improved the FAQ for setting up TLS/SSL
- Added FAQ describing issues that can occur when attempting Import Folder as File with a data connector on Windows
Bug Fixes:
- Allow brain restart/refit to accept unscored previous pipelines
- Fix actual vs predicted labeling for diagnostics of regression model
- Fix MOJO for TensorFlow for non target transformers other than identity
- Fix column type detection for Excel files
- Allow experiments with default expert settings to have a MOJO
- Various bug fixes

Version 1.8.0 (Oct 3, 2019)¶

Available here

Improve speed and memory usage for feature engineering
Improve speed of leakage and shift detection, and improve accuracy
Improve speed of AutoVis under high system load
Improve speed for experiments with large user-given validation data
Improve accuracy of ensembles for regression problems
Improve creation of Autoreport (only one background job per experiment)
Improve sampling techniques for ImbalancedXGBoost and ImbalancedLightGBM models, and disable them by default since can be slower
Add Python/R/C++ MOJO support for FTRL and RandomForest
Add native categorical handling for LightGBM in CPU mode
Add monotonicity constraints support for LightGBM
Add Isolation Forest Anomaly Score transformer (outlier detection)
Re-enable One-Hot-Encoding for GLM models
Add lexicographical label encoding (disabled by default)
Add ability to further train user-provided pretrained embeddings for TensorFlow NLP transformers, in addition to fine-tuning the rest of the neural network graph
Add timeout for BYOR acceptance tests
Add log and notifications for large shifts in final model variable importances compared to tuning model
Add more expert control over time series feature engineering
Add ability for recipes to be uploaded in bulk as entire (or part of) github repository or as links to python files on page
Allow missing values in fold column
Add support for feature brain when starting “New Model With Same Parameters” of a model that was previously restarted
Add support for toggling whether additional features are to be included in pipeline during “Retrain Final Pipeline”
Limit experiment runtime to one day by default (approximately enforced, can be configured in Expert Settings -> Experiment or config.toml ‘max_runtime_minutes’)
Add support for importing pickled Pandas frames (.pkl)
MLI updates:
- Show holdout predictions and test set predictions (if applicable) in MLI TS for both metric and actual vs. predicted charts
- Add ability to download group metrics in MLI TS
- Add ability to zoom into charts in MLI TS
- Add ability to use column not used in DAI model as a k-LIME cluster column in MLI
- Add ability to view original and transformed DAI model-based feature importance in MLI
- Add ability to view Shapley importance for original features
- Add ability to view permutation importance for a DAI model when the config option autodoc_include_permutation_feature_importance is set to on
- Fixed bug in binary Disparate Impact Analysis, which caused incorrect calculations amongst several metrics (ones using false positives and true negatives in the numerator)
Disable NLP TensorFlow transformers by default (enable in NLP expert settings by switching to “on”)
Reorganize expert settings, add tab for feature engineering
Experiment now informs if aborted by user, system or server restart
Reduce load of all tasks launched by server, giving priority to experiments to use cores
Add experiment summary files to aborted experiment logs
Add warning when ensemble has models that reach limit of max iterations despite early stopping, with learning rate controls in expert panel to control.
Improve progress reporting
Allow disabling of H2O recipe server for scoring if not using custom recipes (to avoid Java dependency)
Fix RMSPE scorer
Fix recipes error handling when uploading via URL
Fix Autoreport being spawned anytime GUI was on experiment page, overloading the system with forks from the server
Fix time-out for Autoreport PDP calculations, so completes more quickly
Fix certain config settings to be honored from GUI expert settings (woe_bin_list, ohe_bin_list, text_gene_max_ngram, text_gene_dim_reduction_choice, tensorflow_max_epochs_nlp, tensorflow_nlp_pretrained_embeddings_file_path, holiday_country), previously were only honored when provided at startup time
Fix column type for additional columns during scored test set download
Fix GUI incorrectly converting time for forecast horizon in TS experiments
Fix calculation of correlation for string columns in AutoVis
Fix download for R MOJO runtime
Fix parameters for LightGBM RF mode
Fix dart parameters for LightGBM and XGBoost
Documentation updates:
- Included more information in the Before You Begin Installing or Upgrading topic to help making installations and upgrades go more smoothly
- Added topic describing how to choose between the AWS Community and AWS Marketplace AMIs
- Added information describing how to retrieve the MOJO2 Javadoc
- Updated Python client examples to work with Driverless AI 1.7.x releases
- Updated documentation for new features, expert settings, MLI plots, etc.
Backward/Forward compatibility:
- Models built in 1.8.0 will remain supported in versions 1.8.x
- Models built in 1.7.1 are not deprecated and should continue to work (best effort is made to preserve MOJO and Autoreport creation, MLI, scoring, etc.)
- 1.8.0 upgraded to scipy version 1.3.1 to support newer custom recipes. This might deprecate custom recipes that depend on scipy version 1.2.2 (and experiments using them) and might require re-import of those custom recipes. Previously built Python scoring pipelines will continue to work.
- Models built in 1.7.0 or earlier will be deprecated
Various bug fixes

Version 1.7.1 (Aug 19, 2019)¶

Available here

Added two new models with internal sampling techniques for imbalanced binary classification problems: ImbalancedXGBoost and ImbalancedLightGBM
Added support for rolling-window based predictions for time-series experiments (2 options: test-time augmentation or re-fit)
Added support for setting logical column types for a dataset (to override type detection during experiments)
Added ability to set experiment name at start of experiment
Added leakage detection for time-series problems
Added JDBC connector
MOJO updates:
- Added Python/R/C++ MOJO support for TensorFlow model
- Added Python/R/C++ MOJO support for TensorFlow NLP transformers: TextCNN, CharCNN, BiGRU, including any pretrained embeddings if provided
- Reduced memory usage for MOJO creation
- Increased speed of MOJO creation
- Configuration options for MOJO and Python scoring pipelines now have 3-way toggle: “on”/”off”/”auto”
MLI updates:
- Added disparate impact analysis (DIA) for MLI
- Allow MLI scoring pipeline to be built for datasets with column names that need to be sanitized
- Date-aware binning for partial dependence and ICE in MLI
Improved generalization performance for time-series modeling with regulariation techniques for lag-based features
Improved “predicted vs actual” plots for regression problems (using adaptive point sizes)
Fix bug in datatable for manipulations of string columns larger than 2GB
Fixed download of predictions on user-provided validation data
Fix bug in time-series test-time augmentation (work-around was to include entire training data in test set)
Honor the expert settings flag to enable detailed traces (disable again by default)
Various bug fixes

Version 1.6.4 LTS (Aug 19, 2019)¶

Available here

ML Core updates:
- Speed up schema detection
- DAI now drops rows with missing values when diagnosing regression problems
- Speed up column type detection
- Fixed growth of individuals
- Fixed n_jobs for predict
- Target column is no longer included in predictors for skewed datasets
- Added an option to prevent users from downloading data files locally
- Improved UI split functionality
- A new “max_listing_items” config option to limit the number of items fetched in listing pages
Model Ops updates:
- MOJO runtime upgraded to version 2.1.3 which supports perpetual MOJO pipeline
- Upgraded deployment templates to version matching MOJO runtime version
MLI updates:
- Fix to MLI schema builder
- Fix parsing of categorical reason codes
- Added ability to handle integer time column
Various bug fixes

Version 1.7.0 (Jul 7, 2019)¶

Available here

Support for Bring Your Own Recipe (BYOR) for transformers, models (algorithms) and scorers
Added protobuf-based MOJO scoring runtime libraries for Python, R and Java (standalone, low-latency)
Added local REST server as one-click deployment option for MOJO scoring pipeline, in addition to AWS Lambda endpoint
Added R client package, in addition to Python client
Added Project workspace to group datasets and experiments and to visually compare experiments and create leaderboards
Added download of imported datasets as .csv
Recommendations for columnar transformations in AutoViz
Improved scalability and performance
Ability to provide max. runtime for experiments
Create MOJO scoring pipeline by default if the experiment configuration allows (for convenience, enables local/cloud deployment options without user input)
Support for user provided pre-trained embeddings for TensorFlow NLP models
Support for holdout splits lacking some target classes (can happen when a fold column is provided)
MLI updates:
- Added residual plot for regression problems (keeping all outliers intact)
- Added confusion matrix as default metric display for multinomial problems
- Added Partial Dependence (PD) and Individual Conditional Expectation (ICE) plots for Driverless.ai models in MLI GUI
- Added ability to search by ID column in MLI GUI
- Added ability to run MLI PD/ICE on all features
- Added ability to handle multiple observations for a single time column in MLI TS by taking the mean of the target and prediction where applicable
- Added ability to handle integer time column in MLI TS
- MLI TS will use train holdout predictions if there is no test set provided
Faster import of files with “%Y%m%d” and “%Y%m%d%H%M” time format strings, and files with lots of text strings
Fix units for RMSPE scorer to be a percentage (multiply by 100)
Allow non-positive outcomes for MAPE and SMAPE scorers
Improved listing in GUI
Allow zooming in GUI
Upgrade to TensorFlow 1.13.1 and CUDA 10 (and CUDA is part of the distribution now, to simplify installation)
Add CPU-support for TensorFlow on PPC
Documentation updates:
- Added documentation for new features including
  - Projects
  - Custom Recipes
  - C++ MOJO Scoring Pipelines
  - R Client API
  - REST Server Deployment
- Added information about variable importance values on the experiments page
- Updated documentation for Expert Settings
- Updated “Tips n Tricks” with new Scoring Pipeline tips
Various bug fixes

Version 1.6.3 LTS (June 14, 2019)¶

Available here

Included an Audit log feature
Fixed support for decimal types for parquet files in MOJO
Autodoc can order PDP/ICE by feature importance
Session Management updates
Upgraded datatable
Improved reproducibility
Model diagnostics now uses a weight column
MLI can build surrogate models on all the original features or on all the transformed features that DAI uses
Internal server cache now respects usernames
Fixed an issue with time series settings
Fixed an out of memory error when loading a MOJO
Fixed Python scoring package for TensorFlow
Added OpenID configurations
Documentation updates:
- Updated the list of artifacts available in the Experiment Summary
- Clarified language in the documentation for unsupported (but available) features
- For the Terraform requirement in deployments, clarified that only Terraform versions in the 0.11.x release are supported, and specifically 0.11.10 or greater
- Fixed link to the Miniconda installation instructions
Various bug fixes

Version 1.6.2 LTS (May 10, 2019)¶

Available here

This version provides PPC64le artifacts
Improved stability of datatable
Improved path filtering in the file browser
Fixed units for RMSPE scorer to be a percentage (multiply by 100)
Fixed segmentation fault on Ubuntu 18 with installed font package
Fixed IBM Spectrum Conductor authentication
Fixed handling of EC2 machine credentials
Fixed of Lag transformer configuration
Fixed KDB and Snowflake Error Reporting
Gradually reduce number of used workers for column statistics computation in case of failure.
Hide default Tornado header exposing used version of Tornado
Documentation updates:
- Added instructions for installing via AWS Marketplace
- Improved documentation for installing via Google Cloud
- Improved FAQ documentation
- Added Data Sampling documentation topic
Various bug fixes

Version 1.6.1.1 LTS (Apr 24, 2019)¶

Available here

Fix in AWS role handling.

Version 1.6.1 LTS (Apr 18, 2019)¶

Available here

Several fixes for MLI (partial dependence plots, Shapley values)
Improved documentation for model deployment, time-series scoring, AutoVis and FAQs

Version 1.6.0 LTS (Apr 5, 2019)¶

Private build only.

Fixed import of string columns larger than 2GB
Fixed AutoViz crashes on Windows
Fixed quantile binning in MLI
Plot global absolute mean Shapley values instead of global mean Shapley values in MLI
Improvements to PDP/ICE plots in MLI
Validated Terraform version in AWS Lambda deployment
Added support for NULL variable importance in AutoDoc
Made Variable Importance table size configurable in AutoDoc
Improved support for various combinations of data import options being enabled/disabled
CUDA is now part of distribution for easier installation
Security updates:
- Enforced SSL settings to be honored for all h2oai_client calls
- Added config option to prevent using LocalStorage in the browser to cache information
- Upgraded Tornado server version to 5.1.1
- Improved session expiration and autologout functionality
- Disabled access to Driverless AI data folder in file browser
- Provided an option to filter content that is shown in the file browser
- Use login name for HDFS impersonation instead of predefined name
- Disabled autocomplete in login form
Various bug fixes

Version 1.5.4 (Feb 24, 2019)¶

Available here

Speed up calculation of column statistics for date/datetime columns using certain formats (now uses ‘max_rows_col_stats’ parameter)
Added computation of standard deviation for variable importances in experiment summary files
Added computation of shift of variable importances between feature evolution and final pipeline
Fix link to MLI Time-Series experiment
Fix display bug for iteration scores for long experiments
Fix display bug for early finish of experiment for GLM models
Fix display bug for k-LIME when target is skewed
Fix display bug for forecast horizon in MLI for Time-Series
Fix MLI for Time-Series for single time group column
Fix in-server scoring of time-series experiments created in 1.5.0 and 1.5.1
Fix OpenBLAS dependency
Detect disabled GPU persistence mode in Docker
Reduce disk usage during TensorFlow NLP experiments
Reduce disk usage of aborted experiments
Refresh reported size of experiments during start of application
Disable TensorFlow NLP transformers by default to speed up experiments (can enable in expert settings)
Improved progress percentage shown during experiment
Improved documentation (upgrade on Windows, how to create the simplest model, DTap connectors, etc.)
Various bug fixes

Version 1.5.3 (Feb 8, 2019)¶

Available here

Added support for splitting datasets by time via time column containing date, datetime or integer values
Added option to disable file upload
Require authentication to download experiment artifacts
Automatically drop predictor columns from training frame if not found in validation or test frame and warn
Improved performance by using physical CPU cores only (configurable in config.toml)
Added option to not show inactive data connectors
Various bug fixes

Version 1.5.2 (Feb 2, 2019)¶

Available here

Added world-level bidirectional GRU Tensorflow models for NLP features
Added character-level CNN Tensorflow models for NLP features
Added support to import multiple individual datasets at once
Added support for holdout predictions for time-series experiments
Added support for regression and multinomial classification for FTRL (in addition to binomial classification)
Improved scoring for time-series when test data contains actual target values (missing target values will be predicted)
Reduced memory usage for LightGBM models
Improved performance for feature engineering
Improved speed for TensorFlow models
Improved MLI GUI for time-series problems
Fix final model fold splits when fold_column is provided
Various bug fixes

Version 1.5.1 (Jan 22, 2019)¶

Available here

Fix MOJO for GLM
Add back .csv file of experiment summary
Improve collection of pipeline timing artifacts
Clean up Docker tag

Version 1.5.0 (Jan 18, 2019)¶

Available here

Added model diagnostics (interactive model metrics on new test data incl. residual analysis for regression)
Added FTRL model (Follow The Regularized Leader)
Added Kolmogorov-Smirnov metric (degree of separation between positives and negatives)
Added ability to retrain (only) the final model on new data
Added one-hot encoding for low-cardinality categorical features, for GLM
Added choice between 32-bit (now default) and 64-bit precision
Added system information (CPU, GPU, disk, memory, experiments)
Added support for time-series data with many more time gaps, and with weekday-only data
Added one-click deployment to Amazon Lambda
Added ability to split datasets randomly, with option to stratify by target column or group by fold column
Added support for OpenID authentication
Added connector for BlueData
Improved responsiveness of the GUI under heavy load situations
Improved speed and reduce memory footprint of feature engineering
Improved performance for RuleFit models and enable GPU and multinomial support
Improved auto-detection of temporal frequency for time-series problems
Improved accuracy of final single model if external validation provided
Improved final pipeline if external validation data is provided (add ensembling)
Improved k-LIME in MLI by using original features deemed important by DAI instead of all original features
Improved MLI by using 3-fold CV by default for all surrogate models
Improved GUI for MLI time series (integrated help, better integration)
Added ability to view MLI time series logs while MLI time series experiment is running
PDF version of the Automatic Report (AutoDoc) is now replaced by a Word version
Various bug fixes (GLM accuracy, UI slowness, MLI UI, AutoVis)

Version 1.4.2 (Dec 3, 2018)¶

Available here

Support for IBM Power architecture
Speed up training and reduce size of final pipeline
Reduced resource utilization during training of final pipeline
Display test set metrics (ROC, ROCPR, Gains, Lift) in GUI in addition to validation metrics (if test set provided)
Show location of best threshold for Accuracy, MCC and F1 in ROC curves
Add relative point sizing for scatter plots in AutoVis
Fix file upload and add model checkpointing in python client API
Various bug fixes

Version 1.4.1 (Nov 11, 2018)¶

Available here

Improved integration of MLI for time-series
Reduced disk and memory usage during final ensemble
Allow scoring and transformations on previously imported datasets
Enable checkpoint restart for unfinished models
Add startup checks for OpenCL platforms for LightGBM on GPUs
Improved feature importances for ensembles
Faster dataset statistics for date/datetime columns
Faster MOJO batch scoring
Fix potential hangs
Fix ‘not in list’ error in MOJO
Fix NullPointerException in MLI
Fix outlier detection in AutoVis
Various bug fixes

Version 1.4.0 (Oct 27, 2018)¶

Available here

Enable LightGBM by default (now with MOJO)
LightGBM tuned for GBM decision trees, Random Forest (rf), and Dropouts meet Multiple Additive Regression Trees (dart)
Add ‘isHoliday’ feature for time columns
Add ‘time’ column type for date/datetime columns in data preview
Add support for binary datatable file ingest in .jay format
Improved final ensemble (each model has its own feature pipeline)
Automatic smart checkpointing (feature brain) from prior experiments
Add kdb+ connector
Feature selection of original columns for data with many columns to handle >>100 columns
Improved time-series recipe (multiple validation splits, better logic)
Improved performance of AutoVis
Improved date detection logic (now detects %Y%m%d and %Y-%m date formats)
Automatic fallback to CPU mode if GPU runs out of memory (for XGBoost, GLM and LightGBM)
No longer require header for validation and testing datasets if data types match
No longer include text columns for data shift detection
Add support for time-series models in MLI (including ability to select time-series groups)
Add ability to download MLI logs from MLI experiment page (includes both Python and Java logs)
Add ability to view MLI logs while MLI experiment is running (Python and Java logs)
Add ability to download LIME and Shapley reason codes from MLI page
Add ability to run MLI on transformed features
Display all variables for MLI variable importance for both DAI and surrogate models in MLI summary
Include variable definitions for DAI variable importance list in MLI summary
Fix Gains/Lift charts when observations weights are given
Various bug fixes

Version 1.3.1 (Sep 12, 2018)¶

Available here

Fix ‘Broken pipe’ failures for TensorFlow models
Fix time-series problems with categorical features and interpretability >= 8
Various bug fixes

Version 1.3.0 (Sep 4, 2018)¶

Available here

Added LightGBM models - now have [XGBoost, LightGBM, GLM, TensorFlow, RuleFit]
Added TensorFlow NLP recipe based on CNN Deeplearning models (sentiment analysis, document classification, etc.)
Added MOJO for GLM
Added detailed confusion matrix statistics
Added more expert settings
Improved data exploration (columnar statistics and row-based data preview)
Improved speed of feature evolution stage
Improved speed of GLM
Report single-pass score on external validation and test data (instead of bootstrap mean)
Reduced memory overhead for data processing
Reduced number of open files - fixes ‘Bad file descriptor’ error on Mac/Docker
Simplified Python client API
Query any data point in the MLI UI from the original dataset due to “on-demand” reason code generation
Enhanced k-means clustering in k-LIME by only using a subset of features. See The K-LIME Technique for more information.
Report k-means centers for k-LIME in MLI summary for better cluster interpretation
Improved MLI experiment listing details
Various bug fixes

Version 1.2.2 (July 5, 2018)¶

Available here

MOJO Java scoring pipeline for time-series problems
Multi-class confusion matrices
AUCMACRO Scorer: Multi-class AUC via macro-averaging (in addition to the default micro-averaging)
Expert settings (configuration override) for each experiment from GUI and client APIs.
Support for HTTPS
Improved downsampling logic for time-series problems (if enabled through accuracy knob settings)
LDAP readonly access to Active Directory
Snowflake data connector
Various bug fixes

Version 1.2.1 (June 26, 2018)¶

Added LIME-SUP (alpha) to MLI as alternative to k-LIME (local regions are defined by decision tree instead of k-means)
Added RuleFit model (alpha), now have [GBM, GLM, TensorFlow, RuleFit] - TensorFlow and RuleFit are disabled by default
Added Minio (private cloud storage) connector
Added support for importing folders from S3
Added ‘Upload File’ option to ‘Add Dataset’ (in addition to drag & drop)
Predictions for binary classification problems now have 2 columns (probabilities per class), for consistency with multi-class
Improved model parameter tuning
Improved feature engineering for time-series problems
Improved speed of MOJO generation and loading
Improved speed of time-series related automatic calculations in the GUI
Fixed potential rare hangs at end of experiment
No longer require internet to run MLI
Various bug fixes

Version 1.2.0 (June 11, 2018)¶

Time-Series recipe
Low-latency standalone MOJO Java scoring pipelines (now beta)
Enable Elastic Net Generalized Linear Modeling (GLM) with lambda search (and GPU support), for interpretability>=6 and accuracy<=5 by default (alpha)
Enable TensorFlow (TF) Deep Learning models (with GPU support) for interpretability=1 and/or multi-class models (alpha, enable via config.toml)
Support for pre-tuning of [GBM, GLM, TF] models for picking best feature evolution model parameters
Support for final ensemble consisting of mix of [GBM, GLM, TF] models
Automatic Report (AutoDoc) in PDF and Markdown format as part of summary zip file
Interactive tour (assistant) for first-time users
MLI now runs on experiments from previous releases
Surrogate models in MLI now use 3 folds by default
Improved small data recipe with up to 10 cross-validation folds
Improved accuracy for binary classification with imbalanced data
Additional time-series transformers for interactions and aggreations between lags and lagging of non-target columns
Faster creation of MOJOs
Progress report during data ingest
Normalize binarized multi-class confusion matrices by class count (global scaling factor)
Improved parsing of boolean environment variables for configuration
Various bug fixes

Version 1.1.6 (May 29, 2018)¶

Improved performance for large datasets
Improved speed and user interface for MLI
Improved accuracy for binary classification with imbalanced data
Improved generalization estimate for experiments with given validation data
Reduced size of experiment directories
Support for Parquet files
Support for bzip2 compressed files
Added Data preview in UI: ‘Describe’
No longer add ID column to holdout and test set predictions for simplicity
Various bug fixes

Version 1.1.4 (May 17, 2018)¶

Native builds (RPM/DEB) for 1.1.3

Version 1.1.3 (May 16, 2018)¶

Faster speed for systems with large CPU core counts
Faster and more robust handling of user-specified missing values for training and scoring
Same validation scheme for feature engineering and final ensemble for high enough accuracy
MOJO scoring pipeline for text transformers
Fixed single-row scoring in Python scoring pipeline (broken in 1.1.2)
Fixed default scorer when experiment is started too quickly
Improved responsiveness for time-series GUI
Improved responsiveness after experiment abort
Improved load balancing of memory usage for multi-GPU XGBoost
Improved UI for selection of columns to drop
Various bug fixes

Version 1.1.2 (May 8, 2018)¶

Support for automatic time-series recipe (alpha)
Now using Generalized Linear Model (GLM) instead of XGBoost (GBM) for interpretability 10
Added experiment preview with runtime and memory usage estimation
Added MER scorer (Median Error Rate, Median Abs. Percentage Error)
Added ability to use integer column as time column
Speed up type enforcement during scoring
Support for reading ARFF file format (alpha)
Quantile Binning for MLI
Various bug fixes

Version 1.1.1 (April 23, 2018)¶

Support string columns larger than 2GB

Version 1.1.0 (April 19, 2018)¶

AWS/Azure integration (hourly cloud usage)
Bug fixes for MOJO pipeline scoring (now beta)
Google Cloud storage and BigQuery (alpha)
Speed up categorical column stats computation during data import
Further improved memory management on GPUs
Improved accuracy for MAE scorer
Ability to build scoring pipelines on demand (if not enabled by default)
Additional target transformer for regression problems sqrt(sqrt(x))
Add GLM models as candidates for interpretability=10 (alpha, disabled by default)
Improved performance of native builds (RPM/DEB)
Improved estimation of error bars
Various bug fixes

Version 1.0.30 (April 5, 2018)¶

Speed up MOJO pipeline creation and disable MOJO by default (still alpha)
Improved memory management on GPUs
Support for optional 32-bit floating-point precision for reduced memory footprint
Added logging of test set scoring and data transformations
Various bug fixes

Version 1.0.29 (April 4, 2018)¶

If MOJO fails to build, no MOJO will be available, but experiment can still succeed

Version 1.0.28 (April 3, 2018)¶

(Non-docker) RPM installers for RHEL7/CentOS7/SLES 12 with systemd support

Version 1.0.27 (March 31, 2018)¶

MOJO scoring pipeline for Java standalone cross-platform low-latency scoring (alpha)
Various bug fixes

Version 1.0.26 (March 28, 2018)¶

Improved performance and reduced memory usage for large datasets
Improved performance for F0.5, F2 and accuracy
Improved performance of MLI
Distribution shift detection now also between validation and test data
Batch scoring example using datatable
Various enhancements for AutoVis (outliers, parallel coordinates, log file)
Various bug fixes

Version 1.0.25 (March 22, 2018)¶

New scorers for binary/multinomial classification: F0.5, F2 and accuracy
Precision-recall curve for binary/multinomial classification models
Plot of actual vs predicted values for regression problems
Support for excluding feature transformations by operation type
Support for reading binary file formats: datatable and Feather
Improved multi-GPU memory load balancing
Improved display of initial tuning results
Reduced memory usage during creation of final model
Fixed several bugs in creation of final scoring pipeline
Various UI improvements (e.g., zooming on iteration scoreboard)
Various bug fixes

Version 1.0.24 (March 8, 2018)¶

Fix test set scoring bug for data with an ID column (introduced in 1.0.23)
Allow renaming of MLI experiments
Ability to limit maximum number of cores used for datatable
Print validation scores and error bars across final ensemble model CV folds in logs
Various UI improvements
Various bug fixes

Version 1.0.23 (March 7, 2018)¶

Support for Gains and Lift curves for binomial and multinomial classification
Support for multi-GPU single-model training for large datasets
Improved recipes for large datasets (faster and less memory/disk usage)
Improved recipes for text features
Increased sensitivity of interpretability setting for feature engineering complexity
Disable automatic time column detection by default to avoid confusion
Automatic column type conversion for test and validation data, and during scoring
Improved speed of MLI
Improved feature importances for MLI on transformed features
Added ability to download each MLI plot as a PNG file
Added support for dropped columns and weight column to MLI stand-alone page
Fix serialization of bytes objects larger than 4 GiB
Fix failure to build scoring pipeline with ‘command not found’ error
Various UI improvements
Various bug fixes

Version 1.0.22 (Feb 23, 2018)¶

Fix CPU-only mode
Improved robustness of datatable CSV parser

Version 1.0.21 (Feb 21, 2018)¶

Fix MLI GUI scaling issue on Mac
Work-around segfault in truncated SVD scipy backend
Various bug fixes

Version 1.0.20 (Feb 17, 2018)¶

HDFS/S3/Excel data connectors
LDAP/PAM/Kerberos authentication
Automatic setting of default values for accuracy / time / interpretability
Interpretability: per-observation and per-feature (signed) contributions to predicted values in scoring pipeline
Interpretability setting now affects feature engineering complexity and final model complexity
Standalone MLI scoring pipeline for Python
Time setting of 1 now runs for only 1 iteration
Early stopping of experiments if convergence is detected
ROC curve display for binomial and multinomial classification, with confusion matrices and threshold/F1/MCC display
Training/Validation/Test data shift detectors
Added AUCPR scorer for multinomial classification
Improved handling of imbalanced binary classification problems
Configuration file for runtime limits such as cores/memory/harddrive (for admins)
Various GUI improvements (ability to rename experiments, re-run experiments, logs)
Various bug fixes

Version 1.0.19 (Jan 28, 2018)¶

Fix hang during final ensemble (accuracy >= 5) for larger datasets
Allow scoring of all models built in older versions (>= 1.0.13) in GUI
More detailed progress messages in the GUI during experiments
Fix scoring pipeline to only use relative paths
Error bars in model summary are now +/- 1*stddev (instead of 2*stddev)
Added RMSPE scorer (RMS Percentage Error)
Added SMAPE scorer (Symmetric Mean Abs. Percentage Error)
Added AUCPR scorer (Area under Precision-Recall Curve)
Gracefully handle inf/-inf in data
Various UI improvements
Various bug fixes

Version 1.0.18 (Jan 24, 2018)¶

Fix migration from version 1.0.15 and earlier
Confirmation dialog for experiment abort and data/experiment deletion
Various UI improvements
Various AutoVis improvements
Various bug fixes

Version 1.0.17 (Jan 23, 2018)¶

Fix migration from version 1.0.15 and earlier (partial, for experiments only)
Added model summary download from GUI
Restructured and renamed logs archive, and add model summary to it
Fix regression in AutoVis in 1.0.16 that led to slowdown
Various bug fixes

Version 1.0.16 (Jan 22, 2018)¶

Added support for validation dataset (optional, instead of internal validation on training data)
Standard deviation estimates for model scores (+/- 1 std.dev.)
Computation of all applicable scores for final models (in logs only for now)
Standard deviation estimates for MLI reason codes (+/- 1 std.dev.) when running in stand-alone mode
Added ability to abort MLI job
Improved final ensemble performance
Improved outlier visualization
Updated H2O-3 to version 3.16.0.4
More readable experiment names
Various speedups
Various bug fixes

Version 1.0.15 (Jan 11, 2018)¶

Fix truncated per-experiment log file
Various bug fixes

Version 1.0.14 (Jan 11, 2018)¶

Improved performance

Version 1.0.13 (Jan 10, 2018)¶

Improved estimate of generalization performance for final ensemble by removing leakage from target encoding
Added API for re-fitting and applying feature engineering on new (potentially larger) data
Remove access to pre-transformed datasets to avoid unintended leakage issues downstream
Added mean absolute percentage error (MAPE) scorer
Enforce monotonicity constraints for binary classification and regression models if interpretability >= 6
Use squared Pearson correlation for R^2 metric (instead of coefficient of determination) to avoid negative values
Separated HTTP and TCP scoring pipeline examples
Reduced size of h2oai_client wheel
No longer require weight column for test data if it was provided for training data
Improved accuracy of final modeling pipeline
Include H2O-3 logs in downloadable logs.zip
Updated H2O-3 to version 3.16.0.2
Various bug fixes

Version 1.0.11 (Dec 12, 2017)¶

Faster multi-GPU training, especially for small data
Increase default amount of exploration of genetic algorithm for systems with fewer than 4 GPUs
Improved accuracy of generalization performance estimate for models on small data (< 100k rows)
Faster abort of experiment
Improved final ensemble meta-learner
More robust date parsing
Various bug fixes

Version 1.0.10 (Dec 4, 2017)¶

Tool tips and link to documentation in parameter settings screen
Faster training for multi-class problems with > 5 classes
Experiment summary displayed in GUI after experiment finishes
Python Client Library downloadable from the GUI
Speedup for Maxwell-based GPUs
Support for multinomial AUC and Gini scorers
Add MCC and F1 scorers for binomial and multinomial problems
Faster abort of experiment
Various bug fixes

Version 1.0.9 (Nov 29, 2017)¶

Support for time column for causal train/validation splits in time-series datasets
Automatic detection of the time column from temporal correlations in data
MLI improvements, dedicated page, selection of datasets and models
Improved final ensemble meta-learner
Test set score now displayed in experiment listing
Original response is preserved in exported datasets
Various bug fixes

Version 1.0.8 (Nov 21, 2017)¶

Various bug fixes

Version 1.0.7 (Nov 17, 2017)¶

Sharing of GPUs between experiments - can run multiple experiments at the same time while sharing GPU resources
Persistence of experiments and data - can stop and restart the application without loss of data
Support for weight column for optional user-specified per-row observation weights
Support for fold column for user-specified grouping of rows in train/validation splits
Higher accuracy through model tuning
Faster training - overall improvements and optimization in model training speed
Separate log file for each experiment
Ability to delete experiments and datasets from the GUI
Improved accuracy for regression tasks with very large response values
Faster test set scoring - Significant improvements in test set scoring in the GUI
Various bug fixes

Version 1.0.5 (Oct 24, 2017)¶

Only display scorers that are allowed
Various bug fixes

Version 1.0.4 (Oct 19, 2017)¶

Improved automatic type detection logic
Improved final ensemble accuracy
Various bug fixes

Version 1.0.3 (Oct 9, 2017)¶

Various speedups
Results are now reproducible
Various bug fixes

Version 1.0.2 (Oct 5, 2017)¶

Improved final ensemble accuracy
Weight of Evidence features added
Various bug fixes

Version 1.0.1 (Oct 4, 2017)¶

Improved speed of final ensemble
Various bug fixes

Version 1.0.0 (Sep 24, 2017)¶

Initial stable release