Skip to Content
Technical Articles
Author's profile photo Dmitry Buslov

HANA AutoML图书馆

Let’s assume you have to prepare machine learning model for classification or regression task.
All your data already in HANA, or in flat(csv) file.
Everything you need –https://github.com/dan0nchik/SAP-HANA-AutoML(This library is an open-source research project and is not part of any officialSAP产品.)

This is joke, but hana_automl goes through all(not yet) AutoML steps and makes Data Science work easier.

This library based on python and made on top of other awesome libs:

  • hana_ml
  • Optuna
  • BayesianOptimization
  • Streamlit

For installation – you need just

pip3 install Cython pip3 install hana_automl

After installation – it is quite easy to start:

from hana_automl.utils.scripts import setup_user from hana_ml.dataframe import ConnectionContext cc = ConnectionContext(address='address', user='user', password='password', port=39015) # replace with credentials of user that will be created or granted a role to run PAL. setup_user(connection_context=cc, username='user_new', password="password_new")

setup_user– is additional method if you need to create new user for experiments.

After that – you need fit/predict and waiting…

from hana_automl.automl import AutoML model = AutoML(cc) model.fit( file_path='path to training dataset', # it may be HANA table/view, or pandas DataFrame steps=10, # number of iterations target='target', # column to predict time_limit=120 # time limit in seconds )

predict:

model.predict( file_path='path to test dataset', id_column='ID', verbose=1 )

You can find all documentation here –https://sap-hana-automl.readthedocs.io/en/latest/index.html

Also, it is possible to run all this steps not from python, but from UI with help ofstreamlit

This UI looks like this:Streamlit client

To start Ui you need 3 steps:

  1. Clone repository:git clone https://github.com/dan0nchik/SAP-HANA-AutoML.git
  2. Install dependencies:pip3 install -r requirements.txt
  3. Run GUI:streamlit run ./web.py

Ok, why you have to try?

Have a look on this example –https://github.com/dan0nchik/SAP-HANA-AutoML/blob/main/comparison_openml.ipynb

APL – is awesome, but with strong focus on speed, for more accurate models you need some time and PAL. So, hana_automl could help.

Also, it is possible to make not just simple model, butblending of models.To enable ensemble, just passensemble=Truetohana_automl.automl.AutoML.fit()function when creating AutoML model.

There is a big potential for improvement and contribution is very welcome!

If you have any ideas –https://github.com/dan0nchik/SAP-HANA-AutoML/issues

P.S. this is project of@While-true-codeanythingand@dan0nchik– very talented students…

Don’t wait – have a try on your dataset and share your results…

Assigned Tags

      2 Comments
      You must beLogged onto comment or reply to a post.
      Author's profile photo Andreas Forster
      Andreas Forster

      I trained my first hana_autml regression on PAL
      Big kudos for putting such a cool project togetherDmitry Buslov

      Author's profile photo Jeremy Yu
      Jeremy Yu

      This is really great Dmitry, makes using PAL much easier in Python!