OpenML significantly simplifies benchmarking by:

  • offering a large set of datasets in standardized data formats
  • easily download through a wide range of APIs and existing client libraries
  • machine-readable meta-information on all the datasets
  • sharing of banchmarking results in a reproducible way through the APIs, enabling large scale comparisons

  • In addition, OpenML offers benchmarking suites: curated, comprehensive sets of machine learning datasets, covering a wide spectrum of domains and statistical propertie and standardized evaluation procedures (defined in tasks). This makes benchmarking results more comparable, more complete, and allows more standardized analysis of algorithms under different conditions.


    As a first such suite, we propose the OpenML100, a machine learning benchmark suite of 100 classification datasets carefully curated from the thousands of datasets available. We selected classification datasets for this benchmarking suite to satisfy the following requirements:
    • the number of observations are between 500 and 100000 to focus on medium-sized datasets, that are not too small for proper training and not too big for practical experimentation,
    • the number of features does not exceed 5000 features to keep the runtime of algorithms low
    • the target attribute has at least two classes
    • the ratio of the minority class and the majority class is above 0.05 to eliminate highly imbalanced datasets that would obfuscate a clear analysis.
    We excluded datasets which:
    • cannot be randomized via a 10-fold cross-validation due to grouped samples,
    • have an unknown origin or no clearly defined task,
    • are variants of other datasets (e.g. binarized regression tasks),
    • include sparse data (e.g., text mining data sets).

    Code examples

    The code below demonstrates how OpenML benchmarking suites, in this case the OpenML100, can be conveniently imported for benchmarking using the Python, Java and R APIs. The OpenML100 tasks are downloaded through the study with the same name, which contains the 100 tasks and also holds all benchmarking results obtained on them. The code also shows how to access the raw data set (although this is not needed to train a model), fit a simple classifier on the defined data splits, and finally publish runs on the OpenML server.
    Python (with scikit-learn)
    import openml
    import sklearn
    benchmark_suite ='OpenML100','tasks') # obtain the benchmark suite
    clf = sklearn.pipeline.Pipeline(steps=[('imputer',sklearn.preprocessing.Imputer()), ('estimator',
        sklearn.tree.DecisionTreeClassifier())]) # build a sklearn classifier
    for task_id in benchmark_suite.tasks: # iterate over all tasks
      task = openml.tasks.get_task(task_id) # download the OpenML task
      X, y = task.get_X_and_y() # get the data (not used in this example)
      openml.config.apikey = 'FILL_IN_OPENML_API_KEY' # set the OpenML Api Key
      run = openml.runs.run_model_on_task(task,clf) # run classifier on splits (requires API key)
      score = run.get_metric_score(sklearn.metrics.accuracy_score) # print accuracy score
      print('Data set: %s; Accuracy: %0.2f' % (task.get_dataset().name,score.mean()))
      run.publish() # publish the experiment on OpenML (optional)
      print('URL for run: %s/run/%d' %(openml.config.server,run.run_id))
    Java (with WEKA)
    public static void runTasksAndUpload() throws Exception {
      OpenmlConnector openml = new OpenmlConnector();
      Study benchmarksuite = openml.studyGet("OpenML100", "tasks"); // obtain the benchmark suite
      Classifier tree = new REPTree(); // build a Weka classifier
      for (Integer taskId : benchmarksuite.getTasks()) { // iterate over all tasks
        Task t = openml.taskGet(taskId); // download the OpenML task
        Instances d = InstancesHelper.getDatasetFromTask(openml, t); // obtain the dataset
        int runId = RunOpenmlJob.executeTask(openml, new WekaConfig(), taskId, tree);
        Run run = openml.runGet(runId);   // retrieve the uploaded run
    R (with mlr)
    lrn = makeLearner('classif.rpart') # construct a simple CART classifier
    task.ids = getOMLStudy('OpenML100')$tasks$ # obtain the list of suggested tasks
    for ( in task.ids) { # iterate over all tasks
      task = getOMLTask( # download single OML task
      data = # obtain raw data set
      run = runTaskMlr(task, learner = lrn) # run constructed learner
      setOMLConfig(apikey = 'FILL_IN_OPENML_API_KEY')
      upload = uploadOMLRun(run) # upload and tag the run

    Creating new benchmark suites

    The set of datasets on can easily be extended, and additional OpenML benchmark suites, e.g., for regression and time-series data, can easily be created by defining sets of datasets according to specific needs. This is done by:
    • Creating a new study with the name of the benchmark as the alias.
    • Adding new tasks to the study by tagging them with the name of the benchmark.