OpenML is an open source project, hosted on GitHub. We welcome everybody to help improve OpenML, and make it more useful for everyone. Fork us on GitHub.

GitHub repo's

OpenML Core - Everything done by the OpenML server. This includes dataset feature calculations and server-side model evaluations.

Website - The website and REST API

Meta-feature - New repo for meta-feature calculation tool.

Java API - The Java API and Java-based plugins

R API - The OpenML R package

Python API - The Python API

Issues and feature requests

You can post issues (e.g. bugs) and feature requests on the relevant issue tracker:

OpenML tracker - All general issues and feature requests. This is all organized on Waffle.

Website tracker - Smaller issues related to the website.

R tracker - Issues related to the openml R package.

GitHub wiki

The GitHub Wiki contains more information on how to set up your environment to work on OpenML locally, on the structure of the backend and frontend, and working documents.

Database snapshots

Everything uploaded to OpenML is available to the community. The nightly snapshot of the public database contains all experiment runs, evaluations and links to datasets, implementations and result files. In SQL format (gzipped).

Nightly database SNAPSHOT

If you want to work on the website locally, you'll also need the schema for the 'private' database with non-public information.

Private database schema

Legacy Resources

OpenML is always evolving, but we keep hosting the resources that were used in prior publications so that others may still build on them.

The experiment database used in Vanschoren et al. (2012) Experiment databases. Machine Learning 87(2), pp 127-158. You'll need to import this database (we used MySQL) to run queries. The database structure is described in the paper. Note that most of the experiments in this database have been rerun using OpenML, using newer algorithm implementations and stored in much more detail.

The Exposé ontology used in the same paper, and described in more detail here and here. Exposé is used in designing our databases, and we aim to use it to export all OpenML data as Linked Open Data.

Honor Code

By joining OpenML, you join a special worldwide community of data scientists building on each other's results and connecting their minds as efficiently as possible. This community depends on your motivation to share data, tools and ideas, and to do so with honesty. In return, you will gain trust, visibility and reputation, igniting online collaborations and studies that otherwise may not have happened.

By using any part of OpenML, you agree to:

  • Give credit where credit is due. Cite the authors whose work you are building on, or build collaborations where appropriate.
  • Give back to the community by sharing your own data as openly and as soon as possible, or by helping the community in other ways. In doing so, you gain visibility and impact (citations).
  • Share data according to your best efforts. Everybody make mistakes, but we trust you to correct them as soon as possible. Remove or flag data that cannot be trusted.
  • Be polite and constructive in all discussions. Criticism of methods is welcomed, but personal criticisms should be avoided.
  • Respect circles of trust. OpenML allows you to collaborate in 'circles' of trusted people to share unpublished results. Be considerate in sharing data with people outside this circle.
  • Do not steal the work of people who openly share it. OpenML makes it easy to find all shared data (and when it was shared), thus everybody will know if you do this.

Terms of Use

You agree that you are responsible for your own use of OpenML.org and all content submitted by you, in accordance with the Honor Code and all applicable local, state, national and international laws.

By submitting or distributing content from OpenML.org, you affirm that you have the necessary rights, licenses, consents and/or permissions to reproduce and publish this content. You, and not the developers of OpenML.org, are solely responsible for your submissions.

By submitting content to OpenML.org, you grant OpenML.org the right to host, transfer, display and use this content, in accordance with your sharing settings and any licences granted by you. You also grant to each user a non-exclusive license to access and use this content for their own research purposes, in accordance with any licences granted by you.

You may maintain one user account and not let anyone else use your username and/or password. You may not impersonate other persons.

You will not intend to damage, disable, or impair any OpenML server or interfere with any other party's use and enjoyment of the service. You may not attempt to gain unauthorized access to the Site, other accounts, computer systems or networks connected to any OpenML server. You may not obtain or attempt to obtain any materials or information not intentionally made available through OpenML.

Strictly prohibited are content that defames, harasses or threatens others, that infringes another's intellectual property, as well as indecent or unlawful content, advertising, or intentionally inaccurate information posted with the intent of misleading others. It is also prohibited to post code containing viruses, malware, spyware or any other similar software that may damage the operation of another's computer or property.

Citing OpenML

The OpenML team and the active community of contributing researchers have invested countless hours of time and resources in creating OpenML as it is today. You are free to use OpenML under the CC-BY licence. If you have used OpenML in your work, please cite the following paper:

Joaquin Vanschoren, Jan N. van Rijn, Bernd Bischl, and Luis Torgo. OpenML: networked science in machine learning. SIGKDD Explorations 15(2), pp 49-60, 2013.
Show BibTeX - Read on arXiv

@article{OpenML2013,
author = {Vanschoren, Joaquin and van Rijn, Jan N. and Bischl, Bernd and Torgo, Luis},
title = {OpenML: Networked Science in Machine Learning},
journal = {SIGKDD Explorations},
volume = {15},
number = {2},
year = {2013},
pages = {49--60},
url = {http://doi.acm.org/10.1145/2641190.2641198},
doi = {10.1145/2641190.2641198},
publisher = {ACM},
address = {New York, NY, USA},
}

Citing Data and Code

Sharing data and code is crucial for reproducibility and scientific progress, and should be rewarded. If you are reusing any of the shared data sets, flows or runs/studies, please honor their respective licences and citation requests. OpenML cleary shows these requests when they apply.

Other acknowledgements

The anonymous robot icon was designed by Freepik.

Our Team

OpenML is a community effort, and as such many people have contributed to it over the years.
Want to join? Leave a message on the community mailing list.



Joaquin Vanschoren
Machine learning professor @TUeindhoven. Founder of OpenML. Working to make machine learning more open, collaborative, and automated.


Jan van Rijn
PhD student at Leiden University and main developer of various OpenML components and plugins


Bernd Bischl
PHD in statistics, data scientist, developer of OpenML R plugin, developer of mlr.


Dominik Kirchhoff
PhD student at TU Dortmund University. Contributing to the R package.


Noureddin Sadawi
http://www.brunel.ac.uk/~csstnns


Rafael G. Mantovani
PhD student in computer science @ University of São Paulo, Brazil.


Luis Torgo
Associate Professor of the University of Porto and a Senior Researcher of INESC Tec


Manuel Martin Salvador
PhD student @ Bournemouth University


Nenad Tomašev
Data poet. Working on machine learning in many dimensions.


Paula Branco
PhD Student at University of Porto and a researcher at LIAAD - INESC Tec. She is integrating existing R packages into OpenML.


Andrey Ustyuzhanin
Head of Yandex School of Data Analysis research group. Mission of the group is solving tough scientific problem by applying data science tools and practices. Member of LHCb and SHiP experiments at CERN. Head of Laboratory of Methods for Big Data Analysis at CS faculty of HSE.


Jakob Bossek
PhD student in computer science at the University of Münster, Germany. R enthusiast, one of the main contributors of the OpenML R interface and sports freak.


Mandar Chandorkar
PhD student at CWI Amsterdam. Interests include machine learning, applied mathematics, dynamical systems, probabilistic numerics, building open source ML tools.


Heidi Seibold
PhD student in Computational Biostatistics at the Universitz of Zurich. I am into R, open science and reproducible research.


Andreas Mueller
Research engineer at NYU, scikit-learn core-developer.


Janek Thomas
PhD student for compuational statistics at the LMU Munich.

Altmetrics and Gamification

To encourage open science, OpenML now includes a score system to track and reward scientific activity, reach and impact, and in the future will include further gamification features such as badges. Because the system is still experimental and very much in development, the details are subject to change. Below, the score system is described in more detailed followed by our rationale for this system for those interested. If anything is unclear or you have any feedback of the system do not hesitate to let us know.

The scores

All scores are awarded to users and involve datasets, flows, tasks and runs, or knowledge pieces in short.

Activity

Activity score is awarded to users for contributing to the knowledge base of OpenML. This includes uploading knowledge pieces, leaving likes and downloading new knowledge pieces. Uploads are rewarded strongest, with 3 activity, followed by likes, with 2 activity, and downloads are rewarded the least, with 1 activity.

Reach

Reach score is awarded to knowledge pieces and by extension their uploaders for the expressed interest of other users. It is increased by 2 for every user that leaves a like on a knowledge piece and increased by 1 for every user that downloads it for the first time.

Impact

Impact score is awarded to knowledge pieces and by extension their uploaders for the reuse of these knowledge pieces. A dataset is reused if when it is used as input in a task while flows and tasks are reused in runs. 1 Impact is awarded for every reuse by a user that is not the uploader. Impact of a reused knowledge piece is further increased by half of the acquired reach and half of the acquired impact of a reuse, usually rounded down. So the impact of a dataset that is used in a single task with reach 10 and impact 5, is 8 (⌊1+0.5*10+0.5*5 ⌋).

The rationale

One of OpenML's core ideas is to create an open science environment for sharing and exploration of knowledge while getting credit for your work. The activity score serves the encouragement of sharing and exploration. Reach makes exploration easier (by finding well liked, and/or often downloaded knowledge pieces), while also providing a form of credit to the user. Impact is another form of credit that is closer in concept to citation scores.

Where to find it

The number of likes and downloads as well as the reach and impact of knowledge pieces can be found on the top of their respective pages, for example the Iris data set. In the top right you will also find the new Like button next to the already familiar download button.

When searching for knowledge pieces on the search page, you will now be able to see the statistics mentioned above as well. In addition you can sort the search results on their downloads, likes, reach or impact.

On user profiles you will find all statistics relevant to that user, as well as graphs of their progress on the three scores.

Badges

Badges are intended to provide discrete goals for users to aim for. They are only in a conceptual phase, depending on the community's reaction they will be further developed.
The badges a user has acquired can be found on their user profile below the score graphs. The currently implemented badges are:

Clockwork Scientist
For being active every day for a period of time.
Team Player
For collaborating with other users; reusing a knowledge piece of someone who has reused a knowledge piece of yours.
Good News Everyone
For achieving a high reach on singular knowledge piece you uploaded.

Downvotes

Although not part of the scores, downvotes have also been introduced. They are intended to indicate a flaw of a data set, flow, task or run that can be fixed, for example a missing description.

If you want to indicate something is wrong with a knowledge piece, click the number of issues statistic at the top the page. A panel will open where you either agree with an already raised issue anonymously or submit your own issue (not anonymously).

You can also sort search results by the number of downvotes, or issues on the search page.

Opting out

If you really do not like the gamification you can opt-out by changing the setting on your profile. This hides your scores and badges from other users and hides their scores and badges from you. You will still be able to see the number of likes, downloads and downvotes on knowledge pieces, and your likes, downloads and downvotes will still be counted.


OpenML is integrated in the Weka (Waikato Environment for Knowledge Analysis) Experimenter and the Command Line Interface.

Installation

OpenML is available as a weka extension in the package manager
  1. Download the latest development version (3.7.13 or higher).
  2. Launch Weka, or start from commandline:
    java -jar weka.jar
    If you need more memory (e.g. 1GB), start as follows:
    java -Xmx1G -jar weka.jar
  3. Open the package manager (Under 'Tools')
  4. Select package OpenmlWeka and click install. Afterwards, restart WEKA.
  5. From the Tools menu, open the 'OpenML Experimenter'.

Quick Start (Graphical Interface)

OpenML Weka Screenshot

You can solve OpenML Tasks in the Weka Experimenter, and automatically upload your experiments to OpenML (or store them locally).

  1. From the Tools menu, open the 'OpenML Experimenter'.
  2. Enter your API key in the top field (log in first). You can also store this in a config file (see below).
  3. In the 'Tasks' panel, click the 'Add New' button to add new tasks. Insert the task id's as comma-separated values (e.g., '1,2,3,4,5'). Use search to find interesting tasks and click the icon to list the ID's. In the future this search will also be integrated in WEKA.
  4. Add algorithms in the "Algorithm" panel.
  5. Go to the "Run" tab, and click on the "Start" button.
  6. The experiment will be executed and sent to OpenML.org.
  7. The runs will now appear on OpenML.org. You can follow their progress and check for errors on your profile page under 'Runs'.

Quick Start CommandLine Interface

The Command Line interface is useful for running experiments automatically on a server, without using a GUI.
  1. Create a config file called openml.conf in a new directory called .openml in your home dir. It should contain the following line:
    api_key = YOUR_KEY
  2. Execute the following command:
    java -cp weka.jar openml.experiment.TaskBasedExperiment -T <task_id> -C <classifier_classpath> -- <parameter_settings>
  3. For example, the following command will run Weka's J48 algorithm on Task 1:
    java -cp OpenWeka.beta.jar openml.experiment.TaskBasedExperiment -T 1 -C weka.classifiers.trees.J48
  4. The following suffix will set some parameters of this classifier:
    -- -C 0.25 -M 2
Please report any bugs that you may encounter to j.n.van.rijn@liacs.leidenuniv.nl.

Download Plugin

OpenML features extensive support for MOA. However currently this is implemented as a stand alone MOA compilation, using the latest version (as of May, 2014).





Quick Start

OpenML Weka Screenshot
  1. Download the standalone MOA environment above.
  2. Find your API key in your profile (log in first). Create a config file called openml.conf in a .openml directory in your home dir. It should contain the following lines:
    api_key = YOUR_KEY
  3. Launch the JAR file by double clicking on it, or launch from command-line using the following command:
    java -cp openmlmoa.beta.jar moa.gui.GUI
  4. Select the task moa.tasks.openml.OpenmlDataStreamClassification to evaluate a classifier on an OpenML task, and send the results to OpenML.
  5. Optionally, you can generate new streams using the Bayesian Network Generator: select the moa.tasks.WriteStreamToArff task, with moa.streams.generators.BayesianNetworkGenerator.
Please note that this is a beta version, which is under active development. Please report any bugs that you may encounter to j.n.van.rijn@liacs.leidenuniv.nl.
The R package mlr interfaces a large number of classification and regression techniques. It also uses the OpenML R package (by the same authors) to interface seamlessly with OpenML. This means you can download data and tasks from OpenML, run the many mlr algorithms, and organize all ensuing results online with a few lines or R.

Download

You'll need the mlr and openml packages. Soon, both will be available from CRAN.

Quick Start

In this tutorial, you can find examples of standard use cases.

Issues

Having questions? Did you run into an issue? Let us know via the OpenML R issue tracker.
You can design OpenML workflows in RapidMiner to directly interact with OpenML. The RapidMiner plugin is currently under active development.
The Java API allows you connect to OpenML from Java applications.

Download

Stable releases of the Java API are available from Maven central. Or, you can check out the developer version from GitHub. Include the jar file in your projects as usual, or install via Maven. You can also separately download all dependencies and a fat jar with all dependencies included.

Quick Start

Create an OpenmlConnector instance with your authentication details. This will create a client with all OpenML functionalities.

OpenmlConnector client = new OpenmlConnector("api_key");

All functions are described in the Java Docs, and they mirror the functions from the Web API functions described below. For instance, the API function openml.data.description has an equivalent Java function openmlDataDescription(String data_id).

Downloading

To download data, flows, tasks, runs, etc. you need the unique id of that resource. The id is shown on each item's webpage and in the corresponding url. For instance, let's download Data set 1. The following returns a DataSetDescription object that contains all information about that data set.

DataSetDescription data = client.dataGet(1);

You can also search for the items you need online, and click the icon to get all id's that match a search.

Uploading

To upload data, flows, runs, etc. you need to provide a description of the object. We provide wrapper classes to provide this information, e.g. DataSetDescription, as well as to capture the server response, e.g. UploadDataSet, which always includes the generated id for reference:

DataSetDescription description = new DataSetDescription( "iris", "The famous iris dataset", "arff", "class");
UploadDataSet result = client.dataUpload( description, datasetFile );
int data_id = result.getId();

More details are given in the corresponding functions below. Also see the Java Docs for all possible inputs and return values.

Data download

dataGet(int data_id)

Retrieves the description of a specified data set.

DataSetDescription data = client.dataGet(1);
String name = data.getName();
String version = data.getVersion();
String description = data.getDescription();
String url = data.getUrl();

dataFeatures(int data_id)

Retrieves the description of the features of a specified data set.

DataFeature reponse = client.dataFeatures(1);
DataFeature.Feature[] features = reponse.getFeatures();
String name = features[0].getName();
String type = features[0].getDataType();
boolean	isTarget = features[0].getIs_target();

dataQuality(int data_id)

Retrieves the description of the qualities (meta-features) of a specified data set.

DataQuality response = client.dataQuality(1);
DataQuality.Quality[] qualities = reponse.getQualities();
String name = qualities[0].getName();
String value = qualities[0].getValue();

dataQuality(int data_id, int start, int end, int interval_size)

For data streams. Retrieves the description of the qualities (meta-features) of a specified portion of a data stream.

DataQuality qualities = client.dataQuality(1,0,10000,null);

dataQualityList()

Retrieves a list of all data qualities known to OpenML.

DataQualityList response = client.dataQualityList();
String[] qualities = response.getQualities();

Data upload

dataUpload(DataSetDescription description, File dataset)

Uploads a data set file to OpenML given a description. Throws an exception if the upload failed, see openml.data.upload for error codes.

DataSetDescription dataset = new DataSetDescription( "iris", "The iris dataset", "arff", "class");
UploadDataSet data = client.dataUpload( dataset, new File("data/path"));
int data_id = result.getId();

dataUpload(DataSetDescription description)

Registers an existing dataset (hosted elsewhere). The description needs to include the url of the data set. Throws an exception if the upload failed, see openml.data.upload for error codes.

DataSetDescription description = new DataSetDescription( "iris", "The iris dataset", "arff", "class");
description.setUrl("http://datarepository.org/mydataset");
UploadDataSet data = client.dataUpload( description );
int data_id = result.getId();

Flow download

flowGet(int flow_id)

Retrieves the description of the flow/implementation with the given id.

Implementation flow = client.flowGet(100);
String name = flow.getName();
String version = flow.getVersion();
String description = flow.getDescription();
String binary_url = flow.getBinary_url();
String source_url = flow.getSource_url();
Parameter[] parameters = flow.getParameter();

Flow management

flowOwned()

Retrieves an array of id's of all flows/implementations owned by you.

ImplementationOwned response = client.flowOwned();
Integer[] ids = response.getIds();

flowExists(String name, String version)

Checks whether an implementation with the given name and version is already registered on OpenML.

ImplementationExists check = client.flowExists("weka.j48", "3.7.12");
boolean exists = check.exists();
int flow_id = check.getId();

flowDelete(int id)

Removes the flow with the given id (if you are its owner).

ImplementationDelete response = client.openmlImplementationDelete(100);

Flow upload

flowUpload(Implementation description, File binary, File source)

Uploads implementation files (binary and/or source) to OpenML given a description.

Implementation flow = new Implementation("weka.J48", "3.7.12", "description", "Java", "WEKA 3.7.12")
UploadImplementation response = client.flowUpload( flow, new File("code.jar"), new File("source.zip"));
int flow_id = response.getId();

Task download

taskGet(int task_id)

Retrieves the description of the task with the given id.

Task task = client.taskGet(1);
String task_type = task.getTask_type();
Input[] inputs = task.getInputs();
Output[] outputs = task.getOutputs();

taskEvaluations(int task_id)

Retrieves all evaluations for the task with the given id.

TaskEvaluations response = client.taskEvaluations(1);
Evaluation[] evaluations = response.getEvaluation();

taskEvaluations(int task_id, int start, int end, int interval_size)

For data streams. Retrieves all evaluations for the task over the specified window of the stream.

TaskEvaluations response = client.taskEvaluations(1);
Evaluation[] evaluations = response.getEvaluation();

Run download

runGet(int run_id)

Retrieves the description of the run with the given id.

Run run = client.runGet(1);
int task_id = run.getTask_id();
int flow_id = run.getImplementation_id();
Parameter_setting[] settings = run.getParameter_settings()
EvaluationScore[] scores = run.getOutputEvaluation();

Run management

runDelete(int run_id)

Deletes the run with the given id (if you are its owner).

RunDelete response = client.runDelete(1);

Run upload

runUpload(Run description, Map<String,File> output_files)

Uploads a run to OpenML, including a description and a set of output files depending on the task type.

Run.Parameter_setting[] parameter_settings = new Run.Parameter_setting[1];
parameter_settings[0] = Run.Parameter_setting(null, "M", "2");
Run run = new Run("1", null, "100", "setup_string", parameter_settings);
Map outputs = new HashMap<String,File>();
outputs.add("predictions",new File("predictions.arff"));
UploadRun response = client.runUpload( run, outputs);
int run_id = response.getRun_id();

Free SQL Query

freeQuery(String sql)

Executes the given SQL query and returns the result in JSON format.

org.json.JSONObject json = client.freeQuery("SELECT name FROM dataset");

Issues

Having questions? Did you run into an issue? Let us know via the OpenML Java issue tracker.

The OpenML R package allows you to connect to the OpenML server from R scrips. This means that you can download and upload data sets and tasks, run R implementations, upload your results, and download all experiment results directly via R commands.

It is also neatly integrated into mlr (Machine Learning in R), which provides a unified interface to a large number of machine learning algorithms in R. As such, you can easily run and compare many R algorithms on all OpenML datasets, and analyse all combined results.

All in a few lines of R.

Demo

You can try it out yourself in a Jupyter Notebook running in the everware cloud. You'll need an OpenML account as well as a GitHub account for this service to work properly. It may take a few minutes to spin up.

Launch demo

Example

This example runs an mlr algorithm on an OpenML task. The first time, you need to set your API key on your machine.


  library(mlr)
  library(OpenML)
  setOMLConfig(apikey = qwertyuiop1234567890) # Only the first time

  task = getOMLTask(10)
  lrn = makeLearner("classif.rpart")
  res = runTaskMlr(task, lrn)
  run.id = uploadOMLRun(res)
  

You can of course do many experiments at once:


  # A list of OpenML task ID's
  task.ids = c(10,39)

  # A list of MLR learners
  learners = list(
      makeLearner("classif.rpart"),
      makeLearner("classif.randomForest")
      )

  # Loop
  for (lrn in learners) {
    for (id in task.ids) {
      task = getOMLTask(id)
      res = runTaskMlr(task, lrn)
      run.id = uploadOMLRun(res)
    }
  }
  

Download

The openML package can be downloaded from GitHub. It will also be available from CRAN in the near future.

Tutorial

See the tutorial for the most important functions and examples of standard use cases.

Reference

Full documentation on the packages is available from R Documentation.

Issues

Having questions? Did you run into an issue? Let us know via the OpenML R issue tracker.

The Python module allows you to connect to the OpenML server from Python programs. This means that you can download and upload OpenML dataset, tasks, run Python algorithms on them, and share the results.

It is also being integrated into scikit-learn, which provides a unified interface to a large number of machine learning algorithms in Python. As such, you can easily run and compare many algorithms on all OpenML datasets, and analyse all combined results.

All in a few lines on Python.

Demo

You can try it out yourself in a Jupyter Notebook running in the everware cloud. You'll need an OpenML account as well as a GitHub account for this service to work properly. It may take a few minutes to spin up.

Launch demo

Course

We are currently building a machine learning course with many more examples. All materials are available as Jupyter Notebooks running in the everware cloud. You'll need an OpenML account as well as a GitHub account for this service to work properly. It may take a few minutes to spin up.

Launch course

Example

This example runs an scikit-learn algorithm on an OpenML task.


    from sklearn import ensemble
    from openml import tasks,runs
    import xmltodict

    # Download task, run learner, publish results
    task = tasks.get_task(10)
    clf = ensemble.RandomForestClassifier()
    run = runs.run_task(task, clf)
    return_code, response = run.publish()

    # get the run id for reference
    if(return_code == 200):
      response_dict = xmltodict.parse(response)
      run_id = response_dict['oml:upload_run']['oml:run_id']
      print("Uploaded run with id %s. Check it at www.openml.org/r/%s" % (run_id,run_id))
  

The first time, you need to set up your config file (~/.openml/config) with your API key.


    apikey=FILL_IN_API_KEY
    cachedir=FILL_IN_CACHE_DIR
  

Also, for now, you'll need to install the developer version of the API


    git clone https://github.com/openml/openml-python.git
    git checkout develop
    python setup.py install
  

Download

The Python module can be downloaded from GitHub.

Quickstart

Check out the documentation to get started. There is also a Jupyter Notebook with examples.

Issues

Having questions? Did you run into an issue? Let us know via the OpenML Python issue tracker.
The .Net API allows you connect to OpenML from .Net applications.

Download

Stable releases of the .Net API are available via NuGet. Use the NuGet package explorer in the Visual Studia, write “Install-Package openMl” to the NuGet package manager console or download the whole package from the NuGet website and add it into your project. Or, you can check out the developer version from GitHub.

Quick Start

Create an OpenmlConnector instance with your api key. You can find this key in your account settings. This will create a client with OpenML functionalities, The functionalities mirror the OpenMlApi and not all of them are (yet) implemented. If you need some feature, don’t hesitate contact us via our Git page.

    var connector = new OpenMlConnector("YOURAPIKEY");

All OpenMlConnector methods are documented via the usual .Net comments.

Get dataset description

    var datasetDescription = connector.GetDatasetDescription(1);

List datasets

    var data = connector.ListDatasets();

Get run

    var run = connector.GetRun(1);

List task types

    var taskTypes = connector.ListTaskTypes();

Get task type

    var taskType = connector.GetTaskType(1);

List evaluation measures

    var measures = connector.ListEvaluationMeasures();

List estimation procedures

    var estimationProcs = connector.ListEstimationProcedures();

Get estimation procedure

    var estimationProc = connector.GetEstimationProcedure(1);

List data qualities

    var dataQualities = connector.ListDataQualities();

Free SQL Query

openmlFreeQuery(String sql)

Executes the given SQL query and returns the result in .Net format.

    var result=connector.ExecuteFreeQuery("SELECT name,did FROM dataset");

Issues

Having questions? Did you run into an issue? Let us know via the OpenML .Net issue tracker.
API Documentation

OpenML offers a RESTful Web API, with predictive URLs, for uploading and downloading machine learning resources. Try the API Documentation to see examples of all calls, and test them right in your browser.

Getting started

REST services can be called using simple HTTP GET or POST actions.

The REST Endpoint URL is http://www.openml.org/api/v1/

The default endpoint returns data in XML. If you prefer JSON, use the endpoint http://www.openml.org/api/v1/json/
Note that, to upload content, you still need to use XML (at least for now).

Authentication

To use the API, you need an API key. You can find it in your profile (after logging in).

You can send your api key using Basic Auth, or by adding ?api_key='your key' to your calls. If you are logged into OpenML.org, this will be done automatically (within the session).

For instance, you can call /data/{id}:
in XML: http://www.openml.org/api/v1/data/1
in JSON: http://www.openml.org/api/v1/json/data/1

Testing

For continuous integration and testing purposes, we have a test server offering the same API, but which does not affect the production server.

The REST Endpoint URL is http://test.openml.org/api/v1/

Error messages

Error messages will look like this:

<oml:error xmlns:oml="http://openml.org/error">
  <oml:code>100</oml:code>
  <oml:message>Please invoke legal function</oml:message>
  <oml:additional_information>Additional information, not always available. </oml:additional_information>
</oml:error>

All error messages are listed in the API documentation. E.g. try to get a non-existing dataset:
in XML: http://www.openml.org/api_new/v1/data/99999
in JSON: http://www.openml.org/api_new/v1/json/data/99999

Examples

You need to be logged in for these examples to work.
Download a dataset
  1. User asks for a dataset using the /data/{id} service. The dataset id is typically part of a task, or can be found on OpenML.org.
  2. OpenML returns a description of the dataset as an XML file (or JSON). Try it now
  3. The dataset description contains the URL where the dataset can be downloaded. The user calls that URL to download the dataset.
  4. The dataset is returned by the server hosting the dataset. This can be OpenML, but also any other data repository. Try it now
Download an implementation
  1. User asks for a flow using the /flow/{id} service and a flow id. The flow id can be found on OpenML.org.
  2. OpenML returns a description of the flow as an XML file (or JSON). Try it now
  3. The flow description contains the URL where the flow can be downloaded (e.g. GitHub), either as source, binary or both, as well as additional information on history, dependencies and licence. The user calls the right URL to download it.
  4. The flow is returned by the server hosting it. This can be OpenML, but also any other code repository. Try it now
Download a task
  1. User asks for a task using the /task/{id} service and a task id. The task id is typically returned when searching for tasks.
  2. OpenML returns a description of the task as an XML file (or JSON). Try it now
  3. The task description contains the dataset id(s) of the datasets involved in this task. The user asks for the dataset using the /data/{id} service and the dataset id.
  4. OpenML returns a description of the dataset as an XML file (or JSON). Try it now
  5. The dataset description contains the URL where the dataset can be downloaded. The user calls that URL to download the dataset.
  6. The dataset is returned by the server hosting it. This can be OpenML, but also any other data repository. Try it now
  7. The task description may also contain links to other resources, such as the train-test splits to be used in cross-validation. The user calls that URL to download the train-test splits.
  8. The train-test splits are returned by OpenML. Try it now

openml.authenticate

returns a session_hash, which can be used for writing to the API

Arguments
POST username (Required)
The username to be authenticated with
POST password (Required)
An md5 hash of the password, corresponding to the username
Example Response

<?xml version="1.0" encoding="UTF-8"?>
<oml:authenticate xmlns:oml="http://openml.org/openml">
  <oml:session_hash>G9MPPN114ZCZNWW2VN3JE9VF1FMV8Y5FXHUDUL4P</oml:session_hash>
  <oml:valid_until>2014-08-13 20:01:29</oml:valid_until>
  <oml:timezone>Europe/Berlin</oml:timezone>
</oml:authenticate>


Error codes
250: Please provide username
Please provide the username as a POST variable
251: Please provide password
Please provide the password (hashed as a MD5) as a POST variable
252: Authentication failed
The username and password did not match any record in the database. Please note that the password should be hashed using md5

openml.authenticate.check

checks the validity of the session hash

Arguments
POST username (Required)
The username to be authenticated with
POST session_hash (Required)
The session hash to be checked
Example Response

<?xml version="1.0" encoding="UTF-8"?>
<oml:error xmlns:oml="http://openml.org/openml">
  <oml:code>292</oml:code>
  <oml:message>Hash does not exist</oml:message>
</oml:error>


Error codes
290: Username not provided
Please provide username
291: Hash not provided
Please provide hash to be checked
292: Hash does not exist
Hash does not exist, or is not owned by this user

openml.data

Returns a list with all dataset ids in OpenML that are ready to use

Arguments
None
Example Response

<?xml version="1.0" encoding="UTF-8"?>
<oml:data xmlns:oml="http://openml.org/openml">
  <oml:did>1</oml:did>
  <oml:did>2</oml:did>
  <oml:did>3</oml:did>
  <oml:did>4</oml:did>
  <oml:did>5</oml:did>
  <oml:did>6</oml:did>
  <oml:did>7</oml:did>
  <oml:did>8</oml:did>
  <oml:did>9</oml:did>
  <oml:did>10</oml:did>
</oml:data>


Error codes
370: No datasets available
There are no valid datasets in the system. Please upload!

openml.data.description

returns dataset descriptions in XML

Arguments
GET data_id (Required)
The dataset id
Schema's
openml.data.description
This XSD schema is applicable for both uploading and downloading data.
XSD Schema
Example Response

<?xml version="1.0" encoding="UTF-8"?>
<oml:data_set_description xmlns:oml="http://openml.org/openml">
  <oml:id>1</oml:id>
  <oml:name>anneal</oml:name>
  <oml:version>1</oml:version>
  <oml:description>This is a preprocessed version of the <a href="d/2">anneal.ORIG</a> dataset. All missing values are threated as a nominal value with label '?'. (Quotes for clarity). The original version of this dataset can be found with the name anneal.ORIG.

1. Title of Database: Annealing Data

 2. Source Information: donated by David Sterling and Wray Buntine.

 3. Past Usage: unknown

 4. Relevant Information:
    -- Explanation: I suspect this was left by Ross Quinlan in 1987 at the
       4th Machine Learning Workshop.  I'd have to check with Jeff Schlimmer
       to double check this.
  </oml:description>
  <oml:format>ARFF</oml:format>
  <oml:upload_date>2014-04-06 23:19:20</oml:upload_date>
  <oml:licence>public domain</oml:licence>
  <oml:url>http://openml.liacs.nl/files/download/1/dataset_1_anneal.arff</oml:url>
  <oml:md5_checksum>08dc9d6bf8e5196de0d56bfc89631931</oml:md5_checksum>
</oml:data_set_description>


Error codes
110: Please provide data_id
Please provide data_id
111: Unknown dataset
Data set description with data_id was not found in the database

openml.data.upload

Uploads a dataset

Arguments
POST description (Required)
An XML file containing the data set description
POST dataset (Required)
The dataset file to be stored on the server
POST session_hash (Required)
The session hash, provided by the server on authentication (1 hour valid)
Schema's
openml.data.upload
This XSD schema is applicable for both uploading and downloading data, hence some fields are not used.
XSD Schema
Error codes
130: Problem with file uploading
There was a problem with the file upload
131: Problem validating uploaded description file
The XML description format does not meet the standards
132: Failed to move the files
Internal server error, please contact api administrators
133: Failed to make checksum of datafile
Internal server error, please contact api administrators
134: Failed to insert record in database
Internal server error, please contact api administrators
135: Please provide description xml
Please provide description xml
136: Error slot open
Error slot open, will be filled by not yet defined error
137: Please provide session_hash
In order to share content, please authenticate (openml.authenticate) and provide session_hash
138: Authentication failed
The session_hash was not valid. Please try to login again, or contact api administrators
139: Combination name / version already exists
The combination of name and version of this dataset already exists. Leave version out for auto increment
140: Both dataset file and dataset url provided. Please provide only one
The system is confused since both a dataset file (post) and a dataset url (xml) are provided. Please remove one
141: Neither dataset file or dataset url are provided
Please provide either a dataset file as POST variable, xor a dataset url in the description XML
142: Error in processing arff file. Can be a syntax error, or the specified target feature does not exists
For now, we only check on arff files. If a dataset is claimed to be in such a format, and it can not be parsed, this error is returned.
143: Suggested target feature not legal
It is possible to suggest a default target feature (for predictive tasks). However, it should be provided in the data.

openml.data.delete

Deletes a dataset. Can only be done if the dataset is not used in tasks

Arguments
POST session_hash (Required)
The session hash to authenticate with
POST data_id (Required)
The dataset to be deleted
Error codes
350: Please provide session_hash
In order to remove your content, please authenticate (openml.authenticate) and provide session_hash
351: Authentication failed
The session_hash was not valid. Please try to login again, or contact api administrators
352: Dataset does not exists
The data id could not be linked to an existing dataset.
353: Dataset is not owned by you
The dataset was owned by another user. Hence you cannot delete it.
354: Dataset is in use by other content. Can not be deleted
The data is used in runs. Delete this other content before deleting this dataset.
355: Deleting dataset failed.
Deleting the dataset failed. Please contact support team.

openml.data.licences

Gives a list of all data licences used in OpenML

Arguments
None
Example Response

<?xml version="1.0" encoding="UTF-8"?>
<oml:data_licences xmlns:oml="http://openml.org/openml">
  <oml:licences>
    <oml:licence>public domain</oml:licence>
    <oml:licence>UCI</oml:licence>
  </oml:licences>
</oml:data_licences>


Error codes
None

openml.data.features

Returns the features (attributes) of a given dataset

Arguments
GET data_id (Required)
The dataset id
Schema's
openml.data.features
-
XSD Schema
Example Response

<?xml version="1.0" encoding="UTF-8"?>
<oml:data_features xmlns:oml="http://openml.org/openml">
  <oml:feature>
    <oml:name>family</oml:name>
    <oml:data_type>nominal</oml:data_type>
    <oml:index>0</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>product-type</oml:name>
    <oml:data_type>nominal</oml:data_type>
    <oml:index>1</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>steel</oml:name>
    <oml:data_type>nominal</oml:data_type>
    <oml:index>2</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>carbon</oml:name>
    <oml:data_type>numeric</oml:data_type>
    <oml:index>3</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>hardness</oml:name>
    <oml:data_type>numeric</oml:data_type>
    <oml:index>4</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>temper_rolling</oml:name>
    <oml:data_type>nominal</oml:data_type>
    <oml:index>5</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>condition</oml:name>
    <oml:data_type>nominal</oml:data_type>
    <oml:index>6</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>formability</oml:name>
    <oml:data_type>nominal</oml:data_type>
    <oml:index>7</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>strength</oml:name>
    <oml:data_type>numeric</oml:data_type>
    <oml:index>8</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>non-ageing</oml:name>
    <oml:data_type>nominal</oml:data_type>
    <oml:index>9</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>surface-finish</oml:name>
    <oml:data_type>nominal</oml:data_type>
    <oml:index>10</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>surface-quality</oml:name>
    <oml:data_type>nominal</oml:data_type>
    <oml:index>11</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>enamelability</oml:name>
    <oml:data_type>nominal</oml:data_type>
    <oml:index>12</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>bc</oml:name>
    <oml:data_type>nominal</oml:data_type>
    <oml:index>13</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>bf</oml:name>
    <oml:data_type>nominal</oml:data_type>
    <oml:index>14</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>bt</oml:name>
    <oml:data_type>nominal</oml:data_type>
    <oml:index>15</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>bw%2Fme</oml:name>
    <oml:data_type>nominal</oml:data_type>
    <oml:index>16</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>bl</oml:name>
    <oml:data_type>nominal</oml:data_type>
    <oml:index>17</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>m</oml:name>
    <oml:data_type>nominal</oml:data_type>
    <oml:index>18</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>chrom</oml:name>
    <oml:data_type>nominal</oml:data_type>
    <oml:index>19</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>phos</oml:name>
    <oml:data_type>nominal</oml:data_type>
    <oml:index>20</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>cbond</oml:name>
    <oml:data_type>nominal</oml:data_type>
    <oml:index>21</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>marvi</oml:name>
    <oml:data_type>nominal</oml:data_type>
    <oml:index>22</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>exptl</oml:name>
    <oml:data_type>nominal</oml:data_type>
    <oml:index>23</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>ferro</oml:name>
    <oml:data_type>nominal</oml:data_type>
    <oml:index>24</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>corr</oml:name>
    <oml:data_type>nominal</oml:data_type>
    <oml:index>25</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>blue%2Fbright%2Fvarn%2Fclean</oml:name>
    <oml:data_type>nominal</oml:data_type>
    <oml:index>26</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>lustre</oml:name>
    <oml:data_type>nominal</oml:data_type>
    <oml:index>27</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>jurofm</oml:name>
    <oml:data_type>nominal</oml:data_type>
    <oml:index>28</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>s</oml:name>
    <oml:data_type>nominal</oml:data_type>
    <oml:index>29</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>p</oml:name>
    <oml:data_type>nominal</oml:data_type>
    <oml:index>30</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>shape</oml:name>
    <oml:data_type>nominal</oml:data_type>
    <oml:index>31</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>thick</oml:name>
    <oml:data_type>numeric</oml:data_type>
    <oml:index>32</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>width</oml:name>
    <oml:data_type>numeric</oml:data_type>
    <oml:index>33</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>len</oml:name>
    <oml:data_type>numeric</oml:data_type>
    <oml:index>34</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>oil</oml:name>
    <oml:data_type>nominal</oml:data_type>
    <oml:index>35</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>bore</oml:name>
    <oml:data_type>nominal</oml:data_type>
    <oml:index>36</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>packing</oml:name>
    <oml:data_type>nominal</oml:data_type>
    <oml:index>37</oml:index>
  </oml:feature>
  <oml:feature>
    <oml:name>class</oml:name>
    <oml:data_type>nominal</oml:data_type>
    <oml:index>38</oml:index>
  </oml:feature>
</oml:data_features>


Error codes
270: Please provide data_id
Please provide data_id
271: Unknown dataset
Data set description with data_id was not found in the database
272: No features found
The registered dataset did not contain any features
273: Dataset not processed yet
The dataset was not processed yet, no features are available. Please wait for a few minutes.
274: Dataset processed with error
The feature extractor has run into an error while processing the dataset. Please check whether it is a valid supported file.

openml.data.qualities

Returns the qualities (meta-features) of a given dataset

Arguments
GET data_id (Required)
The dataset id
Schema's
openml.data.qualities
-
XSD Schema
Example Response

<?xml version="1.0" encoding="UTF-8"?>
<oml:data_qualities xmlns:oml="http://openml.org/openml">
  <oml:quality>
    <oml:name>ClassCount</oml:name>
    <oml:value>6.0</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>ClassEntropy</oml:name>
    <oml:value>-1.0</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>DecisionStumpAUC</oml:name>
    <oml:value>0.822828217876869</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>DecisionStumpErrRate</oml:name>
    <oml:value>22.828507795100222</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>DecisionStumpKappa</oml:name>
    <oml:value>0.4503332218612649</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>DefaultAccuracy</oml:name>
    <oml:value>0.76169265033408</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>DefaultTargetNominal</oml:name>
    <oml:value>1</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>DefaultTargetNumerical</oml:name>
    <oml:value>0</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>Dimensionality</oml:name>
    <oml:value>0.043429844097995544</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>EquivalentNumberOfAtts</oml:name>
    <oml:value>-12.218452122298707</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>IncompleteInstanceCount</oml:name>
    <oml:value>0.0</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>InstanceCount</oml:name>
    <oml:value>898.0</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>J48.00001.AUC</oml:name>
    <oml:value>0.7880182273644211</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>J48.00001.ErrRate</oml:name>
    <oml:value>12.249443207126948</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>J48.00001.kappa</oml:name>
    <oml:value>0.6371863763080279</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>J48.0001.AUC</oml:name>
    <oml:value>0.9270456597451915</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>J48.0001.ErrRate</oml:name>
    <oml:value>7.795100222717149</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>J48.0001.kappa</oml:name>
    <oml:value>0.7894969492796818</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>J48.001.AUC</oml:name>
    <oml:value>0.9270456597451915</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>J48.001.ErrRate</oml:name>
    <oml:value>7.795100222717149</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>J48.001.kappa</oml:name>
    <oml:value>0.7894969492796818</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>MajorityClassSize</oml:name>
    <oml:value>684</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>MaxNominalAttDistinctValues</oml:name>
    <oml:value>10.0</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>MeanAttributeEntropy</oml:name>
    <oml:value>-1.0</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>MeanKurtosisOfNumericAtts</oml:name>
    <oml:value>4.6070302750191185</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>MeanMeansOfNumericAtts</oml:name>
    <oml:value>348.50426818856744</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>MeanMutualInformation</oml:name>
    <oml:value>0.0818434274645147</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>MeanNominalAttDistinctValues</oml:name>
    <oml:value>3.21875</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>MeanSkewnessOfNumericAtts</oml:name>
    <oml:value>2.022468153229902</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>MeanStdDevOfNumericAtts</oml:name>
    <oml:value>405.17326983790934</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>MinNominalAttDistinctValues</oml:name>
    <oml:value>2.0</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>MinorityClassSize</oml:name>
    <oml:value>0</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>NBAUC</oml:name>
    <oml:value>0.9594224101963532</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>NBErrRate</oml:name>
    <oml:value>13.808463251670378</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>NBKappa</oml:name>
    <oml:value>0.7185564873649677</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>NegativePercentage</oml:name>
    <oml:value>0.7616926503340757</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>NoiseToSignalRatio</oml:name>
    <oml:value>-13.218452122298709</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>NumAttributes</oml:name>
    <oml:value>39.0</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>NumBinaryAtts</oml:name>
    <oml:value>19.0</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>NumMissingValues</oml:name>
    <oml:value>0.0</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>NumNominalAtts</oml:name>
    <oml:value>32.0</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>NumNumericAtts</oml:name>
    <oml:value>6.0</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>NumberOfClasses</oml:name>
    <oml:value>6</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>NumberOfFeatures</oml:name>
    <oml:value>39</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>NumberOfInstances</oml:name>
    <oml:value>898</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>NumberOfInstancesWithMissingValues</oml:name>
    <oml:value>0</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>NumberOfMissingValues</oml:name>
    <oml:value>0</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>NumberOfNumericFeatures</oml:name>
    <oml:value>6</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>PercentageOfBinaryAtts</oml:name>
    <oml:value>0.48717948717948717</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>PercentageOfMissingValues</oml:name>
    <oml:value>0.0</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>PercentageOfNominalAtts</oml:name>
    <oml:value>0.8205128205128205</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>PercentageOfNumericAtts</oml:name>
    <oml:value>0.15384615384615385</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>PositivePercentage</oml:name>
    <oml:value>0.0</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>REPTreeDepth1AUC</oml:name>
    <oml:value>0.7597968469351692</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>REPTreeDepth1ErrRate</oml:name>
    <oml:value>23.2739420935412</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>REPTreeDepth1Kappa</oml:name>
    <oml:value>0.2894251628951225</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>REPTreeDepth2AUC</oml:name>
    <oml:value>0.9666861764236521</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>REPTreeDepth2ErrRate</oml:name>
    <oml:value>6.7928730512249444</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>REPTreeDepth2Kappa</oml:name>
    <oml:value>0.832482668142716</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>REPTreeDepth3AUC</oml:name>
    <oml:value>0.9924792906738309</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>REPTreeDepth3ErrRate</oml:name>
    <oml:value>2.5612472160356345</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>REPTreeDepth3Kappa</oml:name>
    <oml:value>0.9353873971951361</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>RandomTreeDepth1AUC_K=0</oml:name>
    <oml:value>0.813070621364688</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>RandomTreeDepth2AUC_K=0</oml:name>
    <oml:value>0.8907193338317052</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>RandomTreeDepth3AUC_K=0</oml:name>
    <oml:value>0.9701947883881082</oml:value>
  </oml:quality>
  <oml:quality>
    <oml:name>StdvNominalAttDistinctValues</oml:name>
    <oml:value>2.0593512132112965</oml:value>
  </oml:quality>
</oml:data_qualities>


Error codes
360: Please provide data_id
Please provide data_id
361: Unknown dataset
Data set description with data_id was not found in the database
362: No qualities found
The registered dataset did not contain any calculated qualities
363: Dataset not processed yet
The dataset was not processed yet, no qualities are available. Please wait for a few minutes.
364: Dataset processed with error
The quality calculator has run into an error while processing the dataset. Please check whether it is a valid supported file.
365: Interval start or end illegal
There was a problem with the interval start or end.

openml.data.qualities.list

Lists all data qualities that are used (i.e., are calculated for at least one dataset)

Arguments
None
Example Response

<?xml version="1.0" encoding="UTF-8"?>
<oml:data_qualities_list xmlns:oml="http://openml.org/openml">
  <oml:quality>ClassCount</oml:quality>
  <oml:quality>ClassEntropy</oml:quality>
  <oml:quality>DecisionStumpAUC</oml:quality>
  <oml:quality>DecisionStumpErrRate</oml:quality>
  <oml:quality>DecisionStumpKappa</oml:quality>
  <oml:quality>DefaultAccuracy</oml:quality>
  <oml:quality>Dimensionality</oml:quality>
  <oml:quality>EquivalentNumberOfAtts</oml:quality>
  <oml:quality>HoeffdingAdwin.changes</oml:quality>
  <oml:quality>HoeffdingAdwin.warnings</oml:quality>
  <oml:quality>HoeffdingDDM.changes</oml:quality>
  <oml:quality>HoeffdingDDM.warnings</oml:quality>
  <oml:quality>IncompleteInstanceCount</oml:quality>
  <oml:quality>InstanceCount</oml:quality>
  <oml:quality>J48.00001.AUC</oml:quality>
  <oml:quality>J48.00001.ErrRate</oml:quality>
  <oml:quality>J48.00001.kappa</oml:quality>
  <oml:quality>J48.0001.AUC</oml:quality>
  <oml:quality>J48.0001.ErrRate</oml:quality>
  <oml:quality>J48.0001.kappa</oml:quality>
  <oml:quality>J48.001.AUC</oml:quality>
  <oml:quality>J48.001.ErrRate</oml:quality>
  <oml:quality>J48.001.kappa</oml:quality>
  <oml:quality>MajorityClassSize</oml:quality>
  <oml:quality>MaxNominalAttDistinctValues</oml:quality>
  <oml:quality>MeanAttributeEntropy</oml:quality>
  <oml:quality>MeanKurtosisOfNumericAtts</oml:quality>
  <oml:quality>MeanMeansOfNumericAtts</oml:quality>
  <oml:quality>MeanMutualInformation</oml:quality>
  <oml:quality>MeanNominalAttDistinctValues</oml:quality>
  <oml:quality>MeanSkewnessOfNumericAtts</oml:quality>
  <oml:quality>MeanStdDevOfNumericAtts</oml:quality>
  <oml:quality>MinNominalAttDistinctValues</oml:quality>
  <oml:quality>MinorityClassSize</oml:quality>
  <oml:quality>NBAUC</oml:quality>
  <oml:quality>NBErrRate</oml:quality>
  <oml:quality>NBKappa</oml:quality>
  <oml:quality>NaiveBayesAdwin.changes</oml:quality>
  <oml:quality>NaiveBayesAdwin.warnings</oml:quality>
  <oml:quality>NaiveBayesDdm.changes</oml:quality>
  <oml:quality>NaiveBayesDdm.warnings</oml:quality>
  <oml:quality>NegativePercentage</oml:quality>
  <oml:quality>NoiseToSignalRatio</oml:quality>
  <oml:quality>NumAttributes</oml:quality>
  <oml:quality>NumBinaryAtts</oml:quality>
  <oml:quality>NumMissingValues</oml:quality>
  <oml:quality>NumNominalAtts</oml:quality>
  <oml:quality>NumNumericAtts</oml:quality>
  <oml:quality>NumberOfClasses</oml:quality>
  <oml:quality>NumberOfFeatures</oml:quality>
  <oml:quality>NumberOfInstances</oml:quality>
  <oml:quality>NumberOfInstancesWithMissingValues</oml:quality>
  <oml:quality>NumberOfMissingValues</oml:quality>
  <oml:quality>NumberOfNumericFeatures</oml:quality>
  <oml:quality>PercentageOfBinaryAtts</oml:quality>
  <oml:quality>PercentageOfMissingValues</oml:quality>
  <oml:quality>PercentageOfNominalAtts</oml:quality>
  <oml:quality>PercentageOfNumericAtts</oml:quality>
  <oml:quality>PositivePercentage</oml:quality>
  <oml:quality>REPTreeDepth1AUC</oml:quality>
  <oml:quality>REPTreeDepth1ErrRate</oml:quality>
  <oml:quality>REPTreeDepth1Kappa</oml:quality>
  <oml:quality>REPTreeDepth2AUC</oml:quality>
  <oml:quality>REPTreeDepth2ErrRate</oml:quality>
  <oml:quality>REPTreeDepth2Kappa</oml:quality>
  <oml:quality>REPTreeDepth3AUC</oml:quality>
  <oml:quality>REPTreeDepth3ErrRate</oml:quality>
  <oml:quality>REPTreeDepth3Kappa</oml:quality>
  <oml:quality>RandomTreeDepth1AUC_K=0</oml:quality>
  <oml:quality>RandomTreeDepth2AUC_K=0</oml:quality>
  <oml:quality>RandomTreeDepth3AUC_K=0</oml:quality>
  <oml:quality>StdvNominalAttDistinctValues</oml:quality>
</oml:data_qualities_list>


Error codes
None

openml.task.evaluations

Returns the performance of flows on a given task

Arguments
GET task_id (Required)
the task id
Example Response

<?xml version="1.0" encoding="UTF-8"?>
<oml:task_evaluations xmlns:oml="http://openml.org/openml">
  <oml:task_id/>
  <oml:task_name/>
  <oml:task_type_id/>
  <oml:input_data>1</oml:input_data>
  <oml:estimation_procedure>10-fold Crossvalidation</oml:estimation_procedure>
  <oml:evaluation>


Error codes
300: Please provide task_id
Please provide task_id
301: Unknown task
The task with this id was not found in the database

openml.task.types

Returns a list of all task types

Arguments
None
Example Response

<?xml version="1.0" encoding="UTF-8"?>
<oml:task_types xmlns:oml="http://openml.org/openml">
  <oml:task_type>
    <oml:id>1</oml:id>
    <oml:name>Supervised Classification</oml:name>
    <oml:description>In supervised classification, you are given an input dataset in which instances are labeled with a certain class. The goal is to build a model that predicts the class for future unlabeled instances. The model is evaluated using a train-test procedure, e.g. cross-validation.<br><br>

To make results by different users comparable, you are given the exact train-test folds to be used, and you need to return at least the predictions generated by your model for each of the test instances. OpenML will use these predictions to calculate a range of evaluation measures on the server.<br><br>

You can also upload your own evaluation measures, provided that the code for doing so is available from the implementation used. For extremely large datasets, it may be infeasible to upload all predictions. In those cases, you need to compute and provide the evaluations yourself.<br><br>

Optionally, you can upload the model trained on all the input data. There is no restriction on the file format, but please use a well-known format or PMML.</oml:description>
    <oml:creator>Joaquin Vanschoren, Jan van Rijn, Luis Torgo, Bernd Bischl</oml:creator>
  </oml:task_type>
  <oml:task_type>
    <oml:id>2</oml:id>
    <oml:name>Supervised Regression</oml:name>
    <oml:description>Given a dataset with a numeric target and a set of train/test splits, e.g. generated by a cross-validation procedure, train a model and return the predictions of that model.</oml:description>
    <oml:creator>Joaquin Vanschoren, Jan van Rijn, Luis Torgo, Bernd Bischl</oml:creator>
  </oml:task_type>
  <oml:task_type>
    <oml:id>3</oml:id>
    <oml:name>Learning Curve</oml:name>
    <oml:description>Given a dataset with a nominal target, various data samples of increasing size are defined. A model is build for each individual data sample; from this a learning curve can be drawn. </oml:description>
    <oml:creator>Pavel Brazdil, Jan van Rijn, Joaquin Vanschoren</oml:creator>
  </oml:task_type>
  <oml:task_type>
    <oml:id>4</oml:id>
    <oml:name>Supervised Data Stream Classification</oml:name>
    <oml:description>Given a dataset with a nominal target, various data samples of increasing size are defined. A model is build for each individual data sample; from this a learning curve can be drawn.</oml:description>
    <oml:creator>Geoffrey Holmes, Bernhard Pfahringer, Jan van Rijn, Joaquin Vanschoren</oml:creator>
  </oml:task_type>
</oml:task_types>


Error codes
None

openml.estimationprocedure.get

returns the details of an estimation procedure

Arguments
GET estimationprocedure_id (Required)
The id of the estimation procedure
Example Response

<?xml version="1.0" encoding="UTF-8"?>
<oml:estimationprocedure xmlns:oml="http://openml.org/openml">
  <oml:ttid>1</oml:ttid>
  <oml:name>10-fold Crossvalidation</oml:name>
  <oml:type>crossvalidation</oml:type>
  <oml:repeats>1</oml:repeats>
  <oml:folds>10</oml:folds>
  <oml:stratified_sampling>true</oml:stratified_sampling>
</oml:estimationprocedure>


Error codes
440: Please provide estimationprocedure_id
Please provide estimationprocedure_id
441: estimationprocedure_id not valid
Please provide a valid estimationprocedure_id

openml.implementation.get

Returns the description of an implementation (flow)

Arguments
GET implementation_id (Required)
The id of the implementation
Schema's
openml.implementation.get
This XSD schema is applicable for both uploading and downloading a implementation.
XSD Schema
Example Response

<?xml version="1.0" encoding="UTF-8"?>
<oml:implementation xmlns:oml="http://openml.org/openml">
  <oml:id>100</oml:id>
  <oml:uploader>1</oml:uploader>
  <oml:name>weka.J48</oml:name>
  <oml:version>2</oml:version>
  <oml:external_version>Weka_3.7.5_9117</oml:external_version>
  <oml:description>Ross Quinlan (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA.</oml:description>
  <oml:upload_date>2014-04-23 18:00:36</oml:upload_date>
  <oml:language>English</oml:language>
  <oml:dependencies>Weka_3.7.5</oml:dependencies>
  <oml:parameter>
    <oml:name>A</oml:name>
    <oml:data_type>flag</oml:data_type>
    <oml:default_value/>
    <oml:description>Laplace smoothing for predicted probabilities.</oml:description>
  </oml:parameter>
  <oml:parameter>
    <oml:name>B</oml:name>
    <oml:data_type>flag</oml:data_type>
    <oml:default_value/>
    <oml:description>Use binary splits only.</oml:description>
  </oml:parameter>
  <oml:parameter>
    <oml:name>C</oml:name>
    <oml:data_type>option</oml:data_type>
    <oml:default_value>0.25</oml:default_value>
    <oml:description>Set confidence threshold for pruning.
	(default 0.25)</oml:description>
  </oml:parameter>
  <oml:parameter>
    <oml:name>J</oml:name>
    <oml:data_type>flag</oml:data_type>
    <oml:default_value/>
    <oml:description>Do not use MDL correction for info gain on numeric attributes.</oml:description>
  </oml:parameter>
  <oml:parameter>
    <oml:name>L</oml:name>
    <oml:data_type>flag</oml:data_type>
    <oml:default_value/>
    <oml:description>Do not clean up after the tree has been built.</oml:description>
  </oml:parameter>
  <oml:parameter>
    <oml:name>M</oml:name>
    <oml:data_type>option</oml:data_type>
    <oml:default_value>2</oml:default_value>
    <oml:description>Set minimum number of instances per leaf.
	(default 2)</oml:description>
  </oml:parameter>
  <oml:parameter>
    <oml:name>N</oml:name>
    <oml:data_type>option</oml:data_type>
    <oml:default_value/>
    <oml:description>Set number of folds for reduced error
	pruning. One fold is used as pruning set.
	(default 3)</oml:description>
  </oml:parameter>
  <oml:parameter>
    <oml:name>O</oml:name>
    <oml:data_type>flag</oml:data_type>
    <oml:default_value/>
    <oml:description>Do not collapse tree.</oml:description>
  </oml:parameter>
  <oml:parameter>
    <oml:name>Q</oml:name>
    <oml:data_type>option</oml:data_type>
    <oml:default_value/>
    <oml:description>Seed for random data shuffling (default 1).</oml:description>
  </oml:parameter>
  <oml:parameter>
    <oml:name>R</oml:name>
    <oml:data_type>flag</oml:data_type>
    <oml:default_value/>
    <oml:description>Use reduced error pruning.</oml:description>
  </oml:parameter>
  <oml:parameter>
    <oml:name>S</oml:name>
    <oml:data_type>flag</oml:data_type>
    <oml:default_value/>
    <oml:description>Don't perform subtree raising.</oml:description>
  </oml:parameter>
  <oml:parameter>
    <oml:name>U</oml:name>
    <oml:data_type>flag</oml:data_type>
    <oml:default_value/>
    <oml:description>Use unpruned tree.</oml:description>
  </oml:parameter>
</oml:implementation>


Error codes
180: Please provide implementation_id
Please provide implementation_id
181: Unknown implementation
The implementation with this ID was not found in the database

openml.implementation.exists

A utility function that checks whether an implementation already exists. Mainly used by workbenches

Arguments
GET name (Required)
The name of the implementation
GET external_version (Required)
The (workbench) version of the implementation. This is generally based on conventions per workbench
Example Response

<?xml version="1.0" encoding="UTF-8"?>
<oml:error xmlns:oml="http://openml.org/openml">
  <oml:code>180</oml:code>
  <oml:message>Please provide implementation_id</oml:message>
</oml:error>


Error codes
330: Mandatory fields not present.
Please provide one of the following mandatory field combination: name and external_version.

openml.implementation.upload

Uploads an implementation to OpenML

Arguments
POST description (Required)
An XML file containing the implementation meta data
POST source
The source code of the implementation. If multiple files, please zip them. Either source or binary is required.
POST binary
The binary of the implementation. If multiple files, please zip them. Either source or binary is required.
POST session_hash (Required)
The session hash, provided by the server on authentication (1 hour valid)
Schema's
openml.implementation.upload
This XSD schema is applicable for both uploading and downloading a implementation. (Some fields are ignored)
XSD Schema
Error codes
160: Error in file uploading
There was a problem with the file upload
161: Please provide description xml
Please provide description xml
162: Please provide source or binary file
Please provide source or binary file. It is also allowed to upload both
163: Problem validating uploaded description file
The XML description format does not meet the standards
164: Implementation already stored in database
Please change name or version number
165: Failed to move the files
Internal server error, please contact api administrators
166: Failed to add implementation to database
Internal server error, please contact api administrators
167: Illegal files uploaded
An non required file was uploaded.
168: The provided md5 hash equals not the server generated md5 hash of the file
The provided md5 hash equals not the server generated md5 hash of the file
169: Please provide session_hash
In order to share content, please authenticate (openml.authenticate) and provide session_hash
170: Authentication failed
The session_hash was not valid. Please try to login again, or contact api administrators
171: Implementation already exists
This implementation is already in the database

openml.implementation.owned

Returns a list of all implementations owned by the user

Arguments
POST session_hash (Required)
The session hash, provided by the server on authentication (1 hour valid)
Error codes
310: Please provide session_hash
In order to view private content, please authenticate (openml.authenticate) and provide session_hash
311: Authentication failed
The session_hash was not valid. Please try to login again, or contact api administrators
312: No implementations owned by this used
The user has no implementations linked to his account

openml.implementation.delete

Deletes an implementation (can only be done to owned implementations)

Arguments
POST session_hash (Required)
The session hash, provided by the server on authentication (1 hour valid)
POST implementation_id (Required)
The id of the implementation to delete
Error codes
320: Please provide session_hash
In order to remove your content, please authenticate (openml.authenticate) and provide session_hash
321: Authentication failed
The session_hash was not valid. Please try to login again, or contact api administrators
322: Implementation does not exists
The implementation id could not be linked to an existing implementation.
323: Implementation is not owned by you
The implementation was owned by another user. Hence you cannot delete it.
324: Implementation is in use by other content. Can not be deleted
The implementation is used in runs, evaluations or as component of another implementation. Delete this other content before deleting this implementation.
325: Deleting implementation failed.
Deleting the implementation failed. Please contact support team.

openml.implementation.licences

Returns a list of all used licences in the implementations

Arguments
None
Example Response

<?xml version="1.0" encoding="UTF-8"?>
<oml:implementation_licences xmlns:oml="http://openml.org/openml">
  <oml:licences>
    <oml:licence>public domain</oml:licence>
    <oml:licence>NA</oml:licence>
  </oml:licences>
</oml:implementation_licences>


Error codes
None

openml.evaluation.measures

Returns a list of all evaluation measures

Arguments
None
Example Response

<?xml version="1.0" encoding="UTF-8"?>
<oml:evaluation_measures xmlns:oml="http://openml.org/openml">
  <oml:measures>
    <oml:measure>area_under_roc_curve</oml:measure>
    <oml:measure>average_cost</oml:measure>
    <oml:measure>build_cpu_time</oml:measure>
    <oml:measure>build_memory</oml:measure>
    <oml:measure>class_complexity</oml:measure>
    <oml:measure>class_complexity_gain</oml:measure>
    <oml:measure>confusion_matrix</oml:measure>
    <oml:measure>correlation_coefficient</oml:measure>
    <oml:measure>f_measure</oml:measure>
    <oml:measure>kappa</oml:measure>
    <oml:measure>kb_relative_information_score</oml:measure>
    <oml:measure>kohavi_wolpert_bias_squared</oml:measure>
    <oml:measure>kohavi_wolpert_error</oml:measure>
    <oml:measure>kohavi_wolpert_sigma_squared</oml:measure>
    <oml:measure>kohavi_wolpert_variance</oml:measure>
    <oml:measure>kononenko_bratko_information_score</oml:measure>
    <oml:measure>matthews_correlation_coefficient</oml:measure>
    <oml:measure>mean_absolute_error</oml:measure>
    <oml:measure>mean_class_complexity</oml:measure>
    <oml:measure>mean_class_complexity_gain</oml:measure>
    <oml:measure>mean_f_measure</oml:measure>
    <oml:measure>mean_kononenko_bratko_information_score</oml:measure>
    <oml:measure>mean_precision</oml:measure>
    <oml:measure>mean_prior_absolute_error</oml:measure>
    <oml:measure>mean_prior_class_complexity</oml:measure>
    <oml:measure>mean_recall</oml:measure>
    <oml:measure>mean_weighted_area_under_roc_curve</oml:measure>
    <oml:measure>mean_weighted_f_measure</oml:measure>
    <oml:measure>mean_weighted_precision</oml:measure>
    <oml:measure>mean_weighted_recall</oml:measure>
    <oml:measure>number_of_instances</oml:measure>
    <oml:measure>os_information</oml:measure>
    <oml:measure>precision</oml:measure>
    <oml:measure>predictive_accuracy</oml:measure>
    <oml:measure>prior_class_complexity</oml:measure>
    <oml:measure>prior_entropy</oml:measure>
    <oml:measure>ram_hours</oml:measure>
    <oml:measure>recall</oml:measure>
    <oml:measure>relative_absolute_error</oml:measure>
    <oml:measure>root_mean_prior_squared_error</oml:measure>
    <oml:measure>root_mean_squared_error</oml:measure>
    <oml:measure>root_relative_squared_error</oml:measure>
    <oml:measure>run_cpu_time</oml:measure>
    <oml:measure>run_memory</oml:measure>
    <oml:measure>run_virtual_memory</oml:measure>
    <oml:measure>scimark_benchmark</oml:measure>
    <oml:measure>single_point_area_under_roc_curve</oml:measure>
    <oml:measure>total_cost</oml:measure>
    <oml:measure>unclassified_instance_count</oml:measure>
    <oml:measure>webb_bias</oml:measure>
    <oml:measure>webb_error</oml:measure>
    <oml:measure>webb_variance</oml:measure>
  </oml:measures>
</oml:evaluation_measures>


Error codes
None

openml.run.get

Returns the details of a specific run

Arguments
GET run_id (Required)
The id of the run
Schema's
openml.run.get
This XSD schema is applicable for both uploading and downloading run details.
XSD Schema
Example Response

<?xml version="1.0" encoding="UTF-8"?>
<oml:run xmlns:oml="http://openml.org/openml">
  <oml:run_id>1</oml:run_id>
  <oml:uploader>1</oml:uploader>
  <oml:task_id>68</oml:task_id>
  <oml:implementation_id>61</oml:implementation_id>
  <oml:setup_id>6</oml:setup_id>
  <oml:setup_string>weka.classifiers.trees.REPTree -- -M 2 -V 0.001 -N 3 -S 1 -L -1 -I 0.0</oml:setup_string>
  <oml:parameter_setting>
    <oml:name>61_I</oml:name>
    <oml:value>0.0</oml:value>
  </oml:parameter_setting>
  <oml:parameter_setting>
    <oml:name>61_L</oml:name>
    <oml:value>-1</oml:value>
  </oml:parameter_setting>
  <oml:parameter_setting>
    <oml:name>61_M</oml:name>
    <oml:value>2</oml:value>
  </oml:parameter_setting>
  <oml:parameter_setting>
    <oml:name>61_N</oml:name>
    <oml:value>3</oml:value>
  </oml:parameter_setting>
  <oml:parameter_setting>
    <oml:name>61_S</oml:name>
    <oml:value>1</oml:value>
  </oml:parameter_setting>
  <oml:parameter_setting>
    <oml:name>61_V</oml:name>
    <oml:value>0.001</oml:value>
  </oml:parameter_setting>
  <oml:input_data>
    <oml:dataset>
      <oml:did>9</oml:did>
      <oml:name>autos</oml:name>
      <oml:url>http://openml.liacs.nl/files/download/9/dataset_9_autos.arff</oml:url>
    </oml:dataset>
  </oml:input_data>
  <oml:output_data>
    <oml:file>
      <oml:did>63</oml:did>
      <oml:name>description</oml:name>
      <oml:url>http://openml.liacs.nl/data/download/63/weka_generated_run5258986433356798974.xml</oml:url>
    </oml:file>
    <oml:file>
      <oml:did>64</oml:did>
      <oml:name>predictions</oml:name>
      <oml:url>http://openml.liacs.nl/data/download/64/weka_generated_predictions5823074444642592781.arff</oml:url>
    </oml:file>
    <oml:evaluation>
      <oml:name>area_under_roc_curve</oml:name>
      <oml:implementation>4</oml:implementation>
      <oml:value>0.786876</oml:value>
      <oml:array_data>[�,0.976312,0.861162,0.815581,0.745833,0.756304,0.75239]</oml:array_data>
    </oml:evaluation>
    <oml:evaluation>
      <oml:name>confusion_matrix</oml:name>
      <oml:implementation>10</oml:implementation>
      <oml:array_data>[[0,0,0,0,0,0,0],[0,3,135,12,0,0,0],[0,31,698,178,161,18,14],[0,0,160,2464,510,198,18],[0,0,105,886,1398,127,184],[0,0,56,578,317,532,117],[0,0,68,237,440,267,338]]</oml:array_data>
    </oml:evaluation>
    <oml:evaluation>
      <oml:name>f_measure</oml:name>
      <oml:implementation>12</oml:implementation>
      <oml:value>0.511938</oml:value>
      <oml:array_data>[0,0.032609,0.601206,0.639585,0.505972,0.388038,0.334488]</oml:array_data>
    </oml:evaluation>
    <oml:evaluation>
      <oml:name>kappa</oml:name>
      <oml:implementation>13</oml:implementation>
      <oml:value>0.373111</oml:value>
    </oml:evaluation>
    <oml:evaluation>
      <oml:name>kb_relative_information_score</oml:name>
      <oml:implementation>14</oml:implementation>
      <oml:value>4242.098053</oml:value>
    </oml:evaluation>
    <oml:evaluation>
      <oml:name>mean_absolute_error</oml:name>
      <oml:implementation>21</oml:implementation>
      <oml:value>0.149488</oml:value>
    </oml:evaluation>
    <oml:evaluation>
      <oml:name>mean_prior_absolute_error</oml:name>
      <oml:implementation>27</oml:implementation>
      <oml:value>0.220919</oml:value>
    </oml:evaluation>
    <oml:evaluation>
      <oml:name>number_of_instances</oml:name>
      <oml:implementation>34</oml:implementation>
      <oml:value>10250</oml:value>
      <oml:array_data>[0,150,1100,3350,2700,1600,1350]</oml:array_data>
    </oml:evaluation>
    <oml:evaluation>
      <oml:name>os_information</oml:name>
      <oml:implementation>53</oml:implementation>
      <oml:array_data>[ Oracle Corporation, 1.7.0_51, amd64, Linux, 3.7.10-1.28-desktop ]</oml:array_data>
    </oml:evaluation>
    <oml:evaluation>
      <oml:name>precision</oml:name>
      <oml:implementation>35</oml:implementation>
      <oml:value>0.516877</oml:value>
      <oml:array_data>[0,0.088235,0.571195,0.565786,0.494692,0.465849,0.503726]</oml:array_data>
    </oml:evaluation>
    <oml:evaluation>
      <oml:name>predictive_accuracy</oml:name>
      <oml:implementation>36</oml:implementation>
      <oml:value>0.530049</oml:value>
    </oml:evaluation>
    <oml:evaluation>
      <oml:name>prior_entropy</oml:name>
      <oml:implementation>38</oml:implementation>
      <oml:value>2.326811</oml:value>
    </oml:evaluation>
    <oml:evaluation>
      <oml:name>recall</oml:name>
      <oml:implementation>39</oml:implementation>
      <oml:value>0.530049</oml:value>
      <oml:array_data>[0,0.02,0.634545,0.735522,0.517778,0.3325,0.25037]</oml:array_data>
    </oml:evaluation>
    <oml:evaluation>
      <oml:name>relative_absolute_error</oml:name>
      <oml:implementation>40</oml:implementation>
      <oml:value>0.676663</oml:value>
    </oml:evaluation>
    <oml:evaluation>
      <oml:name>root_mean_prior_squared_error</oml:name>
      <oml:implementation>41</oml:implementation>
      <oml:value>0.331758</oml:value>
    </oml:evaluation>
    <oml:evaluation>
      <oml:name>root_mean_squared_error</oml:name>
      <oml:implementation>42</oml:implementation>
      <oml:value>0.303746</oml:value>
    </oml:evaluation>
    <oml:evaluation>
      <oml:name>root_relative_squared_error</oml:name>
      <oml:implementation>43</oml:implementation>
      <oml:value>0.915564</oml:value>
    </oml:evaluation>
    <oml:evaluation>
      <oml:name>scimark_benchmark</oml:name>
      <oml:implementation>55</oml:implementation>
      <oml:value>1973.4091512218106</oml:value>
      <oml:array_data>[ 1262.1133708514062, 1630.9393838458018, 932.0675956790141, 1719.5408190761134, 4322.384586656718 ]</oml:array_data>
    </oml:evaluation>
  </oml:output_data>
</oml:run>


Error codes
220: Please provide run_id
In order to view run details, please provide run_id
221: Run not found
The run id was invalid, run not found

openml.run.upload

Uploads the results of a run to OpenML

Arguments
POST description (Required)
An XML file describing the run
POST <output_files> (Required)
All output files that should be generated by the run, as described in the task xml. For supervised classification tasks, this is typically a file containing predictions
POST session_hash (Required)
The session hash, provided by the server on authentication (1 hour valid)
Schema's
openml.run.upload
This XSD schema is applicable for both uploading and downloading run details.
XSD Schema
Error codes
200: Please provide session_hash
In order to share content, please authenticate (openml.authenticate) and provide session_hash
201: Authentication failed
The session_hash was not valid. Please try to login again, or contact api administrators
202: Please provide run xml
Please provide run xml
203: Could not validate run xml by xsd
Please double check that the xml is valid.
204: Unknown task
The task with this id was not found in the database
205: Unknown implementation
The implementation with this id was not found in the database
206: Invalid number of files
The number of uploaded files did not match the number of files expected for this task type
207: File upload failed
One of the files uploaded has a problem
208: Error inserting setup record
Internal server error, please contact api administrators
210: Unable to store run
Internal server error, please contact api administrators
211: Dataset not in databse
One of the datasets of this task was not included in database, please contact api administrators
212: Unable to store file
Internal server error, please contact api administrators
213: Parameter in run xml unknown
One of the parameters provided in the run xml is not registered as parameter for the implementation nor its components
214: Unable to store input setting
Internal server error, please contact API support team
215: Unable to evaluate predictions
Internal server error, please contact API support team
216: Error thrown by Java Application
The Java application has thrown an error. Additional information field is provided
217: Error processing output data: unknown or inconsistent evaluation measure
One of the provided evaluation measures could not be matched with a record in the math_function / implementation table.
218: Wrong implementation associated with run: this implements a math_function
The implementation implements a math_function, which is unable to generate predictions. Please select another implementation.
219: Error reading the XML document
The xml description file could not be verified.

openml.run.delete

Deletes a run from the database.

Arguments
POST run_id (Required)
The id of the run to be deleted
POST session_hash (Required)
The session hash to be checked
Error codes
390: Please provide session_hash
In order to remove your content, please authenticate (openml.authenticate) and provide session_hash
391: Authentication failed
The session_hash was not valid. Please try to login again, or contact api administrators
392: Run does not exists
The run id could not be linked to an existing run.
393: Run is not owned by you
The run was owned by another user. Hence you cannot delete it.
394: Deleting run failed.
Deleting the run failed. Please contact support team.

openml.job.get

Retrieves a job that is scheduled and not yet performed

Arguments
GET workbench (Required)
The name of the workbench that is performing the job
GET task_type_id (Required)
The task type of which the job should be.
Example Response

<?xml version="1.0" encoding="UTF-8"?>
<oml:job xmlns:oml="http://openml.org/openml">
  <oml:learner>weka.classifiers.rules.Ridor -- -F 3 -S 1 -N 2.0</oml:learner>
  <oml:task_id>1</oml:task_id>
</oml:job>


Error codes
340: Please provide workbench and task type.
Please provide workbench and task type.
341: No jobs available.
There are no jobs that need to be executed.

openml.setup.delete

Removes a setup from the database. Can only be done if no runs are performed on this setup.

Arguments
POST setup_id (Required)
The id of the setup that should be removed
POST session_hash (Required)
The session hash to be checked
Error codes
400: Please provide session_hash
In order to remove your content, please authenticate (openml.authenticate) and provide session_hash
401: Authentication failed
The session_hash was not valid. Please try to login again, or contact api administrators
402: Setup does not exists
The setup id could not be linked to an existing setup.
404: Setup is in use by other content (runs, schedules, etc). Can not be deleted
The setup is used in runs. Delete this other content before deleting this setup.
405: Deleting setup failed.
Deleting the setup failed. Please contact support team.

Index queries

OpenML keeps an index of all data, tasks, flows and runs for quick access, all in JSON format, using a predictable URL scheme.

Data sets

Get a JSON description of a dataset with www.openml.org/d/id/json (or add /json to the dataset page's url).

Example: www.openml.org/d/1/json

Flows

Get a JSON description of a flow with www.openml.org/f/id/json (or add /json to the flow page's url).

Example: www.openml.org/f/100/json

Tasks

Get a JSON description of a task with www.openml.org/t/id/json (or add /json to the task page's url).

Example: www.openml.org/t/1/json

Runs

Get a JSON description of a run with www.openml.org/r/id/json (or add /json to the run page's url).

Example: www.openml.org/r/1/json

OpenML aims to create a frictionless, collaborative environment for exploring machine learning

Data sets and workflows from various sources analysed and organized online for easy access

Integrated into machine learning environments for automated experimentation, logging, and sharing

Fully reproducible and organized results (e.g. models, predictions) you can build on and compare against

Share your work with the world or within circles of trusted researchers

Make your work more visible and easily citable

Tools to help you design and optimize workflows

In short, OpenML makes it easy to access data, connect to the right people, and automate experimentation, so that you can focus on the data science.

Data

You can upload data sets through the website, or API. Data hosted elsewhere can be referenced by URL.

OpenML automatically analyses the data, checks for problems, visualizes it, and computes data characteristics useful to find and compare datasets.

dataset properties

Every data set gets a dedicated page with all known information (check out zoo), including a wiki, visualizations, statistics, user discussions, and the tasks in which it is used.

Currently, OpenML only accepts a limited number of data formats (e.g. ARFF for tabular data). We aim to extend this in the near future, and allow conversions between the main data types.

Tasks

Tasks describe what to do with the data. OpenML covers several task types, such as classification and clustering. You can create tasks online.

Tasks are little containers including the data and other information such as train/test splits, and define what needs to be returned.

Tasks are machine-readable so that machine learning environments know what to do, and you can focus on finding the best algorithm. You can run algorithms on your own machine(s) and upload the results. OpenML evaluates and organizes all solutions online.

dataset properties

Tasks are real-time, collaborative data mining challenges (e.g. see this one): you can study, discuss and learn from all submissions (code has to be shared), while OpenML keeps track of who was first.

dataset properties

You can also supply hidden test sets for the evaluation of solutions. Novel ways of ranking solutions will be added in the near future.

Flows

Flows are algorithms, workflows, or scripts solving tasks. You can upload them through the website, or API. Code hosted elsewhere (e.g., GitHub) can be referenced by URL.

Ideally, flows are wrappers around existing algorithms/tools so that they can automatically read and solve OpenML tasks.

Every flow gets a dedicated page with all known information (check out WEKA's RandomForest), including a wiki, hyperparameters, evaluations on all tasks, and user discussions.

dataset properties

Currently, you will need to install things locally to run flows. We aim to add support for VMs so that flows can be easily (re)run in any environment.

Runs

Runs are applications of flows on a specific task. They are typically submitted automatically by machine learning environments (through the OpenML API), which make sure that all details are included to ensure reproducibility.

OpenML organizes all runs online, linked to the underlying data, flows, parameter settings, people, and other details. OpenML also independently evaluates the results contained in the run.

You can search and compare everyone's runs online, download all results into your favorite machine learning enviroment, and relate evaluations to known properties of the data and algorithms.

dataset properties

OpenML stores and analyses results in fine detail, up to the level of individual instances.

Plugins

OpenML is deeply integrated in several popular machine learning environments. Given a task, these plugins will automatically download the data into the environments, allow you to run any algorithm/flow, and automatically upload all runs.

dataset properties

Currently, OpenML is integrated, or being integrated, into the following environments. Follow the links to detailed instructions.

Programming APIs

If you want to integrate OpenML into your own tools, we offer several language-specific API's, so you can easily interact with OpenML to list, download and upload data sets, tasks, flows and runs.

With these APIs you can download a task, run an algorithm, and upload the results in just a few lines of code.

dataset properties

Follow the links for detailed documentation:

REST API

OpenML also offers a REST API which allows you to talk to OpenML directly. Most communication is done using XML, but we also offer JSON endpoints for convenience.

Projects (under construction)

You can combine data sets, flows and runs into projects, to collaborate with others online, or simply keep a log of your work.

Each project gets its own page, which can be linked to publications so that others can find all the details online.

Circles (under construction)

You can create circles of trusted researchers in which data can be shared that is not yet ready for publication.

Altmetrics (under construction)

OpenML keeps track of the impact of your work: how often is it downloaded, liked, or reused in other studies.

Jobs (under construction)

OpenML can help you run large experiments. A job is a small container defining a specific flow, with specific parameters settings, to run on a specific tasks. You can generate batches of these jobs online, and you can run a helper tool on your machines/clouds/clusters that downloads these jobs (including all data), executes them, and uploads the results.

Developers

OpenML is an open source project, hosted on GitHub, and maintained by a very active community of developers. We welcome everybody to contribute to OpenML, and are glad to help you make optimal use of OpenML in your research.