OpenML is an open source project, hosted on GitHub. We welcome everybody to help improve OpenML, and make it more useful for everyone. Fork us on GitHub.

GitHub repo's

OpenML Core - Everything done by the OpenML server. This includes dataset feature calculations and server-side model evaluations.

Website - The website and REST API

Meta-feature - New repo for meta-feature calculation tool.

Java API - The Java API and Java-based plugins

R API - The OpenML R package

Python API - The Python API

Issues and feature requests

You can post issues (e.g. bugs) and feature requests on the relevant issue tracker:

OpenML tracker - All general issues and feature requests. This is all organized on Waffle.

Website tracker - Smaller issues related to the website.

R tracker - Issues related to the openml R package.

GitHub wiki

The GitHub Wiki contains more information on how to set up your environment to work on OpenML locally, on the structure of the backend and frontend, and working documents.

Database snapshots

Everything uploaded to OpenML is available to the community. The nightly snapshot of the public database contains all experiment runs, evaluations and links to datasets, implementations and result files. In SQL format (gzipped).

Nightly database SNAPSHOT

If you want to work on the website locally, you'll also need the schema for the 'private' database with non-public information.

Private database schema

Legacy Resources

OpenML is always evolving, but we keep hosting the resources that were used in prior publications so that others may still build on them.

The experiment database used in Vanschoren et al. (2012) Experiment databases. Machine Learning 87(2), pp 127-158. You'll need to import this database (we used MySQL) to run queries. The database structure is described in the paper. Note that most of the experiments in this database have been rerun using OpenML, using newer algorithm implementations and stored in much more detail.

The Exposé ontology used in the same paper, and described in more detail here and here. Exposé is used in designing our databases, and we aim to use it to export all OpenML data as Linked Open Data.

Honor Code

By joining OpenML, you join a special worldwide community of data scientists building on each other's results and connecting their minds as efficiently as possible. This community depends on your motivation to share data, tools and ideas, and to do so with honesty. In return, you will gain trust, visibility and reputation, igniting online collaborations and studies that otherwise may not have happened.

By using any part of OpenML, you agree to:

  • Give credit where credit is due. Cite the authors whose work you are building on, or build collaborations where appropriate.
  • Give back to the community by sharing your own data as openly and as soon as possible, or by helping the community in other ways. In doing so, you gain visibility and impact (citations).
  • Share data according to your best efforts. Everybody make mistakes, but we trust you to correct them as soon as possible. Remove or flag data that cannot be trusted.
  • Be polite and constructive in all discussions. Criticism of methods is welcomed, but personal criticisms should be avoided.
  • Respect circles of trust. OpenML allows you to collaborate in 'circles' of trusted people to share unpublished results. Be considerate in sharing data with people outside this circle.
  • Do not steal the work of people who openly share it. OpenML makes it easy to find all shared data (and when it was shared), thus everybody will know if you do this.

Terms of Use

You agree that you are responsible for your own use of OpenML.org and all content submitted by you, in accordance with the Honor Code and all applicable local, state, national and international laws.

By submitting or distributing content from OpenML.org, you affirm that you have the necessary rights, licenses, consents and/or permissions to reproduce and publish this content. You, and not the developers of OpenML.org, are solely responsible for your submissions.

By submitting content to OpenML.org, you grant OpenML.org the right to host, transfer, display and use this content, in accordance with your sharing settings and any licences granted by you. You also grant to each user a non-exclusive license to access and use this content for their own research purposes, in accordance with any licences granted by you.

You may maintain one user account and not let anyone else use your username and/or password. You may not impersonate other persons.

You will not intend to damage, disable, or impair any OpenML server or interfere with any other party's use and enjoyment of the service. You may not attempt to gain unauthorized access to the Site, other accounts, computer systems or networks connected to any OpenML server. You may not obtain or attempt to obtain any materials or information not intentionally made available through OpenML.

Strictly prohibited are content that defames, harasses or threatens others, that infringes another's intellectual property, as well as indecent or unlawful content, advertising, or intentionally inaccurate information posted with the intent of misleading others. It is also prohibited to post code containing viruses, malware, spyware or any other similar software that may damage the operation of another's computer or property.

Our Team

OpenML is a community effort, and everybody is welcome to contribute. Below are some of the core contributors, but also check out our GitHub page".



Joaquin Vanschoren
Machine learning professor @TUeindhoven. Founder of OpenML. Working to make machine learning more open, collaborative, and automated.


Jan van Rijn
Post-doc at Freiburg University and main developer of various OpenML components and plugins


Bernd Bischl
PHD in statistics, data scientist, developer of OpenML R plugin, developer of mlr.


Dominik Kirchhoff
PhD student at TU Dortmund University. Contributing to the R package.


Rafael G. Mantovani
PhD student in computer science @ University of São Paulo, Brazil.


Matthias Feurer
Ph.D. candidate at the University of Freiburg, Germany. Working on automated machine learning. Creator of the python API for OpenML.


Michel Lang
PhD in statistics. Co-developer of the OpenML R package and mlr.


Giuseppe Casalicchio
PhD student for compuational statistics at the LMU Munich. Maintainer and Developer of the R interface for OpenML.


Andrey Ustyuzhanin
Head of Yandex School of Data Analysis research group. Mission of the group is solving tough scientific problem by applying data science tools and practices. Member of LHCb and SHiP experiments at CERN. Head of Laboratory of Methods for Big Data Analysis at CS faculty of HSE.


Jakob Bossek
PhD student in computer science at the University of Münster, Germany. R enthusiast, one of the main contributors of the OpenML R interface and sports freak.


Heidi Seibold
PhD student in Computational Biostatistics at the University of Zurich. I am into R, open science and reproducible research.


Andreas Mueller
Research engineer at NYU, scikit-learn core-developer.


Janek Thomas
PhD student for compuational statistics at the LMU Munich.

Altmetrics and Gamification

To encourage open science, OpenML now includes a score system to track and reward scientific activity, reach and impact, and in the future will include further gamification features such as badges. Because the system is still experimental and very much in development, the details are subject to change. Below, the score system is described in more detailed followed by our rationale for this system for those interested. If anything is unclear or you have any feedback of the system do not hesitate to let us know.

The scores

All scores are awarded to users and involve datasets, flows, tasks and runs, or knowledge pieces in short.

Activity

Activity score is awarded to users for contributing to the knowledge base of OpenML. This includes uploading knowledge pieces, leaving likes and downloading new knowledge pieces. Uploads are rewarded strongest, with 3 activity, followed by likes, with 2 activity, and downloads are rewarded the least, with 1 activity.

Reach

Reach score is awarded to knowledge pieces and by extension their uploaders for the expressed interest of other users. It is increased by 2 for every user that leaves a like on a knowledge piece and increased by 1 for every user that downloads it for the first time.

Impact

Impact score is awarded to knowledge pieces and by extension their uploaders for the reuse of these knowledge pieces. A dataset is reused if when it is used as input in a task while flows and tasks are reused in runs. 1 Impact is awarded for every reuse by a user that is not the uploader. Impact of a reused knowledge piece is further increased by half of the acquired reach and half of the acquired impact of a reuse, usually rounded down. So the impact of a dataset that is used in a single task with reach 10 and impact 5, is 8 (⌊1+0.5*10+0.5*5 ⌋).

The rationale

One of OpenML's core ideas is to create an open science environment for sharing and exploration of knowledge while getting credit for your work. The activity score serves the encouragement of sharing and exploration. Reach makes exploration easier (by finding well liked, and/or often downloaded knowledge pieces), while also providing a form of credit to the user. Impact is another form of credit that is closer in concept to citation scores.

Where to find it

The number of likes and downloads as well as the reach and impact of knowledge pieces can be found on the top of their respective pages, for example the Iris data set. In the top right you will also find the new Like button next to the already familiar download button.

When searching for knowledge pieces on the search page, you will now be able to see the statistics mentioned above as well. In addition you can sort the search results on their downloads, likes, reach or impact.

On user profiles you will find all statistics relevant to that user, as well as graphs of their progress on the three scores.

Badges

Badges are intended to provide discrete goals for users to aim for. They are only in a conceptual phase, depending on the community's reaction they will be further developed.
The badges a user has acquired can be found on their user profile below the score graphs. The currently implemented badges are:

Clockwork Scientist
For being active every day for a period of time.
Team Player
For collaborating with other users; reusing a knowledge piece of someone who has reused a knowledge piece of yours.
Good News Everyone
For achieving a high reach on singular knowledge piece you uploaded.

Downvotes

Although not part of the scores, downvotes have also been introduced. They are intended to indicate a flaw of a data set, flow, task or run that can be fixed, for example a missing description.

If you want to indicate something is wrong with a knowledge piece, click the number of issues statistic at the top the page. A panel will open where you either agree with an already raised issue anonymously or submit your own issue (not anonymously).

You can also sort search results by the number of downvotes, or issues on the search page.

Opting out

If you really do not like the gamification you can opt-out by changing the setting on your profile. This hides your scores and badges from other users and hides their scores and badges from you. You will still be able to see the number of likes, downloads and downvotes on knowledge pieces, and your likes, downloads and downvotes will still be counted.


OpenML is integrated in the Weka (Waikato Environment for Knowledge Analysis) Experimenter and the Command Line Interface.

Installation

OpenML is available as a weka extension in the package manager
  1. Download the latest development version (3.7.13 or higher).
  2. Launch Weka, or start from commandline:
    java -jar weka.jar
    If you need more memory (e.g. 1GB), start as follows:
    java -Xmx1G -jar weka.jar
  3. Open the package manager (Under 'Tools')
  4. Select package OpenmlWeka and click install. Afterwards, restart WEKA.
  5. From the Tools menu, open the 'OpenML Experimenter'.

Quick Start (Graphical Interface)

OpenML Weka Screenshot

You can solve OpenML Tasks in the Weka Experimenter, and automatically upload your experiments to OpenML (or store them locally).

  1. From the Tools menu, open the 'OpenML Experimenter'.
  2. Enter your API key in the top field (log in first). You can also store this in a config file (see below).
  3. In the 'Tasks' panel, click the 'Add New' button to add new tasks. Insert the task id's as comma-separated values (e.g., '1,2,3,4,5'). Use search to find interesting tasks and click the icon to list the ID's. In the future this search will also be integrated in WEKA.
  4. Add algorithms in the "Algorithm" panel.
  5. Go to the "Run" tab, and click on the "Start" button.
  6. The experiment will be executed and sent to OpenML.org.
  7. The runs will now appear on OpenML.org. You can follow their progress and check for errors on your profile page under 'Runs'.

Quick Start CommandLine Interface

The Command Line interface is useful for running experiments automatically on a server, without using a GUI.
  1. Create a config file called openml.conf in a new directory called .openml in your home dir. It should contain the following line:
    api_key = YOUR_KEY
  2. Execute the following command:
    java -cp weka.jar openml.experiment.TaskBasedExperiment -T <task_id> -C <classifier_classpath> -- <parameter_settings>
  3. For example, the following command will run Weka's J48 algorithm on Task 1:
    java -cp OpenWeka.beta.jar openml.experiment.TaskBasedExperiment -T 1 -C weka.classifiers.trees.J48
  4. The following suffix will set some parameters of this classifier:
    -- -C 0.25 -M 2
Please report any bugs that you may encounter to j.n.van.rijn@liacs.leidenuniv.nl.

Download Plugin

OpenML features extensive support for MOA. However currently this is implemented as a stand alone MOA compilation, using the latest version (as of May, 2014).





Quick Start

OpenML Weka Screenshot
  1. Download the standalone MOA environment above.
  2. Find your API key in your profile (log in first). Create a config file called openml.conf in a .openml directory in your home dir. It should contain the following lines:
    api_key = YOUR_KEY
  3. Launch the JAR file by double clicking on it, or launch from command-line using the following command:
    java -cp openmlmoa.beta.jar moa.gui.GUI
  4. Select the task moa.tasks.openml.OpenmlDataStreamClassification to evaluate a classifier on an OpenML task, and send the results to OpenML.
  5. Optionally, you can generate new streams using the Bayesian Network Generator: select the moa.tasks.WriteStreamToArff task, with moa.streams.generators.BayesianNetworkGenerator.
Please note that this is a beta version, which is under active development. Please report any bugs that you may encounter to j.n.van.rijn@liacs.leidenuniv.nl.
The R package mlr interfaces a large number of classification and regression techniques. It also uses the OpenML R package (by the same authors) to interface seamlessly with OpenML. This means you can download data and tasks from OpenML, run the many mlr algorithms, and organize all ensuing results online with a few lines or R.

Download

You'll need the mlr and openml packages. Soon, both will be available from CRAN.

Quick Start

In this tutorial, you can find examples of standard use cases.

Issues

Having questions? Did you run into an issue? Let us know via the OpenML R issue tracker.
You can design OpenML workflows in RapidMiner to directly interact with OpenML. The RapidMiner plugin is currently under active development.
The Java API allows you connect to OpenML from Java applications.

Download

Stable releases of the Java API are available from Maven central. Or, you can check out the developer version from GitHub. Include the jar file in your projects as usual, or install via Maven. You can also separately download all dependencies and a fat jar with all dependencies included.

Quick Start

Create an OpenmlConnector instance with your authentication details. This will create a client with all OpenML functionalities.

OpenmlConnector client = new OpenmlConnector("api_key");

All functions are described in the Java Docs, and they mirror the functions from the Web API functions described below. For instance, the API function openml.data.description has an equivalent Java function openmlDataDescription(String data_id).

Downloading

To download data, flows, tasks, runs, etc. you need the unique id of that resource. The id is shown on each item's webpage and in the corresponding url. For instance, let's download Data set 1. The following returns a DataSetDescription object that contains all information about that data set.

DataSetDescription data = client.dataGet(1);

You can also search for the items you need online, and click the icon to get all id's that match a search.

Uploading

To upload data, flows, runs, etc. you need to provide a description of the object. We provide wrapper classes to provide this information, e.g. DataSetDescription, as well as to capture the server response, e.g. UploadDataSet, which always includes the generated id for reference:

DataSetDescription description = new DataSetDescription( "iris", "The famous iris dataset", "arff", "class");
UploadDataSet result = client.dataUpload( description, datasetFile );
int data_id = result.getId();

More details are given in the corresponding functions below. Also see the Java Docs for all possible inputs and return values.

Data download

dataGet(int data_id)

Retrieves the description of a specified data set.

DataSetDescription data = client.dataGet(1);
String name = data.getName();
String version = data.getVersion();
String description = data.getDescription();
String url = data.getUrl();

dataFeatures(int data_id)

Retrieves the description of the features of a specified data set.

DataFeature reponse = client.dataFeatures(1);
DataFeature.Feature[] features = reponse.getFeatures();
String name = features[0].getName();
String type = features[0].getDataType();
boolean	isTarget = features[0].getIs_target();

dataQuality(int data_id)

Retrieves the description of the qualities (meta-features) of a specified data set.

DataQuality response = client.dataQuality(1);
DataQuality.Quality[] qualities = reponse.getQualities();
String name = qualities[0].getName();
String value = qualities[0].getValue();

dataQuality(int data_id, int start, int end, int interval_size)

For data streams. Retrieves the description of the qualities (meta-features) of a specified portion of a data stream.

DataQuality qualities = client.dataQuality(1,0,10000,null);

dataQualityList()

Retrieves a list of all data qualities known to OpenML.

DataQualityList response = client.dataQualityList();
String[] qualities = response.getQualities();

Data upload

dataUpload(DataSetDescription description, File dataset)

Uploads a data set file to OpenML given a description. Throws an exception if the upload failed, see openml.data.upload for error codes.

DataSetDescription dataset = new DataSetDescription( "iris", "The iris dataset", "arff", "class");
UploadDataSet data = client.dataUpload( dataset, new File("data/path"));
int data_id = result.getId();

dataUpload(DataSetDescription description)

Registers an existing dataset (hosted elsewhere). The description needs to include the url of the data set. Throws an exception if the upload failed, see openml.data.upload for error codes.

DataSetDescription description = new DataSetDescription( "iris", "The iris dataset", "arff", "class");
description.setUrl("http://datarepository.org/mydataset");
UploadDataSet data = client.dataUpload( description );
int data_id = result.getId();

Flow download

flowGet(int flow_id)

Retrieves the description of the flow/implementation with the given id.

Implementation flow = client.flowGet(100);
String name = flow.getName();
String version = flow.getVersion();
String description = flow.getDescription();
String binary_url = flow.getBinary_url();
String source_url = flow.getSource_url();
Parameter[] parameters = flow.getParameter();

Flow management

flowOwned()

Retrieves an array of id's of all flows/implementations owned by you.

ImplementationOwned response = client.flowOwned();
Integer[] ids = response.getIds();

flowExists(String name, String version)

Checks whether an implementation with the given name and version is already registered on OpenML.

ImplementationExists check = client.flowExists("weka.j48", "3.7.12");
boolean exists = check.exists();
int flow_id = check.getId();

flowDelete(int id)

Removes the flow with the given id (if you are its owner).

ImplementationDelete response = client.openmlImplementationDelete(100);

Flow upload

flowUpload(Implementation description, File binary, File source)

Uploads implementation files (binary and/or source) to OpenML given a description.

Implementation flow = new Implementation("weka.J48", "3.7.12", "description", "Java", "WEKA 3.7.12")
UploadImplementation response = client.flowUpload( flow, new File("code.jar"), new File("source.zip"));
int flow_id = response.getId();

Task download

taskGet(int task_id)

Retrieves the description of the task with the given id.

Task task = client.taskGet(1);
String task_type = task.getTask_type();
Input[] inputs = task.getInputs();
Output[] outputs = task.getOutputs();

taskEvaluations(int task_id)

Retrieves all evaluations for the task with the given id.

TaskEvaluations response = client.taskEvaluations(1);
Evaluation[] evaluations = response.getEvaluation();

taskEvaluations(int task_id, int start, int end, int interval_size)

For data streams. Retrieves all evaluations for the task over the specified window of the stream.

TaskEvaluations response = client.taskEvaluations(1);
Evaluation[] evaluations = response.getEvaluation();

Run download

runGet(int run_id)

Retrieves the description of the run with the given id.

Run run = client.runGet(1);
int task_id = run.getTask_id();
int flow_id = run.getImplementation_id();
Parameter_setting[] settings = run.getParameter_settings()
EvaluationScore[] scores = run.getOutputEvaluation();

Run management

runDelete(int run_id)

Deletes the run with the given id (if you are its owner).

RunDelete response = client.runDelete(1);

Run upload

runUpload(Run description, Map<String,File> output_files)

Uploads a run to OpenML, including a description and a set of output files depending on the task type.

Run.Parameter_setting[] parameter_settings = new Run.Parameter_setting[1];
parameter_settings[0] = Run.Parameter_setting(null, "M", "2");
Run run = new Run("1", null, "100", "setup_string", parameter_settings);
Map outputs = new HashMap<String,File>();
outputs.add("predictions",new File("predictions.arff"));
UploadRun response = client.runUpload( run, outputs);
int run_id = response.getRun_id();

Free SQL Query

freeQuery(String sql)

Executes the given SQL query and returns the result in JSON format.

org.json.JSONObject json = client.freeQuery("SELECT name FROM dataset");

Issues

Having questions? Did you run into an issue? Let us know via the OpenML Java issue tracker.

The OpenML R package allows you to connect to the OpenML server from R scripts. This means that you can download and upload data sets and tasks, run R implementations, upload your results, and download all experiment results directly via R commands.

It is also neatly integrated into mlr (Machine Learning in R), which provides a unified interface to a large number of machine learning algorithms in R. As such, you can easily run and compare many R algorithms on all OpenML datasets, and analyse all combined results.

All in a few lines of R.

Demo

You can try it out yourself in a Jupyter Notebook running in the everware cloud. You'll need an OpenML account as well as a GitHub account for this service to work properly. It may take a few minutes to spin up.

Launch demo

Example

This example runs an mlr algorithm on an OpenML task. The first time, you need to set your API key on your machine.


  library(mlr)
  library(OpenML)
  setOMLConfig(apikey = qwertyuiop1234567890) # Only the first time

  task = getOMLTask(10)
  lrn = makeLearner("classif.rpart")
  res = runTaskMlr(task, lrn)
  run.id = uploadOMLRun(res)
  

You can of course do many experiments at once:


  # A list of OpenML task ID's
  task.ids = c(10,39)

  # A list of MLR learners
  learners = list(
      makeLearner("classif.rpart"),
      makeLearner("classif.randomForest")
      )

  # Loop
  for (lrn in learners) {
    for (id in task.ids) {
      task = getOMLTask(id)
      res = runTaskMlr(task, lrn)
      run.id = uploadOMLRun(res)
    }
  }
  

Download

The openML package can be downloaded from GitHub. It will also be available from CRAN in the near future.

Tutorial

See the tutorial for the most important functions and examples of standard use cases.

Reference

Full documentation on the packages is available from R Documentation.

Issues

Having questions? Did you run into an issue? Let us know via the OpenML R issue tracker.

The Python module allows you to connect to the OpenML server from Python programs. This means that you can download and upload OpenML dataset, tasks, run Python algorithms on them, and share the results.

It is also being integrated into scikit-learn, which provides a unified interface to a large number of machine learning algorithms in Python. As such, you can easily run and compare many algorithms on all OpenML datasets, and analyse all combined results.

All in a few lines on Python.

Demo

You can try it out yourself in a Jupyter Notebook running in the everware cloud. You'll need an OpenML account as well as a GitHub account for this service to work properly. It may take a few minutes to spin up.

Launch demo

Course

We are currently building a machine learning course with many more examples. All materials are available as Jupyter Notebooks running in the everware cloud. You'll need an OpenML account as well as a GitHub account for this service to work properly. It may take a few minutes to spin up.

Launch course

Example

This example runs an scikit-learn algorithm on an OpenML task.


    from sklearn import ensemble
    from openml import tasks,flows,runs
    import xmltodict

    # Download task, run learner, publish results
    task = tasks.get_task(10)
    clf = ensemble.RandomForestClassifier()
    flow = flows.sklearn_to_flow(clf)
    run = runs.run_flow_on_task(task, flow)
    run.publish()

    print("Uploaded run with id %s. Check it at www.openml.org/r/%s" %(run.run_id,run.run_id))
  

The first time, you need to set up your config file (~/.openml/config) with your API key.


    apikey=FILL_IN_API_KEY
    cachedir=FILL_IN_CACHE_DIR
  

Also, for now, you'll need to install the developer version of the API


    git clone https://github.com/openml/openml-python.git
    git checkout develop
    python setup.py install
  

Download

The Python module can be downloaded from GitHub.

Quickstart

Check out the documentation to get started. Or try the Jupyter Notebook.

Issues

Having questions? Did you run into an issue? Let us know via the OpenML Python issue tracker.

Index queries

OpenML keeps an index of all data, tasks, flows and runs for quick access, all in JSON format, using a predictable URL scheme.

Data sets

Get a JSON description of a dataset with www.openml.org/d/id/json (or add /json to the dataset page's url).

Example: www.openml.org/d/1/json

Flows

Get a JSON description of a flow with www.openml.org/f/id/json (or add /json to the flow page's url).

Example: www.openml.org/f/100/json

Tasks

Get a JSON description of a task with www.openml.org/t/id/json (or add /json to the task page's url).

Example: www.openml.org/t/1/json

Runs

Get a JSON description of a run with www.openml.org/r/id/json (or add /json to the run page's url).

Example: www.openml.org/r/1/json

OpenML aims to create a frictionless, collaborative environment for exploring machine learning

Data sets and workflows from various sources analysed and organized online for easy access

Integrated into machine learning environments for automated experimentation, logging, and sharing

Fully reproducible and organized results (e.g. models, predictions) you can build on and compare against

Share your work with the world or within circles of trusted researchers

Make your work more visible and easily citable

Tools to help you design and optimize workflows

In short, OpenML makes it easy to access data, connect to the right people, and automate experimentation, so that you can focus on the data science.

Data

You can upload data sets through the website, or API. Data hosted elsewhere can be referenced by URL.

OpenML automatically analyses the data, checks for problems, visualizes it, and computes data characteristics useful to find and compare datasets.

dataset properties

Every data set gets a dedicated page with all known information (check out zoo), including a wiki, visualizations, statistics, user discussions, and the tasks in which it is used.

Currently, OpenML only accepts a limited number of data formats (e.g. ARFF for tabular data). We aim to extend this in the near future, and allow conversions between the main data types.

Tasks

Tasks describe what to do with the data. OpenML covers several task types, such as classification and clustering. You can create tasks online.

Tasks are little containers including the data and other information such as train/test splits, and define what needs to be returned.

Tasks are machine-readable so that machine learning environments know what to do, and you can focus on finding the best algorithm. You can run algorithms on your own machine(s) and upload the results. OpenML evaluates and organizes all solutions online.

dataset properties

Tasks are real-time, collaborative data mining challenges (e.g. see this one): you can study, discuss and learn from all submissions (code has to be shared), while OpenML keeps track of who was first.

dataset properties

You can also supply hidden test sets for the evaluation of solutions. Novel ways of ranking solutions will be added in the near future.

Flows

Flows are algorithms, workflows, or scripts solving tasks. You can upload them through the website, or API. Code hosted elsewhere (e.g., GitHub) can be referenced by URL.

Ideally, flows are wrappers around existing algorithms/tools so that they can automatically read and solve OpenML tasks.

Every flow gets a dedicated page with all known information (check out WEKA's RandomForest), including a wiki, hyperparameters, evaluations on all tasks, and user discussions.

dataset properties

Currently, you will need to install things locally to run flows. We aim to add support for VMs so that flows can be easily (re)run in any environment.

Runs

Runs are applications of flows on a specific task. They are typically submitted automatically by machine learning environments (through the OpenML API), which make sure that all details are included to ensure reproducibility.

OpenML organizes all runs online, linked to the underlying data, flows, parameter settings, people, and other details. OpenML also independently evaluates the results contained in the run.

You can search and compare everyone's runs online, download all results into your favorite machine learning enviroment, and relate evaluations to known properties of the data and algorithms.

dataset properties

OpenML stores and analyses results in fine detail, up to the level of individual instances.

Plugins

OpenML is deeply integrated in several popular machine learning environments. Given a task, these plugins will automatically download the data into the environments, allow you to run any algorithm/flow, and automatically upload all runs.

dataset properties

Currently, OpenML is integrated, or being integrated, into the following environments. Follow the links to detailed instructions.

Programming APIs

If you want to integrate OpenML into your own tools, we offer several language-specific API's, so you can easily interact with OpenML to list, download and upload data sets, tasks, flows and runs.

With these APIs you can download a task, run an algorithm, and upload the results in just a few lines of code.

dataset properties

Follow the links for detailed documentation:

REST API

OpenML also offers a REST API which allows you to talk to OpenML directly. Most communication is done using XML, but we also offer JSON endpoints for convenience.

Projects (under construction)

You can combine data sets, flows and runs into projects, to collaborate with others online, or simply keep a log of your work.

Each project gets its own page, which can be linked to publications so that others can find all the details online.

Circles (under construction)

You can create circles of trusted researchers in which data can be shared that is not yet ready for publication.

Altmetrics (under construction)

OpenML keeps track of the impact of your work: how often is it downloaded, liked, or reused in other studies.

Jobs (under construction)

OpenML can help you run large experiments. A job is a small container defining a specific flow, with specific parameters settings, to run on a specific tasks. You can generate batches of these jobs online, and you can run a helper tool on your machines/clouds/clusters that downloads these jobs (including all data), executes them, and uploads the results.

Developers

OpenML is an open source project, hosted on GitHub, and maintained by a very active community of developers. We welcome everybody to contribute to OpenML, and are glad to help you make optimal use of OpenML in your research.