Guide
Guide
OpenML is an open source project, hosted on GitHub. We welcome everybody to help improve OpenML, and make it more useful for everyone. If you want to integrate your own machine learning tools with OpenML, check out the available APIs.

We always love to welcome new contributers, and will gladly help you in any way possible.

GitHub repo's

You can find relevant code in the corresponding GitHub repositories. Please also post issues in the relevant issue tracker.

OpenML Core - Everything done by the OpenML server. This includes dataset feature calculations and server-side model evaluations.

Website - The website and REST API

Java API - The Java API and Java-based plugins

R API - The OpenML R package

Python API - The Python API

GitHub wiki

The GitHub Wiki contains more information on how to set up your environment to work on OpenML locally, on the structure of the backend and frontend, and working documents.

Database snapshots

Everything uploaded to OpenML is available to the community. The nightly snapshot of the public database contains all experiment runs, evaluations and links to datasets, implementations and result files. In SQL format (gzipped). You can also download the Database schema.

Nightly database SNAPSHOT

If you want to work on the website locally, you'll also need the schema for the 'private' database with non-public information.

Private database schema

Legacy Resources

OpenML is always evolving, but we keep hosting the resources that were used in prior publications so that others may still build on them.

The experiment database used in Vanschoren et al. (2012) Experiment databases. Machine Learning 87(2), pp 127-158. You'll need to import this database (we used MySQL) to run queries. The database structure is described in the paper. Note that most of the experiments in this database have been rerun using OpenML, using newer algorithm implementations and stored in much more detail.

The Exposé ontology used in the same paper, and described in more detail here and here. Exposé is used in designing our databases, and we aim to use it to export all OpenML data as Linked Open Data.