New Topic
Popular and recent
Recent posts
JoaquinVanschoren posted in KDDCup99 (data) 8 months ago

There is another dataset KDDCup99_full with all 5M datapoints:

posted in KDDCup99 (data) 8 months ago

Is the number of instances correct? This description states there are about five million records.

JoaquinVanschoren posted in Supervised Classification on lung-cancer (1) (task) 1 year ago

This should probably be removed. There is no reason why attribute 57 is used as a target feature

JoaquinVanschoren posted in SPECTF (data) 1 year ago

Oops! Thanks for reporting it. Data upload is now working again.

asemkasem posted in SPECTF (data) 1 year ago

Sure... I tried to do so, but nothing happens after filling the fields and clicking the Submit button.
Tried using Chrome and Microsoft Edge, and both didn't work.
You may want to check on that, and I will try again after sometime (or probably email the file to you)

JoaquinVanschoren posted in SPECTF (data) 1 year ago

Hi Asem, are you creating an ARFF file for that? It would be great if you could upload this corrected version to OpenML (log in and click the '+' icon). Give it the same name 'SPECTF'.

asemkasem posted in SPECTF (data) 1 year ago

I see, thanks for the reply Joaquin...
Will ignore this merged dataset and merge myself UCI's training set with the correct test set instead.

JoaquinVanschoren posted in SPECTF (data) 1 year ago

It seems that the UCI has two versions of the SPECTF test set, and that this dataset was created by joining the larger test set with the training set. The larger test set seems to be faulty: it repeats some of the training data. Hence, this dataset (337) should be deactivated and replaced with the correct version.

asemkasem posted in SPECTF (data) 1 year ago

This data contains 349 instances, while the description here and in the original source (link to UCI above) mentions 267 instances (combning training and test sets).
The dataset called SPECT ( which should be a processed form of this data to give 22 binary features contains 267 instances only.
What is the reason of the difference in the number of instances?

JoaquinVanschoren posted in fried (data) 2 years ago

Sorry, I don't see a feature 'binaryClass'. Did you mean the binarized version of this dataset?

jakob1r posted in fried (data) 2 years ago

The target is binaryClass and not Y

jacobfdegner posted in OpenML (general) 2 years ago

Hi Joaquin,

Thanks for your reply. The idea is to create a framework to compare strategy and card-counting methods in a blackjack game. It is mostly for fun and for exploring tools for making easily reproducible algorithm comparisons. I originally imagined it fitting into a dscr as described by Matthew Stephens, but wanted to see if it could also fit into this framework.

The parts in both frameworks would be these:

Data or a function to generate data - creates a set of cards seen and a set of remaining cards to deal from for each hand. In your framework, these could be computed and uploaded as datasets I think.

Decision rule maker - This would be the task in the openML framework I think. The task would be to make a general function given only the cards seen (not cards left to deal) that best predicted the choices a player should make as the game progressed

Methods - different ways to address the task

A scoring function - this would take the full data (cards previously seen and cards to deal) and decision rule maker, simulate game play, and score the methods based on expected house edge.

The beginnings of it are here although it is undocumented and only the simplest types of methods are implemented:

As I said, it is just a toy, but as long as I am putting it together, I could also use it to learn the openML framework if there is an easy way it could fit.



JoaquinVanschoren posted in OpenML (general) 2 years ago

Dear Jacob,

Yes, this is planned. Can you tell me more about the example? It sounds like we either need to extend an existing task type, or introduce a free-form task type.


jacobfdegner posted in OpenML (general) 2 years ago

A related question: Is it currently the case or is it planned for individual users to define their own task type? I have a toy example that both requires its own task definition, but is probably not general enough to justify a new public task type.

JoaquinVanschoren posted in moa.AMRules (flow) 2 years ago

Welcome to OpenML! You can add new data after you sign in (a '+' icon will appear). Note that we work with the ARFF format for now. Next, you can use any of the supported tools (see the Guide) to run and upload experiments.

disqus_4lDbLvw6iy posted in moa.AMRules (flow) 2 years ago

hi this is interesting ... how can I import data to start ?

Ask new question