Why does CacheBasedDataSet destroy the cache it is given

classic Classic list List threaded Threaded
5 messages Options
Courtney Robinson Courtney Robinson
Reply | Threaded
Open this post in threaded view
|

Why does CacheBasedDataSet destroy the cache it is given

Hi all,

The current CacheBasedDataSet destroys the cache and all data along with it...there is no option to turn this off either.

/** {@inheritDoc} */
@Override public void close() {
datasetCache.destroy();
ComputeUtils.removeData(ignite, datasetId);
ComputeUtils.removeLearningEnv(ignite, datasetId);
}

Why does it do this? 
It means that using SqlDatasetBuilder will result in the data being deleted after training a model.
We had to work around this with
var datasetBuilder = new SqlDatasetBuilder(repo.getCtx().getIgnite(), cacheName, (k, v) -> {
//...
});
var wrapper = new DatasetBuilder<Object, BinaryObject>() {
@Override
public <C extends Serializable, D extends AutoCloseable> Dataset<C, D> build(LearningEnvironmentBuilder envBuilder, PartitionContextBuilder<Object, BinaryObject, C> partCtxBuilder, PartitionDataBuilder<Object, BinaryObject, C, D> partDataBuilder, LearningEnvironment localLearningEnv) {
var cbd = datasetBuilder.build(envBuilder, partCtxBuilder, partDataBuilder, localLearningEnv);
return new DatasetWrapper(cbd) {
@Override public void close() {
System.out.println("Dataset closed");
//DO NOT call close. Cache based data set deletes the data in the cache like some mad man!
}
};
}

@Override
public DatasetBuilder<Object, BinaryObject> withUpstreamTransformer(UpstreamTransformerBuilder builder) {
return datasetBuilder.withUpstreamTransformer(builder);
}

@Override
public DatasetBuilder<Object, BinaryObject> withFilter(IgniteBiPredicate<Object, BinaryObject> filterToAdd) {
return datasetBuilder.withFilter(filterToAdd);
}
};
which works but seems very hacky. 
Are we misusing the API somehow - examples/docs do not mention or indicate anything about this as far as I've found.

Regards,
Courtney Robinson
Founder and CEO, Hypi
https://hypi.io
akorensh akorensh
Reply | Threaded
Open this post in threaded view
|

Re: Why does CacheBasedDataSet destroy the cache it is given

Hi,
  This is the way CacheBasedDataset has been designed.
   It has been made w/an eye toward training the implemented ML models:
https://apacheignite.readme.io/docs/model-updating

  You are free to create an implementation to fit your needs.
  Use these examples to test your design:
 
https://github.com/apache/ignite/tree/master/examples/src/main/java/org/apache/ignite/examples/ml/dataset


Thanks, Alex



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
zaleslaw zaleslaw
Reply | Threaded
Open this post in threaded view
|

Re: Why does CacheBasedDataSet destroy the cache it is given

In reply to this post by Courtney Robinson
Dear Courtney Robinson, let's discuss here the possible behaviour of this
CacheBased Dataset closing.

When designed this feature we think, that the all training parts and stuff
should be deleted from Caches ad model should be serialized or exported
somwhere.

What is your use-case& Could you share some code or pseudo-code?
How are you going to handle data after training?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Courtney Robinson Courtney Robinson
Reply | Threaded
Open this post in threaded view
|

Re: Why does CacheBasedDataSet destroy the cache it is given

Hey,
Just seen this reply.
We have Ignite persistence enabled. The caches/tables are the primary source of the data. That's the use case. 
If we build an ML model from the data in a cache, Ignite's behaviour of deleting the cache means we'll have lost that data.
We were just lucky this showed up in tests before it got anywhere near production data.

In our case, we're push data into a cache continually and rebuilding the model periodically.

Regards,
Courtney Robinson
Founder and CEO, Hypi

https://hypi.io


On Mon, Aug 3, 2020 at 5:28 PM zaleslaw <[hidden email]> wrote:
Dear Courtney Robinson, let's discuss here the possible behaviour of this
CacheBased Dataset closing.

When designed this feature we think, that the all training parts and stuff
should be deleted from Caches ad model should be serialized or exported
somwhere.

What is your use-case& Could you share some code or pseudo-code?
How are you going to handle data after training?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
akorensh akorensh
Reply | Threaded
Open this post in threaded view
|

Re: Why does CacheBasedDataSet destroy the cache it is given

Courtney,

 The CacheBasedDataset.close() method below only destroys the helper cache
derived from
  the original data, and used to train the model. It does not touch the
original data set.


@Override public void close() {
    datasetCache.destroy(); // destroy the helper cache derived from the
original cache
    ComputeUtils.removeData(ignite, datasetId); // remove helper data stored
locally on a node.
    ComputeUtils.removeLearningEnv(ignite, datasetId); //remove helper
object used to make the model.
}

see:
https://github.com/apache/ignite/blob/master/examples/src/main/java/org/apache/ignite/examples/ml/dataset/CacheBasedDatasetExample.java

If you follow the above example, remove the persons.destroy() statement,
remove Ignite from the auto close block,  run it, and connect via web
console, you would see that the original persons data set remains intact.

If for some reason you do need the helper cache that was created to train
the model then do as follows: 1. create your own: MyCacheBasedDataSet
extends CacheBaseDataSet
2. override the close() method. This is not recommended for prod, but could
be useful for debugging the models.


Thanks, Alex



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/