Cache misses with complex cache keys that have fields not relevant for hashCode/equals

classic Classic list List threaded Threaded
5 messages Options
Axel Faust Axel Faust
Reply | Threaded
Open this post in threaded view
|

Cache misses with complex cache keys that have fields not relevant for hashCode/equals

Hello,

I have been working on integrating Apache Ignite as a distributed
caching layer in the open source edition of the Alfresco Content
Services product. As this would be an extension, I don't have full
control over the kinds of keys used in cache operations. One default
cache in particular is using - among others - a complex cache key object
where at least one instance field is not relevant for the purpose of
establishing equality. Only when a lookup key object is set to the exact
same internal state as the key used for the cache put operation,
including the field not relevant for equality, will a cache get
operation actually hit the existing entry and return the expected cached
value.

I have read in
https://apacheignite.readme.io/docs/binary-marshaller#binaryobject-and-cachestore
that the Ignite BinaryObject class provides automatic hash code / equals
implementation. But I have found no details for how these
implementations treat different types of fields, e.g. dependent on
modifiers, or how to change the default behaviour without modifying the
key class in question. My (maybe naiive) assumption was that, if a value
class actually provides a hashCode operation that overrides the Object
default, then that would be respected.

By looking through the source, I have found the interface
BinaryIdentityResolver which sounds like it could be helpful in my case.
Unfortunately, since I can never know in advance what types of objects
users of my extension will use as keys, I can hardly configure a custom
binary type configuration for all possible / potential classes. Is there
any other way to deal with this kind of situation?


For reference, the following GitHub Gist contains the simple unit test I
set up to verify this was an issue with Ignite handling of cache keys
and not something in my implementation on top of Ignite:
https://gist.github.com/AFaust/e52ca1008a71b3e386a34f0fa63274be


Regards

Axel Faust

ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Cache misses with complex cache keys that have fields not relevant for hashCode/equals

Hello!


Usually it is not wise to have large complex keys, especially ones that you do not control. Can cause all sorts of issues.

You can declare your key Externalizable, though, and have control over its marshalling. That, or Binarylizable. Then you get to decide how everything is processed.

Regards,

чт, 20 июн. 2019 г., 17:45 Axel Faust <[hidden email]>:
Hello,

I have been working on integrating Apache Ignite as a distributed
caching layer in the open source edition of the Alfresco Content
Services product. As this would be an extension, I don't have full
control over the kinds of keys used in cache operations. One default
cache in particular is using - among others - a complex cache key object
where at least one instance field is not relevant for the purpose of
establishing equality. Only when a lookup key object is set to the exact
same internal state as the key used for the cache put operation,
including the field not relevant for equality, will a cache get
operation actually hit the existing entry and return the expected cached
value.

I have read in
https://apacheignite.readme.io/docs/binary-marshaller#binaryobject-and-cachestore
that the Ignite BinaryObject class provides automatic hash code / equals
implementation. But I have found no details for how these
implementations treat different types of fields, e.g. dependent on
modifiers, or how to change the default behaviour without modifying the
key class in question. My (maybe naiive) assumption was that, if a value
class actually provides a hashCode operation that overrides the Object
default, then that would be respected.

By looking through the source, I have found the interface
BinaryIdentityResolver which sounds like it could be helpful in my case.
Unfortunately, since I can never know in advance what types of objects
users of my extension will use as keys, I can hardly configure a custom
binary type configuration for all possible / potential classes. Is there
any other way to deal with this kind of situation?


For reference, the following GitHub Gist contains the simple unit test I
set up to verify this was an issue with Ignite handling of cache keys
and not something in my implementation on top of Ignite:
https://gist.github.com/AFaust/e52ca1008a71b3e386a34f0fa63274be


Regards

Axel Faust

Axel Faust Axel Faust
Reply | Threaded
Open this post in threaded view
|

Re: Cache misses with complex cache keys that have fields not relevant for hashCode/equals

Maybe the term "complex" is a bit misleading. I used it to differentiate the kind of composite key used from the trivial, e.g. String/Long keys that might be used for simple entity lookups. In my immediate case, the key is a small immutable object with two hash relevant fields + one non-hash relevant field + one field to cache the logical hash code ([1]). It may be wrapped by other classes of objects used to logically separate cache keys (e.g. by tenants) or otherwise introduce necessary differentiations on keys ([2] is a common example). All known types of cache keys I have seen in ~9 years working with the base software have been stable / immutable and safe for cache use with any other cache framework used, e.g. in the commercial edition (previously EhCache , Hazelcast nowadays).

With "I don't have full control" I meant that I have no option in any way to alter / modify the classes of the cache key objects. I am not an employee of the vendor in control of the base product nor is it likely I can get a pull request to change those classes approved just for the purpose of my project. I am just a community contributor aiming to develop a third party extension to the base product that provides a horizontal scaling option, as the open source edition of the base product is single-server only without a distributed caching layer. So using Externalizable / Binarylizable is not an option.

So, providing a custom BinaryIdentityResolver seems like the perfect solution, as it would allow me to handle hashCode and equals explicitly without modifying the original classes. The primary question / issue I have is that there does not seem to be a way to register a global default, other than e.g. for the object serializer. And the secondary question would be if I overlooked a simpler solution apart from the ones not available to me due to the circumstances (e.g. use of serialisation interfaces).


[1] https://github.com/Alfresco/alfresco-data-model/blob/master/src/main/java/org/alfresco/service/namespace/QName.java
[2] https://github.com/Alfresco/alfresco-repository/blob/master/src/main/java/org/alfresco/repo/cache/lookup/CacheRegionValueKey.java

On 21/06/2019 13:18, Ilya Kasnacheev wrote:
Hello!


Usually it is not wise to have large complex keys, especially ones that you do not control. Can cause all sorts of issues.

You can declare your key Externalizable, though, and have control over its marshalling. That, or Binarylizable. Then you get to decide how everything is processed.

Regards,

чт, 20 июн. 2019 г., 17:45 Axel Faust <[hidden email]>:
Hello,

I have been working on integrating Apache Ignite as a distributed
caching layer in the open source edition of the Alfresco Content
Services product. As this would be an extension, I don't have full
control over the kinds of keys used in cache operations. One default
cache in particular is using - among others - a complex cache key object
where at least one instance field is not relevant for the purpose of
establishing equality. Only when a lookup key object is set to the exact
same internal state as the key used for the cache put operation,
including the field not relevant for equality, will a cache get
operation actually hit the existing entry and return the expected cached
value.

I have read in
https://apacheignite.readme.io/docs/binary-marshaller#binaryobject-and-cachestore
that the Ignite BinaryObject class provides automatic hash code / equals
implementation. But I have found no details for how these
implementations treat different types of fields, e.g. dependent on
modifiers, or how to change the default behaviour without modifying the
key class in question. My (maybe naiive) assumption was that, if a value
class actually provides a hashCode operation that overrides the Object
default, then that would be respected.

By looking through the source, I have found the interface
BinaryIdentityResolver which sounds like it could be helpful in my case.
Unfortunately, since I can never know in advance what types of objects
users of my extension will use as keys, I can hardly configure a custom
binary type configuration for all possible / potential classes. Is there
any other way to deal with this kind of situation?


For reference, the following GitHub Gist contains the simple unit test I
set up to verify this was an issue with Ignite handling of cache keys
and not something in my implementation on top of Ignite:
https://gist.github.com/AFaust/e52ca1008a71b3e386a34f0fa63274be


Regards

Axel Faust

ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Cache misses with complex cache keys that have fields not relevant for hashCode/equals

Hello!

I don't think there is a better solution. Ignite uses binary representation of key to build its B+ tree. If some non-transient field is different then it is a different key, even if it is not used in hashCode.

Regards,
--
Ilya Kasnacheev


пт, 21 июн. 2019 г. в 14:48, Axel Faust <[hidden email]>:

Maybe the term "complex" is a bit misleading. I used it to differentiate the kind of composite key used from the trivial, e.g. String/Long keys that might be used for simple entity lookups. In my immediate case, the key is a small immutable object with two hash relevant fields + one non-hash relevant field + one field to cache the logical hash code ([1]). It may be wrapped by other classes of objects used to logically separate cache keys (e.g. by tenants) or otherwise introduce necessary differentiations on keys ([2] is a common example). All known types of cache keys I have seen in ~9 years working with the base software have been stable / immutable and safe for cache use with any other cache framework used, e.g. in the commercial edition (previously EhCache , Hazelcast nowadays).

With "I don't have full control" I meant that I have no option in any way to alter / modify the classes of the cache key objects. I am not an employee of the vendor in control of the base product nor is it likely I can get a pull request to change those classes approved just for the purpose of my project. I am just a community contributor aiming to develop a third party extension to the base product that provides a horizontal scaling option, as the open source edition of the base product is single-server only without a distributed caching layer. So using Externalizable / Binarylizable is not an option.

So, providing a custom BinaryIdentityResolver seems like the perfect solution, as it would allow me to handle hashCode and equals explicitly without modifying the original classes. The primary question / issue I have is that there does not seem to be a way to register a global default, other than e.g. for the object serializer. And the secondary question would be if I overlooked a simpler solution apart from the ones not available to me due to the circumstances (e.g. use of serialisation interfaces).


[1] https://github.com/Alfresco/alfresco-data-model/blob/master/src/main/java/org/alfresco/service/namespace/QName.java
[2] https://github.com/Alfresco/alfresco-repository/blob/master/src/main/java/org/alfresco/repo/cache/lookup/CacheRegionValueKey.java

On 21/06/2019 13:18, Ilya Kasnacheev wrote:
Hello!


Usually it is not wise to have large complex keys, especially ones that you do not control. Can cause all sorts of issues.

You can declare your key Externalizable, though, and have control over its marshalling. That, or Binarylizable. Then you get to decide how everything is processed.

Regards,

чт, 20 июн. 2019 г., 17:45 Axel Faust <[hidden email]>:
Hello,

I have been working on integrating Apache Ignite as a distributed
caching layer in the open source edition of the Alfresco Content
Services product. As this would be an extension, I don't have full
control over the kinds of keys used in cache operations. One default
cache in particular is using - among others - a complex cache key object
where at least one instance field is not relevant for the purpose of
establishing equality. Only when a lookup key object is set to the exact
same internal state as the key used for the cache put operation,
including the field not relevant for equality, will a cache get
operation actually hit the existing entry and return the expected cached
value.

I have read in
https://apacheignite.readme.io/docs/binary-marshaller#binaryobject-and-cachestore
that the Ignite BinaryObject class provides automatic hash code / equals
implementation. But I have found no details for how these
implementations treat different types of fields, e.g. dependent on
modifiers, or how to change the default behaviour without modifying the
key class in question. My (maybe naiive) assumption was that, if a value
class actually provides a hashCode operation that overrides the Object
default, then that would be respected.

By looking through the source, I have found the interface
BinaryIdentityResolver which sounds like it could be helpful in my case.
Unfortunately, since I can never know in advance what types of objects
users of my extension will use as keys, I can hardly configure a custom
binary type configuration for all possible / potential classes. Is there
any other way to deal with this kind of situation?


For reference, the following GitHub Gist contains the simple unit test I
set up to verify this was an issue with Ignite handling of cache keys
and not something in my implementation on top of Ignite:
https://gist.github.com/AFaust/e52ca1008a71b3e386a34f0fa63274be


Regards

Axel Faust

Axel Faust Axel Faust
Reply | Threaded
Open this post in threaded view
|

Re: Cache misses with complex cache keys that have fields not relevant for hashCode/equals

Well, looks like BinaryIdentityResolver is a dead-end as well. Though the interface still exists, there is no way to configure a custom implementation for a particular type. Despite issues IGNITE-4889, IGNITE-4919, IGNITE-4977 and discussion in [1] stating it should be removed or is deprecated, the interface itself has not been marked as @Deprecated, leading me to believe it is still a viable option. So writing a custom BinarySerializer seems to be the only option remaining.

[1] http://apache-ignite-developers.2346864.n4.nabble.com/Stable-binary-key-representation-td15904.html

On 21/06/2019 18:06, Ilya Kasnacheev wrote:
Hello!

I don't think there is a better solution. Ignite uses binary representation of key to build its B+ tree. If some non-transient field is different then it is a different key, even if it is not used in hashCode.

Regards,
--
Ilya Kasnacheev


пт, 21 июн. 2019 г. в 14:48, Axel Faust <[hidden email]>:

Maybe the term "complex" is a bit misleading. I used it to differentiate the kind of composite key used from the trivial, e.g. String/Long keys that might be used for simple entity lookups. In my immediate case, the key is a small immutable object with two hash relevant fields + one non-hash relevant field + one field to cache the logical hash code ([1]). It may be wrapped by other classes of objects used to logically separate cache keys (e.g. by tenants) or otherwise introduce necessary differentiations on keys ([2] is a common example). All known types of cache keys I have seen in ~9 years working with the base software have been stable / immutable and safe for cache use with any other cache framework used, e.g. in the commercial edition (previously EhCache , Hazelcast nowadays).

With "I don't have full control" I meant that I have no option in any way to alter / modify the classes of the cache key objects. I am not an employee of the vendor in control of the base product nor is it likely I can get a pull request to change those classes approved just for the purpose of my project. I am just a community contributor aiming to develop a third party extension to the base product that provides a horizontal scaling option, as the open source edition of the base product is single-server only without a distributed caching layer. So using Externalizable / Binarylizable is not an option.

So, providing a custom BinaryIdentityResolver seems like the perfect solution, as it would allow me to handle hashCode and equals explicitly without modifying the original classes. The primary question / issue I have is that there does not seem to be a way to register a global default, other than e.g. for the object serializer. And the secondary question would be if I overlooked a simpler solution apart from the ones not available to me due to the circumstances (e.g. use of serialisation interfaces).


[1] https://github.com/Alfresco/alfresco-data-model/blob/master/src/main/java/org/alfresco/service/namespace/QName.java
[2] https://github.com/Alfresco/alfresco-repository/blob/master/src/main/java/org/alfresco/repo/cache/lookup/CacheRegionValueKey.java

On 21/06/2019 13:18, Ilya Kasnacheev wrote:
Hello!


Usually it is not wise to have large complex keys, especially ones that you do not control. Can cause all sorts of issues.

You can declare your key Externalizable, though, and have control over its marshalling. That, or Binarylizable. Then you get to decide how everything is processed.

Regards,

чт, 20 июн. 2019 г., 17:45 Axel Faust <[hidden email]>:
Hello,

I have been working on integrating Apache Ignite as a distributed
caching layer in the open source edition of the Alfresco Content
Services product. As this would be an extension, I don't have full
control over the kinds of keys used in cache operations. One default
cache in particular is using - among others - a complex cache key object
where at least one instance field is not relevant for the purpose of
establishing equality. Only when a lookup key object is set to the exact
same internal state as the key used for the cache put operation,
including the field not relevant for equality, will a cache get
operation actually hit the existing entry and return the expected cached
value.

I have read in
https://apacheignite.readme.io/docs/binary-marshaller#binaryobject-and-cachestore
that the Ignite BinaryObject class provides automatic hash code / equals
implementation. But I have found no details for how these
implementations treat different types of fields, e.g. dependent on
modifiers, or how to change the default behaviour without modifying the
key class in question. My (maybe naiive) assumption was that, if a value
class actually provides a hashCode operation that overrides the Object
default, then that would be respected.

By looking through the source, I have found the interface
BinaryIdentityResolver which sounds like it could be helpful in my case.
Unfortunately, since I can never know in advance what types of objects
users of my extension will use as keys, I can hardly configure a custom
binary type configuration for all possible / potential classes. Is there
any other way to deal with this kind of situation?


For reference, the following GitHub Gist contains the simple unit test I
set up to verify this was an issue with Ignite handling of cache keys
and not something in my implementation on top of Ignite:
https://gist.github.com/AFaust/e52ca1008a71b3e386a34f0fa63274be


Regards

Axel Faust