How about adding kryo or protostuff as an optional marshaller?

classic Classic list List threaded Threaded
11 messages Options
Jackey Jackey
Reply | Threaded
Open this post in threaded view
|

How about adding kryo or protostuff as an optional marshaller?

Hi all,

I would like to find a more compacted marshaller to save the network bandwidth in Ignite clusters. 
From the benchmark result https://github.com/eishay/jvm-serializers/wiki , It looks like the protostuff and kryo works better than other serializers.

Is it a good idea to use them as an optional marshaller? How to do it? Hope for your suggestions.


Best regards,

Lin.
dsetrakyan dsetrakyan
Reply | Threaded
Open this post in threaded view
|

Re: How about adding kryo or protostuff as an optional marshaller?

I highly doubt these marshallers will be more compact than Ignite binary marshaller. Have you tested it?

On Thu, Jul 14, 2016 at 4:01 PM, Lin <[hidden email]> wrote:
Hi all,

I would like to find a more compacted marshaller to save the network bandwidth in Ignite clusters. 
From the benchmark result https://github.com/eishay/jvm-serializers/wiki , It looks like the protostuff and kryo works better than other serializers.

Is it a good idea to use them as an optional marshaller? How to do it? Hope for your suggestions.


Best regards,

Lin.

Jackey Jackey
Reply | Threaded
Open this post in threaded view
|

Re: How about adding kryo or protostuff as an optional marshaller?

Hi Denis,

sorry for the late response, I have run some cases to test the serialization performance for Ignite Binary marshaller and protostuff.

I test most of the primitive types and some custom classes from https://github.com/eishay/jvm-serializers/wiki (case MEDIACONTENT_1 and MEDIACONTENT_2).

I post the key codes on benchmark on the gist https://gist.github.com/jackeylu/32f9fa35abb84faf42fb25993ec70874

Here is my result, I currently mainly considering the serialization size. It looks like protostuff works better than ignite binary marshaller in most cases.

protostuff ignite-binary-marshaller
Case AvgCompressedSize(bytes) AvgCompressedSize(bytes)
MEDIACONTENT_1 267 452
MEDIACONTENT_2 308 531
NULL_TYPE 0 1
TANGOL_SIMPLE_LONG_ARRAY 90 103
CONTACT 212 291
FIXED_CONTACT 175 280
GENERAL 366 422
INT_TYPE 4 5
INT_ARRAY 2952 4101
INT_RANDOM_ARRAY 8602 4101
DOUBLE_TYPE 11 9
DOUBLE_ARRAY 9224 8197
BYTE_TYPE 4 2
BYTE_ARRAY 1029 1029
STRING_TYPE 16 17
STRING_ARRAY 37896 40965
MAP_TYPE 26550 26544
LIST_TYPE 15289 16304
SET_TYPE 12214 13232


I can also provide some hex dump for custom classes as follow,

Here is MEDIACONTENT_1  and MEDIACONTENT_2 from Ignite.



and here two are from protostuff.



------------------ Original ------------------
From:  "Dmitriy Setrakyan";<[hidden email]>;
Date:  Thu, Jul 14, 2016 09:28 PM
To:  "user"<[hidden email]>;
Subject:  Re: How about adding kryo or protostuff as an optional marshaller?

I highly doubt these marshallers will be more compact than Ignite binary marshaller. Have you tested it?

On Thu, Jul 14, 2016 at 4:01 PM, Lin <[hidden email]> wrote:
Hi all,

I would like to find a more compacted marshaller to save the network bandwidth in Ignite clusters. 
From the benchmark result https://github.com/eishay/jvm-serializers/wiki , It looks like the protostuff and kryo works better than other serializers.

Is it a good idea to use them as an optional marshaller? How to do it? Hope for your suggestions.


Best regards,

Lin.


ignite-media-content-1.png (38K) Download Attachment
vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: How about adding kryo or protostuff as an optional marshaller?

Hi Lin,

Do you have a GitHub project that I can run and compare these two marshallers? From these snippets it's not very clear what is actually serialized.

Generally, Ignite does provide minimal overhead in the binary format, mainly to allow field lookups without deserialization, which is crucial for SQL queries, for example. However, even with this overhead, there is no much difference in numbers. I believe that in most real use cases this difference will be negligible.

However, you can always try to introduce custom serialization protocol. Simply implement Marshaller interface and provide the implementation in IgniteConfiguration.

-Val
Jackey Jackey
Reply | Threaded
Open this post in threaded view
|

Re: How about adding kryo or protostuff as an optional marshaller?

Hi Val,

I post the codes in GitHub https://github.com/jackeylu/marshaller-cmp, you can run and compare it.

I am so glad that you can help me to choose the right serializes. I am not sure my cases is fair or not.

And from my tests, I found that,
1. in most of the case of primitive types or jdk.* types, protostuff not work better than ignite binary marshaller, but I think it does'n matter in real world.
2. in the case of user defined objects, protostuff can save average 40% capacity than ignite binary marshaller. Here the custom defined objects are MEDIA_CONTENT_1 and MEDIA_CONTENT_2 which are from https://github.com/eishay/jvm-serializers/blob/master/tpc/data/media.1.cks and https://github.com/eishay/jvm-serializers/blob/master/tpc/data/media.2.cks



------------------ Original ------------------
From:  "valentin.kulichenko";<[hidden email]>;
Date:  Tue, Jul 19, 2016 06:01 AM
To:  "user"<[hidden email]>;
Subject:  Re: How about adding kryo or protostuff as an optional marshaller?

Hi Lin,

Do you have a GitHub project that I can run and compare these two
marshallers? From these snippets it's not very clear what is actually
serialized.

Generally, Ignite does provide minimal overhead in the binary format, mainly
to allow field lookups without deserialization, which is crucial for SQL
queries, for example. However, even with this overhead, there is no much
difference in numbers. I believe that in most real use cases this difference
will be negligible.

However, you can always try to introduce custom serialization protocol.
Simply implement Marshaller interface and provide the implementation in
IgniteConfiguration.

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/How-about-adding-kryo-or-protostuff-as-an-optional-marshaller-tp6309p6361.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.
avk avk
Reply | Threaded
Open this post in threaded view
|

Re: How about adding kryo or protostuff as an optional marshaller?

Lin,


TL;DR

Do not use Kryo with Ignite.


We tried using Kryo a while back (pre-Binary Objects times) and it didn't work. To add insult to injury, it didn't work in the worst possible way: it would appear to work just fine, no exceptions or anything like that. But then you'd discover that, for example, a join query returns fewer rows than expected. It turns out that a replicated cache used on the right side of the join is actually missing data on some of the nodes of the Ignite cluster where the query runs. After some long and painful investigation, we concluded that Kryo was the culprit.


The reason is that as soon as you configure your own custom marshaller, Ignite starts using it for marshalling everything, including most of its own internal classes. Realize that your data is not stored in cache or transferred between the nodes directly. In all cases it's wrapped in Ignite internal classes that then get serialized/deserialized. Some of such internal classes are in fact have specialized readObject/writeObject, readResolve/writeReplace routines defined. By default, Kryo ignores such methods and simply tries to serialize the fields directly using its FieldSerializer, which of course doesn't always work. In order to make Kryo work with Ignite you'd have to register a specific Kryo Serializer and, in some cases, the Instantiator strategy for each internal serializable/externalizable Ignite class!


We didn't think such approach was feasible, so we switched to Binary Objects and are pretty happy with it. It is quite compact and sufficiently fast. The best thing about Binary Objects (at least for us) is the ability to access specific fields of the application data objects without going thru full deserialization. Overall, I believe Ignite provides sufficient means for making marshalling overhead as small as possible.


Regards

Andrey




From: Lin <[hidden email]>
Sent: Monday, July 18, 2016 7:54 PM
To: valentin.kulichenko
Subject: Re: How about adding kryo or protostuff as an optional marshaller?
 
Hi Val,

I post the codes in GitHub https://github.com/jackeylu/marshaller-cmp, you can run and compare it.

I am so glad that you can help me to choose the right serializes. I am not sure my cases is fair or not.

And from my tests, I found that,
1. in most of the case of primitive types or jdk.* types, protostuff not work better than ignite binary marshaller, but I think it does'n matter in real world.
2. in the case of user defined objects, protostuff can save average 40% capacity than ignite binary marshaller. Here the custom defined objects are MEDIA_CONTENT_1 and MEDIA_CONTENT_2 which are from https://github.com/eishay/jvm-serializers/blob/master/tpc/data/media.1.cks and https://github.com/eishay/jvm-serializers/blob/master/tpc/data/media.2.cks



------------------ Original ------------------
From:  "valentin.kulichenko";<[hidden email]>;
Date:  Tue, Jul 19, 2016 06:01 AM
To:  "user"<[hidden email]>;
Subject:  Re: How about adding kryo or protostuff as an optional marshaller?

Hi Lin,

Do you have a GitHub project that I can run and compare these two
marshallers? From these snippets it's not very clear what is actually
serialized.

Generally, Ignite does provide minimal overhead in the binary format, mainly
to allow field lookups without deserialization, which is crucial for SQL
queries, for example. However, even with this overhead, there is no much
difference in numbers. I believe that in most real use cases this difference
will be negligible.

However, you can always try to introduce custom serialization protocol.
Simply implement Marshaller interface and provide the implementation in
IgniteConfiguration.

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/How-about-adding-kryo-or-protostuff-as-an-optional-marshaller-tp6309p6361.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.
Jackey Jackey
Reply | Threaded
Open this post in threaded view
|

Re: How about adding kryo or protostuff as an optional marshaller?

Hi Andrey,

Than you for your advice, we will consider it seriously.  BTW, it looks like all of the general serializers will work like Kyro, as we don't know any information about user's special readObject/writeObject routines , so protostuff may also meet these problems? Is it right?

Thanks again.

Lin
avk avk
Reply | Threaded
Open this post in threaded view
|

Re: How about adding kryo or protostuff as an optional marshaller?

Lin,


I have no experience with protostuff, but if it relies on the user-defined proto files for generation of the serialization/deserialization stubs, then you'd have to provide the proto files for all serializable Ignite classes.


You decide whether or not it's worth the effort.


Of course, the Ignite community may be kind enough to contribute the Kryo- and protostuff-based marshaller implementations that know how to correctly marshal the Ignite classes. The community would then also ensure that any code changes in the Ignite core would be properly reflected in the Kryo and protostuff marshallers (by keeping the proto files up to date, for example).


Having said that, I don't see much benefit of using such marshallers over the ones already available out of the box: OptimizedMarshaller, JdkMarshaller and the default one - BinaryMarshaller. Maybe some kind of benchmark could be developed to compare Ignite-provided serializers to Kryo, protostuff and others.


Regards

Andrey


From: Lin <[hidden email]>
Sent: Tuesday, July 19, 2016 5:58 AM
To: user
Subject: Re: How about adding kryo or protostuff as an optional marshaller?
 
Hi Andrey,

Than you for your advice, we will consider it seriously.  BTW, it looks like all of the general serializers will work like Kyro, as we don't know any information about user's special readObject/writeObject routines , so protostuff may also meet these problems? Is it right?

Thanks again.

Lin
Jackey Jackey
Reply | Threaded
Open this post in threaded view
|

Re: How about adding kryo or protostuff as an optional marshaller?

Hi Andrey,

Thanks for your response and advice. 

FYI, protostuff-runtime[1] does not need any *.proto files, and can generate the schema for you to cache and use at runtime via reflections[2]. I have post my benchmark codes in github[3], for customized class like [4] with two different cases, the protostuff format generated by protostuff-runtime can save about 40% capacity than Ignite Binary Marshaller.
I am not sure it is fair enough or not, any suggestion are welcome.

In the meanwhile, protostuff have some pitfalls for null elements in collection and arrays [5]. If we can not handle these, we would not be able to replace ignite binary marshaller with protostuff.



[1] http://www.protostuff.io/documentation/runtime-schema/
[2] http://www.protostuff.io/documentation/schema/
[3] https://github.com/jackeylu/marshaller-cmp
[4] https://github.com/jackeylu/marshaller-cmp/blob/master/modules/PofObjects/src/main/java/data/media/GenMediaContent.java
[5] https://github.com/protostuff/protostuff/issues/192


Regards,
Lin
Pavel Tupitsyn Pavel Tupitsyn
Reply | Threaded
Open this post in threaded view
|

Re: How about adding kryo or protostuff as an optional marshaller?

Hi Lin,

For a fair comparison, you have to use raw mode in Ignite, as discussed there: 

Pavel.

On Wed, Jul 20, 2016 at 4:50 AM, Lin <[hidden email]> wrote:
Hi Andrey,

Thanks for your response and advice. 

FYI, protostuff-runtime[1] does not need any *.proto files, and can generate the schema for you to cache and use at runtime via reflections[2]. I have post my benchmark codes in github[3], for customized class like [4] with two different cases, the protostuff format generated by protostuff-runtime can save about 40% capacity than Ignite Binary Marshaller.
I am not sure it is fair enough or not, any suggestion are welcome.

In the meanwhile, protostuff have some pitfalls for null elements in collection and arrays [5]. If we can not handle these, we would not be able to replace ignite binary marshaller with protostuff.




vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: How about adding kryo or protostuff as an optional marshaller?

I actually think that even comparing with the raw mode is not completely fair (but this is definitely much closer to be fair). Any serialization protocol based on precompiled schema will be very compact, because it provides almost zero overhead (protostuff doesn't require .proto files, but still requires to generate serialization code for POJOs). Such protocols are extremely compact, but functionally limited and mainly used in messaging systems. For example, we use something very similar internally in Ignite for communication between nodes (see TcpCommunicationSpi code and Message interface if interested in implementation details).

Binary format provides much more features. It is designed to avoid deserialization on server nodes, at the same time allowing to lookup field values and even run SQL queries. With the binary format you can also add any objects into the cache (even without changing class definitions at all) and dynamically change the schema. Obviously, all this adds meta information into the protocol, but Ignite's binary format is still very compact if you compare it with others that provide similar functionality.

-Val