Difference between Ignite Kafka Streamer and Kafka consumer (2.0)
We consume data from Kafka, process (look up on various cache & some business
logic) consumed data and add it to the IgniteDataStreamer. Which one should
be used to consume data from Kafka in Ignite application? And what is
difference between Ignite Kafka Streamer and Kafka consumer (2.0)?
Re: Difference between Ignite Kafka Streamer and Kafka consumer (2.0)
About your question "What is difference between Ignite Kafka Streamer and Kafka consumer?": Kafka Consumer is Kafka API to poll data from Kafka while Ignite Kafka Streamer is a complete solution to stream data from Kafka into Ignite. Internally Kafka Streamer starts one or many Kafka Consumers and uses IgniteDataStreamer API to efficiently load data into Ignite.
You can use Kafka Streamer to process data: Kafka Streamer's "setMultipleTupleExtractor" API allows you to convert Kafka records into Cache entries and IgniteDataStreamer's Receiver API allows you flexibly add entries to multiple caches if you need it.
Kafka Streamer's solution will probably meet your performance requirements: Kafka Consumer API is inherently scalable so you can increase processing performance by starting multiple Kafka Streamer servers.
The only big concern is fault tolerance: if your Kafka Streamer server fails during processing so that already consumed Kafka record does not go into Ignite, the data is lost.
The ignite-kafka module (that contains Kafka Streamer) also includes Ignite-Kafka Source and Sink Connector. Kafka Sink Connector is used for the same purpose - streaming data from Kafka into Ignite - but it addresses both the performance and fault tolerance requirements. If Sink Connector fails while processing Kafka records, the records are not removed from Kafka and will be processed by another Sink Connector instance or after the Connector restart.
But while Kafka Streamer and Sink Connector serve same purpose, the architecture and API is very different: Kafka Streamer "logically" belongs to Ignite cluster and you use IgniteDataStreamer API for configuration and processing. Ignite Sink Connector is part of Kafka Connect cluster and you will use Kafka Connect configuration and API for processing. Kafka Connect infrastructure exposes REST API to manage Kafka Connect cluster (something that IgniteDataStreamer does not have). But introducing Kafka Connect into your system makes the architecture more complex.