Downsides of Spark-Ignite for extended cache management and access?

classic Classic list List threaded Threaded
3 messages Options
phalverson phalverson
Reply | Threaded
Open this post in threaded view
|

Downsides of Spark-Ignite for extended cache management and access?

My company provides big data analytics for large banks (managing and analyzing their loan portfolios). We have a number of applications that are fundamentally grid-based, but which tend to use different frameworks to handle grid computation. We are considering shifting these to a common Hadoop stack to consolidate infrastructure and provide a more uniform way of managing our services, as well as providing more options for different classes of analytics (MR, Streaming, etc.).

One of these applications seems  to be a good fit for Ignite (lots of concurrent low-latency queries against a massive but highly-partitionable dataset) and, possibly, Spark (distributed batch computing). It's the latter I'm uncertain about. I understand the general concept of the IgniteRDD as a bridge to a distributed Ignite cache (or set of them), but do I give anything up by deploying our app as a Spark job, vs. a custom YARN app that hosts Ignite nodes?  I'm specifically looking at implications for:

  - affinity (both Data with Data and Compute with Data)
  - advanced SQL queries (cross-cache joins, aggregations, etc)
  - persistence (warm-up, write-through)
  - transactions

If I can still have all the benefits of IgniteCache while going the virtual RDD, then Spark seems a good fit, but I want to be clear on any limitations that such an abstraction might impose. Appreciate any guidance here.
vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: Downsides of Spark-Ignite for extended cache management and access?

Hi,

IgniteRDD is useful when you already have a Spark application and want to use Ignite as an underlying storage to share the state between different Spark jobs (which is not possible with plain Spark) and to get advantage of fast indexed SQL queries provided by Ignite (there are no indexes in Spark).

IgniteRDD provides full RDD API, but cache API is limited. For example, it doesn't expose transactions. If you're creating an application from scratch, I would recommend to use Ignite API directly with all its features (cache, compute, streaming, etc.).

Makes sense?

-Val
phalverson phalverson
Reply | Threaded
Open this post in threaded view
|

Re: Downsides of Spark-Ignite for extended cache management and access?

Thanks. That was precisely my concern, and I appreciate the guidance.