Workaround for getting ContinuousQuery to support transactions

classic Classic list List threaded Threaded
14 messages Options
ssansoy ssansoy
Reply | Threaded
Open this post in threaded view
|

Workaround for getting ContinuousQuery to support transactions

Following on from:
http://apache-ignite-users.70518.x6.nabble.com/ContinuousQuery-Batch-updates-td34198.html

The takeaway from there is that the continuous queries do not honour
transactions, so if a writer writes 100 records (e.g. CalculationParameters)
in a transaction, the continuous query will see the updates before the
entire batch of 100 has been committed.
This is a show stopper issue for us for using ignite unfortunately so we are
trying to think of some work arounds.

We are considering updating the writer app so, when writing the 100
CalculationParameters records, it:

1. Writes the 100 CalculationParameter records to the cluster (e.g. with
some incremented version e.g. 2)
2. It writes a separate entry into a special "Modifications" cache, with the
number of rows written (numRows), and the version id of those records
(versionId).

Client apps don't subscribe to the CalculationParameter cache. Instead apps
subscribe to the Modifications cache.

The remote filter will:

1. Do a cache.get or a scan query on all the records in
CalculationParameter, filtering on versionId=2. The query has to keep
repeating until all rows are visible (e.g. the numRows records are seen).
Then these records are subjected to another filter (e.g. the usual filter
criteria the the client app would have applied to CalculationParameter). If
there are any records, then the filter returns true.

2. The transformer does a similar process to the above, and groups all the
numRows records into a collection, and returns this collection to the
localListen in the client. The client then has access to all the records it
is interested in, in one batch.

We would have liked to avoid waiting/retrying for all the
CalculationParameter records to appear after the Modifications update is
seen, but since this is happening on the cluster and not on the client we
can probably live with it.

Do any ignite developers in here see any fundamental flaws with this
approach? We ultimately just want the localListen to be called with a
collection of records that were all updated at the same time - so if there
are any other things worth trying please shout.
Thanks!
Sham



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Workaround for getting ContinuousQuery to support transactions

Hello!

Nothing obviously wrong but it still seems to me that you're stretching it too far.

Maybe you need to rethink your cache entry granularity (put more stuff in the entry to make it self-contained) or use some kind of message queue.

Regards,
--
Ilya Kasnacheev


вт, 20 окт. 2020 г. в 19:37, ssansoy <[hidden email]>:
Following on from:
http://apache-ignite-users.70518.x6.nabble.com/ContinuousQuery-Batch-updates-td34198.html

The takeaway from there is that the continuous queries do not honour
transactions, so if a writer writes 100 records (e.g. CalculationParameters)
in a transaction, the continuous query will see the updates before the
entire batch of 100 has been committed.
This is a show stopper issue for us for using ignite unfortunately so we are
trying to think of some work arounds.

We are considering updating the writer app so, when writing the 100
CalculationParameters records, it:

1. Writes the 100 CalculationParameter records to the cluster (e.g. with
some incremented version e.g. 2)
2. It writes a separate entry into a special "Modifications" cache, with the
number of rows written (numRows), and the version id of those records
(versionId).

Client apps don't subscribe to the CalculationParameter cache. Instead apps
subscribe to the Modifications cache.

The remote filter will:

1. Do a cache.get or a scan query on all the records in
CalculationParameter, filtering on versionId=2. The query has to keep
repeating until all rows are visible (e.g. the numRows records are seen).
Then these records are subjected to another filter (e.g. the usual filter
criteria the the client app would have applied to CalculationParameter). If
there are any records, then the filter returns true.

2. The transformer does a similar process to the above, and groups all the
numRows records into a collection, and returns this collection to the
localListen in the client. The client then has access to all the records it
is interested in, in one batch.

We would have liked to avoid waiting/retrying for all the
CalculationParameter records to appear after the Modifications update is
seen, but since this is happening on the cluster and not on the client we
can probably live with it.

Do any ignite developers in here see any fundamental flaws with this
approach? We ultimately just want the localListen to be called with a
collection of records that were all updated at the same time - so if there
are any other things worth trying please shout.
Thanks!
Sham



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ssansoy ssansoy
Reply | Threaded
Open this post in threaded view
|

Re: Workaround for getting ContinuousQuery to support transactions

Hi thanks for the reply. Appreciate the suggestion - and if creating a new
solution around this, we would likely take that tact. Unfortunately the
entire platform we are looking to migrate over to ignite has dependencies in
places for updates to come in as a complete batch (e.g. whatever was in an
update transaction).

We've experimented with putting a queue in the client as you say, with a
timeout which gathers all sequentially arriving updates from the continuous
query and grouping them together after a timeout of e.g. 50ms. However this
is quite fragile/timing sensitive and not something we can comfortably put
into production.

Are there any locking or signaling mechanisms (or anything else really) that
might help us here? E.g. we buffer the updates in the client, and await some
signal that the updates are complete. This signal would need to be fired
after the continuous query has seen all the updates. E.g. the writer will:

Write 10,000 records to the cache
Notify something

The client app will:
Receive 10,000 updates, 1 at a time, queueing them up locally
Upon that notification, drain this queue and process the records.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Workaround for getting ContinuousQuery to support transactions

Hello!

You may actually use our data streamer (with allowOverwrite false).

Once you call flush() on it and it returns, you should be confident that all 10,000 entries are readable from cache. Of course it has to be 1 cache.

Regards,
--
Ilya Kasnacheev


вт, 27 окт. 2020 г. в 13:18, ssansoy <[hidden email]>:
Hi thanks for the reply. Appreciate the suggestion - and if creating a new
solution around this, we would likely take that tact. Unfortunately the
entire platform we are looking to migrate over to ignite has dependencies in
places for updates to come in as a complete batch (e.g. whatever was in an
update transaction).

We've experimented with putting a queue in the client as you say, with a
timeout which gathers all sequentially arriving updates from the continuous
query and grouping them together after a timeout of e.g. 50ms. However this
is quite fragile/timing sensitive and not something we can comfortably put
into production.

Are there any locking or signaling mechanisms (or anything else really) that
might help us here? E.g. we buffer the updates in the client, and await some
signal that the updates are complete. This signal would need to be fired
after the continuous query has seen all the updates. E.g. the writer will:

Write 10,000 records to the cache
Notify something

The client app will:
Receive 10,000 updates, 1 at a time, queueing them up locally
Upon that notification, drain this queue and process the records.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ssansoy ssansoy
Reply | Threaded
Open this post in threaded view
|

Re: Workaround for getting ContinuousQuery to support transactions

Thanks,
How is this different to multiple puts inside a transaction?

By using the data streamer to write the records, does that mean the
continuous query will receive all 10,000 records in one go in the local
listen?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Workaround for getting ContinuousQuery to support transactions

Hello!

No, it does not mean anything about the continuous query listener.

But it means that once a data streamer is flushed, all data is available in caches.

Regards,
--
Ilya Kasnacheev


вт, 3 нояб. 2020 г. в 16:28, ssansoy <[hidden email]>:
Thanks,
How is this different to multiple puts inside a transaction?

By using the data streamer to write the records, does that mean the
continuous query will receive all 10,000 records in one go in the local
listen?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ssansoy ssansoy
Reply | Threaded
Open this post in threaded view
|

Re: Workaround for getting ContinuousQuery to support transactions

Ah ok so this wouldn't help solve our problem?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Workaround for getting ContinuousQuery to support transactions

Hello!

It may handle your use case (doing something when all records are in cache).
But it will not fix the tool that you're using for it (continuous query with expectation of batched handling).

Regards,
--
Ilya Kasnacheev


пт, 6 нояб. 2020 г. в 16:41, ssansoy <[hidden email]>:
Ah ok so this wouldn't help solve our problem?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ssansoy ssansoy
Reply | Threaded
Open this post in threaded view
|

Re: Workaround for getting ContinuousQuery to support transactions

Yeah the key thing is to be able to be notified when all records have been
updated in the cache.
We've tried using IgniteLock or this too by the way (e.g. the writer locks,
writes the records, unlocks).

Then the client app, internally queues all updates as they arrive from the
continuous query. If it can acquire the lock, then it knows all updates have
arrived (because the writer has finished). However this doesn't work either,
because even though the writer has unlocked, the written records are still
in transit to the continuous query (e.g. they don't exist in the internal
client side queue yet)



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Workaround for getting ContinuousQuery to support transactions

Hello!

After you flush the data streamer, you may put a token entry into a small cache to be picked by continuous query. When query handler is triggered, all the data will already be available from caches.

The difference with transactional behavior is that transactions promise (and fail to deliver) "at the same time" guarantee, whilst data streamer will deliver on "after" guarantee.

Regards,
--
Ilya Kasnacheev


пт, 6 нояб. 2020 г. в 19:16, ssansoy <[hidden email]>:
Yeah the key thing is to be able to be notified when all records have been
updated in the cache.
We've tried using IgniteLock or this too by the way (e.g. the writer locks,
writes the records, unlocks).

Then the client app, internally queues all updates as they arrive from the
continuous query. If it can acquire the lock, then it knows all updates have
arrived (because the writer has finished). However this doesn't work either,
because even though the writer has unlocked, the written records are still
in transit to the continuous query (e.g. they don't exist in the internal
client side queue yet)



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ssansoy ssansoy
Reply | Threaded
Open this post in threaded view
|

Re: Workaround for getting ContinuousQuery to support transactions

Thanks for this,

We considering this approach - writing all the entries to some table V, and
then updating a separate token cache T with a signal, picked up by the
continuous query, which then filters the underlying V records, transforms
them and sends them to the client.

However, one problem we ran into - is we lose the "old" values from the
underlying table V. Normally the continuous query has access to the new
value and the old value, so the client app can detect which entries no
longer match the remote filter. With this technique however, the continuous
query remote transformer has the old and new value of T, but is ultimately
doing a ScanQuery on V to get all the "current" values.

Do you have any advice on how we can still achieve that?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Workaround for getting ContinuousQuery to support transactions

Hello!

You can have a transformer on your data streamer to do something to the old value (e.g. keep some fields of it in new value V).

Regards,
--
Ilya Kasnacheev


пн, 9 нояб. 2020 г. в 14:52, ssansoy <[hidden email]>:
Thanks for this,

We considering this approach - writing all the entries to some table V, and
then updating a separate token cache T with a signal, picked up by the
continuous query, which then filters the underlying V records, transforms
them and sends them to the client.

However, one problem we ran into - is we lose the "old" values from the
underlying table V. Normally the continuous query has access to the new
value and the old value, so the client app can detect which entries no
longer match the remote filter. With this technique however, the continuous
query remote transformer has the old and new value of T, but is ultimately
doing a ScanQuery on V to get all the "current" values.

Do you have any advice on how we can still achieve that?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ssansoy ssansoy
Reply | Threaded
Open this post in threaded view
|

Re: Workaround for getting ContinuousQuery to support transactions

interesting! might just work. We will try it out.
E.g. A chance of 500 V's. V has fields a, b, c, (b=foo on all records) and
some client app wants to run a continuous query on all V where b=foo, or was
=foo but now is not following the update.

The writer updates 100 V's, by setting b=bar on all records, and some
incrementing version int N
The datastreamer transformer mutates V by adding a new field called
"changes" which contains b=foo to denote that only the field b was changed,
and it's old value was foo. (e.g. a set of {fieldname, oldvalue}, {.... )
The writer updates the V_signal cache to denote a change was made, with
version N.

The client continuous query listens to the V_signal cache. When it receives
an update (denoting V updates have occurred), it does a scanquery on V in
the transformer, (scan query filters the records that were updated as part
of version N, and either the fields we care about match our predicate, or
the "changes" field are one of the ones we are interested in and match the
predicate).

These are batched up as a collection and returned to the client. Does this
seem like a reasonable approach?




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Workaround for getting ContinuousQuery to support transactions

Hello!

I think it's OK to try.

Regards,
--
Ilya Kasnacheev


пн, 9 нояб. 2020 г. в 19:56, ssansoy <[hidden email]>:
interesting! might just work. We will try it out.
E.g. A chance of 500 V's. V has fields a, b, c, (b=foo on all records) and
some client app wants to run a continuous query on all V where b=foo, or was
=foo but now is not following the update.

The writer updates 100 V's, by setting b=bar on all records, and some
incrementing version int N
The datastreamer transformer mutates V by adding a new field called
"changes" which contains b=foo to denote that only the field b was changed,
and it's old value was foo. (e.g. a set of {fieldname, oldvalue}, {.... )
The writer updates the V_signal cache to denote a change was made, with
version N.

The client continuous query listens to the V_signal cache. When it receives
an update (denoting V updates have occurred), it does a scanquery on V in
the transformer, (scan query filters the records that were updated as part
of version N, and either the fields we care about match our predicate, or
the "changes" field are one of the ones we are interested in and match the
predicate).

These are batched up as a collection and returned to the client. Does this
seem like a reasonable approach?




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/