incorrect partition map exchange behaviour

classic Classic list List threaded Threaded
7 messages Options
tschauenberg tschauenberg
Reply | Threaded
Open this post in threaded view
|

incorrect partition map exchange behaviour

Hi,

We have a cluster of Ignite 2.8.1 server nodes and have recently started
looking at the individual cache metrics for primary keys
org.apache.ignite.internal.processors.cache.CacheLocalMetricsMXBeanImpl.OffHeapPrimaryEntriesCount

In our configuration we have a replicated cache with 2 backups.  Our cluster
has 3 nodes in it so the primaries should be spread equally on the 3 nodes
and each node has backups from the other two nodes.  All these server nodes
are in the baseline.  Additionally we have some thick clients connected but
I don't think they are relevant to the discussion.

Whenever we do a rolling restart one node at a time, at the end after the
last node is restarted it always owns zero primaries and owns solely
backups.  The two nodes restarted earlier during the rolling restart own all
the primaries.

When our cluster is in this scenario, if we start and stop visor, when visor
leaves the cluster it triggers a PME where all keys get balanced on all
server nodes.  Looking at the visor cache stats between the start and stop
we can see a min of 0 keys on the nodes for our cache so visor and the jmx
metrics line up on that front.  After stopping visor, the jmx metrics show
the evenly distributed primaries and then starting visor a second time we
can confirm that again the min, average, max node keys are all evenly
distributed.

Every join and leave during the rolling restart and during visor start/stop
shows reflects a topology increment and node leave and join events in the
logs.  

According to
https://cwiki.apache.org/confluence/display/IGNITE/%2528Partition+Map%2529+Exchange+-+under+the+hood
each leave and join should trigger the PME but we only see the keys changing
on the leaves.

Additionally, we tried waiting longer between the stop and start part of the
rolling restart to see if that had any effect.  We ensured we waited long
enough for a PME to do any moving but waiting longer didn't have any effect.
The stop always has the PME move the keys off that node and the start never
sees the PME move any primaries back.

Why are we only seeing the PME change keys when nodes (server or visor) stop
and never when they join?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
tschauenberg tschauenberg
Reply | Threaded
Open this post in threaded view
|

Re: incorrect partition map exchange behaviour

Here's my attempt to demonstrate and also provide logs

Standup 3 node cluster and load with data


Using a thick client, 250k devices are loaded into the device cache.  The
thick client then leaves.  There's one other thick client connected the
whole time for serving requests but I think that's irrelevant for the test
but want to point it out in case someone notices there's still a client
connected.

Show topology from logs of the client leaving:


[2021-01-08T23:08:05.012Z][INFO][disco-event-worker-#40][GridDiscoveryManager]
Node left topology: TcpDiscoveryNode
[id=611e30ee-b7c6-4ead-a746-f609b206cfb4,
consistentId=611e30ee-b7c6-4ead-a746-f609b206cfb4, addrs=ArrayList
[127.0.0.1, 172.17.0.3], sockAddrs=HashSet [/127.0.0.1:0, /172.17.0.3:0],
discPort=0, order=7, intOrder=6, lastExchangeTime=1610146373751, loc=false,
ver=2.8.1#20200521-sha1:86422096, isClient=true]
[2021-01-08T23:08:05.013Z][INFO][disco-event-worker-#40][GridDiscoveryManager]
Topology snapshot [ver=8, locNode=75e4ddea, servers=3, clients=1,
state=ACTIVE, CPUs=7, offheap=3.0GB, heap=3.1GB]

Start visor on one of the nodes


Show topology from logs


[2021-01-08T23:30:33.461Z][INFO][tcp-disco-msg-worker-[4ea8efe1
10.12.3.76:47500]-#2][TcpDiscoverySpi] New next node
[newNext=TcpDiscoveryNode [id=1cca94e3-f15f-4a8b-9f65-d9b9055a5fa7,
consistentId=10.12.2.110:47501, addrs=ArrayList [10.12.2.110],
sockAddrs=HashSet [/10.12.2.110:47501], discPort=47501, order=0, intOrder=7,
lastExchangeTime=1610148633458, loc=false, ver=2.8.1#20200521-sha1:86422096,
isClient=false]]
[2021-01-08T23:30:34.045Z][INFO][sys-#1011][GridDhtPartitionsExchangeFuture]
Completed partition exchange
[localNode=75e4ddea-1927-4e93-82e9-fdfbb7b58d1c,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=9, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode
[id=1cca94e3-f15f-4a8b-9f65-d9b9055a5fa7, consistentId=10.12.2.110:47501,
addrs=ArrayList [10.12.2.110], sockAddrs=HashSet [/10.12.2.110:47501],
discPort=47501, order=9, intOrder=7, lastExchangeTime=1610148633458,
loc=false, ver=2.8.1#20200521-sha1:86422096, isClient=false], done=true,
newCrdFut=null], topVer=AffinityTopologyVersion [topVer=9, minorTopVer=0]]

Show data balanced in visor


+-----------------------+-------------+-------+---------------------------------+-----------------------------------+-----------+-----------+-----------+----------------+
| Devices(@c2)          | PARTITIONED | 3     | 250000 (0 / 250000)            
| min: 80315 (0 / 80315)            | min: 0    | min: 0    | min: 0    |
min: 250000    |
|                       |             |       |                                
| avg: 83333.33 (0.00 / 83333.33)   | avg: 0.00 | avg: 0.00 | avg: 0.00 |
avg: 250000.00 |
|                       |             |       |                                
| max: 86968 (0 / 86968)            | max: 0    | max: 0    | max: 0    |
max: 250000    |
+-----------------------+-------------+-------+---------------------------------+-----------------------------------+-----------+-----------+-----------+----------------+

At this point the data is all relatively balanced and the topology increased
when visor connected.

Stop ignite on one node


Show topology and PME from logs (from a different ignite node as the ignite
process was stopped)


[2021-01-08T23:35:39.333Z][INFO][disco-event-worker-#40][GridDiscoveryManager]
Node left topology: TcpDiscoveryNode
[id=75e4ddea-1927-4e93-82e9-fdfbb7b58d1c,
consistentId=3a4a497f-5a89-4f2c-8531-b2b05f2ede22, addrs=ArrayList
[10.12.2.110], sockAddrs=HashSet [/10.12.2.110:47500], discPort=47500,
order=3, intOrder=3, lastExchangeTime=1610139164908, loc=false,
ver=2.8.1#20200521-sha1:86422096, isClient=false]
[2021-01-08T23:35:39.333Z][INFO][disco-event-worker-#40][GridDiscoveryManager]
Topology snapshot [ver=10, locNode=4ea8efe1, servers=2, clients=1,
state=ACTIVE, CPUs=5, offheap=2.0GB, heap=2.1GB]
[2021-01-08T23:35:39.333Z][INFO][disco-event-worker-#40][GridDiscoveryManager]  
^-- Baseline [id=0, size=3, online=2, offline=1]
[2021-01-08T23:35:39.335Z][INFO][exchange-worker-#41][time] Started exchange
init [topVer=AffinityTopologyVersion [topVer=10, minorTopVer=0], crd=true,
evt=NODE_LEFT, evtNode=75e4ddea-1927-4e93-82e9-fdfbb7b58d1c, customEvt=null,
allowMerge=false, exchangeFreeSwitch=true]
[2021-01-08T23:35:39.338Z][INFO][sys-#1031][GridAffinityAssignmentCache]
Local node affinity assignment distribution is not ideal [cache=Households,
expectedPrimary=512.00, actualPrimary=548, expectedBackups=1024.00,
actualBackups=476, warningThreshold=50.00%]
[2021-01-08T23:35:39.340Z][INFO][sys-#1032][GridAffinityAssignmentCache]
Local node affinity assignment distribution is not ideal [cache=Devices,
expectedPrimary=512.00, actualPrimary=548, expectedBackups=1024.00,
actualBackups=476, warningThreshold=50.00%]
[2021-01-08T23:35:39.354Z][INFO][exchange-worker-#41][GridDhtPartitionsExchangeFuture]
Finished waiting for partition release future
[topVer=AffinityTopologyVersion [topVer=10, minorTopVer=0], waitTime=0ms,
futInfo=NA, mode=DISTRIBUTED]
[2021-01-08T23:35:39.357Z][INFO][exchange-worker-#41][GridDhtPartitionsExchangeFuture]
Finished waiting for partitions release latch: ServerLatch [permits=0,
pendingAcks=HashSet [], super=CompletableLatch [id=CompletableLatchUid
[id=exchange, topVer=AffinityTopologyVersion [topVer=10, minorTopVer=0]]]]
[2021-01-08T23:35:39.365Z][INFO][exchange-worker-#41][GridDhtPartitionsExchangeFuture]
Finish exchange future [startVer=AffinityTopologyVersion [topVer=10,
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=10, minorTopVer=0],
err=null, rebalanced=true, wasRebalanced=true]
[2021-01-08T23:35:39.370Z][INFO][exchange-worker-#41][GridDhtPartitionsExchangeFuture]
Completed partition exchange
[localNode=4ea8efe1-0f92-4232-bbab-2cb232ed04f9,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=10, minorTopVer=0], evt=NODE_LEFT, evtNode=TcpDiscoveryNode
[id=75e4ddea-1927-4e93-82e9-fdfbb7b58d1c,
consistentId=3a4a497f-5a89-4f2c-8531-b2b05f2ede22, addrs=ArrayList
[10.12.2.110], sockAddrs=HashSet [/10.12.2.110:47500], discPort=47500,
order=3, intOrder=3, lastExchangeTime=1610139164908, loc=false,
ver=2.8.1#20200521-sha1:86422096, isClient=false], done=true,
newCrdFut=null], topVer=AffinityTopologyVersion [topVer=10, minorTopVer=0]]
[2021-01-08T23:35:39.370Z][INFO][exchange-worker-#41][GridDhtPartitionsExchangeFuture]
Exchange timings [startVer=AffinityTopologyVersion [topVer=10,
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=10, minorTopVer=0],
stage="Waiting in exchange queue" (0 ms), stage="Exchange parameters
initialization" (0 ms), stage="Determine exchange type" (15 ms),
stage="Preloading notification" (0 ms), stage="WAL history reservation" (2
ms), stage="Wait partitions release" (0 ms), stage="Wait partitions release
latch" (2 ms), stage="After states restored callback" (5 ms),
stage="Finalize update counters" (2 ms), stage="Detect lost partitions" (1
ms), stage="Exchange done" (3 ms), stage="Total time" (30 ms)]
[2021-01-08T23:35:39.370Z][INFO][exchange-worker-#41][GridDhtPartitionsExchangeFuture]
Exchange longest local stages [startVer=AffinityTopologyVersion [topVer=10,
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=10, minorTopVer=0],
stage="Affinity initialization (exchange-free switch on fully-rebalanced
topology) [grp=Households]" (15 ms) (parent=Determine exchange type),
stage="Affinity initialization (exchange-free switch on fully-rebalanced
topology) [grp=CorpHeadend]" (14 ms) (parent=Determine exchange type),
stage="Affinity initialization (exchange-free switch on fully-rebalanced
topology) [grp=AttributeOptions]" (13 ms) (parent=Determine exchange type)]
[2021-01-08T23:35:39.370Z][INFO][exchange-worker-#41][time] Finished
exchange init [topVer=AffinityTopologyVersion [topVer=10, minorTopVer=0],
crd=true]
[2021-01-08T23:35:39.380Z][INFO][exchange-worker-#41][GridCachePartitionExchangeManager]
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
[topVer=10, minorTopVer=0], force=false, evt=NODE_LEFT,
node=75e4ddea-1927-4e93-82e9-fdfbb7b58d1c]

Show data in visor with data on the two nodes


+-----------------------+-------------+-------+---------------------------------+-----------------------------------+-----------+-----------+-----------+----------------+
| Devices(@c2)          | PARTITIONED | 2     | 250000 (0 / 250000)            
| min: 116261 (0 / 116261)          | min: 0    | min: 0    | min: 0    |
min: 250000    |
|                       |             |       |                                
| avg: 125000.00 (0.00 / 125000.00) | avg: 0.00 | avg: 0.00 | avg: 0.00 |
avg: 250000.00 |
|                       |             |       |                                
| max: 133739 (0 / 133739)          | max: 0    | max: 0    | max: 0    |
max: 250000    |
+-----------------------+-------------+-------+---------------------------------+-----------------------------------+-----------+-----------+-----------+----------------+

At this point there's only 2 nodes with data for this cache and it got
nicely balanced when the ignite process was stopped.

Start the stopped ignite


Show topology and PME from logs (on node where ignite started)


[2021-01-08T23:39:42.817Z][INFO][exchange-worker-#42][GridDhtPartitionsExchangeFuture]
Skipped waiting for partitions release future (local node is joining)
[topVer=AffinityTopologyVersion [topVer=11, minorTopVer=0]]
[2021-01-08T23:39:43.006Z][INFO][db-checkpoint-thread-#54][GridCacheDatabaseSharedManager]
Checkpoint started [checkpointId=07c7fa75-d1dd-4955-964e-7d61bddbf05b,
startPtr=FileWALPointer [idx=7, fileOff=39245141, len=118735],
checkpointBeforeLockTime=91ms, checkpointLockWait=0ms,
checkpointListenersExecuteTime=29ms, checkpointLockHoldTime=51ms,
walCpRecordFsyncDuration=8ms, writeCheckpointEntryDuration=5ms,
splitAndSortCpPagesDuration=14ms,  pages=2079, reason='node started']
[2021-01-08T23:39:43.163Z][INFO][grid-nio-worker-tcp-comm-0-#24][TcpCommunicationSpi]
Established outgoing communication connection [locAddr=/10.12.2.110:34662,
rmtAddr=/10.12.3.76:47100]
[2021-01-08T23:39:43.167Z][INFO][exchange-worker-#42][time] Finished
exchange init [topVer=AffinityTopologyVersion [topVer=11, minorTopVer=0],
crd=false]
[2021-01-08T23:39:43.368Z][INFO][sys-#48][GridDhtPartitionsExchangeFuture]
Received full message, will finish exchange
[node=4ea8efe1-0f92-4232-bbab-2cb232ed04f9, resVer=AffinityTopologyVersion
[topVer=11, minorTopVer=0]]
[2021-01-08T23:39:43.494Z][INFO][sys-#48][GridDhtPartitionsExchangeFuture]
Finish exchange future [startVer=AffinityTopologyVersion [topVer=11,
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=11, minorTopVer=0],
err=null, rebalanced=false, wasRebalanced=false]
[2021-01-08T23:39:43.554Z][INFO][sys-#48][GridDhtPartitionsExchangeFuture]
Completed partition exchange
[localNode=b909d450-11cd-471f-b49e-85c6f300f704,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=11, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode
[id=b909d450-11cd-471f-b49e-85c6f300f704,
consistentId=3a4a497f-5a89-4f2c-8531-b2b05f2ede22, addrs=ArrayList
[10.12.2.110], sockAddrs=HashSet [/10.12.2.110:47500], discPort=47500,
order=11, intOrder=8, lastExchangeTime=1610149183525, loc=true,
ver=2.8.1#20200521-sha1:86422096, isClient=false], done=true,
newCrdFut=null], topVer=AffinityTopologyVersion [topVer=11, minorTopVer=0]]
[2021-01-08T23:39:43.556Z][INFO][sys-#48][GridDhtPartitionsExchangeFuture]
Exchange timings [startVer=AffinityTopologyVersion [topVer=11,
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=11, minorTopVer=0],
stage="Waiting in exchange queue" (53 ms), stage="Exchange parameters
initialization" (3 ms), stage="Components activation" (22 ms),
stage="Determine exchange type" (50 ms), stage="Preloading notification" (0
ms), stage="WAL history reservation" (5 ms), stage="Restore partition
states" (11 ms), stage="After states restored callback" (147 ms),
stage="Waiting for Full message" (392 ms), stage="Affinity recalculation"
(89 ms), stage="Full map updating" (36 ms), stage="Detect lost partitions"
(4 ms), stage="Exchange done" (57 ms), stage="Total time" (869 ms)]
[2021-01-08T23:39:43.558Z][INFO][sys-#48][GridDhtPartitionsExchangeFuture]
Exchange longest local stages [startVer=AffinityTopologyVersion [topVer=11,
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=11, minorTopVer=0],
stage="Affinity fetch" (48 ms) (parent=Determine exchange type),
stage="Restore partition states [grp=ignite-sys-cache]" (11 ms)
(parent=Restore partition states), stage="Restore partition states
[grp=MasterVersion]" (11 ms) (parent=Restore partition states),
stage="Restore partition states [grp=Devices]" (11 ms) (parent=Restore
partition states), stage="Affinity initialization (local join)
[grp=Devices]" (89 ms) (parent=Affinity recalculation), stage="Affinity
initialization (local join) [grp=Households]" (88 ms) (parent=Affinity
recalculation), stage="Affinity initialization (local join)
[grp=MasterVersion]" (67 ms) (parent=Affinity recalculation)]
[2021-01-08T23:39:43.588Z][INFO][exchange-worker-#42][GridCachePartitionExchangeManager]
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
[topVer=11, minorTopVer=0], force=false, evt=NODE_JOINED,
node=b909d450-11cd-471f-b49e-85c6f300f704]
[2021-01-08T23:39:43.642Z][INFO][main][GridDiscoveryManager] Topology
snapshot [ver=11, locNode=b909d450, servers=3, clients=1, state=ACTIVE,
CPUs=7, offheap=3.0GB, heap=3.1GB]
[2021-01-08T23:39:43.643Z][INFO][main][GridDiscoveryManager]   ^-- Baseline
[id=0, size=3, online=3, offline=0]
[2021-01-08T23:39:43.645Z][INFO][main][G] Node started : [stage="Configure
system pool" (65 ms),stage="Start managers" (1113 ms),stage="Configure
binary metadata" (102 ms),stage="Start processors" (744 ms),stage="Init
metastore" (899 ms),stage="Init and start regions" (10 ms),stage="Restore
binary memory" (989 ms),stage="Restore logical state" (2450
ms),stage="Finish recovery" (112 ms),stage="Join topology" (1498
ms),stage="Await transition" (19 ms),stage="Await exchange" (951
ms),stage="Total time" (8952 ms)]
[2021-01-08T23:39:44.550Z][INFO][db-checkpoint-thread-#54][GridCacheDatabaseSharedManager]
Checkpoint finished [cpId=07c7fa75-d1dd-4955-964e-7d61bddbf05b, pages=2079,
markPos=FileWALPointer [idx=7, fileOff=39245141, len=118735],
walSegmentsCleared=0, walSegmentsCovered=[], markDuration=83ms,
pagesWrite=184ms, fsync=1359ms, total=1717ms]


Show data with a min of zero primaries in visor


+-----------------------+-------------+-------+---------------------------------+-----------------------------------+-----------+-----------+-----------+----------------+
| Devices(@c2)          | PARTITIONED | 3     | 250000 (0 / 250000)            
| min: 0 (0 / 0)                    | min: 0    | min: 0    | min: 0    |
min: 0         |
|                       |             |       |                                
| avg: 83333.33 (0.00 / 83333.33)   | avg: 0.00 | avg: 0.00 | avg: 0.00 |
avg: 166666.67 |
|                       |             |       |                                
| max: 133739 (0 / 133739)          | max: 0    | max: 0    | max: 0    |
max: 250000    |
+-----------------------+-------------+-------+---------------------------------+-----------------------------------+-----------+-----------+-----------+----------------+

At this point the stopped ignite process was restarted but notice the
primaries were not balanced to it upon joining.

Stop visor


Show topology and PME from logs


[2021-01-08T23:44:07.824Z][INFO][grid-nio-worker-tcp-comm-1-#25][TcpCommunicationSpi]
Accepted incoming communication connection [locAddr=/10.12.2.110:47100,
rmtAddr=/10.12.2.110:37510]
[2021-01-08T23:44:35.597Z][INFO][tcp-disco-srvr-[:47500]-#3][TcpDiscoverySpi]
TCP discovery accepted incoming connection [rmtAddr=/10.12.4.242,
rmtPort=42615]
[2021-01-08T23:44:35.598Z][INFO][tcp-disco-srvr-[:47500]-#3][TcpDiscoverySpi]
TCP discovery spawning a new thread for connection [rmtAddr=/10.12.4.242,
rmtPort=42615]
[2021-01-08T23:44:35.598Z][INFO][tcp-disco-sock-reader-[]-#7][TcpDiscoverySpi]
Started serving remote node connection [rmtAddr=/10.12.4.242:42615,
rmtPort=42615]
[2021-01-08T23:44:35.599Z][INFO][tcp-disco-sock-reader-[63d7cf0d
10.12.4.242:42615]-#7][TcpDiscoverySpi] Initialized connection with remote
server node [nodeId=63d7cf0d-f8b6-4fe2-ba99-4bbd26731451,
rmtAddr=/10.12.4.242:42615]
[2021-01-08T23:44:35.621Z][INFO][exchange-worker-#42][time] Started exchange
init [topVer=AffinityTopologyVersion [topVer=12, minorTopVer=0], crd=false,
evt=NODE_LEFT, evtNode=1cca94e3-f15f-4a8b-9f65-d9b9055a5fa7, customEvt=null,
allowMerge=true, exchangeFreeSwitch=false]
[2021-01-08T23:44:35.632Z][INFO][exchange-worker-#42][GridDhtPartitionsExchangeFuture]
Finished waiting for partition release future
[topVer=AffinityTopologyVersion [topVer=12, minorTopVer=0], waitTime=0ms,
futInfo=NA, mode=DISTRIBUTED]
[2021-01-08T23:44:35.644Z][INFO][exchange-worker-#42][GridDhtPartitionsExchangeFuture]
Finished waiting for partitions release latch: ClientLatch
[coordinator=TcpDiscoveryNode [id=4ea8efe1-0f92-4232-bbab-2cb232ed04f9,
consistentId=3b21ae44-c9cc-4e04-9e79-9807d22a6d50, addrs=ArrayList
[10.12.3.76], sockAddrs=HashSet [/10.12.3.76:47500], discPort=47500,
order=1, intOrder=1, lastExchangeTime=1610149182437, loc=false,
ver=2.8.1#20200521-sha1:86422096, isClient=false], ackSent=true,
super=CompletableLatch [id=CompletableLatchUid [id=exchange,
topVer=AffinityTopologyVersion [topVer=12, minorTopVer=0]]]]
[2021-01-08T23:44:35.644Z][INFO][exchange-worker-#42][GridDhtPartitionsExchangeFuture]
Finished waiting for partition release future
[topVer=AffinityTopologyVersion [topVer=12, minorTopVer=0], waitTime=0ms,
futInfo=NA, mode=LOCAL]
[2021-01-08T23:44:35.688Z][INFO][exchange-worker-#42][time] Finished
exchange init [topVer=AffinityTopologyVersion [topVer=12, minorTopVer=0],
crd=false]
[2021-01-08T23:44:35.704Z][INFO][tcp-disco-sock-reader-[1cca94e3
10.12.2.110:58161]-#5][TcpDiscoverySpi] Finished serving remote node
connection [rmtAddr=/10.12.2.110:58161, rmtPort=58161
[2021-01-08T23:44:35.761Z][INFO][sys-#80][GridDhtPartitionsExchangeFuture]
Received full message, will finish exchange
[node=4ea8efe1-0f92-4232-bbab-2cb232ed04f9, resVer=AffinityTopologyVersion
[topVer=12, minorTopVer=0]]
[2021-01-08T23:44:35.802Z][INFO][sys-#80][GridDhtPartitionsExchangeFuture]
Finish exchange future [startVer=AffinityTopologyVersion [topVer=12,
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=12, minorTopVer=0],
err=null, rebalanced=true, wasRebalanced=false]
[2021-01-08T23:44:35.820Z][INFO][sys-#80][GridDhtPartitionsExchangeFuture]
Completed partition exchange
[localNode=b909d450-11cd-471f-b49e-85c6f300f704,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=12, minorTopVer=0], evt=NODE_LEFT, evtNode=TcpDiscoveryNode
[id=1cca94e3-f15f-4a8b-9f65-d9b9055a5fa7, consistentId=10.12.2.110:47501,
addrs=ArrayList [10.12.2.110], sockAddrs=HashSet [/10.12.2.110:47501],
discPort=47501, order=9, intOrder=7, lastExchangeTime=1610149182447,
loc=false, ver=2.8.1#20200521-sha1:86422096, isClient=false], done=true,
newCrdFut=null], topVer=AffinityTopologyVersion [topVer=12, minorTopVer=0]]
[2021-01-08T23:44:35.820Z][INFO][sys-#80][GridDhtPartitionsExchangeFuture]
Exchange timings [startVer=AffinityTopologyVersion [topVer=12,
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=12, minorTopVer=0],
stage="Waiting in exchange queue" (0 ms), stage="Exchange parameters
initialization" (0 ms), stage="Determine exchange type" (0 ms),
stage="Preloading notification" (0 ms), stage="WAL history reservation" (2
ms), stage="Wait partitions release" (13 ms), stage="Wait partitions release
latch" (7 ms), stage="Wait partitions release" (0 ms), stage="After states
restored callback" (0 ms), stage="Finalize update counters" (19 ms),
stage="Waiting for Full message" (96 ms), stage="Affinity recalculation" (31
ms), stage="Full map updating" (9 ms), stage="Detect lost partitions" (2
ms), stage="Exchange done" (15 ms), stage="Total time" (194 ms)]
[2021-01-08T23:44:35.820Z][INFO][sys-#80][GridDhtPartitionsExchangeFuture]
Exchange longest local stages [startVer=AffinityTopologyVersion [topVer=12,
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=12, minorTopVer=0],
stage="Affinity applying from full message [grp=AttributeOptions]" (9 ms)
(parent=Affinity recalculation), stage="Affinity applying from full message
[grp=CorpHeadend]" (8 ms) (parent=Affinity recalculation), stage="Affinity
applying from full message [grp=Devices]" (6 ms) (parent=Affinity
recalculation)]
[2021-01-08T23:44:35.824Z][INFO][exchange-worker-#42][GridCachePartitionExchangeManager]
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
[topVer=12, minorTopVer=0], force=false, evt=NODE_LEFT,
node=1cca94e3-f15f-4a8b-9f65-d9b9055a5fa7]

With visor leaving the cluster now, this triggers the PME for the primary
keys.  Will be starting visor again to show that.

Start visor again


Show topology and PME from logs


[2021-01-08T23:46:52.977Z][INFO][tcp-disco-srvr-[:47500]-#3][TcpDiscoverySpi]
TCP discovery accepted incoming connection [rmtAddr=/10.12.2.110,
rmtPort=42455]
[2021-01-08T23:46:52.977Z][INFO][tcp-disco-srvr-[:47500]-#3][TcpDiscoverySpi]
TCP discovery spawning a new thread for connection [rmtAddr=/10.12.2.110,
rmtPort=42455]
[2021-01-08T23:46:52.982Z][INFO][tcp-disco-sock-reader-[]-#8][TcpDiscoverySpi]
Started serving remote node connection [rmtAddr=/10.12.2.110:42455,
rmtPort=42455]
[2021-01-08T23:46:52.984Z][INFO][tcp-disco-sock-reader-[6215e66a
10.12.2.110:42455]-#8][TcpDiscoverySpi] Initialized connection with remote
server node [nodeId=6215e66a-b021-410f-8e3a-aaa317f19609,
rmtAddr=/10.12.2.110:42455]
rmtPort=424553:46:53.041Z][INFO][tcp-disco-sock-reader-[6215e66a
10.12.2.110:42455]-#8][TcpDiscoverySpi] Finished serving remote node
connection [rmtAddr=/10.12.2.110:42455,
[2021-01-08T23:46:53.049Z][INFO][tcp-disco-msg-worker-[4ea8efe1
10.12.3.76:47500]-#2][TcpDiscoverySpi] New next node
[newNext=TcpDiscoveryNode [id=6215e66a-b021-410f-8e3a-aaa317f19609,
consistentId=10.12.2.110:47501, addrs=ArrayList [10.12.2.110],
sockAddrs=HashSet [/10.12.2.110:47501], discPort=47501, order=0, intOrder=9,
lastExchangeTime=1610149613043, loc=false, ver=2.8.1#20200521-sha1:86422096,
isClient=false]]
[2021-01-08T23:46:53.405Z][INFO][exchange-worker-#42][time] Started exchange
init [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=0], crd=false,
evt=NODE_JOINED, evtNode=6215e66a-b021-410f-8e3a-aaa317f19609,
customEvt=null, allowMerge=true, exchangeFreeSwitch=false]
[2021-01-08T23:46:53.410Z][INFO][exchange-worker-#42][GridDhtPartitionsExchangeFuture]
Finished waiting for partition release future
[topVer=AffinityTopologyVersion [topVer=13, minorTopVer=0], waitTime=0ms,
futInfo=NA, mode=DISTRIBUTED]
e [id=4ea8efe1-0f92-4232-bbab-2cb232ed04f9,
consistentId=3b21ae44-c9cc-4e04-9e79-9807d22a6d50, addrs=ArrayList
[10.12.3.76], sockAddrs=HashSet [/10.12.3.76:47500], discPort=47500,
order=1, intOrder=1, lastExchangeTime=1610149182437, loc=false,
ver=2.8.1#20200521-sha1:86422096, isClient=false], ackSent=true,
super=CompletableLatch [id=CompletableLatchUid [id=exchange,
topVer=AffinityTopologyVersion [topVer=13, minorTopVer=0]]]]
[2021-01-08T23:46:53.412Z][INFO][exchange-worker-#42][GridDhtPartitionsExchangeFuture]
Finished waiting for partition release future
[topVer=AffinityTopologyVersion [topVer=13, minorTopVer=0], waitTime=0ms,
futInfo=NA, mode=LOCAL]
[2021-01-08T23:46:53.437Z][INFO][exchange-worker-#42][time] Finished
exchange init [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=0],
crd=false]
nityTopologyVersion [topVer=13, minorTopVer=0]]htPartitionsExchangeFuture]
Received full message, will finish exchange
[node=4ea8efe1-0f92-4232-bbab-2cb232ed04f9,
resVer=Affi[2021-01-08T23:46:53.587Z][INFO][sys-#98][GridDhtPartitionsExchangeFuture]
Finish exchange future [startVer=AffinityTopologyVersion [topVer=13,
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=13, minorTopVer=0],
err=null, rebalanced=true, wasRebalanced=true]
[2021-01-08T23:46:53.598Z][INFO][sys-#98][GridDhtPartitionsExchangeFuture]
Completed partition exchange
[localNode=b909d450-11cd-471f-b49e-85c6f300f704,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=13, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode
[id=6215e66a-b021-410f-8e3a-aaa317f19609, consistentId=10.12.2.110:47501,
addrs=ArrayList [10.12.2.110], sockAddrs=HashSet [/10.12.2.110:47501],
discPort=47501, order=13, intOrder=9, lastExchangeTime=1610149613043,
loc=false, ver=2.8.1#20200521-sha1:86422096, isClient=false], done=true,
newCrdFut=null], topVer=AffinityTopologyVersion [topVer=13, minorTopVer=0]]
[2021-01-08T23:46:53.602Z][INFO][sys-#98][GridDhtPartitionsExchangeFuture]
Exchange timings [startVer=AffinityTopologyVersion [topVer=13,
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=13, minorTopVer=0],
stage="Waiting in exchange queue" (0 ms), stage="Exchange parameters
initialization" (0 ms), stage="Determine exchange type" (1 ms),
stage="Preloading notification" (0 ms), stage="WAL history reservation" (2
ms), stage="Wait partitions release" (0 ms), stage="Wait partitions release
latch" (1 ms), stage="Wait partitions release" (0 ms), stage="After states
restored callback" (0 ms), stage="Waiting for Full message" (102 ms),
stage="Affinity recalculation" (60 ms), stage="Full map updating" (11 ms),
stage="Detect lost partitions" (2 ms), stage="Exchange done" (9 ms),
stage="Total time" (188 ms)]
AffinityTopologyVersion [topVer=13, minorTopVer=0], stage="Affinity
initialization (node join) [grp=CorpHeadend, crd=false]" (33 ms)
(parent=Affinity recalculation), stage="Affinity initialization (node join)
[grp=Devices, crd=false]" (27 ms) (parent=Affinity recalculation),
stage="Affinity initialization (node join) [grp=MasterVersion, crd=false]"
(22 ms) (parent=Affinity recalculation)]
[2021-01-08T23:46:53.614Z][INFO][exchange-worker-#42][GridCachePartitionExchangeManager]
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
[topVer=13, minorTopVer=0], force=false, evt=NODE_JOINED,
node=6215e66a-b021-410f-8e3a-aaa317f19609]


Show data with primaries balanced in visor


+-----------------------+-------------+-------+---------------------------------+-----------------------------------+-----------+-----------+-----------+----------------+
| Devices(@c2)          | PARTITIONED | 3     | 250000 (0 / 250000)            
| min: 80315 (0 / 80315)            | min: 0    | min: 0    | min: 0    |
min: 0         |
|                       |             |       |                                
| avg: 83333.33 (0.00 / 83333.33)   | avg: 0.00 | avg: 0.00 | avg: 0.00 |
avg: 166666.67 |
|                       |             |       |                                
| max: 86968 (0 / 86968)            | max: 0    | max: 0    | max: 0    |
max: 250000    |
+-----------------------+-------------+-------+---------------------------------+-----------------------------------+-----------+-----------+-----------+----------------+

Starting visor shows the keys have all been rebalanced again.




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: incorrect partition map exchange behaviour

Hello!

Does it happen to work on 2.9.1, or will fail too? I recommend checking it since I vaguely remember some discussions about late affinity assignments fix.

Regards,
--
Ilya Kasnacheev


сб, 9 янв. 2021 г. в 03:11, tschauenberg <[hidden email]>:
Here's my attempt to demonstrate and also provide logs

Standup 3 node cluster and load with data


Using a thick client, 250k devices are loaded into the device cache.  The
thick client then leaves.  There's one other thick client connected the
whole time for serving requests but I think that's irrelevant for the test
but want to point it out in case someone notices there's still a client
connected.

Show topology from logs of the client leaving:


[2021-01-08T23:08:05.012Z][INFO][disco-event-worker-#40][GridDiscoveryManager]
Node left topology: TcpDiscoveryNode
[id=611e30ee-b7c6-4ead-a746-f609b206cfb4,
consistentId=611e30ee-b7c6-4ead-a746-f609b206cfb4, addrs=ArrayList
[127.0.0.1, 172.17.0.3], sockAddrs=HashSet [/127.0.0.1:0, /172.17.0.3:0],
discPort=0, order=7, intOrder=6, lastExchangeTime=1610146373751, loc=false,
ver=2.8.1#20200521-sha1:86422096, isClient=true]
[2021-01-08T23:08:05.013Z][INFO][disco-event-worker-#40][GridDiscoveryManager]
Topology snapshot [ver=8, locNode=75e4ddea, servers=3, clients=1,
state=ACTIVE, CPUs=7, offheap=3.0GB, heap=3.1GB]

Start visor on one of the nodes


Show topology from logs


[2021-01-08T23:30:33.461Z][INFO][tcp-disco-msg-worker-[4ea8efe1
10.12.3.76:47500]-#2][TcpDiscoverySpi] New next node
[newNext=TcpDiscoveryNode [id=1cca94e3-f15f-4a8b-9f65-d9b9055a5fa7,
consistentId=10.12.2.110:47501, addrs=ArrayList [10.12.2.110],
sockAddrs=HashSet [/10.12.2.110:47501], discPort=47501, order=0, intOrder=7,
lastExchangeTime=1610148633458, loc=false, ver=2.8.1#20200521-sha1:86422096,
isClient=false]]
[2021-01-08T23:30:34.045Z][INFO][sys-#1011][GridDhtPartitionsExchangeFuture]
Completed partition exchange
[localNode=75e4ddea-1927-4e93-82e9-fdfbb7b58d1c,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=9, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode
[id=1cca94e3-f15f-4a8b-9f65-d9b9055a5fa7, consistentId=10.12.2.110:47501,
addrs=ArrayList [10.12.2.110], sockAddrs=HashSet [/10.12.2.110:47501],
discPort=47501, order=9, intOrder=7, lastExchangeTime=1610148633458,
loc=false, ver=2.8.1#20200521-sha1:86422096, isClient=false], done=true,
newCrdFut=null], topVer=AffinityTopologyVersion [topVer=9, minorTopVer=0]]

Show data balanced in visor


+-----------------------+-------------+-------+---------------------------------+-----------------------------------+-----------+-----------+-----------+----------------+
| Devices(@c2)          | PARTITIONED | 3     | 250000 (0 / 250000)           
| min: 80315 (0 / 80315)            | min: 0    | min: 0    | min: 0    |
min: 250000    |
|                       |             |       |                               
| avg: 83333.33 (0.00 / 83333.33)   | avg: 0.00 | avg: 0.00 | avg: 0.00 |
avg: 250000.00 |
|                       |             |       |                               
| max: 86968 (0 / 86968)            | max: 0    | max: 0    | max: 0    |
max: 250000    |
+-----------------------+-------------+-------+---------------------------------+-----------------------------------+-----------+-----------+-----------+----------------+

At this point the data is all relatively balanced and the topology increased
when visor connected.

Stop ignite on one node


Show topology and PME from logs (from a different ignite node as the ignite
process was stopped)


[2021-01-08T23:35:39.333Z][INFO][disco-event-worker-#40][GridDiscoveryManager]
Node left topology: TcpDiscoveryNode
[id=75e4ddea-1927-4e93-82e9-fdfbb7b58d1c,
consistentId=3a4a497f-5a89-4f2c-8531-b2b05f2ede22, addrs=ArrayList
[10.12.2.110], sockAddrs=HashSet [/10.12.2.110:47500], discPort=47500,
order=3, intOrder=3, lastExchangeTime=1610139164908, loc=false,
ver=2.8.1#20200521-sha1:86422096, isClient=false]
[2021-01-08T23:35:39.333Z][INFO][disco-event-worker-#40][GridDiscoveryManager]
Topology snapshot [ver=10, locNode=4ea8efe1, servers=2, clients=1,
state=ACTIVE, CPUs=5, offheap=2.0GB, heap=2.1GB]
[2021-01-08T23:35:39.333Z][INFO][disco-event-worker-#40][GridDiscoveryManager] 
^-- Baseline [id=0, size=3, online=2, offline=1]
[2021-01-08T23:35:39.335Z][INFO][exchange-worker-#41][time] Started exchange
init [topVer=AffinityTopologyVersion [topVer=10, minorTopVer=0], crd=true,
evt=NODE_LEFT, evtNode=75e4ddea-1927-4e93-82e9-fdfbb7b58d1c, customEvt=null,
allowMerge=false, exchangeFreeSwitch=true]
[2021-01-08T23:35:39.338Z][INFO][sys-#1031][GridAffinityAssignmentCache]
Local node affinity assignment distribution is not ideal [cache=Households,
expectedPrimary=512.00, actualPrimary=548, expectedBackups=1024.00,
actualBackups=476, warningThreshold=50.00%]
[2021-01-08T23:35:39.340Z][INFO][sys-#1032][GridAffinityAssignmentCache]
Local node affinity assignment distribution is not ideal [cache=Devices,
expectedPrimary=512.00, actualPrimary=548, expectedBackups=1024.00,
actualBackups=476, warningThreshold=50.00%]
[2021-01-08T23:35:39.354Z][INFO][exchange-worker-#41][GridDhtPartitionsExchangeFuture]
Finished waiting for partition release future
[topVer=AffinityTopologyVersion [topVer=10, minorTopVer=0], waitTime=0ms,
futInfo=NA, mode=DISTRIBUTED]
[2021-01-08T23:35:39.357Z][INFO][exchange-worker-#41][GridDhtPartitionsExchangeFuture]
Finished waiting for partitions release latch: ServerLatch [permits=0,
pendingAcks=HashSet [], super=CompletableLatch [id=CompletableLatchUid
[id=exchange, topVer=AffinityTopologyVersion [topVer=10, minorTopVer=0]]]]
[2021-01-08T23:35:39.365Z][INFO][exchange-worker-#41][GridDhtPartitionsExchangeFuture]
Finish exchange future [startVer=AffinityTopologyVersion [topVer=10,
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=10, minorTopVer=0],
err=null, rebalanced=true, wasRebalanced=true]
[2021-01-08T23:35:39.370Z][INFO][exchange-worker-#41][GridDhtPartitionsExchangeFuture]
Completed partition exchange
[localNode=4ea8efe1-0f92-4232-bbab-2cb232ed04f9,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=10, minorTopVer=0], evt=NODE_LEFT, evtNode=TcpDiscoveryNode
[id=75e4ddea-1927-4e93-82e9-fdfbb7b58d1c,
consistentId=3a4a497f-5a89-4f2c-8531-b2b05f2ede22, addrs=ArrayList
[10.12.2.110], sockAddrs=HashSet [/10.12.2.110:47500], discPort=47500,
order=3, intOrder=3, lastExchangeTime=1610139164908, loc=false,
ver=2.8.1#20200521-sha1:86422096, isClient=false], done=true,
newCrdFut=null], topVer=AffinityTopologyVersion [topVer=10, minorTopVer=0]]
[2021-01-08T23:35:39.370Z][INFO][exchange-worker-#41][GridDhtPartitionsExchangeFuture]
Exchange timings [startVer=AffinityTopologyVersion [topVer=10,
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=10, minorTopVer=0],
stage="Waiting in exchange queue" (0 ms), stage="Exchange parameters
initialization" (0 ms), stage="Determine exchange type" (15 ms),
stage="Preloading notification" (0 ms), stage="WAL history reservation" (2
ms), stage="Wait partitions release" (0 ms), stage="Wait partitions release
latch" (2 ms), stage="After states restored callback" (5 ms),
stage="Finalize update counters" (2 ms), stage="Detect lost partitions" (1
ms), stage="Exchange done" (3 ms), stage="Total time" (30 ms)]
[2021-01-08T23:35:39.370Z][INFO][exchange-worker-#41][GridDhtPartitionsExchangeFuture]
Exchange longest local stages [startVer=AffinityTopologyVersion [topVer=10,
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=10, minorTopVer=0],
stage="Affinity initialization (exchange-free switch on fully-rebalanced
topology) [grp=Households]" (15 ms) (parent=Determine exchange type),
stage="Affinity initialization (exchange-free switch on fully-rebalanced
topology) [grp=CorpHeadend]" (14 ms) (parent=Determine exchange type),
stage="Affinity initialization (exchange-free switch on fully-rebalanced
topology) [grp=AttributeOptions]" (13 ms) (parent=Determine exchange type)]
[2021-01-08T23:35:39.370Z][INFO][exchange-worker-#41][time] Finished
exchange init [topVer=AffinityTopologyVersion [topVer=10, minorTopVer=0],
crd=true]
[2021-01-08T23:35:39.380Z][INFO][exchange-worker-#41][GridCachePartitionExchangeManager]
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
[topVer=10, minorTopVer=0], force=false, evt=NODE_LEFT,
node=75e4ddea-1927-4e93-82e9-fdfbb7b58d1c]

Show data in visor with data on the two nodes


+-----------------------+-------------+-------+---------------------------------+-----------------------------------+-----------+-----------+-----------+----------------+
| Devices(@c2)          | PARTITIONED | 2     | 250000 (0 / 250000)           
| min: 116261 (0 / 116261)          | min: 0    | min: 0    | min: 0    |
min: 250000    |
|                       |             |       |                               
| avg: 125000.00 (0.00 / 125000.00) | avg: 0.00 | avg: 0.00 | avg: 0.00 |
avg: 250000.00 |
|                       |             |       |                               
| max: 133739 (0 / 133739)          | max: 0    | max: 0    | max: 0    |
max: 250000    |
+-----------------------+-------------+-------+---------------------------------+-----------------------------------+-----------+-----------+-----------+----------------+

At this point there's only 2 nodes with data for this cache and it got
nicely balanced when the ignite process was stopped.

Start the stopped ignite


Show topology and PME from logs (on node where ignite started)


[2021-01-08T23:39:42.817Z][INFO][exchange-worker-#42][GridDhtPartitionsExchangeFuture]
Skipped waiting for partitions release future (local node is joining)
[topVer=AffinityTopologyVersion [topVer=11, minorTopVer=0]]
[2021-01-08T23:39:43.006Z][INFO][db-checkpoint-thread-#54][GridCacheDatabaseSharedManager]
Checkpoint started [checkpointId=07c7fa75-d1dd-4955-964e-7d61bddbf05b,
startPtr=FileWALPointer [idx=7, fileOff=39245141, len=118735],
checkpointBeforeLockTime=91ms, checkpointLockWait=0ms,
checkpointListenersExecuteTime=29ms, checkpointLockHoldTime=51ms,
walCpRecordFsyncDuration=8ms, writeCheckpointEntryDuration=5ms,
splitAndSortCpPagesDuration=14ms,  pages=2079, reason='node started']
[2021-01-08T23:39:43.163Z][INFO][grid-nio-worker-tcp-comm-0-#24][TcpCommunicationSpi]
Established outgoing communication connection [locAddr=/10.12.2.110:34662,
rmtAddr=/10.12.3.76:47100]
[2021-01-08T23:39:43.167Z][INFO][exchange-worker-#42][time] Finished
exchange init [topVer=AffinityTopologyVersion [topVer=11, minorTopVer=0],
crd=false]
[2021-01-08T23:39:43.368Z][INFO][sys-#48][GridDhtPartitionsExchangeFuture]
Received full message, will finish exchange
[node=4ea8efe1-0f92-4232-bbab-2cb232ed04f9, resVer=AffinityTopologyVersion
[topVer=11, minorTopVer=0]]
[2021-01-08T23:39:43.494Z][INFO][sys-#48][GridDhtPartitionsExchangeFuture]
Finish exchange future [startVer=AffinityTopologyVersion [topVer=11,
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=11, minorTopVer=0],
err=null, rebalanced=false, wasRebalanced=false]
[2021-01-08T23:39:43.554Z][INFO][sys-#48][GridDhtPartitionsExchangeFuture]
Completed partition exchange
[localNode=b909d450-11cd-471f-b49e-85c6f300f704,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=11, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode
[id=b909d450-11cd-471f-b49e-85c6f300f704,
consistentId=3a4a497f-5a89-4f2c-8531-b2b05f2ede22, addrs=ArrayList
[10.12.2.110], sockAddrs=HashSet [/10.12.2.110:47500], discPort=47500,
order=11, intOrder=8, lastExchangeTime=1610149183525, loc=true,
ver=2.8.1#20200521-sha1:86422096, isClient=false], done=true,
newCrdFut=null], topVer=AffinityTopologyVersion [topVer=11, minorTopVer=0]]
[2021-01-08T23:39:43.556Z][INFO][sys-#48][GridDhtPartitionsExchangeFuture]
Exchange timings [startVer=AffinityTopologyVersion [topVer=11,
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=11, minorTopVer=0],
stage="Waiting in exchange queue" (53 ms), stage="Exchange parameters
initialization" (3 ms), stage="Components activation" (22 ms),
stage="Determine exchange type" (50 ms), stage="Preloading notification" (0
ms), stage="WAL history reservation" (5 ms), stage="Restore partition
states" (11 ms), stage="After states restored callback" (147 ms),
stage="Waiting for Full message" (392 ms), stage="Affinity recalculation"
(89 ms), stage="Full map updating" (36 ms), stage="Detect lost partitions"
(4 ms), stage="Exchange done" (57 ms), stage="Total time" (869 ms)]
[2021-01-08T23:39:43.558Z][INFO][sys-#48][GridDhtPartitionsExchangeFuture]
Exchange longest local stages [startVer=AffinityTopologyVersion [topVer=11,
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=11, minorTopVer=0],
stage="Affinity fetch" (48 ms) (parent=Determine exchange type),
stage="Restore partition states [grp=ignite-sys-cache]" (11 ms)
(parent=Restore partition states), stage="Restore partition states
[grp=MasterVersion]" (11 ms) (parent=Restore partition states),
stage="Restore partition states [grp=Devices]" (11 ms) (parent=Restore
partition states), stage="Affinity initialization (local join)
[grp=Devices]" (89 ms) (parent=Affinity recalculation), stage="Affinity
initialization (local join) [grp=Households]" (88 ms) (parent=Affinity
recalculation), stage="Affinity initialization (local join)
[grp=MasterVersion]" (67 ms) (parent=Affinity recalculation)]
[2021-01-08T23:39:43.588Z][INFO][exchange-worker-#42][GridCachePartitionExchangeManager]
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
[topVer=11, minorTopVer=0], force=false, evt=NODE_JOINED,
node=b909d450-11cd-471f-b49e-85c6f300f704]
[2021-01-08T23:39:43.642Z][INFO][main][GridDiscoveryManager] Topology
snapshot [ver=11, locNode=b909d450, servers=3, clients=1, state=ACTIVE,
CPUs=7, offheap=3.0GB, heap=3.1GB]
[2021-01-08T23:39:43.643Z][INFO][main][GridDiscoveryManager]   ^-- Baseline
[id=0, size=3, online=3, offline=0]
[2021-01-08T23:39:43.645Z][INFO][main][G] Node started : [stage="Configure
system pool" (65 ms),stage="Start managers" (1113 ms),stage="Configure
binary metadata" (102 ms),stage="Start processors" (744 ms),stage="Init
metastore" (899 ms),stage="Init and start regions" (10 ms),stage="Restore
binary memory" (989 ms),stage="Restore logical state" (2450
ms),stage="Finish recovery" (112 ms),stage="Join topology" (1498
ms),stage="Await transition" (19 ms),stage="Await exchange" (951
ms),stage="Total time" (8952 ms)]
[2021-01-08T23:39:44.550Z][INFO][db-checkpoint-thread-#54][GridCacheDatabaseSharedManager]
Checkpoint finished [cpId=07c7fa75-d1dd-4955-964e-7d61bddbf05b, pages=2079,
markPos=FileWALPointer [idx=7, fileOff=39245141, len=118735],
walSegmentsCleared=0, walSegmentsCovered=[], markDuration=83ms,
pagesWrite=184ms, fsync=1359ms, total=1717ms]


Show data with a min of zero primaries in visor


+-----------------------+-------------+-------+---------------------------------+-----------------------------------+-----------+-----------+-----------+----------------+
| Devices(@c2)          | PARTITIONED | 3     | 250000 (0 / 250000)           
| min: 0 (0 / 0)                    | min: 0    | min: 0    | min: 0    |
min: 0         |
|                       |             |       |                               
| avg: 83333.33 (0.00 / 83333.33)   | avg: 0.00 | avg: 0.00 | avg: 0.00 |
avg: 166666.67 |
|                       |             |       |                               
| max: 133739 (0 / 133739)          | max: 0    | max: 0    | max: 0    |
max: 250000    |
+-----------------------+-------------+-------+---------------------------------+-----------------------------------+-----------+-----------+-----------+----------------+

At this point the stopped ignite process was restarted but notice the
primaries were not balanced to it upon joining.

Stop visor


Show topology and PME from logs


[2021-01-08T23:44:07.824Z][INFO][grid-nio-worker-tcp-comm-1-#25][TcpCommunicationSpi]
Accepted incoming communication connection [locAddr=/10.12.2.110:47100,
rmtAddr=/10.12.2.110:37510]
[2021-01-08T23:44:35.597Z][INFO][tcp-disco-srvr-[:47500]-#3][TcpDiscoverySpi]
TCP discovery accepted incoming connection [rmtAddr=/10.12.4.242,
rmtPort=42615]
[2021-01-08T23:44:35.598Z][INFO][tcp-disco-srvr-[:47500]-#3][TcpDiscoverySpi]
TCP discovery spawning a new thread for connection [rmtAddr=/10.12.4.242,
rmtPort=42615]
[2021-01-08T23:44:35.598Z][INFO][tcp-disco-sock-reader-[]-#7][TcpDiscoverySpi]
Started serving remote node connection [rmtAddr=/10.12.4.242:42615,
rmtPort=42615]
[2021-01-08T23:44:35.599Z][INFO][tcp-disco-sock-reader-[63d7cf0d
10.12.4.242:42615]-#7][TcpDiscoverySpi] Initialized connection with remote
server node [nodeId=63d7cf0d-f8b6-4fe2-ba99-4bbd26731451,
rmtAddr=/10.12.4.242:42615]
[2021-01-08T23:44:35.621Z][INFO][exchange-worker-#42][time] Started exchange
init [topVer=AffinityTopologyVersion [topVer=12, minorTopVer=0], crd=false,
evt=NODE_LEFT, evtNode=1cca94e3-f15f-4a8b-9f65-d9b9055a5fa7, customEvt=null,
allowMerge=true, exchangeFreeSwitch=false]
[2021-01-08T23:44:35.632Z][INFO][exchange-worker-#42][GridDhtPartitionsExchangeFuture]
Finished waiting for partition release future
[topVer=AffinityTopologyVersion [topVer=12, minorTopVer=0], waitTime=0ms,
futInfo=NA, mode=DISTRIBUTED]
[2021-01-08T23:44:35.644Z][INFO][exchange-worker-#42][GridDhtPartitionsExchangeFuture]
Finished waiting for partitions release latch: ClientLatch
[coordinator=TcpDiscoveryNode [id=4ea8efe1-0f92-4232-bbab-2cb232ed04f9,
consistentId=3b21ae44-c9cc-4e04-9e79-9807d22a6d50, addrs=ArrayList
[10.12.3.76], sockAddrs=HashSet [/10.12.3.76:47500], discPort=47500,
order=1, intOrder=1, lastExchangeTime=1610149182437, loc=false,
ver=2.8.1#20200521-sha1:86422096, isClient=false], ackSent=true,
super=CompletableLatch [id=CompletableLatchUid [id=exchange,
topVer=AffinityTopologyVersion [topVer=12, minorTopVer=0]]]]
[2021-01-08T23:44:35.644Z][INFO][exchange-worker-#42][GridDhtPartitionsExchangeFuture]
Finished waiting for partition release future
[topVer=AffinityTopologyVersion [topVer=12, minorTopVer=0], waitTime=0ms,
futInfo=NA, mode=LOCAL]
[2021-01-08T23:44:35.688Z][INFO][exchange-worker-#42][time] Finished
exchange init [topVer=AffinityTopologyVersion [topVer=12, minorTopVer=0],
crd=false]
[2021-01-08T23:44:35.704Z][INFO][tcp-disco-sock-reader-[1cca94e3
10.12.2.110:58161]-#5][TcpDiscoverySpi] Finished serving remote node
connection [rmtAddr=/10.12.2.110:58161, rmtPort=58161
[2021-01-08T23:44:35.761Z][INFO][sys-#80][GridDhtPartitionsExchangeFuture]
Received full message, will finish exchange
[node=4ea8efe1-0f92-4232-bbab-2cb232ed04f9, resVer=AffinityTopologyVersion
[topVer=12, minorTopVer=0]]
[2021-01-08T23:44:35.802Z][INFO][sys-#80][GridDhtPartitionsExchangeFuture]
Finish exchange future [startVer=AffinityTopologyVersion [topVer=12,
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=12, minorTopVer=0],
err=null, rebalanced=true, wasRebalanced=false]
[2021-01-08T23:44:35.820Z][INFO][sys-#80][GridDhtPartitionsExchangeFuture]
Completed partition exchange
[localNode=b909d450-11cd-471f-b49e-85c6f300f704,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=12, minorTopVer=0], evt=NODE_LEFT, evtNode=TcpDiscoveryNode
[id=1cca94e3-f15f-4a8b-9f65-d9b9055a5fa7, consistentId=10.12.2.110:47501,
addrs=ArrayList [10.12.2.110], sockAddrs=HashSet [/10.12.2.110:47501],
discPort=47501, order=9, intOrder=7, lastExchangeTime=1610149182447,
loc=false, ver=2.8.1#20200521-sha1:86422096, isClient=false], done=true,
newCrdFut=null], topVer=AffinityTopologyVersion [topVer=12, minorTopVer=0]]
[2021-01-08T23:44:35.820Z][INFO][sys-#80][GridDhtPartitionsExchangeFuture]
Exchange timings [startVer=AffinityTopologyVersion [topVer=12,
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=12, minorTopVer=0],
stage="Waiting in exchange queue" (0 ms), stage="Exchange parameters
initialization" (0 ms), stage="Determine exchange type" (0 ms),
stage="Preloading notification" (0 ms), stage="WAL history reservation" (2
ms), stage="Wait partitions release" (13 ms), stage="Wait partitions release
latch" (7 ms), stage="Wait partitions release" (0 ms), stage="After states
restored callback" (0 ms), stage="Finalize update counters" (19 ms),
stage="Waiting for Full message" (96 ms), stage="Affinity recalculation" (31
ms), stage="Full map updating" (9 ms), stage="Detect lost partitions" (2
ms), stage="Exchange done" (15 ms), stage="Total time" (194 ms)]
[2021-01-08T23:44:35.820Z][INFO][sys-#80][GridDhtPartitionsExchangeFuture]
Exchange longest local stages [startVer=AffinityTopologyVersion [topVer=12,
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=12, minorTopVer=0],
stage="Affinity applying from full message [grp=AttributeOptions]" (9 ms)
(parent=Affinity recalculation), stage="Affinity applying from full message
[grp=CorpHeadend]" (8 ms) (parent=Affinity recalculation), stage="Affinity
applying from full message [grp=Devices]" (6 ms) (parent=Affinity
recalculation)]
[2021-01-08T23:44:35.824Z][INFO][exchange-worker-#42][GridCachePartitionExchangeManager]
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
[topVer=12, minorTopVer=0], force=false, evt=NODE_LEFT,
node=1cca94e3-f15f-4a8b-9f65-d9b9055a5fa7]

With visor leaving the cluster now, this triggers the PME for the primary
keys.  Will be starting visor again to show that.

Start visor again


Show topology and PME from logs


[2021-01-08T23:46:52.977Z][INFO][tcp-disco-srvr-[:47500]-#3][TcpDiscoverySpi]
TCP discovery accepted incoming connection [rmtAddr=/10.12.2.110,
rmtPort=42455]
[2021-01-08T23:46:52.977Z][INFO][tcp-disco-srvr-[:47500]-#3][TcpDiscoverySpi]
TCP discovery spawning a new thread for connection [rmtAddr=/10.12.2.110,
rmtPort=42455]
[2021-01-08T23:46:52.982Z][INFO][tcp-disco-sock-reader-[]-#8][TcpDiscoverySpi]
Started serving remote node connection [rmtAddr=/10.12.2.110:42455,
rmtPort=42455]
[2021-01-08T23:46:52.984Z][INFO][tcp-disco-sock-reader-[6215e66a
10.12.2.110:42455]-#8][TcpDiscoverySpi] Initialized connection with remote
server node [nodeId=6215e66a-b021-410f-8e3a-aaa317f19609,
rmtAddr=/10.12.2.110:42455]
rmtPort=424553:46:53.041Z][INFO][tcp-disco-sock-reader-[6215e66a
10.12.2.110:42455]-#8][TcpDiscoverySpi] Finished serving remote node
connection [rmtAddr=/10.12.2.110:42455,
[2021-01-08T23:46:53.049Z][INFO][tcp-disco-msg-worker-[4ea8efe1
10.12.3.76:47500]-#2][TcpDiscoverySpi] New next node
[newNext=TcpDiscoveryNode [id=6215e66a-b021-410f-8e3a-aaa317f19609,
consistentId=10.12.2.110:47501, addrs=ArrayList [10.12.2.110],
sockAddrs=HashSet [/10.12.2.110:47501], discPort=47501, order=0, intOrder=9,
lastExchangeTime=1610149613043, loc=false, ver=2.8.1#20200521-sha1:86422096,
isClient=false]]
[2021-01-08T23:46:53.405Z][INFO][exchange-worker-#42][time] Started exchange
init [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=0], crd=false,
evt=NODE_JOINED, evtNode=6215e66a-b021-410f-8e3a-aaa317f19609,
customEvt=null, allowMerge=true, exchangeFreeSwitch=false]
[2021-01-08T23:46:53.410Z][INFO][exchange-worker-#42][GridDhtPartitionsExchangeFuture]
Finished waiting for partition release future
[topVer=AffinityTopologyVersion [topVer=13, minorTopVer=0], waitTime=0ms,
futInfo=NA, mode=DISTRIBUTED]
e [id=4ea8efe1-0f92-4232-bbab-2cb232ed04f9,
consistentId=3b21ae44-c9cc-4e04-9e79-9807d22a6d50, addrs=ArrayList
[10.12.3.76], sockAddrs=HashSet [/10.12.3.76:47500], discPort=47500,
order=1, intOrder=1, lastExchangeTime=1610149182437, loc=false,
ver=2.8.1#20200521-sha1:86422096, isClient=false], ackSent=true,
super=CompletableLatch [id=CompletableLatchUid [id=exchange,
topVer=AffinityTopologyVersion [topVer=13, minorTopVer=0]]]]
[2021-01-08T23:46:53.412Z][INFO][exchange-worker-#42][GridDhtPartitionsExchangeFuture]
Finished waiting for partition release future
[topVer=AffinityTopologyVersion [topVer=13, minorTopVer=0], waitTime=0ms,
futInfo=NA, mode=LOCAL]
[2021-01-08T23:46:53.437Z][INFO][exchange-worker-#42][time] Finished
exchange init [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=0],
crd=false]
nityTopologyVersion [topVer=13, minorTopVer=0]]htPartitionsExchangeFuture]
Received full message, will finish exchange
[node=4ea8efe1-0f92-4232-bbab-2cb232ed04f9,
resVer=Affi[2021-01-08T23:46:53.587Z][INFO][sys-#98][GridDhtPartitionsExchangeFuture]
Finish exchange future [startVer=AffinityTopologyVersion [topVer=13,
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=13, minorTopVer=0],
err=null, rebalanced=true, wasRebalanced=true]
[2021-01-08T23:46:53.598Z][INFO][sys-#98][GridDhtPartitionsExchangeFuture]
Completed partition exchange
[localNode=b909d450-11cd-471f-b49e-85c6f300f704,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=13, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode
[id=6215e66a-b021-410f-8e3a-aaa317f19609, consistentId=10.12.2.110:47501,
addrs=ArrayList [10.12.2.110], sockAddrs=HashSet [/10.12.2.110:47501],
discPort=47501, order=13, intOrder=9, lastExchangeTime=1610149613043,
loc=false, ver=2.8.1#20200521-sha1:86422096, isClient=false], done=true,
newCrdFut=null], topVer=AffinityTopologyVersion [topVer=13, minorTopVer=0]]
[2021-01-08T23:46:53.602Z][INFO][sys-#98][GridDhtPartitionsExchangeFuture]
Exchange timings [startVer=AffinityTopologyVersion [topVer=13,
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=13, minorTopVer=0],
stage="Waiting in exchange queue" (0 ms), stage="Exchange parameters
initialization" (0 ms), stage="Determine exchange type" (1 ms),
stage="Preloading notification" (0 ms), stage="WAL history reservation" (2
ms), stage="Wait partitions release" (0 ms), stage="Wait partitions release
latch" (1 ms), stage="Wait partitions release" (0 ms), stage="After states
restored callback" (0 ms), stage="Waiting for Full message" (102 ms),
stage="Affinity recalculation" (60 ms), stage="Full map updating" (11 ms),
stage="Detect lost partitions" (2 ms), stage="Exchange done" (9 ms),
stage="Total time" (188 ms)]
AffinityTopologyVersion [topVer=13, minorTopVer=0], stage="Affinity
initialization (node join) [grp=CorpHeadend, crd=false]" (33 ms)
(parent=Affinity recalculation), stage="Affinity initialization (node join)
[grp=Devices, crd=false]" (27 ms) (parent=Affinity recalculation),
stage="Affinity initialization (node join) [grp=MasterVersion, crd=false]"
(22 ms) (parent=Affinity recalculation)]
[2021-01-08T23:46:53.614Z][INFO][exchange-worker-#42][GridCachePartitionExchangeManager]
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
[topVer=13, minorTopVer=0], force=false, evt=NODE_JOINED,
node=6215e66a-b021-410f-8e3a-aaa317f19609]


Show data with primaries balanced in visor


+-----------------------+-------------+-------+---------------------------------+-----------------------------------+-----------+-----------+-----------+----------------+
| Devices(@c2)          | PARTITIONED | 3     | 250000 (0 / 250000)           
| min: 80315 (0 / 80315)            | min: 0    | min: 0    | min: 0    |
min: 0         |
|                       |             |       |                               
| avg: 83333.33 (0.00 / 83333.33)   | avg: 0.00 | avg: 0.00 | avg: 0.00 |
avg: 166666.67 |
|                       |             |       |                               
| max: 86968 (0 / 86968)            | max: 0    | max: 0    | max: 0    |
max: 250000    |
+-----------------------+-------------+-------+---------------------------------+-----------------------------------+-----------+-----------+-----------+----------------+

Starting visor shows the keys have all been rebalanced again.




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Alexandr Shapkin Alexandr Shapkin
Reply | Threaded
Open this post in threaded view
|

RE: incorrect partition map exchange behaviour

In reply to this post by tschauenberg

Hi,

 

As you correctly pointed to the PME implementation details webpage, this is a process of exchanging information about partition holders. And it’s happening on every topology change, cluster deactivation, etc. The process itself is not about data rebalancing, it’s about what node should store a particular partition.

 

If you want to check whether the data rebalance happened you need to find something like

 

[2020-01-15 15:46:57,042][INFO ][sys-#50][GridDhtPartitionDemander] Starting rebalance routine [ignite-sys-cache, topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], supplier=9e88a103-4465-4e5b-865f-4edaa909fee1, fullPartitions=[0-99], histPartitions=[]]

 

It also depends on whether your cluster is under load during the rolling upgrade, if there are no updates happening then no data rebalance should happen as well.

 

I’m not pretty sure about the metric and visor. Anyway you can perform the checks explicitly from code:

 

ignite.cache("myCache").localSize(CachePeekMode.BACKUP);

ignite.cache("myCache").localSize(CachePeekMode.PRIMARY);

 

 

 

From: [hidden email]
Sent: Friday, January 8, 2021 3:59 AM
To: [hidden email]
Subject: incorrect partition map exchange behaviour

 

Hi,

 

We have a cluster of Ignite 2.8.1 server nodes and have recently started

looking at the individual cache metrics for primary keys

org.apache.ignite.internal.processors.cache.CacheLocalMetricsMXBeanImpl.OffHeapPrimaryEntriesCount

 

In our configuration we have a replicated cache with 2 backups.  Our cluster

has 3 nodes in it so the primaries should be spread equally on the 3 nodes

and each node has backups from the other two nodes.  All these server nodes

are in the baseline.  Additionally we have some thick clients connected but

I don't think they are relevant to the discussion.

 

Whenever we do a rolling restart one node at a time, at the end after the

last node is restarted it always owns zero primaries and owns solely

backups.  The two nodes restarted earlier during the rolling restart own all

the primaries.

 

When our cluster is in this scenario, if we start and stop visor, when visor

leaves the cluster it triggers a PME where all keys get balanced on all

server nodes.  Looking at the visor cache stats between the start and stop

we can see a min of 0 keys on the nodes for our cache so visor and the jmx

metrics line up on that front.  After stopping visor, the jmx metrics show

the evenly distributed primaries and then starting visor a second time we

can confirm that again the min, average, max node keys are all evenly

distributed.

 

Every join and leave during the rolling restart and during visor start/stop

shows reflects a topology increment and node leave and join events in the

logs. 

 

According to

https://cwiki.apache.org/confluence/display/IGNITE/%2528Partition+Map%2529+Exchange+-+under+the+hood

each leave and join should trigger the PME but we only see the keys changing

on the leaves.

 

Additionally, we tried waiting longer between the stop and start part of the

rolling restart to see if that had any effect.  We ensured we waited long

enough for a PME to do any moving but waiting longer didn't have any effect.

The stop always has the PME move the keys off that node and the start never

sees the PME move any primaries back.

 

Why are we only seeing the PME change keys when nodes (server or visor) stop

and never when they join?

 

 

 

--

Sent from: http://apache-ignite-users.70518.x6.nabble.com/

 

Alex Shapkin
tschauenberg tschauenberg
Reply | Threaded
Open this post in threaded view
|

Re: incorrect partition map exchange behaviour

In reply to this post by ilya.kasnacheev
Haven't tested on 2.9.1 as we don't have that database provisioned and sadly
won't for awhile.  When we do though I will update.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
tschauenberg tschauenberg
Reply | Threaded
Open this post in threaded view
|

RE: incorrect partition map exchange behaviour

In reply to this post by Alexandr Shapkin
Sorry about mixing the terminology.  My post was meant to be about the PME
and the primary keys.

So the summary of my post and what it was trying to show was the PME was
only happening on cluster node leaves (server or visor) but not cluster node
joins (at least with previously joined nodes - haven't tested with joining a
brand new node for the first time such as expanding the cluster from 3 nodes
to 4 nodes).

The PME doc suggests the PME should happen on the joins but the logs and
visor/stats are showing that's not happening and it's only happening on the
leaves.

So what I am trying to identify is:
* is this a known bug and if so, which versions is this fixed in?
* what is the impact of the database state where one node has no designated
primaries?  
** This probably effectively reduces the get/put performance to n-1 nodes?
** Also, for things like compute tasks that operate on local data such as
those using SqlFieldsQuery.setLocal(true) the node with no primaries will do
nothing?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: incorrect partition map exchange behaviour

Hello!

I have just looked at this issue today, and the relevant fix seems to be https://issues.apache.org/jira/browse/IGNITE-11147

Regards,
--
Ilya Kasnacheev


ср, 13 янв. 2021 г. в 20:26, tschauenberg <[hidden email]>:
Sorry about mixing the terminology.  My post was meant to be about the PME
and the primary keys.

So the summary of my post and what it was trying to show was the PME was
only happening on cluster node leaves (server or visor) but not cluster node
joins (at least with previously joined nodes - haven't tested with joining a
brand new node for the first time such as expanding the cluster from 3 nodes
to 4 nodes).

The PME doc suggests the PME should happen on the joins but the logs and
visor/stats are showing that's not happening and it's only happening on the
leaves.

So what I am trying to identify is:
* is this a known bug and if so, which versions is this fixed in?
* what is the impact of the database state where one node has no designated
primaries? 
** This probably effectively reduces the get/put performance to n-1 nodes?
** Also, for things like compute tasks that operate on local data such as
those using SqlFieldsQuery.setLocal(true) the node with no primaries will do
nothing?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/