Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: AMQ 7.7.0.CR1
Affects Version/s: None
Component/s: None
Labels:
- NO-BACKPORT-NEEDED
- long-term-support

QE Test Coverage:
-
Release Note Text:

Hide
Previously, if a broker in a live-backup group configured to use the replication high-availability (HA) policy experienced a very high throughput of AMQP messages (that is, a rate of 10,000 or more messages per second) message consumption might stall. In addition, the broker and clients might display errors such as "Could not complete operations" (broker) and "Remote did not respond to a drain request in time" (client). This issue is now resolved.

Show
Previously, if a broker in a live-backup group configured to use the replication high-availability (HA) policy experienced a very high throughput of AMQP messages (that is, a rate of 10,000 or more messages per second) message consumption might stall. In addition, the broker and clients might display errors such as "Could not complete operations" (broker) and "Remote did not respond to a drain request in time" (client). This issue is now resolved.
Release Note Status:
Documented as Resolved Issue
Target Release:

AMQ 7.7.0.GA
Upstream Jira:
ARTEMIS-2659, ARTEMIS-2673
Verified:
Verified in a release

SFDC Cases Counter:
SFDC Cases Links:

Description

This problem occurs when AMQP clients publish/subscribe messages from the broker at rates in excess of 10,000 messages per second. It has been seen with QPid JMS and the .NET AMQP client, but never seen with Artemis Core JMS or OpenWire clients. We therefore think that the problem is related to the broker's AMQP support.

A variety of different failure modes can be reproduced, using a test case that publishes from twenty concurrent threads, and consumes from twenty concurrent threads, with consumers arranged into shared consumers in groups of five. It is not clear at this stage whether the subscription mode is significant or not. In addition, although the problem has so far only been seen in a master-slave, shared-nothing replication set-up, it isn't clear whether the topology is relevant or not.

All the failure modes have in common that message consumption stops, even though new messages are being published. There may, or may not, be exceptions in the broker or the client.

In some cases we see this message in the broker:

2020-03-10 00:15:33,115 WARN  [org.apache.activemq.artemis.core.server] AMQ222024: Could not complete operations on IO context OperationContextImpl [742496659] [minimalStore=0, storeLineUp=0, stored=0, minimalRe
plicated=1, replicationLineUp=1, replicated=0, paged=0, minimalPage=0, pageLineUp=0, errorCode=-1, errorMessage=null, executorsPending=0, executor=OrderedExecutor(tasks=[])]Task = TaskHolder [storeLined=0, repli
cationLined=1, pageLined=0, task=IOCallback(PageSubscriptionImpl) ]
Task = TaskHolder [storeLined=0, replicationLined=1, pageLined=0, task=IOCallback(PageSubscriptionImpl) ]
Task = TaskHolder [storeLined=0, replicationLined=1, pageLined=0, task=IOCallback(PageSubscriptionImpl) ]
Task = TaskHolder [storeLined=0, replicationLined=1, pageLined=0, task=IOCallback(PageSubscriptionImpl) ]
Task = TaskHolder [storeLined=0, replicationLined=1, pageLined=0, task=IOCallback(PageSubscriptionImpl) ]
Task = TaskHolder [storeLined=0, replicationLined=1, pageLined=0, task=org.apache.activemq.artemis.core.journal.impl.SimpleWaitIOCallback]

In some cases we see this exception in the client:

Caused by: org.apache.qpid.jms.provider.exceptions.ProviderOperationTimedOutException: Remote did not respond to a drain request in time
        at org.apache.qpid.jms.provider.amqp.AmqpConsumer.lambda$stop$1(AmqpConsumer.java:179) ~[qpid-jms-client-0.48.0.jar!/:na]
        at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38) ~[netty-common-4.1.39.Final.jar!/:4.1.39.Final]
        at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:127) ~[netty-common-4.1.39.Final.jar!/:4.1.39.Final]
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) ~[netty-common-4.1.39.Final.jar!/:4.1.39.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:416) ~[netty-common-4.1.39.Final.jar!/:4.1.39.Final]
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:331) ~[netty-transport-native-epoll-4.1.39.Final-linux-x86_64.jar!/:4.1.39.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918) ~[netty-common-4.1.39.Final.jar!/:4.1.39.Final]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.39.Final.jar!/:4.1.39.Final]

Attachments

Issue Links

is cloned by

ENTMQBR-3437 AMQP consumption stalls under during high message throughput

Closed

is related to

ENTMQBR-3432 OpenWire consumption stalls under during high message throughput

Closed

Activity

People

Assignee:: Clebert Suconic

Reporter:: Kevin Boone

Tester:: Tiago Bueno

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 2020/03/10 10:29 AM

Updated:: 2023/09/07 11:48 PM

Resolved:: 2020/06/24 4:42 PM