Loading...

Type: Bug
Resolution: Obsolete
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Steps to Reproduce:
Hide

To reproduce:
0)Modified karaf script and added there

-Dzookeeper.timeout=3000

1)Start fuse
2)run command:

fabric:create --wait-for-provisioning

3)Unpack test attached reconnect-test.zip and run with mvn clean instal
Show
To reproduce: 0)Modified karaf script and added there -Dzookeeper.timeout=3000 1)Start fuse 2)run command: fabric:create --wait-for-provisioning 3)Unpack test attached reconnect-test.zip and run with mvn clean instal

SFDC Cases Counter:
SFDC Cases Links:

The time for reconnection of the client is more than 1 minute:

    [m!!!Stopping container0
    !!!!Killing container broker-nossl-c0
    [0;32m16-12-2014 12:35:41,671 | INFO | [SSHClient] | executeCommand - Command: shell:exec kill -9 6528
    [mDelay = 27ms
    !!!Received 1 messages, number = 0
    [0;32m16-12-2014 12:35:41,755 | INFO | [SSHClient] | executeCommand - Response:
    [m[0;32m16-12-2014 12:35:41,755 | INFO | [Container] | killContainerAtLocalHost - Result of killing master process :
    [m[0;32m16-12-2014 12:35:41,758 | INFO | [SSHClient] | executeCommand - Command: shell:exec kill -9 6528
    [m[0;33m16-12-2014 12:35:41,773 | WARN | [FailoverTransport] | handleTransportFailure - Transport (tcp://dhcp-10-40-3-26.brq.redhat.com/10.40.3.26:54789@41760) failed, attempting to automatically reconnect
    java.io.EOFException
            at java.io.DataInputStream.readInt(DataInputStream.java:392)
            at org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:258)
            at org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:221)
            at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:213)
            at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:196)
            at java.lang.Thread.run(Thread.java:744)
    [m[0;33m16-12-2014 12:35:41,777 | WARN | [FailoverTransport] | handleTransportFailure - Transport (tcp://dhcp-10-40-3-26.brq.redhat.com/10.40.3.26:54789@41762) failed, attempting to automatically reconnect
    java.io.EOFException
            at java.io.DataInputStream.readInt(DataInputStream.java:392)
            at org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:258)
            at org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:221)
            at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:213)
            at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:196)
            at java.lang.Thread.run(Thread.java:744)
    [m[0;32m16-12-2014 12:35:41,847 | INFO | [SSHClient] | executeCommand - Response:
    [m[0;32m16-12-2014 12:35:41,848 | INFO | [Container] | killContainerAtLocalHost - Result of killing master process :
    [m!!!!Waiting for empty process broker-nossl-c0
     
    ...logs.... Containers restarted.....
     
    [m[0;33m16-12-2014 12:35:57,141 | WARN | [FailoverTransport] | doReconnect - Failed to connect to [tcp://dhcp-10-40-3-26.brq.redhat.com:54789] after: 10 attempt(s) continuing to retry.
    [m[0;33m16-12-2014 12:35:57,142 | WARN | [FailoverTransport] | doReconnect - Failed to connect to [tcp://dhcp-10-40-3-26.brq.redhat.com:54789] after: 10 attempt(s) continuing to retry.
     
 ...logs.... Containers restarted.....
     
    [m[0;32m16-12-2014 12:39:33,918 | INFO | [FailoverTransport] | doReconnect - Successfully reconnected to tcp://dhcp-10-40-3-26.brq.redhat.com:58553
    [m[0;32m16-12-2014 12:39:33,920 | INFO | [FailoverTransport] | doReconnect - Successfully reconnected to tcp://dhcp-10-40-3-26.brq.redhat.com:58553

Set for fuse and for the client zookeeper.timeout=3000.
In fabric there is created 3 containers with replicated levelDB:

mq-create --no-ssl --parent-profile=mq-replicated --group a broker-nossl
container-create-child --profile mq-broker-a.broker-nossl root broker-nossl-c0
container-create-child --profile mq-broker-a.broker-nossl root broker-nossl-c1
container-create-child --profile mq-broker-a.broker-nossl root broker-nossl-c2

There are 3 threads in the test - one is restarting amq container 1 by one, another is sending persistent messages, the third consumes those messages.
Both thread use url for connection:

discovery:(fabric://a?useExponentialBackoff=false&trace=true&nested.maxReconnectAttempts=5&nested.initialReconnectDelay=1000)

Sometimes after the test execution there are messages in DLQ and in the queue left. In the queue it's impossible to browse them, while in the DLQ it was possible.
For example I got 1 message in the ActiveMQ.DLQ and 3 messages in the test queue during the last test run.
1 message was not received.

The message from ActiveMQ.DLQ was as follows:

ActiveMQTextMessage {commandId = 628, responseRequired = true, messageId = ID:dhcp-4-224.brq.redhat.com-55932-1418827020048-1:1:1:1:624, originalDestination = null, originalTransactionId = null, producerId = ID:dhcp-4-224.brq.redhat.com-55932-1418827020048-1:1:1:1, destination = queue://continuousCommonSessionRestartWithBrokersKillTest, transactionId = null, expiration = 0, timestamp = 1418827248606, arrival = 0, brokerInTime = 1418827248606, brokerOutTime = 1418903486296, correlationId = null, replyTo = null, persistent = true, type = null, priority = 4, groupID = null, groupSequence = 0, targetConsumerId = null, compressed = false, userID = null, content = org.apache.activemq.util.ByteSequence@11f55379, marshalledProperties = org.apache.activemq.util.ByteSequence@13b3625, dataStructure = null, redeliveryCounter = 0, size = 0, properties = {timeProperty=1418827248606}, readOnlyProperties = true, readOnlyBody = true, droppable = false, jmsXGroupFirstForConsumer = false, text = 623_Some text}

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

Hide
logs.zip
2014/12/17 10:28 AM
4.99 MB
Elena Medvedeva
Extracting archive...
Show
logs.zip
2014/12/17 10:28 AM
4.99 MB
Elena Medvedeva
Hide
reconnect-test-send-receive.zip
2014/12/17 10:15 AM
109 kB
Elena Medvedeva
Extracting archive...
Show
reconnect-test-send-receive.zip
2014/12/17 10:15 AM
109 kB
Elena Medvedeva

is related to

ENTMQBR-85 Connection.start sometimes fails to connect when master container is destroyed when using replicated levelDB and fabric discovery.

Closed

ENTMQBR-201 Some messages lost when replicated LevelDB used

Closed

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates