Status: Open (View Workflow)
Affects Version/s: 1.0.31.Final, 2.1.1.Final
Fix Version/s: None
The EJBClient library allows setting up a transaction context (user or managed) so that multiple invocations from a client to a server may enjoy transactional guarantees.
Part of this scheme involves setting up a transaction instance (with transaction id) on the client, and using a corresponding transaction instance on the server which is resumed/suspended each time an invocation for the given transaction id arrives at the server. When a transaction needs to commit or roll back, a transaction invocation (prepare/commit/roll back) is sent from the client to the server to conclude the transaction on the server.
The client side transaction instance is associated with a target node when the transaction is initially created and this node is used by the transaction invocation messages to send the commit or roll back messages. This mechanism works fine, as long as the node to which the invocations are sent to does not change.
However, there is a retry mechanism for clustered SFSBs such that if an invocation returns with NoSuchEJBException, the EJBClientInvocationContext will automatically retry that invocation on another node in the cluster which supports the given invocation. The retry mechanism is based on re-running the interceptor chain on the client ; the first interceptor, ReceiverInterceptor, will choose a new target node for the invocation. However, this change of node is not communicated to the EJBClientTransactionContext.
Although the retried invocation will succeed on the new node, there are two problems with the enclosing transaction:
- when the transaction comes to commit or rollback, the commit and rollback invocation messages are sent to the node on which the original invocations were made, before retry. In the case of a failover scenario, this node will often be unavailable/down.
- there is an entry for the corresponding client side transaction on the server side, in the EJBRemoteTransactionRepository. This entry is used to suspend and resume the transaction on the server side. When the retry mechanism chooses a new node to direct the invocation to, all transaction related state on the server is lost, as the EJBRemoteTransactionRepository service is not replicated across nodes.
In EJBClient failover tests with managed transaction contexts, I am seeing transactions which have had invocations necessarily use the retry mechanism be unable to commit or roll back because the transaction invocation messages are trying to communicate with a node which is no longer available.