-
Type:
Bug
-
Status: Verified (View Workflow)
-
Priority:
Critical
-
Resolution: Done
-
Affects Version/s: JDG 7.1.0 GA
-
Fix Version/s: JDG 7.2 ER5
-
Component/s: HotRod Java client
-
Labels:None
-
Environment:
JDG Hot Rod client 6.4
JDG Server 7.1.0
-
Target Release:
-
Fix Build:ER5
-
Steps to Reproduce:
-
Git Pull Request:
-
Sprint:JDG Sprint #10
The customer set the property datagrid.hosts with a space between the IP list like this:
<property name="datagrid.hosts" value="10.111.111.1:11222; 10.111.111.2:11222; 10.111.111.2:11222; 10.111.111.3:11222; 10.111.111.4:11222;"/>
|
This lead to an exception (only catch with trace logging turned on):
10:47:29,378 TRACE [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransport] (Timer-13) Could not connect to server: 10.111.111.4:11222: java.net.UnknownHostException
|
The Hot Rod client keeps trying to connect to an unknown host due to this host validation fail. The TcpTransport class creates the socket but the JVM wasn't releasing it, leading to growth on open files count:
00:00:01 totsck tcpsck udpsck rawsck ip-frag tcp-tw
|
....
|
16:30:02 649704 251 9 0 0 143
|
16:40:01 649955 248 9 0 0 172
|
16:50:01 650127 246 9 0 0 27 --> when it reached the nofile limits
|
17:00:01 650120 244 9 0 0 24
|
17:10:01 650124 242 9 0 0 26
|
17:20:01 650124 234 9 0 0 27
|
17:30:01 650121 218 9 0 0 27
|
17:40:01 650135 217 9 0 0 25
|
There were 649683 held by java process:
$ cat lsof | grep 31226 | grep sock | wc -l
|
649683
|
and "can't identify protocol" is 649681:
$ cat lsof | grep identify | grep sock | grep 31226 | wc -l
|
649681
|
Our suggestions are:
1. Correct the addServers method from ConfigurationBuilder class (line 96) to strip the spaces or to consider spaces on the redexp ADDRESS_PATTERN
2. Set null to socket and socketChannel on the "finally" of the TcpTransport constructor (line 58-66). Without it, the OS is leaving socket files opened until the nolimits reach or when the process shutdown.
3. The could not connect error message should be a WARN and not a TRACE: log.tracef(e, "Could not connect to server: %s", serverAddress); (line 75). This should be clear on the console logs to warn operations of this problem. The JDG server may be out of reach, causing troubles to the environment.
Please see the attached linked GSS ticket for more information about this matter.