Tuesday, August 9, 2011

Textual description of firstImageUrl

Diagnosing RMI Exception: java.rmi.ConnectException: Connection refused to host: 127.0.0.1

Introduction


I have a service that runs on Linux under JBOSS. This service uses JMX (RMI) to talk to a windows box running a java service. RMI is the equivalent of the windows Distributed COM that allows a client to invoke a method on a remote service.

Setup


My setup is as follows. I have a Linux machine (Client) that is running a Java service in Tomcat, hosted in JBOSS. It is trying to invoke a method on a remote machine which is running a Java program as a windows service.

Java RMI ConnectionRefused Error


Everything was working ok, until both the Linux and Windows boxes were moved to a different subnet. Then, they started failing with the following exception:

Caused by: java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is:
java.net.ConnectException: Connection refused
at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:601)
at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:198)
at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184)
at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:110)
at javax.management.remote.rmi.RMIServerImpl_Stub.newClient(Unknown Source)
at javax.management.remote.rmi.RMIConnector.getConnection(RMIConnector.java:2312)
at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:277)
at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:248)
at com.zillow.core.jmx.ZillowJMXConnectorFactory.connect(ZillowJMXConnectorFactory.java:127)
at com.zillow.core.jmx.ZillowJMXConnectorFactory.connect(ZillowJMXConnectorFactory.java:53)
at com.zillow.bcpserver.BCPServerProxy.afterPropertiesSet(BCPServerProxy.java:123)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1369)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1335)
... 142 more
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:519)
at java.net.Socket.connect(Socket.java:469)
at java.net.Socket.(Socket.java:366)
at java.net.Socket.(Socket.java:180)
at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:22)
at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:128)
at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:595)
... 154 more


I searched for this exception in the search engines, and the only thing I found was people saying that if the local box (i.e linux) did not have a correct IP address for localhost, it would send the 127.0.0.1 address as the "ContactMe" endpoint to the remote destination, and that would fail.

For example, the following links explained that issue:

https://forum.springsource.org/showthread.php?33711-RMI-invocation-attempts-connecting-to-127.0.0.1


However, in my case, that turned out not to be the problem. My Linux boxes (even after the move) had the correct hostname, and the `hostname` command was giving back the correct hostname (and not 127.0.0.1 or localhost).

At this point, there was no more information available through the search engines to diagnose the problem.

So, I decided to debug it myself. I first restarted the service, and used TCPDUMP to do a network sniff on the linux box.


Here, .175 is the Linux box, and .45 is the windows box.

The following is the packet disassembly of the JRMI/ReturnData response being sent by the Windows box to the Linux box.


As you can see, the Windows server is sending back "127.0.0.1" as the CallMe endpoint to the Linux box. The linux box tries to connect to port 1099 on 127.0.0.1 and fails.

Since the machines had been moved to different networks, it is possible that the java service might have lost their IP address and network registration settings.

So, I restarted the java service on the windows box. And viola!, that fixed the problem.

Looking for tools to help you troubleshoot networking issues? My blog post Network Programmers Toolchest will come in handy.