Take Care of Thread Safe when Creating EventHubClient Object with Java SDK


Recently, I have worked on a weird azure app service issue. The app service just randomly returns 502.3 status code, and the win32 error code is 0x80072EFD. Normally speaking, for a Java web application, this is a bad gateway issue and the win32 error code means HttpPlatformHandler is not able to connect to Tomcat, because all Tomcat threads are used up and Tomcat cannot accept any more connections.
Then, I have used the jstack(can be found in the Java SDK directory) command in kudu to capture a thread dump and would like to check whether there are any deadlock threads or if they are hanging on any IO operation.

jstack -F PID D:/home/site/threaddump1.txt

Just as expected, almost all the tomcat threads (by default to 200 in Tomcat) are blocking in the following stack, it can explain why there is 502.3 error.

- sun.nio.ch.WindowsSelectorImpl$SubSelector.poll0(long, int, int[], int[], int[], long) - sun.nio.ch.WindowsSelectorImpl$SubSelector.poll() - sun.nio.ch.WindowsSelectorImpl$SubSelector.access$400(sun.nio.ch.WindowsSelectorImpl$SubSelector) - sun.nio.ch.WindowsSelectorImpl.doSelect(long) - sun.nio.ch.SelectorImpl.lockAndDoSelect(long) - sun.nio.ch.SelectorImpl.select(long) - org.apache.qpid.proton.reactor.impl.SelectorImpl.select(long) - org.apache.qpid.proton.reactor.impl.IOHandler.handleQuiesced(org.apache.qpid.proton.reactor.Reactor, org.apache.qpid.proton.reactor.Selector) - org.apache.qpid.proton.reactor.impl.IOHandler.onUnhandled(org.apache.qpid.proton.engine.Event) - org.apache.qpid.proton.engine.impl.EventImpl.getType() - org.apache.qpid.proton.engine.BaseHandler.handle(org.apache.qpid.proton.engine.Event) - org.apache.qpid.proton.engine.impl.EventImpl.dispatch(org.apache.qpid.proton.engine.Handler) - org.apache.qpid.proton.reactor.impl.ReactorImpl.dispatch(org.apache.qpid.proton.engine.Event, org.apache.qpid.proton.engine.Handler) - org.apache.qpid.proton.reactor.impl.ReactorImpl.process() - com.microsoft.azure.servicebus.MessagingFactory$RunReactor.run() - java.lang.Thread.run()

Based on com.microsoft.azure.servicebus.MessagingFactory$RunReactor.run function call, looks like it is blocking in the IO with service bus, but can't give further information. Luckily, some exception log says there is "Unable to establish loopback connection" exception when the issue happens, and this has given me more hint.

com.microsoft.azure.servicebus.MessagingFactory$RunReactor.run UnHandled exception while processing events in reactor:
org.apache.qpid.proton.reactor.impl.ReactorInternalException: java.io.IOException: Unable to establish loopback connection
org.apache.qpid.proton.engine.impl.EventImpl.dispatch(EventImpl.java:112)
org.apache.qpid.proton.reactor.impl.ReactorImpl.dispatch(ReactorImpl.java:309)
org.apache.qpid.proton.reactor.impl.ReactorImpl.process(ReactorImpl.java:277)
com.microsoft.azure.servicebus.MessagingFactory$RunReactor.run(MessagingFactory.java:381)
java.lang.Thread.run(Thread.java:745)Cause: java.io.IOException: Unable to establish loopback connection
org.apache.qpid.proton.reactor.impl.IOHandler.onUnhandled(IOHandler.java:388)
org.apache.qpid.proton.engine.BaseHandler.onSelectableInit(BaseHandler.java:92)
org.apache.qpid.proton.engine.BaseHandler.handle(BaseHandler.java:221)
org.apache.qpid.proton.engine.impl.EventImpl.dispatch(EventImpl.java:108)
org.apache.qpid.proton.reactor.impl.ReactorImpl.dispatch(ReactorImpl.java:309)
org.apache.qpid.proton.reactor.impl.ReactorImpl.process(ReactorImpl.java:277)
com.microsoft.azure.servicebus.MessagingFactory$RunReactor.run(MessagingFactory.java:381) 

Now it is more clear that the application has exhausted all the available loopback connections, all the threads are just blocking on waiting available loopback connection and finally stop Tomcat listening the request. After checking the code implementation, I can see this is a code defect that it has not considered thread safe when creating EventHubClient object. As the below code snippet shown, there is no problem for the implementation itself, a static EventHubClient object is expected to be created when it is being requested the first time, after that, it will be reused in the following requests.

import com.microsoft.azure.eventhubs.EventData;
import com.microsoft.azure.eventhubs.EventHubClient;
import com.microsoft.azure.servicebus.ConnectionStringBuilder;
import com.microsoft.azure.servicebus.ServiceBusException;

public class AzureUtils
{
    private static EventHubClient ehClient = null;
    private static void initAzure() throws ServiceBusException, IOException
    {
        // connStr is connection string
        ehClient = EventHubClient.createFromConnectionStringSync(connStr);
    }

    public static void send(String message)
    {
        try
        {
            byte[] payloadBytes = (message).getBytes("UTF-8");
            EventData sendEvent = new EventData(payloadBytes);
            if (ehClient == null)
            {
                initAzure();
            }
            ehClient.sendSync(sendEvent);

        }
        catch (Exception e)
        {
            e.printStackTrace();
        }
    }
}

However for a web application, especially a high throughput site as this one, as soon as the application launches, it will handle thousands of requests simultaneously. While, static keyword in Java doesn't mean it is thread safe, each thread may have its own local copy if the value is not initialized yet which means lots of EventHubClient objects have been created when the application launches. Each EventHubClient creates a dedicated physical socket to the EventHubService and exhausted the connections. The correct fix is to make sure the object has been initialized before using it in this scenario. Hope this is helpful if you have experienced similar issues.

Comments (0)

Skip to main content