Troubleshooting Azure IOT Hub connections on embedded Linux

Hi!

I'm in Japan for a few days, working with local partners to get their devices connections to Azure IoT Hub. And I want to share a few lessons learned.

We always started from the Azure IOT Hub SDK on GitHub.  And here's the first catch: if you just download the zip file from GitHub, you are missing the links to other dependent projects and your source tree is lacking some files. To avoid running into these problems, please clone the project using git and don't forget to add the --recursive option as described here.

git clone --recursive https://github.com/Azure/azure-iot-sdks.git

In case you get strange compiler errors on the way, such as mismatch of function signatures, it might be that your source tree is out of sync. One way to fix this is to run "git submodule init" and "git submodule update" in the right directories, but I often just throw away the whole tree and clone it again.

The first thing you should do is to familiarize yourself with the SDK on a normal Linux machine. For this purpose, I just run a Linux VM on Azure. Go through the steps of setting up the development environment and setting up an IoT hub, just for testing. The free tier of the Azure IoT Hub is sufficient at this point. Now create a device ID in your IoT Hub, e.g., by using the Azure IoT HubDevice Explorer on Windows. Under the management tab, select your created device and then right-click and select "Copy connection string".

Go to the source code of one of the simple examples, e.g., the C amqp sample client. Insert your connection string in the source code and compile the sample. Now head back to the device explorer, click on the data tab and start monitoring data from your device. Then run the sample client executable. You should now see a few messages arriving. Now in device explorer, switch to the "Message to Device" tab, select your device and enable "Monitor Feedback endpoint". Now type something in the message field and hit send. Your sample client should receive data and the feedback endpoint monitoring should indicate that the messages have been received.

Great, now let's move over to the actual device!

Here, there are a couple of things you need to be aware of, the two most important ones are trust and time. Wait? What? Is this some relationship self-help blog? :-)

The trust issue:

Unfortunately, some embedded devices do not come with the right set of certificate authorities installed. When the Azure IOT SDK client code tries to establish a secure connection, it validates the certificate presented by the IOT hub against the known certificate authorities. If there is none, the client code stays quiet for a very long time and then fails with various errors. In order to test for this condition, I often just use the openssl client program and try to establish the connection manually from the device.  Most embedded Linux distributions have the openssl executable installed together with the openssl library. An alternative is to run both the sample and "tcpdump -w capture.pcap" at the same time on the device, then download the pcap file and analyze it using wireshark.

For example, if I want to see if I can reach the mqtt endpoint of my IOT Hub, I run the following command:

openssl s_client -connect <My iothub name>.azure-devices.net:8883

(and of course replace <> with the name of your IOT hub)

If this command fails to establish a valid TLS connection with "Verify return code: 20", you have "trust issues". If you see "Verify return code: 0 (ok)" then everything is OK. In wireshark, you would see the TLS negotiation fail with "No CA".

To resolve your trust issue, make sure the right CA certificate is present on the device. Microsoft uses the Baltimore CyberTrust CA to sign the server keys, so you should have the file "Baltimore_CyberTrust_Root.pem" somewhere in your file system. But even if it is there, the openssl library may not load it. To find out where it expects the files to be, just run "openssl version -d". You should see something like this:

OPENSSLDIR: "/usr/lib/ssl"

This means that the OpenSSL library will look for the CA cert in the file /usr/lib/ssl/cert.pem and then in files in the directory /usr/lib/ssl/certs/

But it may be that the file is actually there but OpenSSL still fails to establish a secure connection. Then you might have a time issue.

The time issue:

CA certificates have a time span in which they are valid. For instance, the Baltimore CyberTrust CA openssl x509 is valid in the following time span:

Not Before: May 12 18:46:00 2000 GMT
Not After : May 12 23:59:00 2025 GMT

You can easily check for yourself by running this command:

openssl x509 -in /usr/lib/ssl/certs/Baltimore_CyberTrust_Root.pem -text

How could this be invalid? Easy: Some embedded devices have no battery-buffered realtime clock and initialize their clocks with preset dates on boot. And these may be ancient, e.g. Unix Epoch (January 1st, 1970), GPS epoch (January 6th, 1980) or whatever the manufacturer set. So a good practice is to set the clock to the right date before attempting to connect.

But that might not be enough.

The Azure IOT hub also uses a time-based token scheme to authenticate its clients. The process is described here. The token includes an expiry time as seconds since Unix Epoch in UTC. The Azure IOT SDK uses the device connection string to create such an shared access signature token. If your clock is off, the token created may already have expired. The tokens are generated with a validity of 3600 seconds, i.e., one hour. If your clock is late by more than that, the IOT hub will reject the connection.

So the best practice is to run ntpclient or ntpd on your embedded device. Even busybox has a simple ntpd implementation, so this should be available on your embedded os. Alternatives are of course to use GPS, a mobile network, a battery-powered RTC or a radio time receiver (FM RDS, long-wave time signals etc.) as a time source. But be aware of the startup and initialization times these time sources take (gps can take several minutes to give a proper time information) and the skew RTCs might accumulate over time. And RTC batteries might die after a couple of years. Also make sure that your time zone is properly set, the SDK will always calculate in UTC times, but if your timezone claims to be UTC but the clock is set to the local time zone, you might be off by a couple of hours.

Which brings me back to the CA cert validity. Today, 2025 seems to be far out in the future, but remember that many embedded devices designed today have a lifetime of over 10 years. So that CA cert will expire in the lifetime of these devices. So make sure you have a way to update the CA certificate.

Hope this helps,

H.