Jump to content
  • 0

Problem communicating with multiple Ethernet devices from a single process with uldaq


Mark Rivers

Question

I have discovered a problem when a single application (i.e. process) is controlling multiple Ethernet modules.   The error message reported by my application is this:

[Sat Apr 29 06:35:55 2023] 2023/04/29 06:35:55.680 MultiFunction::readInt32 Error: Calling AIn, err=102 Invalid network frame
[Sat Apr 29 07:22:55 2023] 2023/04/29 07:22:55.840 MultiFunction::pollerThread Error: Calling CIn, err=102 Invalid network frame
[Sat Apr 29 07:55:33 2023] 2023/04/29 07:55:33.678 MultiFunction::readInt32 Error: Calling AIn, err=102 Invalid network frame
[Sat Apr 29 08:00:21 2023] 2023/04/29 08:00:21.678 MultiFunction::readInt32 Error: Calling AIn, err=102 Invalid network frame

So the ulAin and ulCIn functions are returning error 102.  The application is calling these functions many times per second.  The errors are relatively rare, about once every 15 minutes on average.

The problem has been observed when controlling multiple E-1608 modules, multiple E-TC modules, or multiple TC-32 modules.

This error never occurs if the same application is only talking to a single Ethernet module.

This error never occurs (nor any similar error) if the application is controlling multiple USB modules.  I have run it with 3 USB modules for several weeks with no errors.

My application runs on both Linux and Windows by using #ifdef __WIN32 or #ifdef linux to call either the Windows or Linux UL library.  I will test to see if the problem also occurs when the application is running on Windows, but I don't think it does.

Link to comment
Share on other sites

6 answers to this question

Recommended Posts

  • 0

I have now run the same application on Windows, controlling the same 2 E-1608 modules.  The Windows application is the same source code as Linux, just using the equivalent calls to the Windows UL library rather than the uldaq library.   In 20 hours there has not been a single error, while on Linux there would be about 100 errors in this time period. 

Link to comment
Share on other sites

  • 0

Hi Fausto,

I have not yet written a simple example program to demonstrate the problem, but I will do so. 

My real application is here: https://github.com/epics-modules/measComp/blob/master/measCompApp/src/drvMultiFunction.cpp  It is part of the EPICS real-time control system toolkit used at many large-scale facilities worldwide, such as synchrotron and neutron sources, major telescopes, etc. The problem was first reported to me by a scientist at the Advanced Light Source at the Lawrence Berkeley Laboratory where he was using E-TC and TC-32 modules.  I reproduced the problem at my facility, the Advanced Photon Source at Argonne National Laboratory, with 2 E-1608 units. 

I am using Centos 7 and Windows 10.  Each OS is running on a dedicated machine, they are not running on the same system or on virtual machines.  I have the ability to test on Ubuntu 22 and Centos 8 as well if you think that would be useful.

The computers are using wired Ethernet on the same subnet as the Measurement Computing devices. 

As I said above, the problem does not occur on Windows, does not occur with multiple USB devices on Linux, and does not occur if the application is only running a single Ethernet device.  The Linux version of my application previously used Warren Jasper's drivers, and I did not see the problem in that version.  The problem only began when I switched to using uldaq rather than Warren's drivers.

By the way I think uldaq is great!  It is a clean design and remarkably problem-free for a new product.  I really appreciate that you made it open-source on Github, I only wish that more vendors did the same!

Thanks,

Mark

 

Link to comment
Share on other sites

  • 0

I just rebuilt uldaq with TRACE defined in ul_internal.h.

When I now run my application controlling a single E-1608 I do not see any error messages.  However, when I run it controlling 2 E-1608s I see the following error stream:

[2023-May-01 10:57:50:873904] Invalid frame ID!!!!
[2023-May-01 10:57:50:973725] Invalid frame ID!!!!
[2023-May-01 10:57:51:074001] Invalid frame ID!!!!
[2023-May-01 10:57:51:274613] Invalid frame ID!!!!
[2023-May-01 10:57:51:374377] Invalid frame ID!!!!
[2023-May-01 10:57:51:573951] Invalid frame ID!!!!
[2023-May-01 10:57:51:674601] Invalid frame ID!!!!
[2023-May-01 10:57:51:905766] Invalid frame ID!!!!
[2023-May-01 10:57:52:073874] Invalid frame ID!!!!
[2023-May-01 10:57:52:274010] Invalid frame ID!!!!
[2023-May-01 10:57:52:474640] Invalid frame ID!!!!
[2023-May-01 10:57:52:774197] Invalid frame ID!!!!
[2023-May-01 10:57:52:874391] Invalid frame ID!!!!
[2023-May-01 10:57:52:985038] Invalid frame ID!!!!
[2023-May-01 10:57:53:173903] Invalid frame ID!!!!
[2023-May-01 10:57:53:274781] Invalid frame ID!!!!
[2023-May-01 10:57:53:374418] Invalid frame ID!!!!
2023/05/01 10:57:53.474 MultiFunction::readInt32 Error: Calling AIn, err=102 Invalid network frame
2023/05/01 10:57:53.474 dualTest:E1608_1:Ai3 devAsynInt32::processCallbackInput process read error
[2023-May-01 10:57:53:474640] Invalid frame ID!!!!
[2023-May-01 10:57:53:574945] Invalid frame ID!!!!
[2023-May-01 10:57:53:674400] Invalid frame ID!!!!
2023/05/01 10:57:53.774 MultiFunction::readInt32 Error: Calling AIn, err=102 Invalid network frame
2023/05/01 10:57:53.774 dualTest:E1608_1:Ai4 devAsynInt32::processCallbackInput process read error
[2023-May-01 10:57:53:774968] Invalid frame ID!!!!
[2023-May-01 10:57:53:874470] Invalid frame ID!!!!
[2023-May-01 10:57:53:975290] Invalid frame ID!!!!
[2023-May-01 10:57:54:146314] Invalid frame ID!!!!
[2023-May-01 10:57:54:475164] Invalid frame ID!!!!
[2023-May-01 10:57:54:974780] Invalid frame ID!!!!
[2023-May-01 10:57:55:074628] Invalid frame ID!!!!
[2023-May-01 10:57:55:174388] Invalid frame ID!!!!
[2023-May-01 10:57:55:474389] Invalid frame ID!!!!
[2023-May-01 10:57:55:875013] Invalid frame ID!!!!
[2023-May-01 10:57:55:974375] Invalid frame ID!!!!
[2023-May-01 10:57:56:074441] Invalid frame ID!!!!
2023/05/01 10:57:56.174 MultiFunction::readInt32 Error: Calling AIn, err=102 Invalid network frame
2023/05/01 10:57:56.174 dualTest:E1608_2:Ai2 devAsynInt32::processCallbackInput process read error

I now realize the following:

- The error is Invalid frame ID.  This is printed on this line in uldaq:

https://github.com/mccdaq/uldaq/blob/1d8404159c0fb6d2665461b80acca5bbef5c610a/src/net/NetDaqDevice.cpp#L628

- The errors are happening 3-10 times per second.  However, this line does 2 retries each time there is an error:

https://github.com/mccdaq/uldaq/blob/1d8404159c0fb6d2665461b80acca5bbef5c610a/src/net/NetDaqDevice.cpp#L510

The retry generally works, so most of the time my application does not receive the 102 error. But clearly something is quite wrong because the uldaq driver is seeing all of these Invalid frame ID errors and is retrying 3-10 times per second.

Thanks,

Mark

 

Link to comment
Share on other sites

  • 0

I have written a test program to try to reproduce the error.  The test program talks to 2 E-DIO24 modules using 4 threads.  2 of the threads talk to one E-DIO4, and the other 2 threads talk to the other module.  There is a single mutex that the threads lock before they make the calls to the UL library.  I can reproduce the problem in my EPICS driver ("Invalid network frame") if I don't lock the mutex before calling the UL library.

My EPICS driver does have a mutex which is called when I thought it should be.  But I just realized that mutex is only protecting access to a specific device, not to the UL library as a whole.  So when there are 2 devices in the same EPICS process there are 2 mutexes, and the UL library could be called from 2 threads at the same time.

Interestingly this does not appear to be a problem on Windows, and is not a problem on Linux with USB devices.  It seems that the UL library on Linux for Ethernet devices cannot be called from 2 unsynchronized threads even if those threads are talking to different devices?

testDualEthDevice.cpp

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...