I have no hAP ac3, so cannot make the experiment.
Multithreading processes is not easy, and this distribution over the 4 CPU's is not known. Sometimes there is some form of serialization of the overall process, what makes the CPU wait.
Your CHR is normally a virtual machine, running on a larger host, that might have faster memory, larger memory caches at the CPU, and might define 100%CPU for the CHR differently.
Well there seems to be another bottleneck than just the pure CPU power in the hAP ac3. This reading might be of some interest to you:
https://www.batna24.com/uk/mikrotik-hap ... mance-test
Having access to the datasheets of the IPQ-4019 datasheets at Qualcomm requires registration and company validation. I have no access.
But be aware of the TCP congestion control. It reduces the throughput to avoid congestion. There are different algorithms for congestion-avoidance, and the outcome is different.
https://en.wikipedia.org/wiki/TCP_congestion_control. I don't know what algorithm ROS is using, and how it will impact uni-directional and bi-directional flow.
Finding a study for loopback speed tests and bi-directional loopback speed tests is unlikely.
But at least the concepts of this study could shed a light on what is happening :
https://hal.inria.fr/hal-01073421/file/RR-LIG-002.pdf
With asymmetric connection speeds, the impact is clear
https://community.cisco.com/t5/routing/ ... -p/1301840, but symmetric speed , even loopback can still have reduced throughput to some extend.