Blowing on the FC increases the rate of CRC errors significantly
I brake together @Martin.dold @jonas.schlagenhauf @evileli @julian.reimer
I understand only trainstation ... Please help me to understand the bug to provide meaningful help here. Edit: Answered questions. Problem stays the same between betCOM 2.2.0 and 2.3.0. Whole project moved now forward to betCOM 2.3.0-alpha1
Setup
- What version is used on FC-ARM? a2afa0ab
- What modules are enabled in FC-ARM firmware? See here
- Are the two FCs connected with each other in this setup? No. Only ARM-FC to GS.
- What version is used on GS? 0e600939
- How did we check for the CRC errors? I expect by checking the command line of GS software? Yes.
- Is the error reproducable? How to reproduce the error? Yes, just blow on it.
- How much "blowing" is required? Mild blowing, like blowing out a couple of candles, but not as much as on your 90th birthday.
- Where to blow exactly? Directly on the controller.
- Did you test in the lab or in the hangar? Lab.
- How can SW behaviour be effected by such phyiscal influence? Doesn't that point to a HW error some where? Broken/unstable cables, wires, soldering points ... ?! My intuition so far points to two possible effects:
Blowing changes the temperature of the oscillator, thus leading to a baud rate mismatch between FC-ARM and GS.-
Increasing the package length by activating the tube angle sensors enhances the probability of a baud rate mismatch.edit: also short messages with CRC errors
Source of the problem
-
Physical manipulation of the connection cable (also induced by blowing) increases CRC error rate @benedikt.schleusener(edit: not the source) -
Packages containing longer runs of zeros (TAS sensor not connected, thus reading in only zeros) are more susceptible to bit misalignment(edit: not the source) -O4 (122k main loops / sec) is to much optimization ... -O3 (114k main loops / sec) works fine and still much better than debug (49k main loops / sec)- edit: not the source. Since hardware rework (resoldering of wires, -O3 works better than debug)
- Source of the problem is definitely the Flight Controller. Review of the signal on the Tx of the µC with the oscilloscope shows the missing bytes.
More info
- Testing with feature/crcdebug and sending the same first package each time reveals that messages are missing single bytes or have up to nine additional bytes. This leads to using the wrong bytes as CRC value as well as mismatched length information. This might be an issue with the ringbuffer.
Received message yields right CRC (be8f/be8f). Received message: 2a|45|12|07|08|00|10|f1|c3|f5|25|52|0b|0a|07|08|00|10|91|b7|f4|25|18|00|5a|0b|0a|07|08|00|10|c8|9f|f4|25|18|00|8a|01|1f|0a|07|08|00|10|b8|99|ed|25|2a|0a|08|80|11|10|80|07|18|a0|92|02|32|08|08|ca|03|10|c4|02|18|72| Received message yields wrong CRC (1ff1/8fde). Received message: 2a|45|12|07|08|00|10|f1|c3|f5|25|52|0b|0a|07|08|00| 91|b7|f4|25|18|00|5a|0b|0a|07|08|00|10|c8|9f|f4|25|18|00|8a|01|1f|0a|07|08|00|10|b8|99|ed|25|2a|0a|08|80|11|10|80|07|18|a0|92|02|32|08|08|ca|03|10|c4|02|18|72|be| Received message yields right CRC (be8f/be8f). Received message: 2a|45|12|07|08|00|10|f1|c3|f5|25|52|0b|0a|07|08|00|10| 91|b7|f4|25|18|00|5a|0b|0a|07|08|00|10|c8|9f|f4|25|18|00|8a|01|1f|0a|07|08|00|10|b8|99|ed|25|2a|0a|08|80|11|10|80|07|18|a0|92|02|32|08|08|ca|03|10|c4|02|18|72| Received message yields wrong CRC (1ff1/8fde). Received message: 2a|45|12|07|08|00|10|f1|c3|f5|25|52|0b|0a|07|08|00| 91|b7|f4|25|18|00|5a|0b|0a|07|08|00|10|c8|9f|f4|25|18|00|8a|01|1f|0a|07|08|00|10|b8|99|ed|25|2a|0a|08|80|11|10|80|07|18|a0|92|02|32|08|08|ca|03|10|c4|02|18|72|be| Received message yields wrong CRC (3bca/ca03). Received message: 2a|45|12|07|08|00|10|f1|c3|f5|25|52|0b|0a|07|08|00| 25|52|0b|0a|07|08|00|10|91|b7|f4|25|18|00|5a|0b|0a|07|08|00|10|c8|9f|f4|25|18|00|8a|01|1f|0a|07|08|00|10|b8|99|ed|25|2a|0a|08|80|11|10|80|07|18|a0|92|02|32|08|08| Received message yields wrong CRC (5610/08ca). Received message: 2a|45|12|07|08|00|10|f1|c3|f5|25|52|0b|0a|07|08|00|10|25|52|0b|0a|07|08|00|10|91|b7|f4|25|18|00|5a|0b|0a|07|08|00|10|c8|9f|f4|25|18|00|8a|01|1f|0a|07|08|00|10|b8|99|ed|25|2a|0a|08|80|11|10|80|07|18|a0|92|02|32|08| Received message yields wrong CRC (1ff1/8fde). Received message: 2a|45|12|07|08|00|10|f1|c3|f5|25|52|0b|0a|07|08|00| 91|b7|f4|25|18|00|5a|0b|0a|07|08|00|10|c8|9f|f4|25|18|00|8a|01|1f|0a|07|08|00|10|b8|99|ed|25|2a|0a|08|80|11|10|80|07|18|a0|92|02|32|08|08|ca|03|10|c4|02|18|72|be| Received message yields wrong CRC (3bca/ca03). Received message: 2a|45|12|07|08|00|10|f1|c3|f5|25|52|0b|0a|07|08|00| 25|52|0b|0a|07|08|00|10|91|b7|f4|25|18|00|5a|0b|0a|07|08|00|10|c8|9f|f4|25|18|00|8a|01|1f|0a|07|08|00|10|b8|99|ed|25|2a|0a|08|80|11|10|80|07|18|a0|92|02|32|08|08| Received message yields wrong CRC (1ff1/8fde). Received message: 2a|45|12|07|08|00|10|f1|c3|f5|25|52|0b|0a|07|08|00| 91|b7|f4|25|18|00|5a|0b|0a|07|08|00|10|c8|9f|f4|25|18|00|8a|01|1f|0a|07|08|00|10|b8|99|ed|25|2a|0a|08|80|11|10|80|07|18|a0|92|02|32|08|08|ca|03|10|c4|02|18|72|be| Received message yields wrong CRC (1ff1/8fde). Received message: 2a|45|12|07|08|00|10|f1|c3|f5|25|52|0b|0a|07|08|00| 91|b7|f4|25|18|00|5a|0b|0a|07|08|00|10|c8|9f|f4|25|18|00|8a|01|1f|0a|07|08|00|10|b8|99|ed|25|2a|0a|08|80|11|10|80|07|18|a0|92|02|32|08|08|ca|03|10|c4|02|18|72|be|
-
The last byte in the ringbuf is always 0x00edit: this is a feature, not a bug - Messages containing the sequence 0x00 0x01 seem to be more succeptible for CRC errors
(perhaps frequency mismatch again)edit: frequency alignment is fine. - Being connected over CCS seems to reduce the CRC error rate drastically.
Brainstroming on possible source
-
Error in the nano-pb implementation... edit: might be a problem to use nanopb-0.3.4 for code, but 0.3.6-dev for generation oO?! - Moved both versions to nanopb-0.3.6(-highwind). Didn't solve the problem.
- Generated the serialized pb message only once and repeatedly sent it. Same problem, so nanopb_encode is not the problem.
- Error in the hal_uart implementation
Solution
See commit f564a739