I had been in Sarajevo during the 19 Jan 2005 week for acceptance test of a delivery and want to share with you what I experienced during this work.
Basically, in this delivery we introduced Diameter Charging support to Telenity SMSC product. SMSC establishes TCP connection to Diameter Server and performs charging requests over this connection.
During the tests, we observed that connection is lost after each 1 minute interval and SMSC re-establishes connections. Also in tcpdump we captured to see packets, wireshark shows retransmissions of each packet, i.e. after first packet immediately in nanoseconds a second retransmitted of this packet is seen as sent below.
As you can see packet 24 of 23, 26 of 25, 30 of 29 is shown as retransmitted.
My first impression about this connection loss was related with those retransmissions, as suggesting that somehow diameter or firewall/loadbalancer between SMSC and diameter is triggering that connection loss because those retransmissions are treated as a security issue by that firewall/loadbalancer etc. According to that suggestion, we tried to figure out reason of that retransmission issue to avoid connection loss and after consumption of hours, nothing we achieved.
And then after a during of brainstorming and discussions with other engineers and analysing the packets, we figured out that, that connection loss happens exactly after 1 minute of establishing connection (or last sent packet) by a FIN,ACK packet sent from diameter to SMSC. This FIN,ACK is an order of terminating to connection.
After exactly 1 minute of last packet 45 sent to diamater, we receive FIN, ACK from diameter at packet 46 and loosing connection, then after 2 second, we re-establishes connection.
Here is details a about connection termination with FIN,ACK.
And the only meaningful explanation of this 1 minute mystery is that: If no packet is sent during 1 minute, diameter server thinks that established connection is not healthy because it does not received any packet during 1 minute and it requests for connection termination. According to that analyse, we configured heart-beat mechanism on SMSC to send watch-dog packets to diameter at 10 sec to keep connection alive, and it worked :).
Below you can see watch dog request/answers packets 1-3, 6-8.
So my first suggestion was wrong, there were no relations between connection loss with re-transmissions.
Everything was working fine although we eliminated connection loss but could not those re-transmissions, on the other hand yet no seen side effect of, customer was asking about reason of those re-transmissions.
At my last hours of last day at Sarajevo, while discussing with customer about opening support ticket to my company for resolution of that re-transmissions issue, somehow with the help of muse, I detected that we are getting tcpdump with "-i any" interface parameter without being sure or not much thinking about meaning of that parameter or even without being aware of that parameter while executing the tcpdump command, in short we were violating the rule: "Know about what you’re doing while you’re doing it".
Below you can see watch dog request/answers packets 1-3, 6-8.
So my first suggestion was wrong, there were no relations between connection loss with re-transmissions.
Everything was working fine although we eliminated connection loss but could not those re-transmissions, on the other hand yet no seen side effect of, customer was asking about reason of those re-transmissions.
At my last hours of last day at Sarajevo, while discussing with customer about opening support ticket to my company for resolution of that re-transmissions issue, somehow with the help of muse, I detected that we are getting tcpdump with "-i any" interface parameter without being sure or not much thinking about meaning of that parameter or even without being aware of that parameter while executing the tcpdump command, in short we were violating the rule: "Know about what you’re doing while you’re doing it".
This is the tcpdump with any interface:
tcpdump -vvv -i any -s 0 -w /tmp/test_charging_live_http.pcap port 8080
And then we started to recapture without "any" parameter
tcpdump -s 0 -vvv -w /tmp/charging_http.pcap host 172.31.247.71
and no retransmission!!!
Actually there were no-retransmissions all along, the first one and retranssmitted one is only same packet but since we use “–any” parameter while taking tcpdump caused recapturing packets on while passing over a second network interface. So wrongly wireshark shows that is retransmitted but actually it is the exactly same packet passing over two layers. They were not retransmitted but same packet because within IP header figure out that identification fields were the same for both packets and equals.
Here is one more link that might be useful: https://ask.wireshark.org/questions/1284/wireshark-showing-lot-of-tcp-retransmissions-is-it-real-retransmission-or-wireshark-showing-incorrectly
Finally, we ended with no issue, it was a great experience with a lot of learnings and enjoyed trip.
No comments:
Post a Comment