Increased network latency to Europe servers

Starting a few days ago, latency to the World of Warcraft EU servers has been significant. Many users — including myself — are reporting in-game latencies ranging from 500ms all the way into the tens of thousands. Users located in North America, Norway, Sweden, Russia, Bulgaria, Romania, and other portions of Europe are experiencing this issue.

There is a WoW EU Forum thread about this problem. Be sure to note the timestamps of the replies, even though the thread itself was started in 2008:

There are also other threads which could be related — hard to tell:

First and foremost, what actual Realm you play on doesn’t matter. This is a problem with the Internet, not with Blizzard’s environment. However, Blizzard does have a peering arrangement (as far as I know) with Telia (see below), so they should be able to do something about it.

I’ve been studying the issue for the past few days, and there is a common fault point which the above Forum thread does not outline due to users using standard Windows tracert.exe rather than something like Ping Plotter or mtr (for *IX) — think of them as “repetitive traceroutes”.

The common failure point is Telia’s network, specifically their routers in Paris, France. The below mtr to 80.239.233.39 shows what I’m talking about:

                                        Packets               Pings
 Host                                 Loss%   Snt   Rcv  Last   Avg  Best  Wrst
 1. gw.home.lan                        0.0%   156   156   0.4   0.4   0.3   0.6
 2. ???
 3. ge-2-19-ur01.santaclara.ca.sfba.c  0.0%   156   156  11.9  12.4   8.0  61.3
 4. te-0-3-0-0-ar01.sfsutro.ca.sfba.c  0.0%   156   156  21.9  12.8   9.0  26.6
 5. pos-0-4-0-0-cr01.sanjose.ca.ibone  0.0%   156   156  12.6  14.7  10.3  33.0
 6. xe-10-0-0.edge1.SanJose1.Level3.n  0.0%   156   156  12.5  19.2  10.1 240.0
 7. vlan99.csw4.SanJose1.Level3.net    0.0%   156   156  21.9  19.1  11.4  41.9
 8. ae-93-93.ebr3.SanJose1.Level3.net  0.0%   156   156  21.3  19.1  10.6  36.1
 9. ae-2.ebr3.LosAngeles1.Level3.net   0.0%   156   156  37.7  29.2  20.5  49.5
10. ae-73-73.csw2.LosAngeles1.Level3.  0.0%   156   156  23.7  28.5  20.7  57.8
11. ae-2-79.edge1.LosAngeles9.Level3.  0.0%   156   156  23.8  29.9  20.0 188.8
12. telia-level3-ge.LosAngeles9.level  0.0%   156   156  24.9  27.0  20.3 111.5
13. ash-bb1-link.telia.net             0.0%   156   156  87.1  87.9  84.3 105.4
14. prs-bb2-link.telia.net            15.4%   156   132 191.7 190.9 185.7 223.6
    prs-bb2-link.telia.net
15. ffm-bb2-link.telia.net            10.3%   156   140 200.3 203.8 195.4 326.0
    ffm-bb2-link.telia.net
16. ffm-b5-link.telia.net             14.1%   156   134 198.3 203.1 195.9 372.0
17. ???

Note hops #14 through #16. ICMP de-prioritisation could explain the loss, but only when the routers were being heavily utilised. The “trickle down” nature of the packet loss tells us the issue starts at hop #14, which is a Telia router in Paris. It’s important to note the “trickle down” nature of the problem; if hops #15 and #16 showed 0% loss, then I could believe ICMP de-prio was responsible for this… but it isn’t.

This morning, around 0130 PST, the issue cleared up, and remained that way for almost an hour. The Paris routers showed 0% packet loss during that time, and my latencies to the WoW EU servers went from 2800ms down to around 350-380ms (normal). Then starting around 0230 PST, the issue started again, with packet loss at the above hops averaging around 15%.

WoW itself is TCP-based, so it does not surprise me that users are reporting sudden lag, then the game “bursting” to catch up — TCP retransmissions are the cause for the “bursty” nature during high latency.

A few days ago, I posted about the problem on the official outages@outages.org mailing list, with no response. I’m tempted to post the same thing to NANOG, where I know TeliaSonera operators do hang out.

What blows my mind: this issue began 3-4 days ago, yet it still hasn’t been dealt with. The least Telia could do is route around the issue in Paris — their network is large enough to accomplish this. Re-convergence of this nature is commonly used to mitigate network problems (I do this at my job quite often).

As far as what’s causing the problem? High router CPU load, routing engine issues, bad hardware, or a DoS/DDoS attack of some kind. Who knows; it could be one of many things.

Update: I just wanted to provide an example of what things look like when Telia is behaving properly. The high amount of packet loss at hop #12 is obviously ICMP de-prio, as it does not “trickle down” into successive hops. During this mtr, WoW EU server latency was around 400ms:

                                        Packets               Pings
 Host                                 Loss%   Snt   Rcv  Last   Avg  Best  Wrst
 1. gw.home.lan                        0.0%   435   435   0.4   0.3   0.3   0.5
 2. ???
 3. ge-2-19-ur01.santaclara.ca.sfba.c  0.0%   434   434  10.9  10.4   7.6  37.9
 4. te-0-3-0-0-ar01.sfsutro.ca.sfba.c  0.0%   434   434  10.7  11.6   9.2  34.0
 5. pos-0-5-0-0-cr01.sanjose.ca.ibone  0.0%   434   434  13.7  13.4  10.7  36.2
 6. xe-10-0-0.edge1.SanJose1.Level3.n  0.0%   434   434  11.8  17.6  10.3 181.0
 7. vlan89.csw3.SanJose1.Level3.net    0.0%   434   434  21.8  17.4  10.7  84.5
 8. ae-83-83.ebr3.SanJose1.Level3.net  0.2%   434   433  19.1  18.1  10.9  42.8
 9. ae-2.ebr3.LosAngeles1.Level3.net   0.0%   434   434  20.7  27.4  20.1  52.9
10. ae-73-73.csw2.LosAngeles1.Level3.  0.0%   434   434  39.5  27.0  19.7  47.0
11. ae-2-79.edge1.LosAngeles9.Level3.  0.0%   434   434  21.4  27.6  20.1 159.1
12. telia-level3-ge.LosAngeles9.level 20.1%   434   346  28.1  26.8  20.1 189.2
13. ash-bb1-link.telia.net             0.0%   434   434  85.0  86.3  83.5 110.4
14. prs-bb1-link.telia.net             0.0%   434   434 181.8 183.7 179.4 222.0
    prs-bb1-link.telia.net
15. ffm-bb1-link.telia.net             0.0%   434   434 190.8 196.4 189.4 321.7
    ffm-bb1-pos6-0-0.telia.net
16. ffm-b4-link.telia.net              0.0%   434   434 190.3 199.7 189.4 387.8
17. ???