All Blizzard sites unreachable

EDIT: As of 10:48 PDT, things seem back online (and reliably at that). I strongly doubt there will ever be mention of this outage publicly.

EDIT #2: Imagine that! Assuming you’re actually able to log in to Diablo 3 (not sure about other battle.net services), there is an incredibly (and obviously intentionally) vague message about “a problem”. I’ve included it here as a screenshot. I love the use of the phrase “the parties involved”; wouldn’t want to name any names and risk holding a provider responsible for the problem, would we? US politics, what a disgrace.

EDIT #3: mtr now shows what appears to be a great improvement; it looks to me like there was a network-level problem within AT&T’s own network near/around Salt Lake City that caused this. Of course, I would still love to know why my packets bound for the NS-WEST.CERF.NET nameserver go from the Bay Area to Los Angeles to Salt Lake City to Los Angeles to San Diego. Asymmetry doesn’t explain this either. Obviously AT&T has some kind of networking issue that they tried to route around (which caused this entire ordeal) but failed miserably. Wouldn’t be the first, 2nd, or even 10th time I’ve seen AT&T fail at this. Welcome to the Internet: it’s broken 24x7x365, no exaggeration.

Not sure when this issue began today, but I was able to confirm its existence around 09:45 PDT (that’s 9:45am Pacific).

None of Blizzard’s sites — including blizzard.com, battle.net, diablo3.com, as well as any subdomain under those — are resolving. I imagine this would affect any game or service Blizzard offers which uses DNS for resolution; hard-coded IP addresses (which I believe WoW uses for its server list) would not be affected, of course, but I believe WoW does use DNS for some portion of its authentication mechanism. In the case of Diablo 3, the behaviour is amusing — you might see a strange GDI-based “Checking for updates” box, then get the usual launcher dialog with no details in it, with “Play” greyed out. (Note to Blizzard: you should really improve the error handling in the D3 Launcher. The way this manifested itself, from a UI / end-user perspective, is absolutely atrocious!)

Blizzard’s Facebook page is filled with people complaining about the problem, while Blizzard’s Twitter page (as of 10:09 PDT) has no mention of any issue.

The root cause appears to be some sort of major issue within AT&T. Blizzard relies on two authoritative nameservers for all of their sites — specifically NS-WEST.CERF.NET and NS-EAST.CERF.NET (this is AT&T). Neither of these authoritative nameservers are responding. I have not the slightest idea on earth why Blizzard has only *two* authoritative nameservers, and both with the same company (AT&T); you would expect them to have a tertiary that wasn’t AT&T-responsible. I dunno who maintains Blizzard’s WHOIS and SOA data, but that’s a very, very bad choice on their part; DNS and Networking 101 says do not do this.

Below are some technical details which won’t mean much to your average player but to a system or network administrator will be more than sufficient evidence of the problem.

I cannot explain why packets destined for NS-WEST.CERF.NET’s IP address are going through Salt Lake City, Utah. This smells of an AT&T IP routing problem but I have no easy way to confirm that given the asymmetric nature of packet flow on the Internet.

DNS testing:

$ dig @a.gtld-servers.net ns blizzard.com.

; <<>> DiG 9.8.2 <<>> @a.gtld-servers.net ns blizzard.com.
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 11635
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 2, ADDITIONAL: 2
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;blizzard.com.                  IN      NS

;; AUTHORITY SECTION:
blizzard.com.           172800  IN      NS      ns-east.cerf.net.
blizzard.com.           172800  IN      NS      ns-west.cerf.net.

;; ADDITIONAL SECTION:
ns-east.cerf.net.       172800  IN      A       207.252.96.3
ns-west.cerf.net.       172800  IN      A       192.153.156.3

;; Query time: 108 msec
;; SERVER: 192.5.6.30#53(192.5.6.30)
;; WHEN: Sun Jun  3 10:01:43 2012
;; MSG SIZE  rcvd: 114

$ dig @207.252.96.3 ns blizzard.com.

; <<>> DiG 9.8.2 <<>> @207.252.96.3 ns blizzard.com.
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached

$ dig @192.153.156.3 ns blizzard.com.

; <<>> DiG 9.8.2 <<>> @192.153.156.3 ns blizzard.com.
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached

Simple mtr (traceroute+ping combination, essentially):

$ mtr 192.153.156.3
                                                  Packets               Pings
 Host                                           Loss%   Snt   Rcv  Last   Avg  Best  Wrst
 1. gw.home.lan                                  0.0%    21    21   0.3   0.2   0.2   0.3
 2. c-67-180-84-1.hsd1.ca.comcast.net           20.0%    20    16 4932. 4829. 4264. 5128.
 3. te-0-0-0-12-ur05.santaclara.ca.sfba.comcast  0.0%    20    20  97.8  19.0   8.7  97.8
 4. te-1-11-0-1-ar01.sfsutro.ca.sfba.comcast.ne  0.0%    20    20  44.1  24.0  13.0  45.9
 5. 68.86.91.229                                 0.0%    20    20  62.5  25.9  12.1  66.7
 6. pos-0-4-0-0-pe01.11greatoaks.ca.ibone.comca  0.0%    20    20  30.6  20.7  16.8  32.0
 7. 192.205.37.1                                 0.0%    20    20  18.4  29.2  16.1 118.3
 8. cr2.sffca.ip.att.net                         0.0%    20    20  44.4  49.7  39.7  84.7
 9. cr2.la2ca.ip.att.net                         0.0%    20    20  58.0  48.0  39.7  66.3
10. cr1.slkut.ip.att.net                         0.0%    20    20  43.6  50.4  39.3  83.1
11. cr2.slkut.ip.att.net                         0.0%    20    20  56.6  48.6  38.5  87.9
12. 12.123.238.133                               0.0%    20    20  40.7  46.9  37.7  97.1
13. ???

$ mtr 207.252.96.3
                                                  Packets               Pings
 Host                                           Loss%   Snt   Rcv  Last   Avg  Best  Wrst
 1. gw.home.lan                                  0.0%    62    62   0.3   0.3   0.2   0.4
 2. c-67-180-84-1.hsd1.ca.comcast.net            8.2%    61    56 4528. 5193. 4525. 6339.
 3. te-0-0-0-12-ur05.santaclara.ca.sfba.comcast  0.0%    61    61  10.4  15.7   8.0  70.7
 4. te-1-11-0-1-ar01.sfsutro.ca.sfba.comcast.ne  0.0%    61    61  19.9  23.6  10.3  67.2
 5. 68.86.90.93                                  0.0%    61    61  21.8  20.8  11.4  53.8
 6. pos-0-9-0-0-pe01.11greatoaks.ca.ibone.comca  0.0%    61    61  17.7  24.7  16.0  56.1
 7. 192.205.37.1                                 0.0%    61    61  74.2  28.0  15.8 140.6
 8. cr2.sffca.ip.att.net                         0.0%    61    61  45.2  49.6  39.3 104.1
 9. cr2.la2ca.ip.att.net                         0.0%    61    61  41.3  50.3  37.9 114.7
10. cr1.slkut.ip.att.net                         0.0%    61    61  54.1  48.3  38.6  80.8
11. cr2.slkut.ip.att.net                         0.0%    61    61  62.7  50.7  39.7  91.2
12. 12.123.238.137                               0.0%    61    61  39.2  50.6  37.5 183.9
13. ???
14. ???
15. ???
16. ???
17. ???
18. ???
19. n54ny401me3.ip.att.net                       0.0%    58    57  96.6 104.9  94.4 145.6
20. ???

BGP routing tables as seen by the Internet (more specifically route-views.routeviews.org):

route-views>show ip route 192.153.156.3
Routing entry for 192.153.156.0/24
  Known via "bgp 6447", distance 20, metric 0
  Tag 7018, type external
  Last update from 12.0.1.63 7w0d ago
  Routing Descriptor Blocks:
  * 12.0.1.63, from 12.0.1.63, 7w0d ago
      Route metric is 0, traffic share count is 1
      AS Hops 2
      Route tag 7018

route-views>show ip route 207.252.96.3
Routing entry for 207.252.96.0/24
  Known via "bgp 6447", distance 20, metric 0
  Tag 7018, type external
  Last update from 12.0.1.63 7w0d ago
  Routing Descriptor Blocks:
  * 12.0.1.63, from 12.0.1.63, 7w0d ago
      Route metric is 0, traffic share count is 1
      AS Hops 2
      Route tag 7018