“10.14”新加坡数据中心中断事故原因,Equinx的回应来了!

 


“10.14”新加坡Equinix数据中心中断事故原因初步查明

 

译 者 说

Equinix数据中心中断事故原因发生在计划升级过程中冷冻水系统出现技术故障。这一事故再次提醒我们,冷却系统的冗余设计和人员MOP的技能是何等重要,尤其是对于这样一个处于热带地区的设施。

 

 

Equinix已证实,其在新加坡的一座数据中心在按计划实施系统升级过程中,影响了包括两家银行在内的几个客户的运营。导致了长达数小时的中断,受影响银行的ATM柜员机和网上银行服务在周末无法使用。

Equinix has confirmed that a scheduled system upgrade at one of its data centers in Singapore affected the operations of several customers, including two banks. It led to hours-long disruptions that left ATM and online banking services unavailable over the weekend. 

 

 

星展银行的客户在最近几个月已经历了多次服务中断,周六的中断造成无法访问各种服务,包括移动银行和点对点资金转账服务PayLah。该行的自动取款机也关闭了。星展银行在其Facebook个人资料的后续更新中表示,中断是由其数据中心服务提供商Equinix的问题引起的。

DBS Bank customers, who already had experienced multiple service disruptions in recent months, were unable to access various services on Saturday, including mobile banking and peer-to-peer funds transfer service, PayLah. The bank's ATMs also were down. In subsequent updates on its Facebook profile, DBS said the disruption was caused by an issue with its data center service provider, Equinix.

 

 

星展银行周六深夜发布更新表示:“我们正在尽最大努力转向我们的备份数据中心,并期望逐步恢复服务。”该银行通知客户其自动取款机已恢复运营,但在线服务等一些服务仍然不可用。在周日早上的更新中,星展银行表示其所有服务都已恢复并运行。星展银行于2017年与Equinix合作,将其主要数据中心转移到较小的“云优化”场所,使星展银行能够以75%的优惠价格运行该设施。

"We are doing our utmost to swing over to our backup data center and expect to progressively restore services," the bank said. By late night on Saturday, it posted an update to notify customers its ATMs had resumed operations, but some services such as its online services remained unavailable. In an update Sunday morning, DBS said all its services were back up and running. The bank in 2017 partnered with Equinix to move its main data center to smaller "cloud-optimized" premises, enabling DBS' to run the facility at 75% cheaper. 

 

 

ZDNET联系了Equinix,询问了有关服务中断的问题,包括其故障转移措施和辅助站点都无法阻止该事件的原因,以及有多少组织受到影响。Equinix发言人在一封电子邮件回复中说:“10月14日,在我们新加坡的一个数据中心计划进行系统升级期间,冷冻水系统出现了技术问题。这升高了数据中心某些机房的温度,并影响了一些客户的运营。技术问题已解决,我们正在与受影响的客户联系。” 该发言人表示,该供应商目前正在“彻底调查”这一事件,并将在合适的时机提供更多细节。

Contacted Equinix with questions regarding the service disruption, including why its failover measures and secondary site were unable to prevent the incident, and how many organizations were impacted. In an email response, an Equinix spokesperson said: "On Oct. 14, a technical issue with the chilled water system occurred during a planned system upgrade at one of our data centers in Singapore. This raised the temperatures in some of the halls in the data center and impacted some customer's operations. The technical issue has been resolved and we are in contact with impacted customers."The spokesperson said the vendor currently is "thoroughly investigating" the incident and will offer more details when these are available. 

 

图为Equinix数据中心的冷冻水系统

 

 

关注的焦点
Center of Concern

 

该事件还凸显了一个令人不安的事实:新加坡的关键金融系统和互联网服务提供商 (ISP) 托管在Equinix数据中心。披露这一通常因安全问题而受到保密的事实,凸显了新加坡金融基础设施的脆弱性。正如一位评论者恰当地总结的那样——“空调坏了啊”。这表明新加坡热带气候给数据中心运营带来的潜在风险。

The incident has also spotlighted an uncomfortable truth: Singapore’s critical financial systems and internet service providers (ISPs) are hosted in Equinix data centers. This revelation, usually guarded due to security implications, highlights the vulnerability of Singapore’s financial infrastructure. As one commenter aptly summarized, ‘air con spoil ah,’ indicating the potential risks of Singapore’s tropical climate on data center operations.

 

 

数据中心国际专家培训

 

 ATD设计课程
没有做过运维的设计师就无法设计出好用的数据中心吗?ATD课程将彻底解决这个问题
 
 AOS运维课程

运维管理的直接目标是优秀的运维人员而非设备本身;全球权威的AOS运维管理专家课程将带您深刻理解运维管理的本质

 
 ATS管理课程

正确和系统地了解Tier分级体系会提升数据中心的项目投资回报、减少业务中断风险,ATS课程将全面带您学习Tier知识,帮助您有效提升企业的运营指标和内外部客户的满意度。

 

点击图片查看课程排期

 

 

 

扫码回复【uptime培训】了解课程详情

 

 

但除了气候问题之外,这次中断还凸显了过度依赖单一数据中心或提供商的风险。如果受影响的银行在多个数据中心实施冗余和业务连续性规划,那么中断可能会大大减轻。

But beyond the concern of climate, the outage illuminates the risk of overreliance on a single data center or provider. Had the affected banks implemented redundancy and business continuity planning across multiple data centers, the disruption could have been significantly mitigated.

 

 

 

 

监管涟漪效应
Regulatory Ripple Effects

 

有趣的是,新加坡金融管理局 (MAS) 针对银行数据中心冷却系统的冗余制定了具体准则。MAS 技术风险管理指南要求金融机构确保电源、网络连接和冷却系统有足够的冗余,以消除任何单点故障。星展银行和花旗银行似乎在遵守这些准则方面遇到了问题,考虑到其服务的关键性质,这是一个令人担忧的监督。

Interestingly, the Monetary Authority of Singapore (MAS) has specific guidelines for banks regarding the redundancy of cooling systems in their data centers. The MAS Technology Risk Management Guidelines mandate financial institutions to ensure adequate redundancy for power, network connectivity, and cooling systems to eliminate any single point of failure. It seems that DBS and Citibank may have stumbled in adhering to these guidelines, a concerning oversight considering the critical nature of their services.

 

重要的是,这并不是新加坡第一次因数据中心问题而遭受严重中断。今年早些时候,微软Azure在东南亚地区经历了一次电力浪涌,导致部分冷却装置离线,并扰乱了新加坡的多个组织。

It’s important to recall that this is not the first time Singapore has been hit by a significant outage due to a data center issue. Earlier this year, Microsoft Azure experienced a power surge in the Southeast Asia region, causing a subset of cooling units to go offline and disrupting various organizations in Singapore.

 

图为位于新加坡的Equinix SG1数据中心

 

 

业务处于平衡状态
Business in the Balance

 

这次中断对新加坡的企业产生了实实在在的影响,一些企业报告称,由于网上银行和支付服务中断,销售收入损失高达10%。这凸显了企业对数字银行平台的依赖,以及此类中断的潜在经济后果。

The outage had tangible impacts on businesses in Singapore, with some reporting up to 10% loss in sales revenue due to disrupted online banking and payment services. This underscores the reliance of businesses on digital banking platforms, and the potential financial consequences of such outages.

 

总之,周末的中断明确地提醒人们,数据中心在支持金融服务方面发挥的关键作用以及其故障的潜在后果。它提出了有关数据中心的弹性和冗余的关键问题,特别是在新加坡这样的热带气候下。最重要的是,它强调了不要把所有数字鸡蛋放在一个篮子里的重要性——这是数字时代风险管理的一个教训。

In conclusion, the weekend’s outage served as a stark reminder of the critical role that data centers play in supporting financial services and the potential consequences of their failure. It has raised key questions about the resilience and redundancy of data centers, especially in tropical climates like Singapore. Above all, it has highlighted the importance of not putting all our digital eggs in one basket – a lesson in risk management for the digital age.

 

 
 
深 知 社
 
 

 

翻译:

Seaman

DKV(DeepKnowledge Volunteer)创始成员

 

公众号声明:

本文并非官方认可的中文版本,仅供读者学习参考,不得用于任何商业用途,文章内容请以英文原版为准,本文不代表深知社观点。文中内容来自互联网,如有侵权,将在24小时内删除。中文版未经公众号DeepKnowledge书面授权,请勿转载。

 

推荐阅读:

 

 

 

首页    运维    “10.14”新加坡数据中心中断事故原因,Equinx的回应来了!
上周六,新加坡星展银行和花旗银行发生的中断事故震荡了整个新加坡,人们将矛头指向Equinix数据中心冷却系统运维问题。在国外多家媒体的追问下,Equinix发言人对此作出了回应。
设计
管理
运维
设备
电气
暖通
控制
碳中和
储能

深知社