0条Plus

这不是演习：从新冠疫情看人工智能应对极端事件的能力

JEREMY KAHN 2020-03-27

一旦遇到“黑天鹅”案例，多数人工智能系统都会出现难以招架的情况。

此类罕见的极端事件在金融界被称为“黑天鹅”，通常每十年甚至一个世纪才会出现一次，但都会让市场颤抖。新冠疫情显然算得上是一次“黑天鹅”事件。在数据科学和人工智能领域，“黑天鹅”还有一些别称，如边界情况、极端情况、“分布外”数据点等。一旦遇到“黑天鹅”案例，多数人工智能系统都会出现难以招架的情况。

对许多企业的新型人工智能系统而言，新冠疫情是一场实战测试，也让我们得以看清它们究竟有多强大。现在，大多数机器学习系统需要使用大量历史数据进行训练，但时局骤变时会出现什么情况呢？

比如说，多数AI驱动的交易算法都是在最近五年才投入使用。其训练数据甚至可能都没有囊括2008年的金融危机，而且几乎可以肯定的是，当前的很多因素也没有被纳入其中，例如这种由需求引发的全行业大规模冲击。

因此在过去几周，一些本应能够在各种市场环境中应对自如的人工智能驱动投资策略却交出了远不如预期的成绩单。以英国热门线上零售平台Ocado为例，近期其网站经历了前所未有的流量暴增，比创建20年以来的流量峰值还要高出4倍之多。在上周四与记者举行的电话会议中，Ocado发言人大卫·什利夫表示，由于近期访客太多，该公司使用机器学习技术来监测网络异常的网络安全软件误以为网站遭受了“拒绝服务”类型的网络攻击，继而采取举措阻止用户访问网站。幸运的是，运营经理通过人工干预避免了这次“误伤”。

企业该怎么做才能让机器学习模型应对这些极端情况？DataRobot是一家专为大型企业开发、运行机器学习模型的波士顿初创企业，该公司的数据科学家杰伊·舒伦可为您提供解决之道。

公司实时监控数据模型至关重要。如果某家杂货店平常一分钟卖22箱牛奶，突然遇到销量增至10倍，肯定想知道原因。舒伦说，能做到的企业并不多。

企业要主动了解哪些机器学习模型，以及模型中的哪些输入变量对极端事件最敏感。他说，从电力需求到购物，任何与人类行为相关的事务都可能因为新型冠状病毒改变。

企业要考虑与不同算法相关的风险。如果投放广告的系统出现问题，情况可不妙，但后果远没有系统将价值100万美元的产品运送到因避免社交而关闭的商店严重。

企业里的数据科学家应该跟业务领域专家坐下来，对系统进行模拟压力测试：出现危机时，客户可能想要什么样的产品？如果成千上万顾客要在一周内采购六个月用的卫生纸，供应管理算法要如何应对？

数据科学家可以调整人工智能系统调用的程序，避免软件因遇到极端情况崩溃。举例来说，如果模型使用价格百分比而不是实际价格，恢复正常功能会更迅速。

公司应该寻找数据中可能存在的代理指标：现在发生的事件更接近哪个历史事件？是飓风桑迪还是1973年石油危机？

最后，数据科学家要仔细考虑未来的训练数据里要不要加入当前新型冠状病毒导致的极端数据。对于某些系统，加入极端情况数据可能帮软件避免受到类似危机的影响。但在很多情况下可能适得其反，导致系统错误地认为危机是一种“新常态”。囤积卫生纸的人买了太多，未来几个月内都不用再买，所以不久的将来需求突然崩溃，这一点人类分析师肯定能预料到，但人工智能系统无法预见。

舒伦表示，公司可在不同条件下建立不同类型的机器学习模型：一种是在正常情况下使用，更经济但更脆弱；另一种可能效率较低，但遇到异常数据时不容易崩溃，在极端事件中更可靠。（财富中文网）

译者：梁宇

审校：夏林

公司实时监控数据模型至关重要。如果某家杂货店平常一分钟卖22箱牛奶，突然遇到销量增至10倍，肯定想知道原因。舒伦说，能做到的企业并不多。

公司应该寻找数据中可能存在的代理指标：现在发生的事件更接近哪个历史事件？是飓风桑迪还是1973年石油危机？

译者：梁宇

审校：夏林

In finance, they are called black swans. Those rare, extreme events that come along only once every decade or even once a century and can send markets reeling. The global coronavirus pandemic is certainly one. In data science and artificial intelligence circles, those same kind of events are known by different names: edge cases, corner cases, or “out-of-distribution” datapoints. And most A.I. systems do not cope well when confronted with them.

The coronavirus pandemic is providing a real-world test of how robust many companies’ new-fangled A.I. systems really are.Most of today’s machine learning systems need to be trained on lots of historical data. But what happens when the present suddenly stops looking like the recent past?

Most A.I.-driven trading algorithms, for instance, have only been implemented in the last five years. Their training data might not even have included the 2008 financial crisis. They almost certainly don’t include anything like the massive demand-driven shock we’re seeing across all industries right now.

So, some A.I.-driven investment strategies that were supposed to do well in all kinds of different market conditions have actually performed much worsethan expected in the past few weeks.Another example: Ocado, a popular online grocery business in the U.K., has seen traffic to its website spike four times higher than any previous peak the company has experienced in its 20-year history. In a conference call with reporters Thursday, Ocado spokesman David Shriver said so many visitors went to its website that the company’s cybersecurity software, which uses machine learning to detect aberrant behavior, assumed the site was experiencing a denial of service cyberattack and moved to block those connections. Luckily, human operations managers intervened to prevent that from happening.

What can a company do to make sure its machine learning models are able to cope with these extremes? Jay Schuren, a data scientist at DataRobot, a Boston startup that helps large corporations create and run machine learning models, has tips.

It’s vital that companies monitor their data models in real-time. For a grocery that normally sells 22 cartons of milk a minute, you want to know if you suddenly start selling 10 times that amount. Not enough businesses do this today, Schuren says.

Businesses need to be proactive about which machine learning models and which input variables within the models are most sensitive to extreme events. Anything that depends on human behavior—from electricity demand to shopping—will probably change because of Covid-19, he says.

Businesses need to think about the risks associated with different algorithms. If a system for placing ads goes haywire, that’s not good, but the consequences are a lot less severe than a system dispatching $1 million worth of products to a store that’s now shuttered due to social distancing measures.

A company’s data scientists should sit down with the business's subject-matter experts and stress-test a system in simulation: What items might customers want in a crisis? And what will happen to your supply management algorithm if you do get thousands of people wanting to purchase six months' worth of toilet paper in a week?

Data scientists can rejigger which inputs an A.I. system uses so the software might be less thrown-off by extreme variations: For instance, rather than using prices as an input variable, a model that uses the percentage change in prices instead will return to normal functioning faster.

Companies should look for proxies that might exist in their data: Does this look like what happened during Hurricane Sandy or what happened during the 1973 oil crisis?

Finally, data scientists need to think carefully about whether they want the current coronavirus extremes included in future training data. For some systems, doing so might inoculate the software from being caught off guard by a similar crisis. But in a lot of other cases, it might have the opposite effect, leading the system to falsely expect that the crisis reflects a “new normal.” All those people stockpiling toilet paper today may have so much on hand they won’t need to buy any more for months, resulting in a sudden crash in demand in the near-future that the A.I. system won’t be able to foresee, even though a human analyst would certainly expect it.

Schuren says that companies could benefit from building families of different types of machine learning models for different conditions: one type that is more economically efficient, but more fragile, that they use in normal circumstances, and another that is maybe less efficient, but also less prone to break when confronted with abnormal data, that they can fall back on during extreme events.

精选评论

撰写或查看更多评论, 请打开财富Plus APP

热读文章

热门视频

500强行业分布