Teaching Machines to Understand Us 让机器理解我们 之三 自然语言学习及深度学习的信仰

Language learning


Facebook’s New York office is a three-minute stroll up Broadway from LeCun’s office at NYU, on two floors of a building constructed as a department store in the early 20th century. Workers are packed more densely into the open plan than they are at Facebook’s headquarters in Menlo Park, California, but they can still be seen gliding on articulated skateboards past notices for weekly beer pong. Almost half of LeCun’s team of leading AI researchers works here, with the rest at Facebook’s California campus or an office in Paris. Many of them are trying to make neural networks better at understanding language. “I’ve hired all the people working on this that I could,” says LeCun.


A neural network can “learn” words by spooling through text and calculating how each word it encounters could have been predicted from the words before or after it. By doing this, the software learns to represent every word as a vector that indicates its relationship to other words—a process that uncannily captures concepts in language. The difference between the vectors for “king” and “queen” is the same as for “husband” and “wife,” for example. The vectors for “paper” and “cardboard” are close together, and those for “large” and “big” are even closer.


The same approach works for whole sentences (Hinton says it generates “thought vectors”), and Google is looking at using it to bolster its automatic translation service. A recent paper from researchers at a Chinese university and Microsoft’s Beijing lab used a version of the vector technique to make software that beats some humans on IQ-test questions requiring an understanding of synonyms, antonyms, and analogies.


LeCun’s group is working on going further. “Language in itself is not that complicated,” he says. “What’s complicated is having a deep understanding of language and the world that gives you common sense. That’s what we’re really interested in building into machines.” LeCun means common sense as Aristotle used the term: the ability to understand basic physical reality. He wants a computer to grasp that the sentence “Yann picked up the bottle and walked out of the room” means the bottle left with him. Facebook’s researchers have invented a deep-learning system called a memory network that displays what may be the early stirrings of common sense.


A memory network is a neural network with a memory bank bolted on to store facts it has learned so they don’t get washed away every time it takes in fresh data. The Facebook AI lab has created versions that can answer simple common-sense questions about text they have never seen before. For example, when researchers gave a memory network a very simplified summary of the plot of Lord of the Rings, it could answer questions such as “Where is the ring?” and “Where was Frodo before Mount Doom?” It could interpret the simple world described in the text despite having never previously encountered many of the names or objects, such as “Frodo” or “ring.”

记忆网络是一个神经网络,附带一个记忆库存,用来存储学习到的事实,这样当每次新数据来的时候,不会被冲刷掉。Facebook的人工智能实验室已经开发了几个版本,已经可以回答一些简单的常识问题,这些文字是它们从来没有看到的。比如,当研究者给记忆网络一个非常简化版本的《魔戒》的剧情,它可以回答像“戒指在哪里?”和“Mount Doom之前Frodo在哪里”这样的问题。它可以解释文字里描述的简单的世界,虽然之前从来没有遇到过这些名字和物体,比如“Frodo”或“戒指”。

The software learned its rudimentary common sense by being shown how to answer questions about a simple text in which characters do things in a series of rooms, such as “Fred moved to the bedroom and Joe went to the kitchen.” But LeCun wants to expose the software to texts that are far better at capturing the complexity of life and the things a virtual assistant might need to do. A virtual concierge called Money-penny that Facebook is expected to release could be one source of that data. The assistant is said to be powered by a team of human operators who will help people do things like make restaurant reservations. LeCun’s team could have a memory network watch over Moneypenny’s shoulder before eventually letting it learn by interacting with humans for itself.


Building something that can hold even a basic, narrowly focused conversation still requires significant work. For example, neural networks have shown only very simple reasoning, and researchers haven’t figured out how they might be taught to make plans, says LeCun. But results from the work that has been done with the technology so far leave him confident about where things are going. “The revolution is on the way,” he says.


Some people are less sure. Deep-learning software so far has displayed only the simplest capabilities required for what we would recognize as conversation, says Oren Etzioni, CEO of the Allen Institute for Artificial Intelligence in Seattle. The logic and planning capabilities still needed, he says, are very diferent from the things neural networks have been doing best: digesting sequences of pixels or acoustic waveforms to decide which image category or word they represent. “The problems of understanding natural language are not reducible in the same way,” he says.

一些人则没那么确定。Oren Etzioni是西雅图艾伦人工智能研究所的CEO,他说,目前的深度学习软件进行对话的能力只是最基本最简单的,仍然需要的逻辑与计划的能力,与神经网络可以做的事非常不一样:接收像素序列或语音波形来确定图像属于哪个类别,语音代表哪个字。他说:“理解自然语言的问题不能以同样的方式进行简化。”

Gary Marcus, a professor of psychology and neural science at NYU who has studied how humans learn language and recently started an artificial-intelligence company called Geometric Intelligence, thinks LeCun underestimates how hard it would be for existing software to pick up language and common sense. Training the software with large volumes of carefully annotated data is fine for getting it to sort images. But Marcus doubts it can acquire the trickier skills needed for language, where the meanings of words and complex sentences can flip depending on context. “People will look back on deep learning and say this is a really powerful technique—it’s the first time that AI became practical,” he says. “They’ll also say those things required a lot of data, and there were domains where people just never had enough.” Marcus thinks language may be one of those domains. For software to master conversation, it would need to learn more like a toddler who picks it up without explicit instruction, he suggests.

Gary Marcus是纽约大学一个心理学和神经科学的教授,研究过人类如何学习语言,最近成立了一个人工智能公司,名叫几何智能,他认为LeCun低估了现有软件学习语言和常识的难度。用大量仔细标注的数据对软件进行训练是可以对图像进行分类的。但Marcus怀疑这对于语言这种需要更复杂技巧的问题是不够的,在不同的上下文环境中,文字和复杂句子的意思可以完全不一样。他说:“人们将来回望深度学习的时候,会说这确实是很强大的技术,这是人工智能第一次变得实用,人们也会说这些东西需要大量数据,总有一些领域人们永远也不会有足够的数据。”Marcus认为语言就是这样一个领域。他认为,对于想掌握对话技巧的软件,应当更像一个蹒跚学步的孩子在没有明确指令的情况下去学习。

Deep belief


At Facebook’s headquarters in California, the West Coast members of LeCun’s team sit close to Mark Zuckerberg and Mike Schroepfer, the company’s CTO. Facebook’s leaders know that LeCun’s group is still some way from building something you can talk to, but Schroepfer is already thinking about how to use it. The future Facebook he describes retrieves and coordinates information, like a butler you communicate with by typing or talking as you might with a human one.

在加利福尼亚的Facebook总部里,LeCun团队在西海岸的成员与扎克伯格和公司CTO Mike Schroepfer坐在一起。Facebook的领导者知道LeCun小组还正在构建可以对话的东西的过程中,但Schroepfer已经正在想如何去使用它了。他所描述的Facebook的未来能检索和整合信息,就像正在与一个管家正在交流,通过打字或谈话,并且与一个人类管家的能力应当类似。

“You can engage with a system that can really understand concepts and language at a much higher level,” says Schroepfer. He imagines being able to ask that you see a friend’s baby snapshots but not his jokes, for example. “I think in the near term a version of that is very realizable,” he says. As LeCun’s systems achieve better reasoning and planning abilities, he expects the conversation to get less one-sided. Facebook might offer up information that it thinks you’d like and ask what you thought of it. “Eventually it is like this super-intelligent helper that’s plugged in to all the information streams in the world,” says Schroepfer.

Schroepfer说:“你可以用上一个在更高层次真正理解概念和语言的系统。”比如,他设想系统当看到朋友的宝宝时会发问,而看到他的笑话时则不,他说:“我认为在近期其可行性是很高的。” 当LeCun的系统拥有了更好的推理和计划的能力时,他希望对话不要那么片面。Facebook可能会提供你可能会喜欢的信息,并问你认为怎样。Schroepfer说:“最终它会像一个超级智能的帮手,连接着世界上所有的信息流。”

The algorithms needed to power such interactions would also improve the systems Facebook uses to filter the posts and ads we see. And they could be vital to Facebook’s ambitions to become much more than just a place to socialize. As Facebook begins to host articles and video on behalf of media and entertainment companies, for example, it will need better ways for people to manage information. Virtual assistants and other spinouts from LeCun’s work could also help Facebook’s more ambitious departures from its original business, such as the Oculus group working to make virtual reality into a mass-market technology.


None of this will happen if the recent impressive results meet the fate of previous big ideas in artificial intelligence. Blooms of excitement around neural networks have withered twice already. But while complaining that other companies or researchers are over-hyping their work is one of LeCun’s favorite pastimes, he says there’s enough circumstantial evidence to stand firm behind his own predictions that deep learning will deliver impressive payoffs. The technology is still providing more accuracy and power in every area of AI where it has been applied, he says. New ideas are needed about how to apply it to language processing, but the still-small field is expanding fast as companies and universities dedicate more people to it. “That will accelerate progress,” says LeCun.


It’s still not clear that deep learning can deliver anything like the information butler Facebook envisions. And even if it can, it’s hard to say how much the world really would benefit from it. But we may not have to wait long to find out. LeCun guesses that virtual helpers with a mastery of language unprecedented for software will be available in just two to five years. He expects that anyone who doubts deep learning’s ability to master language will be proved wrong even sooner. “There is the same phenomenon that we were observing just before 2012,” he says. “Things are starting to work, but the people doing more classical techniques are not convinced. Within a year or two it will be the end.”



时间: 2024-07-29 22:21:07

