David Pratten is passionate about leading IT-related change projects for social good.
2302 stories

A closer look at the Flying V, a blended-wing passenger airliner concept

1 Share

Dutch airline KLM turned 100 earlier this month and decided to give itself a birthday present: a shiny, sleek, futuristic-looking, sustainable aircraft. Or at least the possibility of one in 2040."This could be the next thing," says Dr. Roelf Vos, professor of Flight Performance and Propulsion at Delft University of Technology and the head researcher on the Flying V project. "It at least deserves some investigation."

The Flying V, touted in press releases as "revolutionary", is what is known as a blended wing body, or BWB, aircraft, which have no distinct wing and body structures like more conventional aircraft. The shape reduces drag, which means the plane needs less fuel to operate. TU Delft claims the Flying V will consume 20 percent less fuel than a similarly sized traditional aircraft. "These are estimates," cautions Vos. "We still have 5-10 years of research before we could test a full scale aircraft."

The design of the Flying V wasn't invented by Vos. Or even TU Delft or KLM; it was the idea of a Technical University of Berlin student Justus Benad, working on his thesis project at airplane maker Airbus. He tested a scale model in 2014 and Airbus patented the design, but didn't move further on the project. Vos saw the concept in a news article in 2015 and wondered if Benad's calculations were accurate. "I was skeptical," he said. He had two students review the concepts, one of whom went to Berlin to meet with Benad, and, together, they concluded the concept had potential.

Read 12 remaining paragraphs | Comments

Read the whole story
21 hours ago
Sydney, Australia
Share this story

Neanderthal glue was a bigger deal than we thought

1 Share
Color photo of a roll of birch bark, a puddle of tar, and a spear with birch tar securing the point.

This replica shows how Neanderthals might have used birch tar to haft a projectile point. (credit: Paul R. B. Kozowyk)

Fifty-thousand years ago, a Neanderthal living in Northwestern Europe put sticky birch tar on the back side of a sharp flint flake to make the tool easier to grip. Eventually, that tool washed down the Rhine or Meuse Rivers and out into the North Sea. In the 21st century, dredging ships scooped it up along with tons of sand, other stone tools, and fossilized bones, then dumped the whole pile on Zandmotor Beach in the Netherlands.

Despite all of that, the birch tar still clung to the flake, and it provides evidence that Neanderthals used a complex set of technology to make elaborate tools.

Living on the edge

Making birch tar at all is a fairly complex process. It takes multiple steps, lots of planning, and detailed knowledge of the materials and the process. So the fact that archaeologists have found a handful of tools hafted using birch tar tells us that Neanderthals were (pardon the pun) pretty sharp.

Read 10 remaining paragraphs | Comments

Read the whole story
21 hours ago
Sydney, Australia
Share this story

How Flagstaff Arizona switched to LEDs without giving astronomers a headache

1 Share
A couple of different types of dark-sky-friendly LED streetlights.

Enlarge / A couple of different types of dark-sky-friendly LED streetlights. (credit: Scott K. Johnson)

“I feel like we’re protecting the last tree, in a way.” That’s what Flagstaff, Arizona, city council member Austin Aslan said at a recent meeting. The subject of that earnest statement might surprise you: it was streetlights. To be more specific, he was talking about a careful effort to prevent streetlights from washing out the stars in the night sky.

Flagstaff became the first city to earn a designation from the International Dark Sky Association in 2001. That came as a result of its long history of hosting astronomy research at local Lowell Observatory, as well as facilities operated by the US Navy. The city has an official ordinance governing the use of outdoor lighting—public and private.

Lighting issues

A few years ago, though, a problem arose. The type of dark-sky-friendly streetlight that the city had been using was going extinct, largely as a casualty of low demand. In fact, as of this summer, there are none left to buy. Meanwhile, the age of the LED streetlight has arrived with a catch: limited night-sky-friendly LED options.

Read 20 remaining paragraphs | Comments

Read the whole story
22 hours ago
Sydney, Australia
Share this story

Vega-Lite: a grammar of interactive graphics

1 Share

Vega-lite: a grammar of interactive graphics Satyanarayan et al., IEEE transactions on visualization and computer graphics, 2016

From time to time I receive a request for more HCI (human-computer interaction) related papers in The Morning Paper. If you’ve been a follower of The Morning Paper for any time at all you can probably tell that I naturally gravitate more towards the feeds-and-speeds end of the spectrum than user experience and interaction design. But what good is a super-fast system that nobody uses! With the help of Aditya Parameswaran, who recently received a VLDB Early Career Research Contribution Award for his work on the development of tools for large-scale data exploration targeting non-programmers, I’ve chosen a selection of data exploration, visualisation and interaction papers for this week. Thank you Aditya! Fingers crossed I’ll be able to bring you more from Aditya Parameswaran in future editions.

Vega and Vega-lite follow in a long line of work that can trace its roots back to Wilkinson’s ‘The Grammar of Graphics.’ It’s all the way back to 2015 since we last looked at the Vega-family on The Morning Paper (see ‘Declarative interaction design for data visualization’ and ‘[Reactive Vega: a streaming dataflow architecture for declarative interactive visualisation][ReactiveVega’). Since then [Vega-Lite]VegaLite has come into existence, bringing high-level specification of interactive visualisations to the Vega-Lite world. From the look of the github repository it remains a very actively developed project to this day.

Grammars for graphics

You can think of a ‘grammar of graphics’ as a bit like the ultimate DSL for creating charts and visualisations. Vega-Lite using JSON structures to describe visualisations and interactions, which are compiled down to full Vega specifications.

Compared to base Vega, Vega-Lite introduces a view algebra for composing multiple views (including merging scales, aligning views etc. – massive time-saver!), and a novel grammar of interaction. The implementation compiles interaction grammars down to the Reactive Vega runtime.

Unit views

The unit in the Vega-Lite world is a view.

A unit specification describes a single Cartesian plot, with a backing data set, a given mark-type, and a set of one or more encoding definitions for visual channels such as position (x, y), color, size, etc..

For example:

Encodings determine how data attributes map to the properties of visual marks.

View composition

Given multiple unit specifications, composite views can be created using a set of composition operators.

There are four basic composition operators; layer, concatenate, facet, and repeat. It’s also possible to nest these (i.e., a composite view serve as a unit in a higher level composition) to create more complex views and dashboards.


The layer operator takes multiple unit specifications as input, and produces a view with charts plotted on top of each other. By default Vega-Lite will produce shared scales and merge guides, when merging scales doesn’t make sense, you can override this behaviour to produce a dual-axis chart instead.


Vega-Lite provides both horizontal (charts side-by-side) and vertical (stacked charts) concatenation operators. A shared scale and axis will be used where possible.


The facet operator produces trellis plots with one chart for each distinct value of a given field. Scales and guides are shared across all plots.


Repeat also generates multiple plots, but allows full replication of a data set in each cell.

For example, repeat can be used to create a scatterplot matrix (SPLOM), where each cell shows a different 2D projection of the same data table.

Interaction grammar

So far we’ve just been making pretty pictures, but the interaction grammar is where we can make those visualisations come alive. The interaction grammar is concerned with selections and transformations.

The end result is an enumerable, combinatorial design space of interactive statistical graphics, with concise specification of not only linking interactions, but panning, zooming, and custom techniques as well.


Selections map user actions to a set of points a user is interested in manipulating. The simplest selection is the point selection, which selects an individual datum. E.g., you click on a point, and it becomes highlighted. In a list selection the user can select multiple points, with points inserted, removed, or modified as events fire. Interval selections are similar to lists, but membership is determined by range predicates rather than selection points individually.

By default selections are made over data values, facilitating reuse across views, but it is also possible to define selections over a visual range (e.g., choosing colours for a heatmap) if needed.

The particular events that update a selection are determined by the platform a Vega-Lite specification is compiled on, and the input modalities it supports. By default we use mouse events on desktops, and touch events on mobile and tablet devices.


Once a selection has been made, it is the job of transformations to manipulate the selected components. There are five atomic transformation types, “the minimal set to support both common and custom interaction techniques.”

  • Projection alters the predicate governing selection inclusion. In the previous figure for example, plot (d) shows projecting a selection to all points with matching Origin field values.
  • Toggling adds or removes elements in a list selection when the corresponding event occurs
  • Translation offsets spatial properties by an amount determined by the associated events (e.g. drag events). It works by default with interval selections, enabling movement of brushed regions or panning.

  • Zooming applies a scale factor
  • Nearest causes the data value or visual element nearest the selection’s triggering event to be selected.

Linking selections to visual encodings

Selections, as defined by the combination of selection types and transforms, can then be used to parameterise visual encodings and make them interactive. For example, setting the fill colour of points that are selected to a given highlight colour, and to grey otherwise.

Selected points can also be used as input data to further encodings. This enables linked interactions, including displaying tooltips or labels, and cross-filtering.

Putting it all together

The example visualisations in section 6 of the paper cover all seven categories of interaction techniques in [Yi et al.’s taxonomy][YiEtAl]:

  • select to mark items of interest
  • explore to examine subsets of the data
  • connect to highlight related items
  • abstract/elaborate to vary the level of detail
  • reconfigure to show different arrangements of the data
  • filter to show elements conditionally
  • encode to change the visual representations used

Here’s a representative example showing layered cross-filtering:

Since of course one of the key points is that these visualisations are interactive, you might like to explore the interactive demos available online.

The last word

Vega-Lite is an open source system available at http://vega.github.io/vega-lite. By offering a multi-view grammar of graphics tightly integrated with a grammar of interaction, Vega-Lite facilitates rapid exploration of design variations. Ultimately, we hope that it enables analysts to produce and modify interactive graphics with the same ease with which they currently construct static plots.

Read the whole story
2 days ago
Sydney, Australia
Share this story

Machines’ Language Skills Just Surpassed Our Own. But Is This Understanding?


In the fall of 2017, Sam Bowman, a computational linguist at New York University, figured that computers still weren’t very good at understanding the written word. Sure, they had become decent at simulating that understanding in certain narrow domains, like automatic translation or sentiment analysis (for example, determining if a sentence sounds “mean or nice,” he said). But Bowman wanted measurable evidence of the genuine article: bona fide, human-style reading comprehension in English. So he came up with a test.

In an April 2018 paper coauthored with collaborators from the University of Washington and DeepMind, the Google-owned artificial intelligence company, Bowman introduced a battery of nine reading-comprehension tasks for computers called GLUE (General Language Understanding Evaluation). The test was designed as “a fairly representative sample of what the research community thought were interesting challenges,” said Bowman, but also “pretty straightforward for humans.” For example, one task asks whether a sentence is true based on information offered in a preceding sentence. If you can tell that “President Trump landed in Iraq for the start of a seven-day visit” implies that “President Trump is on an overseas visit,” you’ve just passed.

The machines bombed. Even state-of-the-art neural networks scored no higher than 69 out of 100 across all nine tasks: a D-plus, in letter grade terms. Bowman and his coauthors weren’t surprised. Neural networks — layers of computational connections built in a crude approximation of how neurons communicate within mammalian brains — had shown promise in the field of “natural language processing” (NLP), but the researchers weren’t convinced that these systems were learning anything substantial about language itself. And GLUE seemed to prove it. “These early results indicate that solving GLUE is beyond the capabilities of current models and methods,” Bowman and his coauthors wrote.

Their appraisal would be short-lived. In October of 2018, Google introduced a new method nicknamed BERT (Bidirectional Encoder Representations from Transformers). It produced a GLUE score of 80.5. On this brand-new benchmark designed to measure machines’ real understanding of natural language — or to expose their lack thereof — the machines had jumped from a D-plus to a B-minus in just six months.

“That was definitely the ‘oh, crap’ moment,” Bowman recalled, using a more colorful interjection. “The general reaction in the field was incredulity. BERT was getting numbers on many of the tasks that were close to what we thought would be the limit of how well you could do.” Indeed, GLUE didn’t even bother to include human baseline scores before BERT; by the time Bowman and one of his Ph.D. students added them to GLUE in February 2019, they lasted just a few months before a BERT-based system from Microsoft beat them.

As of this writing, nearly every position on the GLUE leaderboard is occupied by a system that incorporates, extends or optimizes BERT. Five of these systems outrank human performance.

But is AI actually starting to understand our language — or is it just getting better at gaming our systems? As BERT-based neural networks have taken benchmarks like GLUE by storm, new evaluation methods have emerged that seem to paint these powerful NLP systems as computational versions of Clever Hans, the early 20th-century horse who seemed smart enough to do arithmetic, but who was actually just following unconscious cues from his trainer.

“We know we’re somewhere in the gray area between solving language in a very boring, narrow sense, and solving AI,” Bowman said. “The general reaction of the field was: Why did this happen? What does this mean? What do we do now?”

Writing Their Own Rules

In the famous Chinese Room thought experiment, a non-Chinese-speaking person sits in a room furnished with many rulebooks. Taken together, these rulebooks perfectly specify how to take any incoming sequence of Chinese symbols and craft an appropriate response. A person outside slips questions written in Chinese under the door. The person inside consults the rulebooks, then sends back perfectly coherent answers in Chinese.

The thought experiment has been used to argue that, no matter how it might appear from the outside, the person inside the room can’t be said to have any true understanding of Chinese. Still, even a simulacrum of understanding has been a good enough goal for natural language processing.

The only problem is that perfect rulebooks don’t exist, because natural language is far too complex and haphazard to be reduced to a rigid set of specifications. Take syntax, for example: the rules (and rules of thumb) that define how words group into meaningful sentences. The phrase “colorless green ideas sleep furiously” has perfect syntax, but any natural speaker knows it’s nonsense. What prewritten rulebook could capture this “unwritten” fact about natural language — or innumerable others?

NLP researchers have tried to square this circle by having neural networks write their own makeshift rulebooks, in a process called pretraining.

Before 2018, one of NLP’s main pretraining tools was something like a dictionary. Known as word embeddings, this dictionary encoded associations between words as numbers in a way that deep neural networks could accept as input — akin to giving the person inside a Chinese room a crude vocabulary book to work with. But a neural network pretrained with word embeddings is still blind to the meaning of words at the sentence level. “It would think that ‘a man bit the dog’ and ‘a dog bit the man’ are exactly the same thing,” said Tal Linzen, a computational linguist at Johns Hopkins University.

A better method would use pretraining to equip the network with richer rulebooks — not just for vocabulary, but for syntax and context as well — before training it to perform a specific NLP task. In early 2018, researchers at OpenAI, the University of San Francisco, the Allen Institute for Artificial Intelligence and the University of Washington simultaneously discovered a clever way to approximate this feat. Instead of pretraining just the first layer of a network with word embeddings, the researchers began training entire neural networks on a broader basic task called language modeling.

“The simplest kind of language model is: I’m going to read a bunch of words and then try to predict the next word,” explained Myle Ott, a research scientist at Facebook. “If I say, ‘George Bush was born in,’ the model now has to predict the next word in that sentence.”

These deep pretrained language models could be produced relatively efficiently. Researchers simply fed their neural networks massive amounts of written text copied from freely available sources like Wikipedia — billions of words, preformatted into grammatically correct sentences — and let the networks derive next-word predictions on their own. In essence, it was like asking the person inside a Chinese room to write all his own rules, using only the incoming Chinese messages for reference.

“The great thing about this approach is it turns out that the model learns a ton of stuff about syntax,” Ott said.

What’s more, these pretrained neural networks could then apply their richer representations of language to the job of learning an unrelated, more specific NLP task, a process called fine-tuning.

“You can take the model from the pretraining stage and kind of adapt it for whatever actual task you care about,” Ott explained. “And when you do that, you get much better results than if you had just started with your end task in the first place.”

Indeed, in June of 2018, when OpenAI unveiled a neural network called GPT, which included a language model pretrained on nearly a billion words (sourced from 11,038 digital books) for an entire month, its GLUE score of 72.8 immediately took the top spot on the leaderboard. Still, Sam Bowman assumed that the field had a long way to go before any system could even begin to approach human-level performance.

Then BERT appeared.

A Powerful Recipe

So what exactly is BERT?

First, it’s not a fully trained neural network capable of besting human performance right out of the box. Instead, said Bowman, BERT is “a very precise recipe for pretraining a neural network.” Just as a baker can follow a recipe to reliably produce a delicious prebaked pie crust — which can then be used to make many different kinds of pie, from blueberry to spinach quiche — Google researchers developed BERT’s recipe to serve as an ideal foundation for “baking” neural networks (that is, fine-tuning them) to do well on many different natural language processing tasks. Google also open-sourced BERT’s code, which means that other researchers don’t have to repeat the recipe from scratch — they can just download BERT as-is, like buying a prebaked pie crust from the supermarket.

If BERT is essentially a recipe, what’s the ingredient list? “It’s the result of three things coming together to really make things click,” said Omer Levy, a research scientist at Facebook who has analyzed BERT’s inner workings.

The first is a pretrained language model, those reference books in our Chinese room. The second is the ability to figure out which features of a sentence are most important.

In 2017, an engineer at Google Brain named Jakob Uszkoreit was working on ways to accelerate Google’s language-understanding efforts. He noticed that state-of-the-art neural networks also suffered from a built-in constraint: They all looked through the sequence of words one by one. This “sequentiality” seemed to match intuitions of how humans actually read written sentences. But Uszkoreit wondered if “it might be the case that understanding language in a linear, sequential fashion is suboptimal,” he said.

Uszkoreit and his collaborators devised a new architecture for neural networks focused on “attention,” a mechanism that lets each layer of the network assign more weight to some specific features of the input than to others. This new attention-focused architecture, called a transformer, could take a sentence like “a dog bites the man” as input and encode each word in many different ways in parallel. For example, a transformer might connect “bites” and “man” together as verb and object, while ignoring “a”; at the same time, it could connect “bites” and “dog” together as verb and subject, while mostly ignoring “the.”

The nonsequential nature of the transformer represented sentences in a more expressive form, which Uszkoreit calls treelike. Each layer of the neural network makes multiple, parallel connections between certain words while ignoring others — akin to a student diagramming a sentence in elementary school. These connections are often drawn between words that may not actually sit next to each other in the sentence. “Those structures effectively look like a number of trees that are overlaid,” Uszkoreit explained.

This treelike representation of sentences gave transformers a powerful way to model contextual meaning, and also to efficiently learn associations between words that might be far away from each other in complex sentences. “It’s a bit counterintuitive,” Uszkoreit said, “but it is rooted in results from linguistics, which has for a long time looked at treelike models of language.”

Finally, the third ingredient in BERT’s recipe takes nonlinear reading one step further.

Unlike other pretrained language models, many of which are created by having neural networks read terabytes of text from left to right, BERT’s model reads left to right and right to left at the same time, and learns to predict words in the middle that have been randomly masked from view. For example, BERT might accept as input a sentence like “George Bush was in Connecticut in 1946” and predict the masked word in the middle of the sentence (in this case, “born”) by parsing the text from both directions. “This bidirectionality is conditioning a neural network to try to get as much information as it can out of any subset of words,” Uszkoreit said.

The Mad-Libs-esque pretraining task that BERT uses — called masked-language modeling — isn’t new. In fact, it’s been used as a tool for assessing language comprehension in humans for decades. For Google, it also offered a practical way of enabling bidirectionality in neural networks, as opposed to the unidirectional pretraining methods that had previously dominated the field. “Before BERT, unidirectional language modeling was the standard, even though it is an unnecessarily restrictive constraint,” said Kenton Lee, a research scientist at Google.

Each of these three ingredients — a deep pretrained language model, attention and bidirectionality — existed independently before BERT. But until Google released its recipe in late 2018, no one had combined them in such a powerful way.

Refining the Recipe

Like any good recipe, BERT was soon adapted by cooks to their own tastes. In the spring of 2019, there was a period “when Microsoft and Alibaba were leapfrogging each other week by week, continuing to tune their models and trade places at the number one spot on the leaderboard,” Bowman recalled. When an improved version of BERT called RoBERTa first came on the scene in August, the DeepMind researcher Sebastian Ruder dryly noted the occasion in his widely read NLP newsletter: “Another month, another state-of-the-art pretrained language model.”

BERT’s “pie crust” incorporates a number of structural design decisions that affect how well it works. These include the size of the neural network being baked, the amount of pretraining data, how that pretraining data is masked and how long the neural network gets to train on it. Subsequent recipes like RoBERTa result from researchers tweaking these design decisions, much like chefs refining a dish.

In RoBERTa’s case, researchers at Facebook and the University of Washington increased some ingredients (more pretraining data, longer input sequences, more training time), took one away (a “next sentence prediction” task, originally included in BERT, that actually degraded performance) and modified another (they made the masked-language pretraining task harder). The result? First place on GLUE — briefly. Six weeks later, researchers from Microsoft and the University of Maryland added their own tweaks to RoBERTa and eked out a new win. As of this writing, yet another model called ALBERT, short for “A Lite BERT,” has taken GLUE’s top spot by further adjusting BERT’s basic design.

“We’re still figuring out what recipes work and which ones don’t,” said Facebook’s Ott, who worked on RoBERTa.

Still, just as perfecting your pie-baking technique isn’t likely to teach you the principles of chemistry, incrementally optimizing BERT doesn’t necessarily impart much theoretical knowledge about advancing NLP. “I’ll be perfectly honest with you: I don’t follow these papers, because they are extremely boring to me,” said Linzen, the computational linguist from Johns Hopkins. “There is a scientific puzzle there,” he grants, but it doesn’t lie in figuring out how to make BERT and all its spawn smarter, or even in figuring out how they got smart in the first place. Instead, “we are trying to understand to what extent these models are really understanding language,” he said, and not “picking up weird tricks that happen to work on the data sets that we commonly evaluate our models on.”

In other words: BERT is doing something right. But what if it’s for the wrong reasons?

Clever but Not Smart

In July 2019, two researchers from Taiwan’s National Cheng Kung University used BERT to achieve an impressive result on a relatively obscure natural language understanding benchmark called the argument reasoning comprehension task. Performing the task requires selecting the appropriate implicit premise (called a warrant) that will back up a reason for arguing some claim. For example, to argue that “smoking causes cancer” (the claim) because “scientific studies have shown a link between smoking and cancer” (the reason), you need to presume that “scientific studies are credible” (the warrant), as opposed to “scientific studies are expensive” (which may be true, but makes no sense in the context of the argument). Got all that?

If not, don’t worry. Even human beings don’t do particularly well on this task without practice: The average baseline score for an untrained person is 80 out of 100. BERT got 77 — “surprising,” in the authors’ understated opinion.

But instead of concluding that BERT could apparently imbue neural networks with near-Aristotelian reasoning skills, they suspected a simpler explanation: that BERT was picking up on superficial patterns in the way the warrants were phrased. Indeed, after re-analyzing their training data, the authors found ample evidence of these so-called spurious cues. For example, simply choosing a warrant with the word “not” in it led to correct answers 61% of the time. After these patterns were scrubbed from the data, BERT’s score dropped from 77 to 53 — equivalent to random guessing. An article in The Gradient, a machine-learning magazine published out of the Stanford Artificial Intelligence Laboratory, compared BERT to Clever Hans, the horse with the phony powers of arithmetic.

In another paper called “Right for the Wrong Reasons,” Linzen and his coauthors published evidence that BERT’s high performance on certain GLUE tasks might also be attributed to spurious cues in the training data for those tasks. (The paper included an alternative data set designed to specifically expose the kind of shortcut that Linzen suspected BERT was using on GLUE. The data set’s name: Heuristic Analysis for Natural-Language-Inference Systems, or HANS.)

So is BERT, and all of its benchmark-busting siblings, essentially a sham? Bowman agrees with Linzen that some of GLUE’s training data is messy — shot through with subtle biases introduced by the humans who created it, all of which are potentially exploitable by a powerful BERT-based neural network. “There’s no single ‘cheap trick’ that will let it solve everything , but there are lots of shortcuts it can take that will really help,” Bowman said, “and the model can pick up on those shortcuts.” But he doesn’t think BERT’s foundation is built on sand, either. “It seems like we have a model that has really learned something substantial about language,” he said. “But it’s definitely not understanding English in a comprehensive and robust way.”

According to Yejin Choi, a computer scientist at the University of Washington and the Allen Institute, one way to encourage progress toward robust understanding is to focus not just on building a better BERT, but also on designing better benchmarks and training data that lower the possibility of Clever Hans–style cheating. Her work explores an approach called adversarial filtering, which uses algorithms to scan NLP training data sets and remove examples that are overly repetitive or that otherwise introduce spurious cues for a neural network to pick up on. After this adversarial filtering, “BERT’s performance can reduce significantly,” she said, while “human performance does not drop so much.”

Still, some NLP researchers believe that even with better training, neural language models may still face a fundamental obstacle to real understanding. BERT is not designed to model language in general, said Anna Rogers, a computational linguist at the Text Machine Lab at the University of Massachusetts, Lowell. Instead, after fine-tuning, it models “a specific NLP task, or even a specific data set for that task.” And it’s likely that no training data set, no matter how comprehensively designed or carefully filtered, can capture all the edge cases and unforeseen inputs that humans effortlessly cope with when we use natural language.

Bowman points out that it’s hard to know how we would ever be fully convinced that a neural network achieves anything like real understanding. Standardized tests, after all, are supposed to reveal something intrinsic and generalizable about the test-taker’s knowledge. But as anyone who has taken an SAT prep course knows, tests can be gamed. “We have a hard time making tests that are hard enough and trick-proof enough that solving really convinces us that we’ve fully solved some aspect of AI or language technology,” he said.

Indeed, Bowman and his collaborators recently introduced a test called SuperGLUE that’s specifically designed to be hard for BERT-based systems. So far, no neural network can beat human performance on it. But even if (or when) it happens, does it mean that machines can really understand language any better than before? Or does just it mean that science has gotten better at teaching machines to the test?

“That’s a good analogy,” Bowman said. “We figured out how to solve the LSAT and the MCAT, and we might not actually be qualified to be doctors and lawyers.” Still, he added, this seems to be the way that artificial intelligence research moves forward. “Chess felt like a serious test of intelligence until we figured out how to write a chess program,” he said. “We’re definitely in an era where the goal is to keep coming up with harder problems that represent language understanding, and keep figuring out how to solve those problems.”

Read the whole story
4 days ago
Sydney, Australia
Share this story

How To Keep Momentum After A Hack Day?

1 Share

I had a coffee chat last week with the Chief Information Officer (CIO) of an ASX100 manufacturing company. In a very traditional and process-driven organisation she wants her team to lead cross-functional innovation initiatives and has introduced hack days to initiate a spark. She was concerned about how to keep momentum after a hack day. So she approached me to draw on my experience of helping Global 500 companies create innovation ecosystems.

Read the whole story
4 days ago
Sydney, Australia
Share this story
Next Page of Stories