Think China’s data is an unbeatable AI advantage? A new report says otherwise

19-Jul-2019 Intellasia | South China Morning Post | 6:02 AM Print This Post

In this dawning age of artificial intelligence, data is the new oil and China is the new OPEC.

But a new report released on Tuesday suggests that the staggering amount of data generated by China’s 1.4 billion population may not be as big an advantage in the global AI competition as it was thought to be.

The report by MacroPolo, the in-house think tank at the Paulson Institute in Chicago, argues that data is not a single-dimensional resource for AI and despite China’s formidable data reserves, the US still holds advantages in data quality and data diversity.

“Many assume the size of China’s population gives it an advantage in the volume of data, but this is actually misleading,” said Sheehan, a San Francisco-based fellow at MacroPolo who wrote the paper.

“The relationship between data and AI prowess is analogous to the relationship between labour and the economy. China may have an abundance of workers, but the quality, structure, and mobility of that labour force is just as important to economic development,” Sheehan added.

The research comes as the US and China, competing on so many economic and cultural fronts, are locked in a rivalry over AI technologies.

In 2017, China’s State Council issued a three-step plan to make the country a global leader in AI by 2030. Just this February, US President Donald Trump issued an executive order intended to maintain America’s global AI leadership, directing government agencies to prioritise artificial intelligence in their research and development spending.

To a significant degree, the AI race is a competition for data. From facial recognition to autonomous cars to machine translation, most AI applications are possible only after machines study a huge amount of data and find hidden patterns between inputs and outcomes. Only then could a machine start to learn how to master human skills.

Thus, data today is regarded by many technologists as an important, if not vital, strategic resource for the AI economy.

But in his MacroPolo paper, Sheehan broke down data into five different dimensions: quantity, depth, quality, diversity and access. The paper, more of a framework than quantitative research, found that China and the US are tied in the quantity of their data. China holds advantages in terms of data depth and access, while the US has superior data quality and diversity.

More than 800 million Chinese have connected to the internet, generating abundant data concerning a wide array of online activities, from grocery shopping to buying wealth-management products to booking a table at a restaurant.

But most internet service providers in China still largely focus on their domestic market while Silicon Valley companies have more global reach. Users of Google and Facebook represent a far greater range of languages, ethnicities, cultures and nationalities than those of WeChat, China’s No 1 social messaging tool, whose 1 billion users are almost all Chinese.

As a result, for example, an AI-operated facial recognition programme may have difficulty identifying people other than Chinese if all the data it has studied is exclusively of Chinese faces.

MacroPolo is not alone in rethinking China’s so-called data advantage. “Access to the most data in and of itself is not the most important factor in AI development,” said Samm Sacks, a cybersecurity policy and China digital economy fellow at the Washington-based think tank New America.

“This myth has been fuelling misconception about a so-called China data advantage, while helping to bolster arguments against privacy regulation in the US. Technologists are increasingly considering how other factors like computing power, talent, the math involved, and the kind of data available may be just as valuable,” she said.

In a June paper Sacks co-wrote titled “The Myth of China’s Big AI Advantage”, she said that “the analogy of data as the new oil is seriously flawed… The risk is that policymakers could shy away from potential privacy legislation out of fear that putting checks on access to data will disadvantage the US”.


Category: China

Print This Post