AI Used to Purchase and Sell Fake, Low-Quality Data Causes Mess in Big Data Market
Yin Yi
/SOURCE : Yicai
AI Used to Purchase and Sell Fake, Low-Quality Data Causes Mess in Big Data Market

(Yicai Global) April 18 -- The Big Data industry seems to be unaffected by the 'capital winter' in which investors have become scarce and companies cash-strapped as the market deteriorates, insiders told Yicai Global.

However, only a few companies have successfully raised funds in the capital market. Firms that use artificial intelligence and have hundreds of millions of bytes of data usually find it very difficult to attract investors.

Seven companies in the big data industry -- including four overseas and three domestic -- have secured financing since the start of this month, raising USD160 million (CNY1.1 billion), reported. Some 22 big data-related firms obtained financing in last year's first half alone in rounds ranging from angel to third round.

As investors rush into Big Data, some startups have become over-valued. A company's valuation will multiply several times if it is Big Data-related, or so it seems.

Artificial intelligence is a good 'story'

All Big Data companies that have completed fundraising of late mentioned artificial intelligence in their marketing materials, Yicai Global has learned. In addition to Big Data companies, some credit firms and Internet finance players also say they use artificial intelligence to identify credit risks or combat fraud.

"I don't think it's necessary to describe artificial intelligence as something unattainable. Over the past decade, led by the Big Data industry, technologies such as deep learning and natural language processing have developed rapidly, laying a solid foundation for the current exponential growth of artificial intelligence. We should work hard to turn such technologies into products and explore data resources to help our corporate clients solve risk-control issues more efficiently and at lower cost," said Zhao Jie, chief executive of Shanghai Siruide Information Technology Co.

"Algorithm models such as multivariate neural networks have long been very mature multivariate statistical methods that have long applied in physics, and the mechanical and industrial fields. They remained little-known among the public until the recent two years when the Big Data market boomed, bringing such technologies into public view," Zhao told reporters.

Stories of artificial intelligence often include claims that 'team members are international high-end talents.' Mastering foreign advanced algorithm technology is just one factor, Zhao believes. Some technologies that apply in foreign scenarios may not offer solutions suitable for the domestic market without being adjusted to adapt to China's actual circumstances.

Data v. algorithms

"Better data beats better algorithms. A set of reliable data is preferable to a powerful algorithm model," Dr. Liao Chenhan, chief scientist of Prism Data Research Institute, told Yicai Global.

"In fact, artificial intelligence modeling only accounts for 30 percent of the total efforts spent on solving practical problems. Some 70 percent is spent on acquiring and processing data. To use artificial intelligence technology to improve risk control and modeling, we first need to rely on automation, i.e. acquiring and sorting out data by using human thinking and methods, and then apply machine learning algorithms to correlate all this information," Liao said.

High-quality data is fundamental for large data companies. "Another problem with the Big Data market is that despite companies' claims of having hundreds of millions of bytes of data, their quality is often neglected," Zhao said.

The Big Data market has conveyed too many negative impressions over the past two years as very few companies have high-quality data and the real ability to analyze data and develop products, an insider from the credit industry told Yicai Global. Many so-called Big Data companies on the market make money by buying and selling data, which mostly comes from the black market.

Data trafficking

"Some data traffickers, who lack any data processing ability, but have connections and thus access to data sources, directly sell unprocessed raw data to earn profits. Since such data are frequently updated with new data coming in and old expiring, they can produce a whole new set of databases by just effecting a few modifications and keep playing their tricks to profit from the price difference," the insider said.

"Of the data sold, some is legal, some is illegal. People will buy online consumer data, online banking data, POS machine data, credit card data, operator data, or even industrial and commercial data. Other than some enterprises that package and sell some data, some people within enterprises collude with outsiders to sell data; even people in Baidu, Alibaba and Tencent (BAT) sell data."

Fake data

Our reporter also learned that many unauthorized data sales are dirty; since data is usually sold by quantity, to increase the amount, usually only 30% of the data is genuine, while 70% are inflated or fake. "If the base data are all fake or inaccurate, even a great and advanced analytical model cannot arrive at the correct results," said Zhao Jie.

Enterprises that have advantages working in data must be those who control traffic, said Zhang Ke, Maxent Anti-Fraud chief executive in an interview with Yicai Global. This is because the source of so-called online data relies on online traffic: no traffic means no data sources. The reliance on buying and selling data thus certainly does not work, and is just a superficial shell.

Follow Yicai Global on
Keywords: AI , Big Data , Fraud