Amazon lately experienced how an algorithm developed to solve a specific problem ended up worsening it. In its pursuit of building a recruitment tool based on machine learning techniques, it created a tool churning out unsuitable candidates. Furthermore, the AI tool showed a major bias against women. How come? Sandra Wachter, AI researcher at Oxford University, states it most likely was a result of male resumes dominating the historical data that was used to develop the algorithm. ‘From a technical perspective it’s not very surprising, it’s what we call ‘garbage in and garbage out’, Wachter mentioned.
In other words, the problem boils down to the data Amazon fed into its algorithm. Discussing the matter, Reuters points towards the company’s current diversity. They observe that 74 percent of managerial roles are currently being held by men. Wachter concludes that the data is therefore probably biased and has resulted in a subsequently biased recruitment tool.
Therefore, we would like to argue that the quality and nature of the data used for AI applications is at least as, if not more, important as the algorithms generating its output. So, what does this mean? It means that an underestimated challenge of AI is guarding the quality of input factors. This is particularly relevant when using previously labeled data in the case of supervised learning. Whereas companies have been focusing on developing algorithms to perfection, a lack of well-curated data results in faulty or biased outcomes. This is the case regardless of how well the algorithm was developed.
When it comes to data, quantity helps with quality. China’s population of 1.3 billion people generates an astonishing quantity of data that - due to less restrictive policies - to a great extent can be used to further the development of AI implementations. Kai-Fu Lee, venture capitalist and author of ‘AI Superpowers: China, Silicon Valley, and the New World Order’, argues that China’s large amount of data will be the decisive factor in the ongoing AI race with the United States. According to Lee, China’s data edge is triple that of the US based on mobile users ratio, 10 times in food delivery, 50 times in mobile payment, and 300 times in shared bicycle rides.
We know that China’s internet population is rapidly growing. For an ecommerce company, finding out what makes those online consumers tick can be of great value. The sheer amount of data available in China is all that ecommerce companies would need to create the perfect algorithm, right? A variety of data on 200 million consumers is then already a great place to start to analyse the market. However, if those 200 million are Chinese between the age of 20 and 30 years old, to what extent are you actually able to develop meaningful insights? In other words, the diversity and representativeness of data is also of great importance to the development of every algorithm.
These issues pose a great challenge for AI companies throughout the world, especially in China. Whereas its abundance of data can serve as a competitive advantage, it makes it also increasingly difficult to clean and structure data in such a way that biased algorithms or other erroneous outcomes are being prevented. It is up to us to overcome the challenges posed by this massive amount of data. At Dashmote, we specialize in bridging data and business in the field of Marketing Intelligence. In doing so, cleaning and structuring massive amounts of data is at the core of our internal processes.
Dashmote is an AI technology scale-up headquartered in Amsterdam, The Netherlands. With the goal of bridging the gap between images and data, we are working to bring AI-based solutions to marketers at clients like Heineken, Unilever, Philips, L’Oreal, and Coca-Cola. We add value in areas such as Location Analysis, Trends Analysis, and Marketing Intelligence. Doing so on a global level, our company today has offices in Amsterdam, Shanghai, Vienna, and New York. Interested in joining our global journey? Check out our opportunities!