Data lies at the heart of decision-making. Its impact on healthcare, finance, technology, Artificial Intelligence (AI) and every major industry has only increased with time. While data and its significance going forward may have been debated more in the recent past than ever before, it’s important to note that this isn’t a new revelation that has gained global relevance overnight – data has been the lifeline of mankind’s survival and evolution for as long as our species has survived. From texts recovered from Uruk, dated 3100-3350 BC, that point to sophisticated records and bookkeeping arising out of economic necessity to modern-day health and fitness trackers. As we have evolved, we have increasingly documented our lives and environment to improve livelihoods and society.
Now, technological evolution is bringing along its own set of challenges. First, the amount of data we collect today has increased exponentially. 90% of all data that exists presently was created in the last two years. Globally, 2.5 Exabyte’s of data is created every day – that’s roughly 2.5bn GB of data. Furthermore, 80% of this data is unstructured. Finally, due to the dynamic nature of algorithms and AI, the speed at which we collect and analyze this data has amplified exponentially. As a simple example, search for something on Amazon and the next instant you are bound to see a Facebook/Instagram ad for the same product following you.
Very few organizations control almost all our data. What we buy, where we dine, who we speak to, our search history, the content we consume, our financials, real-time location, the list only gets longer and more frightening. The recent Facebook and Google data breaches provide crucial perspective on the need for proactive cyber-security, and how even the largest, most advanced technology companies can (intentionally or not) fall considerably short. Such data scandals underscore the need for the three stakeholders of society – government, businesses, and the people – to work together to build trust if we are to realize the full benefits of the data revolution and minimize the risks. There are four key areas where such collaboration would be especially vital, at this particular inflection point –
1. A multi-model approach to data
Creativity and innovation are usually born out of non-conventional approaches. Why should our approach to data be any different? We need to collect, analyze and apply data from multiple non-conventional sources. As an intriguing example, ants move together in large organized groups, at constant speeds without crashing into each other at any point and rarely stopping. Data scientists are studying this behavior, termed swarm intelligence, and converting it to algorithms to potentially apply to self-driving technology.
As technology and humans progress, tapping non-conventional sources will prove to be the fastest way to creative innovation. It’s truly fascinating; it’s also easier said than done! Especially given the innumerable variables, points of information and permutations and combinations one could find in nature. This is where functional AI and big data analytics could aid our efforts. Deep learning algorithms are capable of being trained to find indicative links and applications between varied and diverse sources such as self-driving technology and ants!
2. The rise of collaborative data
Research has repeatedly proven that collaboration is the backbone of innovation and progress. This holds true especially when it comes to data. The synergistic cross flow of data between companies, countries, and users, has and will continue to lead to the most efficient leaps in product and service innovations. History and the tech industry offer many examples but perhaps few as significant as Google. Google is synonymous with search. It is today the most efficient search engine because of the data it has collated, filtered, analyzed & used across its product lineup over the last two decades and continues to do so, globally.
Similarly, much of the progress made in financial technology, healthcare, manufacturing, e-commerce, and other major fields has been due to the free flow of data. While governments debate copious data laws, they have themselves been beneficiaries. As one of many examples, algorithms designed and employed by Fin-Tech firms are key contributors to helping governments track financial fraud by internally analyzing data and spending patterns.
In an interview at Davos last year, Marc Benioff, CEO of Salesforce, made a critical point, more relevant now than ever before – ‘the need to build purpose-driven companies where the focus is on a culture of building trust with the customer, not just products and profits.’
This naturally brings us to the elephant in the room (or on the page) – privacy. While these are bold, ambitious ideas requiring state-of-the-art technologies, they bring with them challenges to ethics, user safety, data privacy, and data utilization.
3. Data Privacy
Open the tech section of a publication and one is likely to come across news of hacks and cyber-attacks. The frequency is alarming. Safeguarding our society against such bad actors is imperative. However, often these attacks and leakages occur not because of data sharing but despite it. A classic example is of the Indian citizen database, Aadhaar, established by and captive with the government that recently got leaked for less than 10 US dollars. Or the recent Google Plus data leak due to a bug in the developer code that left user profiles vulnerable to attack. The nature of cyber-threats is continually evolving. They can’t be avoided, but they can be controlled. To do so, organizations need to take a dynamic, proactive approach to cyber-security with a focus on not just prevention but early detection and action.
Taking cue from Clive Humby, refining data would be equivalent to analyzing it and putting it to proper, safe use. This requires complex, advanced tools, skillsets and algorithms that entail a significant learning curve, built and perfected by businesses over long periods, and not necessarily employed widely. If data is the new oil, ensuring we protect critical data is crucial. However, we have to do so ensuring we do not impede trade and globalization efforts. Building strict, enforceable laws is a much needed meaningful start. Europe’s General Data Protection Regulation (GDPR) is proof of maintaining balance while walking this tight, ethical rope. It’s energizing to see tech CEO’s like Marc Benioff call for greater government regulation to the tech industry and Google CEO Sundar Pichai being open to a GDPR-like policy for the US.
Tech is the future, but the policy governing tech will determine what this future looks like. As a society, it’s important we draw a line between protectionist laws and strict safety regulations and ensure that the former doesn’t replace the latter. It is, therefore, crucial to understand that utilizing data for innovation while ensuring user privacy are not mutually exclusive goals and appreciate that one doesn’t have to be achieved at the cost of the other.
4. Data utilization
Among the most surprising facts is that less than 0.5% of all data mined is used and analyzed. The government and industry are only just scratching the surface with the data they’re utilizing. This presents us with significant challenges but also game-changing opportunities. First, data redundancy – while data is routinely used to draw on patterns, most data becomes redundant within a short period due to constantly changing demographics, technologies, and economics. It is vital, therefore, for data scientists to separate the signal from the noise and not just build tools to efficiently analyze big data but focus on identifying strong data. The Financial Times did a fascinating piece last month on Chinese firm Ant Financial, an affiliate of Alibaba, which runs the Sesame credit system, citing that strong data was more pertinent than big data when evaluating hundreds of data points across millions of consumers.
Second, statistics has a thumb rule – the wider the sample size, the more accurate the finding. This is particularly true for data analytics. From straightforward Netflix recommendations to complex healthcare devices that track patient health the world over, a majority of the data we mine needs to be put to efficient use to experience significant leaps in functional AI capabilities. It is essential that this is achieved at the intersection of active citizenship and technology, and building participatory regimes where information is only shared and utilized post appropriate consent from users.
In its recent Future of Jobs report, the World Economic Forum used an intriguing term – ‘augmentation strategy’ – in reference to governments adopting automation to complement and aid human skill-sets. An augmentation strategy is what we require for revolutionary data reform – a transparent data governance structure, enabled by users and primary data providers, for use by businesses to develop and perfect products and services. A more proactive, empathetic approach to trust and collaborate on the four points above would be a good starting point.
Michael Bailey, the Digital Media Director at Google, put it succinctly, “If there’s one thing that’s important to remember when we talk about an automated future, it’s this: machines are only as good as the data we feed into them.”