The Risks of Empowering “Citizen Data Scientists”

Until recently, the prevailing understanding of artificial intelligence (AI) and its subset machine learning (ML) was that expert data scientists and AI engineers were the only people that could push AI strategy and implementation forward. That was a reasonable view. After all, data science generally, and AI in particular, is a technical field requiring, among other things, expertise that requires many years of education and training to obtain.

Fast forward to today, however, and the conventional wisdom is rapidly changing. The advent of “auto-ML” — software that provides methods and processes for creating machine learning code — has led to calls to “democratize” data science and AI. The idea is that these tools enable organizations to invite and leverage non-data scientists — say, domain data experts, team members very familiar with the business processes, or heads of various business units — to propel their AI efforts.

In theory, making data science and AI more accessible to non-data scientists (including technologists who are not data scientists) can make a lot of business sense. Centralized and siloed data science units can fail to appreciate the vast array of data the organization has and the business problems that it can solve, particularly with multinational organizations with hundreds or thousands of business units distributed across several continents. Moreover, those in the weeds of business units know the data they have, the problems they’re trying to solve, and can, with training, see how that data can be leveraged to solve those problems. The opportunities are significant.

In short, with great business insight, augmented with auto-ML, can come great analytic responsibility. At the same time, we cannot forget that data science and AI are, in fact, very difficult, and there’s a very long journey from having data to solving a problem. In this article, we’ll lay out the pros and cons of integrating citizen data scientists into your AI strategy and suggest methods for optimizing success and minimizing risks.

The Risks of Democratizing AI in Your Organization

Putting your AI strategy in the hands of novices comes with at least three risks.

First, auto-ML does not solve for gaps in expertise, training, and experience, thus increasing the probability of failure. When used by trained data scientists, auto-ML tools can help a great deal with efficiency, e.g. by writing code quickly that a data scientist can validate. But there are all sorts of ways an AI can go technically or functionally sideways, and non-data scientists with auto-ML may run straight into those pitfalls.

For instance, one of the issues in ensuring a successful AI project is the ability to appropriately handle unbalanced training data sets. A data set of transactions that contain few instances of suspicious transactions — let’s say 1% — must be sampled very carefully for it to be usable as training data. Auto-ML, however, is an efficiency tool. It cannot tell you how to solve for that problem by, for instance, subsampling, oversampling, or tailoring the data sampling given domain knowledge. Furthermore, this is not something your director of marketing knows how to handle. Instead, it sits squarely in the expertise of the experienced data scientist.

Other risks of failure in this area also loom large, particularly those that pertain to creating a model that is ultimately useless. For instance, the model is built with inputs that aren’t available at run time, or the model overfits or underfits the data, or the model was tested against the wrong benchmark. And so on.

Second, AI infamously courts various ethical, reputational, regulatory, and legal risks with which AI experts, let alone AI novices, are not familiar. What’s more, even if they are aware of those risks, the AI novice will certainly not know how to identify those risks and devise appropriate risk-mitigation strategies and tactics. In other words, citizen data scientists will increase these risks, and brands are putting their reputations in the hands of amateurs with potentially serious implications on the organization’s clients, customers, and partners.

Moreover, the guardrails companies have built to mitigate this risk were built with traditional data scientists in mind. While many organizations are creating AI ethical risk or “Responsible AI” governance structures, processes, and policies — and others will soon join suit in response to new regulations in the European Union (The EU AI Act) and Canada (The AI and Data Act) roll out in the coming years — they’ll need to extend that governance to include AI created by non-data scientists. Given that spotting these risks takes not only technical expertise but also ethical, reputational, and regulatory expertise, this is no easy feat.

Third, related to both of the above, having AI novices spend time developing AI can lead to wasted efforts and internal resources on projects better left on the cutting room floor. And potentially worse than that, faulty models that get used may lead to significant unforeseen negative impacts.

How to Prepare Your Organization for Democratized AI

All AI should be vetted for technical, ethical, reputational, regulatory, and legal risks before going to production, without exception. While citizen data scientist-created models carry more risks, that doesn’t mean that the auto-ML approach cannot work. Rather, for those organizations that determine it is an effective part of their AI strategy, the key is to create, maintain, and scale appropriate oversight and guidance. Here are five things those organizations can do to increase the likelihood of success.

Provide ongoing education.

Published best practices and guidelines enable citizen data scientist to find answers to their questions and continue to learn. For instance, there are best practices that pertain to the issues referenced above: unbalanced data sets, over and underfitting models, etc. Those best practices should be readily available internally and searchable by anyone and everyone building a model. This can be delivered in various forms, including an internal wiki or similar application.

Provide visibility into similar use cases within the organization.

One of the most powerful educational tools you can provide to your non-data scientists is examples or case studies they can use as templates for their own projects. In fact, those other projects may have resources that the team can use, e.g., NLP models that are plug and play, a model methodology used to solve a problem, and so on. This has the added benefit of speeding up time-to-value and avoiding the duplication of work and thus a waste of resources. In fact, more and more companies are investing in inventory tools to search and reuse various AI assets, including models, features, and novel machine learning methods (e.g., a specific type of clustering method).

Create an expert mentor program for AI novices.

This should be tailored to the project so that it provides problem-specific guidance. This also includes the ability to get an AI idea vetted by an expert early on in the project discovery phase, so as to avoid common pitfalls or unrealistic expectations for what AI can provide. Perhaps most important here is determining whether the data the organization or business unit has is sufficient for training an effective and relevant model. If not, a mentor can help determine how difficult it would be to acquire the needed data from either another business unit (that may store data in a way that makes it difficult to extract and use) or from a third party.

Ideally, mentors are involved throughout the AI product lifecycle, from the concept phase all the way through to model maintenance. At earlier stages, mentors can help teams avoid significant pitfalls and ensure a robust roadmap is developed. In later stages, they can play a more tactical role, like when the team needs guidance with a deployed model that isn’t performing as well as anticipated. Indeed, this function can also be very useful for experienced data scientists. Novice and expert data scientists alike can benefit from having an expert sounding board. It’s important to stress here that potentially two kinds of mentors are needed: one to solve for technical and business risks, the other to ensure compliance with the AI ethics or a Responsible AI program.

Verify all projects by experts before AI is put in production.

Mentorship can play a crucial role, but at the end of the day, all models, and the solutions in which they’re embedded, need to be assessed and approved for deployment by experts. Ideally this should be performed by two distinct review boards. One board should be comprised of technologists. The other board should also include technologists, but should primarily consist of people from risk, compliance, legal, and ethics.

Provide resources for education and inspiration outside your organization.

Any group in any organization can suffer from group think or simply a lack of imagination. One powerful way out of that is to encourage and provide the resources for everyone who builds AI models to attend AI conferences and summits, where the creativity using AI across all industries and business units is on full display. They may see a solution they want to procure, but more importantly, they may see a solution that inspires them to create something similar internally.

. . .

AI is in its infancy. Organizations are continuously trying to determine how and whether to use AI, particularly against a backdrop of doubting its trustworthiness. Whether you trust AI novices with your AI strategy or not, following these steps will ensure a disciplined approach to AI, will maximize the benefits that AI can bring, and will minimize potential risks. Put simply, following these five steps should be a part of basic AI hygiene. To democratize or not to democratize AI is up to you.