Frequently heard Customer Challenges about Deploying AI in Production
1. Business Observability: “Have deployed some ML – but, don’t have any idea what’s going on across the business, suppliers, downstream – let alone know where exactly to improve?”. Viewing business metrics, such as risks, across the value chain is a non-starter
2. Existing Data Platform Mess: “Our Data infrastructure is a mess of tools and things that don’t work together - let us fix that first”. Every customer has the same problem – too many tools, too little governance, incompatibility, which all will go away magically, with the next generation architecture.
3. Industry-Specific AI Needs: “Our industry (read retail, mfg, etc.) business is different – standard AI/ML just does not address our issues – in any case, we seem to need multiple different kinds of AI apps”. Every firm’s industry (retail, mfg, etc.) apps need different guardrails, heuristics, and most importantly different consolidated (aggregated) learning models to make things work.
4. Data Needs: “Do we have the right data & enough of it – my team keeps telling me, we don’t have the data we need”. 90% of customers we have dealt with do not have all the data they need – often they simply need more data, sometimes different sources of data, and often more relevant/meaningful data.
5. Time to Production: “Have a few models and engineers to put AI apps into production – but, it still takes months to get it in place”. Even with SOTA tools, MLOPs & deployment is a labyrinth of vast numbers of analytics engineering tasks; implication deployment delays & production hassles.
6. Brittleness: “Every time some small update is made – things break and fixes take weeks”. Brittleness comes in various shapes – complex overlapping data pipelines, new re-training requirements, changing business metrics for guardrails and of course the fragmentation of the MLOPs platform component add-ons, are only some of the issues.
7. Safe, Reliable: “Can our business depend upon these AI apps – what if our customer demographic changes, will it be robust enough, will it meet these new regulatory guidelines”. Reliability has many dimensions – of which correctness is only one; fairness, equitable results, robustness against adversarial data, and customer data safety are all important issues, as AI apps start becoming the foundation on which modern businesses run. Is our AI Business Dependable?
8. Deployment Spaghetti: “Our AI deployment needs are so varied and complex, we always need specialized help from external consultants”. From real-time vs streaming vs batch vs edge needs, remediation of data & model drifts, scale along with multi-model complexity, and re-training vagaries when data distributions change – all add up, in terms of costs, complexity, and customer helplessness.
9. Complexity: “We have 40 AI models in production with another 15 that will go live in the next 6 months – ML & data team needs are exploding – how do we manage the complexities”. Unfortunately, configurable aspects of most multi-app ai deployments are many – monitoring estimators, re-training schedules, data pipeline structures, reverse-etl definitions, artifact store(s) configurations, privacy setup, and re-trainable model sources are just a few. Way too many moving parts!!
10. Multiple & Different Business Units: “Have several business units– how will they work with each other – they have fairly different datasets & AI needs”. For the same app, variations across business units in terms of datasets, metrics, and AI app structure almost always mean creeping scope woes.
11. Cloud Costs: “Always surprised by my monthly cloud bills – ML will make things worse”. Cloud service inflation is a reality – IT support does not know which service uses what resources, thus they hesitate to turn anything off; which implies out-of-control cloud costs. Does not help that federated deployment patterns including data mesh often increase cloud resource use.
…and these don’t even begin to address issues of foundation-model-ops that are beginning to emerge.
Reach out to Node.Digital to learn more about our MLOPs solution to reduce your "AI Production Gap