Weka raises $140M as the AI boom bolsters data platforms
While an increasing number of companies are investing in AI, many are struggling to get AI-powered projects into production -- much less delivering meaningful ROI.
The challenges are many. But a commonly recurring one is data management. The data that companies need to train, run and fine-tune AI models is disorganized, siloed, and otherwise unoptimized. In a 2022 survey by Great Expectations, an open source data benchmarking platform, 77% of organizations said they were concerned about their data quality.
Startups that promise to fix these data problems are raking in funding.
On Wednesday, Weka, a platform for building data pipelines that handle a range of data sources, types, and sizes, announced that it raised $140 million in a two-part ($100 million and $40 million) Series E round led by Valor Equity Partners, with participation from Nvidia, Norwest Venture Partners, Micron Ventures, Qualcomm Ventures, Hitachi Ventures and others. The oversubscribed round values Weka at $1.6 billion post-money, double the company's previous valuation.
Weka CEO Liran Zvibel and Weka's other co-founders, Maor Ben-Dayan and Omri Palmon, met while building the data storage startup XIV, which IBM acquired for $350 million in 2007. The trio stayed on with IBM for a number of years but eventually left to pursue other, independent ventures.
The problem of managing data continued to gnaw at Zvibel, though, he says.
"I was frustrated and disillusioned with seeing customers forced to use disparate, siloed data infrastructure solutions that were wasteful, costly and complex to deploy, manage and maintain," he said. "The problem became especially apparent with the rise of cloud computing and the advent of high-performance computing, machine learning and the earliest AI workloads."
So in 2013, Zvibel recruited Ben-Dayan and Palmon to build a new set of data tools -- a set that might bring about a better approach to storing, managing and moving data.
"We envisioned a platform powerful enough to support the performance demands of next-generation compute hardware and large-scale, data-intensive workloads in the demanding and distributed environments," Zvibel. "To meet the needs of modern workloads, we knew it would need to be able to process tens of terabytes of data and be deployed anywhere."
Weka's core offering is a parallel file system, a type of distributed file system that can spread and orchestrate data tasks (e.g. copying files) across multiple places (e.g. servers and workstations) at once. On top of this, Weka sells services and capabilities to support AI and machine learning, visual effects and high-performance compute workloads in environments spanning on-premises data centers, public clouds and hybrid clouds.
Zvibel claims that one key advantage of Weka's architecture is that it can speed up AI model training by reducing the amount of time it takes to copy data across storage locations. "A typical generative AI data pipeline includes multiple steps of copying data sets, which wastes vital training time," he said. "Weka keeps model training hardware constantly fed with data so models can be trained quicker."
Weka competes with data platforms like DataDirect, Pure Storage, NetApp and Vast Data. Vast is among the more formidable of the bunch, having closed a $118 million Series E funding round in December 2023 that tripled the startup's valuation to $9.1 billion.
But Weka appears to be holding its own, with a customer base of over 300 brands including AI startup Stability AI, 11 of the Fortune 50, and several undisclosed domestic and foreign government agencies.
Even with its relatively large headcount (~400 employees worldwide, with plans to grow that number 25% in the next year), Zvibel said that the Silicon Valley-based company now has a "line of sight" to cash flow positivity by December 2024.
"The latest raise was calculated based on favorable market conditions and proactive investor interest, which enabled us to raise at extremely favorable and advantageous terms for Weka," he added. "Our average burn rate is expected to be less than half a million per month before reaching that milestone. We've exceeded $100 million in annual recurring revenue and are maintaining a hyper-growth trajectory."