Iterative, the company dedicated to streamlining the workflow of artificial intelligence (AI) engineers and creator of widely-used open-source projects in MLOps, announced the upcoming release of DataChain, a new open-source tool for processing and evaluating unstructured data.
According to McKinsey’s Global Survey on the state of AI published in early 2024, only 15 percent of surveyed companies have realized a meaningful effect of generative AI (GenAI) on their business to date. A large part of the problem lies in the challenge of processing unstructured data at scale and estimating the results which is traditionally cumbersome – and stems from the missing link between the structured data technologies and the newer AI workflows based in Python. While the (older) analytical databases provided full control over the data quality, unstructured multimodal data like text and images proved much harder to assess and improve at scale.
Also Read: Riveron Acquires Yantra
“The biggest challenge in adopting artificial intelligence in the enterprise today is the lack of practices and tools for data curation and generative AI evaluation that can ensure the quality of results,” said Dmitry Petrov, CEO of Iterative. “As the next step, we need AI models that can evaluate and improve AI models. So far this has only happened at the industry forefront – take a look at DeepMind’s AlphaGo training against itself, or OpenAI’s DALL-E3 curating its own dataset. Our goal is to change this.”
The proliferation of sophisticated AI foundational models opens the door to intelligent curation and data processing. However, the absence of easy solutions to wrangle unstructured data using AI models in easy-to-manage formats keeps the technology barrier high. In practice, most AI engineers are still building custom code for converting their JSON model responses, adapting them to databases, and running models in parallel with out-of-memory data.
DataChain democratizes the popular AI-based analytical capabilities like ‘large language models (LLMs) judging LLMs’ and multimodal GenAI evaluations, greatly leveling the playing field for data curation and pre-processing. DataChain can also store and structure Python object responses using the latest data model schemas – such as those utilized by leading LLM and AI foundational model providers.
Source: GlobeNewsWire

