Tuesday, June 30, 2026

Zilliz Introduces Loon: A Lake-Native Storage Engine Eliminating Vector Data Redundancy Across AI Workloads

Related stories

Zilliz, a prominent player in AI data infrastructure and the engineering force behind Milvus, has unveiled Loon a novel storage engine integrated into Milvus 3.0 and engineered to drive the Zilliz Vector Lakebase. Operating as a lake-native foundation, Loon empowers engineering teams to leverage a single, unified copy of vector data to simultaneously manage real-time searches, massive-scale data discovery, and deep batch analytics. This architectural innovation marks the backbone of Zilliz Cloud’s transition from a standalone vector database into a comprehensive, unified data ecosystem tailored for enterprise AI.

Solving the Vector Database Dilemma

“Vector Lakebase is our answer to what happens after vector databases succeed”

The design principle of the Vector Lakebase is centered on a core premise: a single logical instance of vector data should fully accommodate every stage of the enterprise AI lifecycle including live production queries, discovery exploration, and analytical workflows without requiring complex data transfers or system replication. Handling this issue at the storage layer poses considerable technical challenges because the architectural design has to account for the presence of two conflicting traits. The architecture has to provide low-latency row access and also perform full scans for analytical purposes while using efficient object storage. Moreover, the architecture has to be able to handle fluid data sets that are constantly reindexed and relabeled as the machine learning models evolve.

“Vector retrieval is no longer the whole problem; Vector Lakebase is our answer to what happens after vector databases succeed,” said James Luan, Cofounder and CTO of Zilliz. “The systems that win will make continuous serving and continuous discovery feel like part of the same machine and that only works when the storage layer can serve a single copy of data to every workload. Loon is that storage layer.”

Also Read: FPT Strengthens Strategic Alliance with Microsoft to Accelerate AI Innovation Across Asia

Architectural Innovations Driving AI Data Evolution

To overcome traditional architectural bottlenecks, Loon approaches vector datasets as inherently heterogeneous structures. The storage engine is engineered around three foundational pillars:

  • Hybrid File Formats: Data columns are assigned optimized file formats based on their individual traits. Metadata and scalar fields are stored in Parquet to ensure highly efficient scans, while sparse and dense vectors leverage the open Vortex format to execute precise, byte-level row reads directly on object storage. Unstructured source data like PDFs, imagery, and raw video remain natively in object storage via references rather than undergoing duplicate storage cycles.
  • Row ID Alignment: Even when split across distinct formats, individual columns function as a singular logical table. This design permits teams to deploy updated embedding models into newly dedicated columns without triggering costly rewrites of existing metadata, captions, or pre-established vectors.
  • A Versioned Manifest: Serving as the definitive single source of truth, a centralized manifest catalogs the operational state of the dataset including indexes, statistics, active files, and delete logs. This allows external engines like Ray and Spark, along with on-demand compute clusters, to safely interact with and update the same data layer without replicating resources.

Performance Benchmarks and Operational Value

Internal evaluations conducted by Zilliz highlight Loon’s operational efficiency. When deployed on object storage, the Vortex-based layout reduced the volume of data transferred per record query by an estimated 135 times compared to conventional Parquet structures. This performance gap makes low-latency production serving feasible on highly economical object storage tiers. Additionally, because evolving data is modified directly in place, updating an embedding model functions as a simple metadata version change rather than a massive data overhaul.

This structural foundation eliminates the need for redundant ETL pipelines. Live production clusters maintain stable query times, while decoupled compute resources manage analytical workloads independently. Furthermore, External Collections can index data that remains inside a client’s native Google Cloud Storage or Amazon S3 buckets.

Availability

Loon has been integrated in Milvus 3.0 and operates as the default storage tier for Zilliz Vector Lakebase hosted on Zilliz Cloud. It is available in over 30 locations globally, spread out in Microsoft Azure, Google Cloud, and AWS, providing Dedicated, Serverless, and BYOC (Bring Your Own Cloud) architecture options. Those enterprise-level customers looking to integrate online query, offline modeling, and data lake solutions may consider signing up for a starter account, which provides free $100 credits for new business email registrations, or contact the Zilliz engineering team for consultation.

Subscribe

- Never miss a story with notifications


    Latest stories