Microsoft unveiled the preview release of its new data ingestion building blocks for the .NET platform, designed to empower developers building intelligent and context-aware AI applications. Developed as part of Microsoft’s continual investment in simplifying AI pipelines, this offering addresses a growing challenge faced by .NET developers: efficiently ingesting, transforming, and retrieving data from multiple sources to support high-quality AI experiences.
“When building AI applications, context is key. AI models have a knowledge cutoff and do not have access to your personal or company data by default,” stated Luis Quintanilla, Program Manager, and Adam Sitnik, Senior Software Engineer, in the official announcement. “To generate high-quality answers, AI apps need two things: access to high-quality data and the ability to surface the right information at the right time.” These dual requirements form the basis of Microsoft’s new strategy to provide modular, extensible tools for modern AI development.
The newly introduced Microsoft.Extensions.DataIngestion library offers foundational building blocks that enable developers to seamlessly create composable and scalable data ingestion pipelines. Designed for AI and machine learning scenarios including Retrieval-Augmented Generation (RAG the platform simplifies how developers handle the Extract, Transform, Load (ETL) processes across a wide variety of content formats and data sources.
Key features of the preview release include:
- Unified Document Representation: Standardizes representation of content from PDFs, Word documents, images, and more for optimal preparation with large language models.
- Flexible Data Ingestion Tools: Supports data ingestion from cloud-based and local sources via built-in readers, minimizing manual integration work.
- AI-Powered Enhancements: Includes built-in enrichment capabilities such as summarization, sentiment analysis, keyword extraction, and classification.
- Customizable Chunking Strategies: Offers token-based, section-based, and semantic-aware chunking to optimize retrieval readiness and AI inputs.
- Production-Ready Storage: Simplifies storage of processed data in widely used vector databases and document stores, with embedding support for RAG workflows.
- End-to-End Pipeline Composition: Allows developers to chain together readers, processors, chunkers, and writers through the IngestionPipeline API, streamlining application development.
- Enterprise-Grade Performance and Scalability: Built to efficiently process large volumes of data suitable for enterprise deployments within the .NET ecosystem.
Also Read: Amazon EMR and AWS Glue Add Audit-Context Logging for AWS Lake Formation
The framework relies on current .NET components, such as Microsoft.ML.Tokenizers, Microsoft.Extensions.AI, and Microsoft.Extensions.VectorData. This setup allows for smooth integration with current .NET AI workflows. It also makes it easy to add custom logic and connectors. This modular approach aims to grow alongside the evolving .NET AI ecosystem.
The preview is now available on NuGet in the Microsoft.Extensions.DataIngestion package. This lets developers and organizations start using these tools right away in their AI projects.


