Achieving Optimal Performance in Azure AI Search

Blog

Sizing and Scaling Azure AI Search

Dr. Michael Allgöwer

Published on

4.9.2024

3.4.2025

Updated on

3.4.2025

Data Science & AI

Azure AI Search, Microsoft’s top serverless option for the retrieval part of RAG, has unique sizing, scaling, and pricing logic. While it conceals many complexities of server based solutions, it demands specific knowledge of its configurations.

Inhaltsverzeichnis

Basic Concepts: Replicas, Partitions, Indices, Search Units and Service Tiers

Let’s start with a small glossary of the most important concepts for sizing and scaling Azure AI Search:

Indices: an index is where your data is stored. Depending on your service tier (see below), you can have up to 200 indices, or even up to 3000 in a special tier (Standard S3HD) specifically designed to accommodate a high number of indices.
Partitions: Partitions physically store your data, unlike indices, which are logical. You can only configure the number of partitions in your search service. Each index is distributed across all partitions, except in service tier Standard S3HD, where each index is mapped to a single partition.
Replicas: Replicas are identical copies of your service used for load balancing.
Service tier: Azure AI Search is available in eight different tiers, with differences in maximum numbers of indices, partitions and replicas. They also differ in partition size (which is fixed for each tier), and in the total storage available. Higher tiers are assigned more powerful hardware (specs are not documented).
Search units (sometimes also called scale units): Multiply the number of partitions with the number of replicas to calculate the number of search units. Multiply this with the fee associated with your service tier, and you what you’ll be billed.

Two Special Tiers: Free Tier and Standard S3HD

There are two special tiers that behave a bit differently. The free tier should only be used for demos and educational purposes. Its resources are shared with other services, making the performance somewhat unpredictable, and the limits are very tight. Even in a proof of concept, you want to gain insights into the query throughput and other performance metrics on your specific data, which you can’t reliably do in the Free tier. Hence, even when you do a proof of concept, you should at least use the Basic tier.

The Standard S3HD tier supports up to 3,000 indices, compared to a maximum of 200 in other tiers. It is designed for multi-tenant applications, a distinct scenario from typical use cases. For the rest of this text, we will assume that you need at most 200 indices. Multi-tenant applications and the Standard S3HD tier will not be covered

Sizing and Scaling for a Proof of Concept: It’s a Storage Game

Sizing Azure AI Search for a proof of concept is usually easier than in a production environment. Two things help us in a typical proof of concept situation:

The number of users and queries is usually low and not growing quickly.
You don’t need to adhere to SLAs.

Hence, we usually don’t need multiple replicas. Instead, we focus on the total storage needed. This number is impossible to estimate with any precision. Microsoft’s documentation says that the only way to know the size of an index is to build it. But as there are vast differences between the total storage available from one service tier to the next, you will probably have an idea which one you need. If in doubt, choose the higher tier. Migrating from one tier to another means rebuilding the search service completely. You don’t want to do that in a typical proof of concept situation. Another viable option is to choose a tier that may or may not suffice to accommodate all of your data, and live with the risk of not being able to ingest all documents. Finally, the number of indices you need may also influence your choice of a service tier. Once you’ve chosen your tier and figured out your true storage needs, you know the right number of partitions and therefore also the monthly cost you have to expect.

Sizing and Scaling for Production: It’s a Design Game

There are two important things that you can only find out via testing:

The size of an index.
The query performance on a different service tier, with a different number of replicas or changes in indices. Increasing the number of replicas helps query performance, but the speedup is not proportional to the number of replicas and may differ for different queries.

For a production setting, this means that all sizing and scaling decisions need a lot of testing. You can’t prepare by laying out a scaling strategy that will tell you how to choose your parameters in a given situation. Instead, you must design your architecture to facilitate frequent, realistic, and efficient testing. Testing takes time: Not only a migration to a different tier takes time and effort with its complete rebuild. Even changing the number of replicas or partitions in the configuration may take several hours to be completed, due to the potentially large data volumes to be moved.

To be ready for testing, you need to prepare your architecture and your devops-processes:

Architecture: Plan for a traffic split / traffic copy that routes somes of your traffic to a separate Azure AI search service when you perform a test.
Devops: Fully automate the build process of your testing service, including, but not limited to, the use of infrastructure as code.

Want To Learn More? Contact Us!

Dr. Sebastian Petry

Domain Lead Data Science & AI

Who is b.telligent?

Do you want to replace the IoT core with a multi-cloud solution and utilise the benefits of other IoT services from Azure or Amazon Web Services? Then get in touch with us and we will support you in the implementation with our expertise and the b.telligent partner network.

Get to know us

The top of an office building on a bright day

All posts

No previous post

No next post

Sizing and Scaling Azure AI Search

Inhaltsverzeichnis

Basic Concepts: Replicas, Partitions, Indices, Search Units and Service Tiers

Two Special Tiers: Free Tier and Standard S3HD

Sizing and Scaling for a Proof of Concept: It’s a Storage Game

Sizing and Scaling for Production: It’s a Design Game

Want To Learn More? Contact Us!

Your contact person

Dr. Sebastian Petry

Who is b.telligent?

info@btelligent.com

Munich

Basel

Berlin

Cluj

Dusseldorf

Frankfurt

Hamburg

Nuremberg

Vienna

Zurich

Vienna – Postal address

Vienna – Visitor address

Cluj

Basel

Zurich

Nürnberg

Frankfurt

Düsseldorf

Hamburg

Berlin

Munich

Sizing and Scaling Azure AI Search

Inhaltsverzeichnis

Basic Concepts: Replicas, Partitions, Indices, Search Units and Service Tiers

Two Special Tiers: Free Tier and Standard S3HD

Sizing and Scaling for a Proof of Concept: It’s a Storage Game

Sizing and Scaling for Production: It’s a Design Game

Want To Learn More? Contact Us!

Your contact person

Dr. Sebastian Petry

Who is b.telligent?

Related Posts

Snowflake Document AI – Easily Extract Data From Unstructured Documents

Neural Averaging Ensembles for Tabular Data With TensorFlow 2.0

Neural Networks for Tabular Data: Ensemble Learning Without Trees

Efficient Distance Joins in Polars

Polars: Develop Faster, Execute Faster

Munich

Basel

Berlin

Cluj

Dusseldorf

Frankfurt

Hamburg

Nuremberg

Vienna

Zurich

Vienna – Postal address

Vienna – Visitor address

Cluj

Basel

Zurich

Nürnberg

Frankfurt

Düsseldorf

Hamburg

Berlin

Munich