Azure AI Search, Microsoft’s top serverless option for the retrieval part of RAG, has unique sizing, scaling, and pricing logic. While it conceals many complexities of server based solutions, it demands specific knowledge of its configurations.
Basic concepts: replicas, partitions, indices, search units and service tiers
Let’s start with a small glossary of the most important concepts for sizing and scaling Azure AI Search:
- Indices: an index is where your data is stored. Depending on your service tier (see below), you can have up to 200 indices, or even up to 3000 in a special tier (Standard S3HD) specifically designed to accommodate a high number of indices.
- Partitions: Partitions physically store your data, unlike indices, which are logical. You can only configure the number of partitions in your search service. Each index is distributed across all partitions, except in service tier Standard S3HD, where each index is mapped to a single partition.
- Replicas: Replicas are identical copies of your service used for load balancing.
- Service tier: Azure AI Search is available in eight different tiers, with differences in maximum numbers of indices, partitions and replicas. They also differ in partition size (which is fixed for each tier), and in the total storage available. Higher tiers are assigned more powerful hardware (specs are not documented).
- Search units (sometimes also called scale units): Multiply the number of partitions with the number of replicas to calculate the number of search units. Multiply this with the fee associated with your service tier, and you what you’ll be billed.
Two special tiers: free tier and Standard S3HD
There are two special tiers that behave a bit differently. The free tier should only be used for demos and educational purposes. Its resources are shared with other services, making the performance somewhat unpredictable, and the limits are very tight. Even in a proof of concept, you want to gain insights into the query throughput and other performance metrics on your specific data, which you can’t reliably do in the Free tier. Hence, even when you do a proof of concept, you should at least use the Basic tier.
The Standard S3HD tier supports up to 3,000 indices, compared to a maximum of 200 in other tiers. It is designed for multi-tenant applications, a distinct scenario from typical use cases. For the rest of this text, we will assume that you need at most 200 indices. Multi-tenant applications and the Standard S3HD tier will not be covered
Sizing and scaling for a proof of concept: it’s a storage game
Sizing Azure AI Search for a proof of concept is usually easier than in a production environment. Two things help us in a typical proof of concept situation:
- The number of users and queries is usually low and not growing quickly.
- You don’t need to adhere to SLAs.
Hence, we usually don’t need multiple replicas. Instead, we focus on the total storage needed. This number is impossible to estimate with any precision. Microsoft’s documentation says that the only way to know the size of an index is to build it. But as there are vast differences between the total storage available from one service tier to the next, you will probably have an idea which one you need. If in doubt, choose the higher tier. Migrating from one tier to another means rebuilding the search service completely. You don’t want to do that in a typical proof of concept situation. Another viable option is to choose a tier that may or may not suffice to accommodate all of your data, and live with the risk of not being able to ingest all documents. Finally, the number of indices you need may also influence your choice of a service tier. Once you’ve chosen your tier and figured out your true storage needs, you know the right number of partitions and therefore also the monthly cost you have to expect.
Sizing and scaling for production: it’s a design game
There are two important things that you can only find out via testing:
- The size of an index.
- The query performance on a different service tier, with a different number of replicas or changes in indices. Increasing the number of replicas helps query performance, but the speedup is not proportional to the number of replicas and may differ for different queries.
For a production setting, this means that all sizing and scaling decisions need a lot of testing. You can’t prepare by laying out a scaling strategy that will tell you how to choose your parameters in a given situation. Instead, you must design your architecture to facilitate frequent, realistic, and efficient testing. Testing takes time: Not only a migration to a different tier takes time and effort with its complete rebuild. Even changing the number of replicas or partitions in the configuration may take several hours to be completed, due to the potentially large data volumes to be moved.
To be ready for testing, you need to prepare your architecture and your devops-processes:
- Architecture: Plan for a traffic split / traffic copy that routes somes of your traffic to a separate Azure AI search service when you perform a test.
- Devops: Fully automate the build process of your testing service, including, but not limited to, the use of infrastructure as code.
Have Questions around sizing and scaling?
Sizing and scaling Azure AI search is a nontrivial task, even at this somewhat superficial overview level. If you have questions or want to discuss other RAG-related topics, feel free to reach out to Dr. Michael Allgöwer on Linkedin or via e-mail Michael.Allgoewer@ btelligent.com.
If you’d like to dive into other aspects of RAG, LLMs, and DevOps/MLOps, have a look at our other blog entries.