#
Add Azure AI OpenAI Model
Prerequisites for using Azure OpenAI
An Azure subscription
Python: Version 3.8 or later.
Python Libraries: The following Python libraries are required: os.
Azure OpenAI Service Resource: An Azure OpenAI Service resource with either the gpt-35-turbo or the gpt-4 models deployed. For more information about model deployment, see the resource deployment guide.
Post the requirements are met, you can go ahead and proceed to configure. The following details need to be provided -
LLMs / Embeddings Select the type of model you want to configure.
LLMs: Large Language Models used for tasks within the Expert Agent Studio
Embeddings: Models that convert text into vector representations for embedding the documents in the Knowledge hub.
Model ID
A unique identifier assigned to the model within Azuri OpenAI tells the platform exactly which model to route requests to.
Deployment Name
The deployment name is a user-defined identifier assigned when deploying an OpenAI model (like GPT-3.5 Turbo or GPT-4) within your Azure OpenAI resource.
Display Name
A user-friendly name that will appear in the platform UI
API Key
A secure token provided by Azure Open AI to authenticate API requests. Ensures only authorized applications can access the model.
Endpoint
The full URL where the model can be accessed for inference. This is where the platform sends requests when invoking the model
Deployment Type
A dropdown to specify how the model is deployed and consumed
On-Demand - This deployment offers a pay‑per‑call model where you only pay for the requests you make. It’s ideal for low to medium volume, bursty workloads. While it offers flexibility and no infrastructure reservation, latency may vary under heavy load.
Provisioned - Provisioned deployments use reserved compute capacity called Provisioned Throughput Units (PTUs). You select a consistent capacity level upfront, which delivers predictable performance and lower latency for high-throughput use cases. This is suitable for workloads that require stable, large-scale inference.
Batch - Batch deployments are optimized for asynchronous, high-volume processing. You submit large input files and receive results once processing completes, typically within 24 hours. This comes at a lower cost than real-time deployments but doesn’t support immediate response. Useful for document summarization, dataset processing, or bulk classification.
Custom Model Import- This option supports deploying custom or open-source models using dedicated infrastructure.