Effective May 21, 2024, new AI-models join Microsoft Azure AI. GPT-4o and GPT-4 Turbo are now available in Azure AI Studio and as an application programming interface (API). Phi-3-vision, a new cost-effective model, also became available in Azure.
GPT-4o
GPT-4o is OpenAI’s newest and most powerful large language model (LLM).
Here are some specifics:
Multimodal Integration:
- GPT-4o processes text and images simultaneously.
- This multimodal approach sets a new standard for AI by enhancing accuracy and responsiveness in human-computer interactions.
- Capabilities like audio and video recognition may join in the future.
Accessing GPT-4o:
- To utilize GPT-4o, create or use an existing resource in a supported standard or global standard region where the model is available.
- Once your resource is set up, deploy the GPT-4o model using the name “gpt-4o” and the version “2024-05-13” (or the newest version available).
GPT-4 Turbo
GPT-4 Turbo is a LLM that accepts both text and image inputs, generating text-based responses.
GPT-4 Turbo’s strengths:
- GPT-4 Turbo outperforms its predecessors, including GPT-3.5 Turbo and older GPT-4 models.
- It excels in chat interactions and traditional completion tasks.
- GPT-4 Turbo is inferior to GPT-4o and is the older model.
Azure-Specific Differences to OpenAI:
- While OpenAI’s version of the latest 0409 turbo model supports JSON mode and function calling for all inference requests, Azure OpenAI’s version currently doesn’t support these features with image input.
- However, text-based input requests do support JSON mode and function calling.
Comparing GPT-4 Turbo to GPT-4o
- In English text and coding tasks, GPT-4o matches the capabilities of GPT-4 Turbo.
- However, where GPT-4o truly shines is in its superior performance in non-English languages and vision tasks. Vision tasks refer to a set of computer vision activities that involve processing and understanding visual data, such as images or videos. These tasks enable machines to “see” and interpret visual information, much like human vision.
The table below shows more differences:
Feature | GPT-4 Turbo | GPT-4o |
Input Types | Text and image | Text, image (audio and video recognition may join in the future) |
Optimized For | Chat interactions and traditional completion tasks | Multimodal tasks including non-English languages and vision |
Performance | High accuracy in problem-solving | Superior performance in non-English languages and vision tasks |
Latency | Higher Latency (unpreferred – time it takes for the model to respond after receiving an input) | Lower Latency (preferred) |
Throughput | Standard text generation speed: 20 tokens per second | Faster text generation speed: 109 tokens per second |
Cost | 2.0x more expensive | More cost-effective |
Model Versions | gpt-4 (turbo-2024-04-09) | gpt-4o (2024-05-13) |
GPT-4o is superior in every aspect. It only makes sense to use GPT-4 Turbo if your systems are already optimized for this model.
Phi-3-vision
Phi-3-vision is the first multimodal model (text and image recognition) in the Phi-3 family.
The Phi-3 family is a collection of AI small language models (SLMs) developed by Microsoft. They are powerful and very cost-effective.
Capabilities of Phi-3-vision:
- It is designed to reason over real-world data, seamlessly handling both text and images.
- Compared to other Phi-3 models, users can inquire about visual data, like charts, or pose open-ended questions about specific images.
- It has been developed in line with Microsoft’s responsible AI principles.
Here is a comparison of the Phi-3-models:
Model types | Parameters | Context lengths | Unique Selling Proposition (USP) |
Phi-3-vision | 4.2 billion | 128K | Only Phi-3 model that understands images |
Phi-3-mini | 3.8 billion | 128K / 4K | Compact and efficient |
Phi-3-small | 7.0 billion | 128K / 8K | Versatile for various AI tasks |
Phi-3-medium | 14.0 billion | 128K / 4K | Advanced processing for complex tasks |
You can access Phi-3-vision here.
Ready to Unleash the Power of these New Models in Azure?
Explore the cutting-edge models—GPT-4o, GPT-4 Turbo, and Phi-3-vision—now available in Azure AI. Whether you’re diving into multimodal tasks, enhancing collaboration, or optimizing cost-effectiveness, these models are your gateway to the future of AI.
Get Started Today by contacting SCHNEIDER IT MANAGEMENT for your Microsoft Azure needs!
More information
For the announcement, please visit: https://blogs.microsoft.com/blog/2024/05/21/whats-next-microsoft-build-continues-the-evolution-and-expansion-of-ai-tools-for-developers/#:~:text=New%20frontier%20models%20and,in%20Azure%20AI%20Studio.
For more info on Phi-3-vision in Azure, please visit: https://azure.microsoft.com/en-us/blog/new-models-added-to-the-phi-3-family-available-on-microsoft-azure/
For more info on GPT-4o in Azure AI, please visit: https://azure.microsoft.com/en-us/blog/introducing-gpt-4o-openais-new-flagship-multimodal-model-now-in-preview-on-azure/.
For a OpenAI model comparison in Azure, please visit: https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models.
To learn how we can help you with your Microsoft licensing requirements, please visit: https://www.schneider.im/software/microsoft/.
Contact us for expert services on your specific Microsoft Online Services and software requirements and to request a quote today.