Apimart
Log inSign Up
MAI-Thinking-1 Key Specs and Data Guide

MAI-Thinking-1 Key Specs and Data Guide

Explore MAI-Thinking-1 specs, training data, sparse MoE architecture, 256K context, benchmark results, private preview access, API setup, and enterprise uses.

Model Insights

MAI-Thinking-1 is Microsoft's advanced AI model designed for tasks like math, coding, and enterprise applications. It features a sparse Mixture-of-Experts (MoE) architecture with 35 billion active parameters and a massive 256,000-token context window (about 600 pages). Trained on 33.55 trillion tokens of clean, licensed data, it ensures data safety and reliability for industries like healthcare, finance, and legal.

Key Features:

  • Architecture: Sparse MoE with 35B active parameters (~1T total).
  • Training Data: 33.55T tokens, all commercially licensed and human-generated.
  • Context Window: 256,000 tokens for handling extensive workflows.
  • Performance: Top scores in math (97% AIME 2025) and coding benchmarks.
  • Multi-Modal Integration: Works with tools like MAI-Image-2.5 (images) and MAI-Voice-2 (voice cloning).
  • API Access: OpenAI-compatible with advanced function calling.

Currently in private preview, MAI-Thinking-1 is optimized for enterprise-scale tasks, offering powerful reasoning capabilities while maintaining regulatory compliance. Its integration with other MAI tools supports complex workflows across industries.

MAI-Thinking-1: Building a Hill-Climbing Machine

MAI-Thinking-1 Core Specifications

The technical foundation of MAI-Thinking-1 provides the capabilities needed for producing advanced multi-modal outputs, making it a powerful tool for video generation and unified LLM API workflows.

Model Architecture and Parameters

MAI-Thinking-1 is designed using a sparse Mixture-of-Experts (MoE) architecture combined with a decoder-only Transformer setup. Its "LatentMoE" configuration ensures that only the necessary experts are activated - typically 8 out of 512 per token - allowing for efficient and scalable processing[2][6]. The model operates with 35 billion active parameters and approximately 1 trillion total parameters. Its context window spans 256,000 tokens, which equates to roughly 600 pages of content, enabling extensive data handling within a single session[6].

The architecture incorporates Gemma-3-style attention, featuring five local layers per global layer, a 512-token sliding window, and Grouped-Query Attention (GQA) with 8 KV heads. Its tokenizer, o200k_base, supports a vocabulary of about 200,000 tokens, further enhancing its processing capabilities.

MAI-Thinking-1 was co-developed with Microsoft's Maia 200 silicon, delivering a 1.4x boost in performance-per-watt compared to standard hardware setups. Training utilized 8,000 NVIDIA GB200 GPUs, ensuring the model's robust performance through access to high-quality computational resources[6].

Training Source and Data Compatibility

The model was trained on a total of 33.55 trillion tokens of clean, commercially licensed, human-generated content. This includes 30 trillion tokens during the main pre-training phase and an additional 3.55 trillion during mid-training[6]. The dataset encompasses a wide range of sources, such as web text, publicly available GitHub code, books, academic papers, news, multilingual content, and materials tailored to specific domains.

"MAI-Thinking-1 was trained from scratch using 33.55 trillion tokens of meticulously sourced, commercially licensed data, ensuring a fully auditable training pipeline."

This transparent data sourcing makes the model a reliable choice for industries like healthcare, finance, and legal, where intellectual property safety and regulatory compliance are critical[4].

API and Integration Features

MAI-Thinking-1 integrates seamlessly with the Chat Completions API, making it compatible with OpenAI-style workflows. It also supports advanced features like function calling and layered developer instructions, making it well-suited for complex multi-modal tasks including video generation and enterprise-level applications.

FeatureSpecification
ArchitectureSparse MoE / Decoder-only Transformer
Active Parameters35 Billion
Total Parameters~1 Trillion
Context Window256,000 tokens (~600 pages)
Training Data33.55T tokens (clean, human-generated, licensed)
API CompatibilityChat Completions API, Function Calling, Developer Instructions
Tokenizero200k_base (~200k vocabulary)
Primary SiliconMicrosoft Maia 200

Key Performance and Capability Data

MAI-Thinking-1: Key Specs & Benchmark Performance at a Glance
MAI-Thinking-1: Key Specs & Benchmark Performance at a Glance

MAI-Thinking-1 delivers measurable outcomes in critical areas like math, coding, and scientific reasoning. These metrics reflect its performance on standardized problem sets that align with enterprise needs.

Reasoning and Workflow Optimization

With a 256,000-token context window, MAI-Thinking-1 can handle complex, multi-step workflows. This includes tasks like analyzing lengthy contracts, processing detailed agent traces, or reviewing intricate research documents - all in a single pass. Its sparse Mixture-of-Experts (MoE) architecture ensures scalability without a proportional increase in compute demand, which means lower costs for high-volume tasks.

For example, in June 2026, Microsoft showcased a MAI model fine-tuned for automated Excel tasks. The model achieved performance parity with GPT-5.4 on both public and private benchmarks [6].

"MAI-Thinking-1 is purpose-built for the workloads enterprises run at scale... at a price-performance point that makes high-volume, always-on AI workloads economically viable." - Naomi Moneypenny, Microsoft [3]

These reasoning capabilities make it a strong contender for tasks requiring advanced code and mathematical performance.

Code and Math Performance

MAI-Thinking-1 has achieved some of the highest benchmark scores in its category. Notable results include:

  • 97.0% on AIME 2025
  • 94.5% on AIME 2026
  • 52.8% on SWE-Bench Pro
  • 73.5% on SWE-Bench Verified
  • 84.2% on GPQA Diamond
  • 87.7% on LiveCodeBench v6 [6]

These scores reflect Microsoft’s focus on training the model in deterministic environments, ensuring its performance is both reliable and steerable. Such consistency makes it ideal for demanding enterprise applications.

Enterprise Deployment Characteristics

MAI-Thinking-1 is designed with scalability and data safety at its core. Built exclusively on proprietary, high-quality data, it ensures auditability and intellectual property (IP) protection - especially crucial for regulated industries like healthcare, finance, and legal. In June 2026, Microsoft partnered with Mayo Clinic to develop a specialized clinical model using the MAI framework, tailored for high-stakes clinical reasoning [6].

CapabilitySpecificationEnterprise Benefit
Active Parameters35B (MoE)High performance at reduced inference costs
Context Window256,000 tokensHandles 600+ page documents without chunking
AIME 2025 Score97.0%Exceptional mathematical reasoning
SWE-Bench Pro Score52.8%Reliable, production-grade code generation
GPQA Diamond Score84.2%Advanced scientific and technical reasoning
Data StandardsClean, commercially licensedEnsures IP safety for sensitive industries
CustomizationFrontier Tuning / RLEsEnables user-owned checkpoints and workflows

This robust performance, paired with multi-modal integration, including advanced multimodal models, supports applications like video generation and API optimization.

Enterprises can further tailor MAI-Thinking-1 using Reinforcement Learning Environments (RLEs). Microsoft reported in June 2026 that tuning the model for McKinsey’s specific needs resulted in better quality than GPT-5.5, all while operating at 10x lower cost [6].

Training Pipeline and Data Requirements

Overview of the Training Pipeline

MAI-Thinking-1 is built to continuously evolve, thanks to a carefully designed pipeline where every element - data, compute power, and reward signals - can be refined and optimized over time [2][6].

"The aim is a repeatable system that can absorb better data, stronger rewards, more capable environments, and more compute." - Microsoft AI [2]

The process starts with MAI-Base-1, which is created through a multi-phase token training approach [6][7]. From there, Microsoft develops three specialized models in parallel. These focus on:

  • STEM and competition-level coding.
  • Agentic coding and tool utilization.
  • Helpfulness and safety.

These models are then merged using supervised learning. The final step involves reinforcement learning (RL) to fine-tune the combined model, resulting in MAI-Thinking-1. The entire training process was powered by 8,000 NVIDIA GB200 GPUs, running on a Microsoft-operated Azure cluster [6].

Microsoft takes a unique stance on model development: intelligence is built through learning, not inherited. Rather than distilling knowledge from third-party models, MAI-Thinking-1 is trained to develop reasoning capabilities independently [2][7].

"Capabilities should be learned, not inherited. Although faster to acquire, inherited intelligence lacks the steerability essential for real world usage." - Microsoft AI [2]

This foundational approach ensures the system is not only capable but also adaptable for real-world applications, setting the stage for rigorous data governance and integration.

Data Quality and Governance Standards

The training data is carefully curated, consisting exclusively of human-written, commercially licensed content [2][6]. This ensures a clean data lineage, which is critical for industries like healthcare and finance, where regulatory compliance relies on traceable and auditable data [2][5].

To maintain integrity, Microsoft avoids using standard machine learning benchmarks during training. This prevents the model from memorizing test data, ensuring that benchmark scores reflect genuine reasoning rather than rote learning [6].

"If we cannot account for what shaped a model, we cannot fully understand its behavior or credibly improve it." - Microsoft AI Superintelligence Team [2]

Multi-Modal Data Integration

The training process goes beyond traditional text data, incorporating diverse multi-modal formats to enhance its relevance in practical scenarios. The corpus includes web text, public GitHub code, books, academic papers, news articles, and multilingual content [6]. For coding and agentic tasks, Microsoft employs deterministic, executable environments to validate the data [2][6].

In addition to text and code, the MAI models integrate audio and visual data. For instance:

  • MAI-Transcribe-1.5 uses advanced entity recognition to handle domain-specific vocabulary with precision.
  • MAI-Voice-2 features identity preservation for consistent voice output [3].
  • MAI-Image-2.5 enables precise image-to-image editing while maintaining brand and character consistency [3].

This multi-modal integration equips MAI-Thinking-1 for a wide range of applications, from text generation to complex visual and audio tasks, ensuring it is versatile enough for real-world needs.

API Integration Requirements in APIMart

GccAi

This section outlines the steps for integrating with APIMart's API to access MAI-Thinking-1, designed for advanced multi-modal workflows.

Setting Up API Access

To access MAI-Thinking-1, you’ll use APIMart’s OpenAI-compatible gateway. Start by configuring your OpenAI SDK (Python or Node.js) to point to the endpoint:

https://api.apimart.ai/v1

Replace your OpenAI API key with your APIMart API key. Ensure you specify "mai-thinking-1" as the model identifier.

For authentication, use a Bearer token in the request header:

Authorization: Bearer YOUR_API_KEY

It’s essential to store your API key securely, either in an environment variable or a secrets manager. Note that access is currently limited to private preview. Production access can be requested through Microsoft Foundry [2][3].

Function Calling and Developer Tools

MAI-Thinking-1 goes beyond standard chat completions by offering runtime tools that enhance functionality. The integration process is straightforward:

  1. The model processes your input and identifies a tool call request.
  2. Your client executes the tool call and sends the result back to the model.
  3. The model uses the result to generate a final response [8].

Here’s a breakdown of the supported tools and their use cases:

Tool TypeDescriptionExample Use Case
Function CallingExecutes custom functions via JSON SchemaTrigger database queries or business logic
Web SearchRetrieves real-time data from the internetFetch stock prices or technical documentation
File SearchSearches within uploaded documentsAnalyze internal manuals or compliance files
Remote MCPConnects to Model Context Protocol servicesAccess protected data or domain-specific models

These tools allow developers to extend the model's capabilities for a wide range of applications.

Operational Prerequisites

To optimize your integration with MAI-Thinking-1, ensure the following operational requirements are met:

  • Support for a 256,000-token context window, which requires efficient memory and latency management to handle large in-context datasets [2].
  • Proper handling of key error codes:
    • 401: Authentication failure
    • 402: Insufficient balance
    • 429: Rate limit exceeded

Here’s a quick summary of the technical requirements:

RequirementDetail
Model Identifiermai-thinking-1
Auth MethodBearer Token
Context Window256,000 tokens
ArchitectureSparse Mixture-of-Experts (35B active / 1T total parameters)
API CompatibilityOpenAI Chat Completions

Practical Deployment Constraints

Current Deployment Status

As of June 2026, MAI-Thinking-1 is available exclusively through a private preview on Microsoft Foundry. Access requires submitting a request via the Foundry portal. A broader public preview is expected on the MAI Playground "soon", with general availability gradually expanding across multiple Foundry regions worldwide [1][2]. This phased rollout highlights the model's importance in handling complex, multi-modal AI workflows.

One major limitation at this stage is the lack of disclosed per-token pricing. This makes it challenging for teams to accurately budget for integration. However, for context, other models in the MAI series on Foundry are priced at $5 per 1M input tokens (MAI-Image-2.5) and $0.36 per hour (MAI-Transcribe-1.5), offering a general idea of the pricing structure [3].

Inference and Performance Trade-offs

MAI-Thinking-1 employs a sparse Mixture-of-Experts architecture, which activates only 35 billion out of its ~1 trillion total parameters during an inference pass. This design strikes a balance between reducing compute costs and maintaining high-level reasoning capabilities:

"MAI-Thinking-1 is purpose-built for the workloads enterprises run at scale... at a price-performance point that makes high-volume, always-on AI workloads economically viable." [3]

That said, the model is limited to text and reasoning tasks. It does not support vision-based tasks or multimodal document understanding. For workflows requiring image analysis or visual document parsing, additional models will be necessary. These limitations can significantly shape deployment strategies, depending on the industry and specific use cases.

Industry-Specific Use Cases

These constraints directly influence how the model is applied across different industries. For example, the Mayo Clinic has collaborated with Microsoft to integrate MAI-Thinking-1 into clinical and research workflows. This partnership demonstrates how access controls and reliability requirements play a crucial role in regulated sectors like healthcare [6][9].

Overall, MAI-Thinking-1 is best suited for high-volume reasoning tasks in areas such as legal analysis, financial modeling, medical research, and enterprise software development. These sectors demand scalable, cost-efficient intelligence solutions, and the model's text-focused architecture, combined with its controlled availability, aligns well with the compliance and operational needs of these industries.

Conclusion and Key Takeaways

MAI-Thinking-1 stands out with its powerful architecture and smooth integration, making it a top choice for multi-modal applications. Its performance in reasoning tasks sets a high bar, combining efficiency with impressive benchmark results. The use of clean, commercially licensed training data and a no-distillation approach gives it a clear edge, especially for industries like healthcare, legal, and finance, where data transparency and lineage are crucial.

Developers can easily access MAI-Thinking-1 through APIMart, which offers a unified endpoint for over 500 AI models, including MAI-Image-2.5, Sora 2, and Kling V3. This setup simplifies the creation of multi-modal pipelines - for example, using MAI-Thinking-1 for reasoning, MAI-Image-2.5 for visual tasks, and Sora 2 or Kling V3 for video output. All of this is backed by a reliable 99.9% uptime SLA.

To optimize costs and performance, enterprises can implement tiered routing by assigning complex reasoning tasks to MAI-Thinking-1 while delegating simpler tasks to lighter models. Early integration is recommended, giving businesses time to test, refine, and scale their systems before demand increases. By adopting MAI-Thinking-1, organizations can achieve exceptional technical performance while unlocking operational efficiencies across various sectors.

FAQs

What should I use the 256,000-token context window for?

The 256,000-token context window is perfect for handling tasks that involve processing large amounts of data. For example, it can review massive documents - up to 600 pages - or handle intricate coding workflows with ease. This capability is particularly useful for enterprise-level operations, such as analyzing entire code repositories to spot errors, resolve bugs, or execute complex, layered instructions.

It also shines in asynchronous, high-volume scenarios. Tasks like assessing architectural consistency or identifying patterns in extensive datasets become much more manageable, especially when immediate responses aren't necessary.

How do I get access if MAI-Thinking-1 is private preview?

To try out MAI-Thinking-1 during its private preview, you’ll need to submit an early access request through the official interest form on Azure AI Foundry. You can also test the model on platforms like Baseten, Fireworks AI, and Open Router. Look for registration links in the model card or catalog on Azure AI Foundry. Details about public availability and pricing will be shared after the preview phase.

What do I need to change to call MAI-Thinking-1 from my OpenAI SDK?

To use MAI-Thinking-1 with the OpenAI SDK, you'll need to adjust both your client configuration and request structure. Here's what to do:

  • Use the API base URL specifically for MAI-Thinking-1.
  • Set the model parameter to mai-thinking-1.
  • Include system prompts designed to guide step-by-step reasoning.
  • Adjust the extra_body parameter to control the reasoning effort. Keep in mind, this will influence both compute time and associated costs.

Make sure your integration is equipped to process reasoning-style outputs properly for the best results.

Ready to build?

Choose the model you want in the model marketplace

Try chat, image and video models in the APIMart model marketplace, and experience model capabilities quickly with one unified API.

Chat modelsImage modelsVideo models
Explore model marketplace