
Wan 2.7: Alibaba AI Video Generator Guide
Wan 2.7 is Alibaba's AI video model with text-to-video, image-to-video, reference-to-video, and editing modes. See its features, pricing, and APIMart access.
Wan 2.7 is Alibaba's latest AI video generation model, competing with tools like Kling V3, launched in early 2026 by Tongyi Lab. It uses a 27-billion-parameter architecture to create professional-grade videos in four modes: Text-to-Video (T2V), Image-to-Video (I2V), Reference-to-Video (R2V), and Video Editing. With features like "Thinking Mode", HEX color matching, and native audio synchronization, it simplifies video production for marketing, e-commerce, and media teams.
Key details:
- Resolutions: 720p ($0.0664/sec) and 1080p ($0.1096/sec)
- Durations: 2–15 seconds
- Aspect Ratios: 16:9, 9:16, 1:1, 4:3, 3:4
- Modes:
- T2V: Generate videos from text prompts.
- I2V: Animate static images.
- R2V: Maintain style across references.
- Video Editing: Modify clips with natural language.
Wan 2.7 is accessible via APIMart with a pay-as-you-go model and commercial usage rights under the Apache 2.0 license. While it has limitations, like a 15-second cap and 1080p max resolution, it offers flexibility and precision for short-form video production.
Core Features and Capabilities of Wan 2.7
Multimodal Generation Modes
Wan 2.7 offers four distinct generation modes:
- Text-to-Video (T2V): This mode creates 720p or 1080p video clips directly from written prompts.
- Image-to-Video (I2V): It animates static images, using FLF2V to ensure smooth transitions for the opening and closing frames.
- Reference-to-Video (R2V): This mode maintains a character's identity, voice, and visual style across up to five references, such as images, audio clips, or video snippets, without requiring fine-tuning [2].
- Video Editing: Accepts natural language instructions to modify existing footage, enabling changes like altering the color of a jacket or applying global style adjustments to an entire clip.
Additionally, the Video Continuation feature extends 2–10 second clips into longer sequences while preserving consistent visuals.
These modes are enhanced by advanced controls that elevate visual quality, making them ideal for professional use.
Visual Quality and Advanced Controls
Wan 2.7 employs a Diffusion Transformer with Flow Matching and full spatio-temporal attention, allowing it to process both space and time simultaneously. This approach minimizes artifacts and ensures realistic three-dimensional movement, avoiding issues like object distortion or morphing between frames.
Key controls include:
- Thinking Mode: Pre-plans scene composition, lighting, and camera movements to handle complex prompts with multiple characters or intricate spatial layouts while reducing artifacts.
- Prompt Expansion: Automatically enriches short prompts with cinematographic details, such as lighting conditions, depth cues, and cinematic control, before the generation process begins.
- Seed Value: Saving a seed value from a successful generation lets users replicate the same visual style across multiple outputs, ensuring consistency.
These tools are tailored for U.S. marketing, e-commerce, and media teams looking to scale professional-grade video production efficiently.
Supported Resolutions, Durations, and Aspect Ratios
Wan 2.7 supports video outputs in 720p and native 1080p across all modes. Clip durations range from 2 to 15 seconds, offering flexibility for various use cases, from short social media ads to pre-visualization sequences. While image generation supports up to 4K resolution, video outputs remain capped at 1080p [2][5].
The platform natively supports five aspect ratios, each optimized for specific use cases:
| Aspect Ratio | Best For | Primary Platforms |
|---|---|---|
| 16:9 | Cinematic storytelling, film pre-visualization | YouTube, presentations, TV |
| 9:16 | Social ads, influencer content | TikTok, Instagram Reels, YouTube Shorts |
| 1:1 | Product showcases, brand awareness | Instagram Feed, square social ads |
| 4:3 / 3:4 | Traditional media, tablet content | Legacy formats, e-commerce listings |
For synchronous call clips, 5–10 seconds is the optimal duration. Generating a 15-second 1080p video can take over 10 minutes [2][4]. To manage costs, creating early drafts in 720p - saving approximately 33% - and reserving 1080p for final outputs is a practical strategy. Pricing models and access options are explored in the next section.
Wan 2.7 Pricing and Access Options
Direct API Pricing
Wan 2.7 operates on a per-second, pay-as-you-go model. There are no subscriptions, seat fees, or minimum usage requirements, making it easier to manage costs and scale production as needed.
Your final cost depends on three key factors: the resolution (720p vs. 1080p), the clip's duration (ranging from 2 to 15 seconds), and the generation mode. Both Standard Text-to-Video and Image-to-Video are billed at the same rate, while Reference-to-Video is higher due to its ability to process up to five mixed reference files. As a general rule, generating 1080p videos costs roughly 1.5 times more than 720p.
| Platform | Mode / Resolution | Price |
|---|---|---|
| APIMart | 720p (all modes) | $0.0664 / sec [6] |
| APIMart | 1080p (all modes) | $0.1096 / sec [6] |
For video editing tasks, the cost is calculated based on the combined duration of both the input and output footage [6]. This transparent pricing approach makes it easier to plan and budget for your projects.
Free Tiers and Hosted Platforms
Although Alibaba doesn't offer a centralized free tier for Wan 2.7, developers can still conduct brief tests to fine-tune their prompts and parameters before scaling up. APIMart's flexible pay-as-you-go model allows you to start with lower-cost outputs or explore alternatives like MiniMax Hailuo 2.3 as your workflow develops.
Accessing Wan 2.7 Through APIMart

APIMart simplifies the process of using Wan 2.7 by offering unified access through a single API key and consolidated billing. The system automatically selects the appropriate mode based on your input parameters, and since both modes are billed at the same rate, tracking costs becomes straightforward.
In addition, APIMart provides a 99.9% service level agreement, ensuring reliability for teams managing production pipelines [6].
"As a developer, I value stability and speed. WAN 2.7 on APIMart delivers great performance with an easy-to-use API." - David Chen, Full-Stack Engineer [6]
With pricing set at $0.0664 per second for 720p and $0.1096 per second for 1080p, APIMart offers a scalable solution with predictable costs for developers and production teams alike.
Business Use Cases and Workflow Integration
Marketing and Advertising
Wan 2.7's four generation modes make it a game-changer for marketing teams looking to create video content quickly and effectively. Take the Video Editing mode, for example. It allows marketers to perform tasks like A/B testing with ease. Imagine instructing it to "change the jacket from red to navy" - within seconds, you'll have a revised clip ready to test. This fast-paced iteration is perfect for fine-tuning creative elements in paid social campaigns.
For global campaigns, Wan 2.7 shines with its 12-language text rendering and localized voice cloning. These features let you adapt a single visual asset for multiple regions, saving time and resources while maintaining a consistent message. On top of that, the tool ensures exact HEX color code control, so every visual aligns perfectly with your brand's style guide.
"WAN 2.7 dramatically cut our short-form video turnaround. Cinematic camera moves and stable character consistency make our brand stand out on social." - Sarah Kim, Content Creator [6]
The platform also simplifies visual content creation for e-commerce applications, making it a versatile tool for marketing professionals.
E-Commerce and Product Visualization
For online retailers, Wan 2.7 offers tools that simplify product presentation. A standout feature is the 9-grid Image-to-Video tool, which transforms a 3×3 grid of product photos into a seamless video sequence. This is a huge time-saver for managing catalogs with large numbers of SKUs.
Another powerful feature is First and Last Frame Control (FLF2V), which lets you define exactly where a shot begins and ends. This precision is perfect for product reveals or smooth 360° rotations. Combine this with the Reference-to-Video (R2V) mode, and you can lock in a product's visual identity across up to five mixed references. This ensures a consistent look across an entire product line without the need for tedious manual adjustments.
Entertainment and Media Production
Wan 2.7 also offers exciting possibilities for entertainment and media production, particularly in ensuring consistent character portrayal and simplifying previsualization workflows.
Independent animators and studio teams can use the R2V mode to lock in a character's appearance, voice, and camera style across multiple clips. This eliminates the need for costly, per-subject fine-tuning, making it perfect for short-form narratives where consistent character portrayal is key.
"The consistency of WAN 2.7 is amazing! Character images remain stable across multiple clips, which was previously hard to achieve." - Wei Zhang, Independent Animator [6]
For previsualization, the Text-to-Video mode with Prompt Expansion brings rough scene descriptions to life. It creates fully realized storyboards with professional transitions and dynamic camera movements, such as FPV drone dives or orbital shots. Outputs are available in MP4, WEBM, and MOV formats, ensuring compatibility with popular editing software and web platforms [7].
Limitations, Risks, and Best Practices
Technical and Content Limitations
Wan 2.7 comes with a few constraints that can influence how you design your workflows. One of the most notable is the clip duration limit: videos max out at 15 seconds, and in Reference-to-Video mode, the limit drops further to 10 seconds [1][7]. Additionally, video resolution is capped at 1080p, unlike the Wan2.7-Image-Pro model, which supports higher-resolution still images [8].
Generating a 15-second 1080p video can take more than 10 minutes, which risks timeouts during synchronous API calls.
"15-second 1080P videos can exceed 10 minutes of generation time. I hit timeouts in my test run on that specific combination." - Segmind Review [4]
To avoid these issues, stick to 5–10 second clips for improved stability. For early drafts or experimental prompts, consider using 720p resolution - it reduces generation costs by around 33% compared to 1080p [2]. Reserve 1080p for your final outputs. For footage longer than 15 seconds, use Video Continuation mode to chain shorter clips, rather than attempting to stretch a single generation. Be aware that the model struggles with simulating complex physics, such as water, cloth dynamics, and multi-object collisions, often producing inconsistent results [9].
Legal and Ethical Considerations
Beyond the technical challenges, legal and ethical factors play a significant role when using Wan 2.7.
The model is distributed under the Apache 2.0 license, allowing U.S. businesses to use it commercially, self-host, and fine-tune without paying royalties [3][9]. Outputs generated through professional API platforms come with commercial usage rights, simplifying their use in publishing or advertising [3][6].
However, the Reference-to-Video (R2V) feature introduces potential risks. Since it can replicate a person’s face and voice from just one image and audio sample, you must ensure you have explicit legal rights to any likeness or voice used. Using someone’s image or voice without proper consent - even for internal testing - could violate right-of-publicity laws in many U.S. states. For teams working with the open-source version, there’s no built-in content filter, so it’s your responsibility to review outputs before they are shared publicly [9]. These precautions are especially important for businesses looking to integrate AI-generated content into commercial campaigns.
Tips for Getting the Most Out of Wan 2.7
To navigate these challenges and maximize the model’s potential, consider the following tips:
- Organized prompts lead to better results. Structure them by specifying key elements like Subject, Action, Camera Cue, Environment, and Mood. Use specific instructions (e.g., "change the background to a white studio") to refine outputs without regenerating everything - saving both time and credits [2][3][4].
- Save the seed value from any successful generation. This allows you to tweak prompts later without losing the quality of your original result [2][4].
- For multi-reference projects, keep the number of reference images to three or fewer. While the API supports up to five, quality tends to drop noticeably beyond three [9].
- Limit batch API calls to 3–4 at a time to avoid hitting rate limits [4].
Conclusion
Wan 2.7 brings together text-to-video, image-to-video, reference-to-video, and natural-language editing into one streamlined production system. By consolidating these capabilities, it simplifies workflows and speeds up content creation for U.S. businesses. The result? Fewer tools, reduced overhead, and quicker delivery of everything from social media ads to product demonstrations.
This system strikes a balance between creative control and affordability, delivering professional-grade precision at a fraction of the usual expense. Features like First and Last Frame Control, HEX-based color matching, and Thinking Mode empower teams with director-level control over their projects. On top of that, APIMart's transparent pay-as-you-go pricing - $0.0664/sec for 720p and $0.1096/sec for 1080p - is already 20% below standard rates, making it cost-effective whether you're producing a few clips or managing large-scale campaigns [6].
With an Apache 2.0 license, guaranteed commercial usage rights, and a 99.9% SLA, Wan 2.7 ensures dependable, flexible performance. While the platform does require a learning curve - rewarding users who craft precise, structured prompts - it opens the door to significant creative possibilities.
For U.S. businesses looking to integrate AI-driven video production into their workflows, Wan 2.7, available through APIMart, is a practical and economical choice.
FAQs
How much does a typical Wan 2.7 video cost?
Wan 2.7 offers a straightforward per-second pricing system - no subscriptions, no credit bundles. The cost depends on the resolution and mode you choose. For instance:
- 720p videos typically range from $0.10 to $0.13 per second.
- 1080p videos are priced slightly higher, at $0.15 to $0.195 per second.
To give you an idea, creating a 5-second 720p video would cost approximately $0.50 to $0.65. The final price is calculated by multiplying the video's duration by the per-second rate, which can vary depending on the API provider or the complexity of the task.
How do I keep the same character and style across multiple clips?
To maintain a consistent character and style in Wan 2.7, take advantage of its advanced multi-modal reference tools. You can upload up to five mixed references - such as images, video, or audio - to define key elements like facial structure, voice, and overall style. For more intricate requirements, consider uploading a 3x3 grid of reference images, which ensures consistency across multiple angles. Additionally, use the first and last frame control feature to keep the subject's placement and motion paths steady throughout clips.
What should I do if my 1080p generation times out?
If your 1080p generation process times out, you can rely on the asynchronous polling or callback delivery methods offered by the APIMart API. These methods are designed to efficiently manage the extended processing time needed for high-resolution outputs. With these workflows, you can submit your request and retrieve the result once it's complete - no need to maintain an open connection while waiting.