On August 26th, Google DeepMind announced the all-new Gemini 2.5 Flash Image model (internal nickname “nano-banana”). This model not only inherits the “speed” and “efficiency” advantages of its predecessor, Gemini 2.0 Flash, but also achieves a qualitative leap in image editing and generation capabilities. It simplifies complex image processing work, allowing users to achieve precise and efficient image creation and editing with just natural language instructions.
Currently, this model is available for developer preview on Google AI Studio, the Gemini API, and Vertex AI, and is expected to officially transition to a stable version in the coming weeks. Like all of Google’s AI image models, all images generated or edited by Gemini 2.5 Flash Image will have an invisible SynthID digital watermark, ensuring that their AI-created identity can be recognized.
Gemini 2.5 Flash Image Core Functions
With its powerful technical foundation, Gemini 2.5 Flash Image brings four revolutionary core functions that completely change our traditional perception of AI image models.
1. Role Consistency: Bidding Farewell to the “Multifaceted” Problem
In the past, a major pain point of AI image models was the difficulty in maintaining a consistent appearance for the same character across multiple different scenes. This meant that creating the same protagonist for a comic series or shooting multi-angle images for e-commerce products required repeated adjustments to the prompt, and the results were difficult to guarantee.
Gemini 2.5 Flash Image perfectly solves this problem. It can accurately understand and remember the appearance characteristics of a character or object and seamlessly integrate them into new images with different poses, backgrounds, or lighting. Whether it’s shooting a series of product photos for a brand or creating a complete character story, this feature can greatly improve creative efficiency and content consistency.
2. Precise Text Editing: One Sentence for Complex Retouching
Have you ever dreamed of making precise, localized modifications to an image with just a simple description? Now, this dream has become a reality. Gemini 2.5 Flash Image allows users to perform precise, localized editing through natural language. For example, you can ask the model to “remove the stain on the T-shirt,” “blur the photo background,” “colorize a black-and-white photo,” or even “change a person’s pose.” This capability greatly simplifies the tedious manual selection and layer operations in traditional photo editing software, allowing both designers and ordinary users to easily become “retouching masters.”
3. Multi-Image Fusion: Merging Creativity to Generate New Scenes with One Click
Gemini 2.5 Flash Image can understand and combine multiple input images to create new, logical scenes. Users can upload up to three images at a time and then use a prompt to merge them together. For example, you can combine a product image with an interior photo to quickly generate a realistic product display image; or you can merge a person’s portrait with another scene to create a new creative composite photo. This capability brings unlimited possibilities for graphic design, product displays, and creative content generation.
4. Real-World Reasoning: AI’s “World Knowledge” Empowers Image Creation
Traditional image generation models are often only good at aesthetics but lack a deep understanding of the real world. However, Gemini 2.5 Flash Image inherits the powerful “world knowledge” of the Gemini series models, allowing it to perform simple causal reasoning. For example, it can understand a hand-drawn sketch and transform it into a precise instructional diagram, and it can even simulate a simple physical process like “a balloon flying toward a cactus” and generate the resulting image of “the balloon being popped.” This capability makes the AI image model not just an aesthetic tool, but a smart assistant with practical functions and logical reasoning abilities.
Gemini 2.5 Flash Image Pricing and Usage
Gemini 2.5 Flash Image continues the path of “low latency and low cost.” A preview version is now available to developers through the three major platforms: the Gemini API, Google AI Studio, and Vertex AI, with a stable version expected to be released in the coming weeks. It’s worth noting that OpenRouter.ai and fal.ai are among the first partners to introduce this model into their ecosystem of over 3 million developers.
The pricing continues the high-value characteristics of the Gemini series:
- Output Fee: $30 per million tokens
- Single Image: Approximately 1,290 tokens (about $0.039)
- Input Billing: Follows Gemini 2.5 Flash standards
How to use Google Gemini 2.5 Flash Image?
For developers and content creators, getting started with Gemini 2.5 Flash Image is very simple.
- For General Users:If you just want to get a quick taste, you can go directly to Google AI Studio, switch to the “Flash” model, and select a preset template for testing, or simply enter your own prompt.
- For Developers:You can use the official Python sample code to call the gemini-2.5-flash-image-preview model for development. In addition, Google has partnered with well-known developer platforms OpenRouter.ai and fal.ai, allowing over 3 million developers worldwide to conveniently access and use the model through these platforms.
Conclusion
In summary, the release of Gemini 2.5 Flash Image marks the transition of AI image generation technology from a stage of “wild imagination” to a stage of “precise and controllable” editing. It not only lowers the barrier to image creation but also brings disruptive changes to various fields such as commercial applications, creative design, and education and research. In the future, as the model continues to iterate, we have reason to believe that AI will become the most powerful “co-pilot” for human creativity.