‘Indians among most avid users’: Team behind ChatGPT Images 2.0 on multilingual AI image generation

India is playing a growing role in shaping how AI image generation models are developed, with OpenAI’s ChatGPT Images 2.0 now capable of generating everything from Manga-style panels in Hindi to more realistic depictions of crowded and chaotic Indian streets.

Earlier this week, OpenAI CEO Sam Altman said that Indian users have generated more than one billion visuals using Images 2.0 since its release in April 2026. The milestone comes a year after OpenAI first introduced the ‘Images for ChatGPT’ feature that kicked off the viral Studio Ghibli-style AI images trend.

However, OpenAI is also reportedly undergoing a broader strategic reset, pulling the plug on experimental side projects while redirecting talent and computing resources toward enterprise products. In a surprise move, the company shut down Sora, its popular AI video-generation tool, just six months after releasing it to the public.

In this context, The Indian Express sat down with members of the San Francisco-based team that built Images 2.0 to understand how exactly the latest model is a step change above previous versions and more importantly, how it was iterated for multilingual, culturally diverse markets like India – an approach that seems to be paying off in terms of adoption and user engagement.

“Previously, most of our work, including model evaluations, were done in English. Our models also struggled with a lot of details, especially in Asian languages. In Chinese, Japanese, Korean, Hindi and others, there are thousands of characters compared to just 26 letters in English,” ⁠Boyuan Chen, a research scientist at OpenAI, said.

“However, this time, we spent a lot of time making sure cultures from around the world were covered in our internal iteration process. Whenever we saw that a language was not performing well, we added a lot more data to ensure broader cultural and linguistic coverage,” Chen explained.

With ChatGPT Images 2.0, OpenAI said it has achieved significant gains in non-Latin text rendering, particularly in Japanese, Korean, Chinese, Hindi, and Bengali. The multilingual understanding of the model is said to go beyond simple translation, where language is embedded in visual outputs such as posters, comics, diagrams, etc.

Abhi Muchhal, a product manager at OpenAI, offered another example of the model’s India-specific realism. “In the previous model, if you prompted it to make a city scene in India, it wouldn’t be crowded at all. While this model is not perfect, now you can see a realistic representation where there’s rickshaws moving left and right, and there’s a lot of people, there’s hustle and bustle,” he said.

Beyond multilingual capabilities, Images 2.0 has the ability to generate across a wide range of aspect ratios in much higher quality, with support for up to 2K resolution, and is said to demonstrate improved fidelity across a wide range of visual styles.

As recently as 2024, text-to-image generators like DALL-E 3 struggled to spell words accurately inside images. Because diffusion models generate images by reconstructing pixels from noise, small text elements received less attention during training. The issue became more complex with regard to outputs in different languages.

But that limitation has now largely gone the way of the infamous that plagued earlier image generators.

A key breakthrough came earlier with Images 1.0, which reportedly took an autoregressive approach to generate images sequentially from left to right and top to bottom, similar to how text is written. This differed from the diffusion model technique used by most image generators like DALL-E that create the entire image at once.

With Images 2.0, OpenAI was able to improve the model’s ability to accurately render text in different languages by applying the same advances used to improve its text-based chatbots. Declining to share details for proprietary reasons, Chen said,“It’s similar to text intelligence in ChatGPT. Depending on the prompt, it can respond robotically or more naturally and conversationally. The same idea applies here.”

He further mentioned that the key was training the model to follow instructions from users better. “With this image-generation model, we wanted it to follow the user’s intent. So we trained it on both types of data, publicly available casual data and studio-style images,” he said. “We made sure the model follows what people actually want, instead of simply outputting good-looking images,” Chen added.

Images 2.0 is also OpenAI’s first image generation model that ‘thinks through’ user prompts as it is built on top of the company’s reasoning models. It also has the ability to use the web to find relevant information, with a knowledge cut-off date of December 2025. It is also more likely to understand context than Images 1.5 did, according to Muchhal.

Stating that Indians have consistently been one of the most avid users of image generation, Muchhal said, “We were very happy to see the level of adoption in India, but more than the numbers, what surprised me most was the diversity of use cases.”

Not all of the usage trends pertained to generating photorealistic outputs, he said, pointing to the latest trend of asking ChatGPT to turn nice photos into scribbly drawings like the ones done on Microsoft Paint decades ago.

When asked whether viral AI image trends are intentionally shaped by OpenAI or driven organically by user behaviour, Muchhal affirmed that it was a combination of both: “We try to pick a representative set of use cases where we know that either the model has struggled with it in the past or areas that we want to improve, and we try to improve on those. But to be honest, a lot of the things that go viral are also unexpected to us.”

Tried my profile photo with ChatGPT Image 2.0 and it absolutely destroyed me 😂

Prompt 👇

“Redraw the attached image in the most clumsy, scribbly, and utterly pathetic way possible. Use a white background, and make it look like it was drawn in MS Paint with a mouse. It should…

— Sam Domains (@SamDZign1)

The OpenAI executives also said some of the most unexpected trends in India included AI-generated hair-colour previews, the ‘younger me’ portraits, and Y2K-style romantic portraits.

GPT IMAGE 2 on ChatGPT

Prompt:
A highly realistic split-time portrait of the same person meeting her younger self. The image is divided vertically into two perfectly aligned halves with identical camera angle and composition.

On the left side: a black-and-white photo of a…

— K (@ChillaiKalan__)

On enterprise adoption of AI image generators, Muchhal said, “In the past, the model struggled with accurately following instructions which made it very hard for users to be able to use this for a professional use case.” “But what we’ve seen now with Images 2.0 is not only the personal use cases, but there’s been overwhelming enterprise demand because now you’re able to make the creative workflow go so much faster,” he added.

Images 2.0 is also able to generate fine-grained elements, including the tiny flaws that add realism to its visuals.

Asked about the risk of photorealistic outputs used to spread misinformation, Muchhal said that OpenAI looks to strike a constant balance between creative freedom and user safety and transparency. “We have very high standards around copyright infringement, and we make sure there is no misuse in those areas. One thing we care deeply about is ensuring there is nothing deceptive or impersonating in the outputs,” he said.

ChatGPT-generated images support the open C2PA (Coalition for Content Provenance and Authenticity) standard which adds a clear signal in the metadata that an image was generated by an AI model. A few days ago, OpenAI also announced a partnership with Google to include an invisible watermark called SynthID. But the AI-generated images do not carry a visible watermark so as not to tarnish the output, as per Muchhal.

When asked for comment on the Indian government’s which require social media platforms to attach a prominent label on AI-generated content, Muchhal said, “We believe the system needs to be built in collaboration with stakeholders […] We have shared a lot of what we are doing with government stakeholders, continue to incorporate their input, and are working to find the right balance between giving users control and meeting the trust and safety expectations set by governments.”

Source