A metal ball rolling into a room, coming to rest in the center, rendered in a minimalist style with no background, 'CHRONO EDIT'

NVIDIA ChronoEdit: Image Editing with Video Generation

NVIDIA has released Chrono Edit, which is basically image editing as video generation. So instead of just going from the image plus the prompt to the final image directly, it sort of generates a low frame rate video that follows the instructions to get to that final result. For example, if you insert a big metal ball in the center of the room, it’ll have the ball rolling into the scene and ending up there so that it’s more physically consistent.
The drawbacks are that the actual image editing quality can sometimes be iffy. You’ll get a little bit of warping and mushiness in the fine details, especially with people’s faces. So it’s not as good as other image editing models when it comes to that. The prompt adherence doesn’t always work, even for a task that should be pretty simple like moving the camera. Sometimes it just produces an image that’s almost identical to the input image but with mushier details, instead of actually following the prompt, so it can be pretty finicky. You need to make sure that your prompt always produces the actual result you’re looking for.
On the plus side, it’s super cheap, like one cent per image. This is pretty crazy since it’s doing video generation in the background. It’s available on Fal.ai right now. This isn’t the first time we’ve seen this; there was a paper about video based image editing that came up on AI Search’s YouTube channel a few months ago, but this is the first one that’s available on providers like Fal.ai and has some decent quality depending on what you’re doing. I would just recommend running the end results through an upscaler and not using this for something where every detail needs to be exactly the same as the original. But for AI-generated media, this is good.
https://research.nvidia.com/labs/toronto-ai/chronoedit/
https://fal.ai/models/fal-ai/chrono-edit/playground