Interesting Engineering

Flawed to flawless: New diffusion model ends AI image generation issues

By Kapil Kajal,

4 hours ago

Generative artificial intelligence (AI) has historically struggled to produce consistent images, often misinterpreting details such as fingers and facial symmetry.

Moreover, when prompted to generate images of different sizes and resolutions, these models can fail.

Rice University computer scientists have developed a new method for generating images using pre-trained diffusion models to curb such issues.

These models are generative AI that learns by adding layer after layer of random noise to the images they are trained on and then generates new images by removing the added noise.

ElasticDiffusion

Moayed Haji Ali, a doctoral student in computer science at Rice University, presented the new approach called ElasticDiffusion in a peer-reviewed paper at the 2024 Institute of Electrical and Electronics Engineers (IEEE) Conference on Computer Vision and Pattern Recognition (CVPR) in Seattle.

“Diffusion models like Stable Diffusion, Midjourney, and DALL-E create impressive results, generating fairly lifelike and photorealistic images,” Haji Ali said.

“But they have a weakness: They can only generate square images. So, in cases where you have different aspect ratios, like on a monitor or a smartwatch … that’s where these models become problematic.”

If you instruct a model like Stable Diffusion to generate a non-square image, such as one with a 16:9 aspect ratio, the elements used to construct the resulting image may become repetitive.

That repetition manifests as abnormal deformities in the image or image subjects, such as individuals with six fingers or a strangely elongated car. The way these models are trained also contributes to the problem.

“If you train the model on only images that are a certain resolution, they can only generate images with that resolution,” said Vicente Ordóñez-Román, an associate professor of computer science who advised Haji Ali on his work alongside Guha Balakrishnan, assistant professor of electrical and computer engineering.

Overfitting

Ordóñez-Román explained that overfitting is a common problem in AI, where the model becomes too specialized in the training data.

“You could solve that by training the model on a wider variety of images, but it’s expensive and requires massive amounts of computing power ⎯ hundreds, maybe even thousands of graphics processing units,” Ordóñez-Román said.

According to Haji Ali, digital noise used by diffusion models can be translated into a signal with two data types: local and global.

The local signal contains detailed pixel-level information, such as the shape of an eye or the texture of a dog’s fur, while the global signal captures the image’s overall outline.

“One reason diffusion models need help with non-square aspect ratios is that they usually package local and global information together,” said Haji Ali, who worked on synthesizing motion in AI-generated videos before joining Ordóñez-Román’s research group at Rice for his Ph.D. studies.

“When the model tries to duplicate that data to account for the extra space in a non-square image, it results in visual imperfections.”

Different approach

The ElasticDiffusion method explained in Haji Ali’s paper takes a unique approach to generating images.

Instead of combining both signals, ElasticDiffusion separates the local and global signals into conditional and unconditional generation paths.

It subtracts the conditional model from the unconditional one, resulting in a score encompassing overall image information.

After that, the unconditional path with the local pixel-level detail is applied to the image in quadrants, filling in the details one square at a time.

Global information, such as the image aspect ratio and the content of the image (e.g., a dog, a person running, etc.), remains separate. This ensures that the AI does not confuse the signals and repeat data.

The result is a clearer image that does not require additional training, regardless of the aspect ratio.

The only drawback to ElasticDiffusion relative to other diffusion models is time. Currently, it takes up to 6-9 times as long for Haji Ali’s method to make an image.

The goal is to reduce that to the same inference time as other models like Stable Diffusion or DALL-E.

Expand All

Read in NewsBreak

Comments /

Add a Comment

YOU MAY ALSO LIKE

Local News

Slim 1.7mm sandwich robot crawls, climbs, squeezes, swims through narrow gaps

Interesting Engineering1 day ago

Life in 3D: 1mm robot folds, flips and moves with just a spark of power

Interesting Engineering1 day ago

First magnet-controlled prosthetic hand lets amputees perform everyday tasks

Interesting Engineering2 days ago

'Gunsmoke' Actor James Arness and His Touching Last Words To Fans: 13 Years After His Tragic Death

Herbie J Pilato16 days ago

Tiny laser creates 180,000°F, 800 million times atmospheric pressure in cosmic simulation

Interesting Engineering11 hours ago

New plant produces lithium 500 times faster with 96% recovery rate from brine

Interesting Engineering1 day ago

Math Puzzle for September 6, 2024

Alameda Post7 days ago

Mountaintop falling into sea caused mega-tsunami, shook Earth for 9 days

Interesting Engineering1 day ago

Kamikaze pigeons, drunk worms, anus-breathing mammals awarded Ig Nobel Prizes

Interesting Engineering16 hours ago

World’s strongest battery could extend EV range by 70%, make phones credit card-thin

Interesting Engineering2 days ago

This Little Yorkshire Terrier Is Looking For Love

Camilo Díaz24 days ago

Solar device makes 20L drinking water a day from seawater with 93% efficiency

Interesting Engineering1 day ago

Health officials report first case of Oropouche virus, aka ‘Sloth Fever,’ confirmed in Kentucky

Northern Kentucky Tribune9 days ago

Cheap aluminum paste used to build TOPCon solar cells with 22.56% efficiency

Interesting Engineering1 day ago

New 450Wh/kg solid-state battery to boost range of Mercedes future EVs by 80%

Interesting Engineering3 days ago

Every household can get four free COVID-19 tests by mail, starting late September

Northern Kentucky Tribune6 days ago

Thousands of parents die of overdoses; advocates say their kids need more help

Northern Kentucky Tribune25 days ago

‘World’s largest, most advanced’ dark matter detector to be built by UK

Interesting Engineering2 days ago

Anduril reveals powerful missile family with 575-mile reach, 2-hour loiter

Interesting Engineering1 day ago

Potato-shaped critter visible now may predict climate change in Colorado

David Heitz21 days ago

Struggling Retailer Files for Bankruptcy, Finds Buyer for $760 Million Purchase

Akeena2 days ago

Keep The Kitchen Sink Area Decluttered & Organized

Declutterbuzz8 days ago

In Memory of 'Tough Guy' Actor Robert Conrad ('Wild, Wild West): 4 Years After His Tragic Death

Herbie J Pilato29 days ago

New nanomaterial to be used at crime scenes without lab, can transform forensics

Interesting Engineering4 hours ago

Toyota Prius sets Guinness World Record with 93 MPG coast-to-coast drive

Interesting Engineering9 hours ago

14-bit breakthrough: Brain-like device hits massive 4.1 tera-operations per second/watt

Interesting Engineering11 hours ago

US: 4th neutrino flavor discovery closer as Fermilab’s SBND detects 1st signals

Interesting Engineering2 days ago

Shrinkage in cattle: what you need to know

West Texas Livestock Growers8 days ago

After years of pressure, new Border Patrol policy protects essential items like medicine, IDs, legal documents

Arizona Luminaria13 days ago

Aging reversed: Popular cheap diabetes drug metformin could help preserve youth

Interesting Engineering1 day ago

It’s essential to note our commitment to transparency:

Our Terms of Use acknowledge that our services may not always be error-free, and our Community Standards emphasize our discretion in enforcing policies. As a platform hosting over 100,000 pieces of content published daily, we cannot pre-vet content, but we strive to foster a dynamic environment for free expression and robust discourse through safety guardrails of human and AI moderation.

Comments / 0

Community Policy