Open in App
  • Local
  • U.S.
  • Election
  • Politics
  • Sports
  • Lifestyle
  • Education
  • Real Estate
  • Newsletter
  • Interesting Engineering

    Flawed to flawless: New diffusion model ends AI image generation issues

    By Kapil Kajal,

    4 hours ago

    https://img.particlenews.com/image.php?url=0c3kOg_0vVoLn3J00

    Generative artificial intelligence (AI) has historically struggled to produce consistent images, often misinterpreting details such as fingers and facial symmetry.

    Moreover, when prompted to generate images of different sizes and resolutions, these models can fail.

    Rice University computer scientists have developed a new method for generating images using pre-trained diffusion models to curb such issues.

    These models are generative AI that learns by adding layer after layer of random noise to the images they are trained on and then generates new images by removing the added noise.

    ElasticDiffusion

    Moayed Haji Ali, a doctoral student in computer science at Rice University, presented the new approach called ElasticDiffusion in a peer-reviewed paper at the 2024 Institute of Electrical and Electronics Engineers (IEEE) Conference on Computer Vision and Pattern Recognition (CVPR) in Seattle.

    “Diffusion models like Stable Diffusion, Midjourney, and DALL-E create impressive results, generating fairly lifelike and photorealistic images,” Haji Ali said.

    “But they have a weakness: They can only generate square images. So, in cases where you have different aspect ratios, like on a monitor or a smartwatch … that’s where these models become problematic.”

    If you instruct a model like Stable Diffusion to generate a non-square image, such as one with a 16:9 aspect ratio, the elements used to construct the resulting image may become repetitive.

    That repetition manifests as abnormal deformities in the image or image subjects, such as individuals with six fingers or a strangely elongated car. The way these models are trained also contributes to the problem.

    “If you train the model on only images that are a certain resolution, they can only generate images with that resolution,” said Vicente Ordóñez-Román, an associate professor of computer science who advised Haji Ali on his work alongside Guha Balakrishnan, assistant professor of electrical and computer engineering.

    Overfitting

    Ordóñez-Román explained that overfitting is a common problem in AI, where the model becomes too specialized in the training data.

    “You could solve that by training the model on a wider variety of images, but it’s expensive and requires massive amounts of computing power ⎯ hundreds, maybe even thousands of graphics processing units,” Ordóñez-Román said.

    According to Haji Ali, digital noise used by diffusion models can be translated into a signal with two data types: local and global.

    The local signal contains detailed pixel-level information, such as the shape of an eye or the texture of a dog’s fur, while the global signal captures the image’s overall outline.

    “One reason diffusion models need help with non-square aspect ratios is that they usually package local and global information together,” said Haji Ali, who worked on synthesizing motion in AI-generated videos before joining Ordóñez-Román’s research group at Rice for his Ph.D. studies.

    “When the model tries to duplicate that data to account for the extra space in a non-square image, it results in visual imperfections.”

    Different approach

    The ElasticDiffusion method explained in Haji Ali’s paper takes a unique approach to generating images.

    Instead of combining both signals, ElasticDiffusion separates the local and global signals into conditional and unconditional generation paths.

    It subtracts the conditional model from the unconditional one, resulting in a score encompassing overall image information.

    After that, the unconditional path with the local pixel-level detail is applied to the image in quadrants, filling in the details one square at a time.

    Global information, such as the image aspect ratio and the content of the image (e.g., a dog, a person running, etc.), remains separate. This ensures that the AI does not confuse the signals and repeat data.

    The result is a clearer image that does not require additional training, regardless of the aspect ratio.

    The only drawback to ElasticDiffusion relative to other diffusion models is time. Currently, it takes up to 6-9 times as long for Haji Ali’s method to make an image.

    The goal is to reduce that to the same inference time as other models like Stable Diffusion or DALL-E.

    Expand All
    Comments /
    Add a Comment
    YOU MAY ALSO LIKE
    Local News newsLocal News
    Alameda Post7 days ago
    West Texas Livestock Growers8 days ago

    Comments / 0