Open in App
  • Local
  • U.S.
  • Election
  • Politics
  • Sports
  • Lifestyle
  • Education
  • Real Estate
  • Newsletter
  • Interesting Engineering

    ORNL’s Discovery to be world’s most powerful and energy-efficient supercomputer

    By Abhishek Bhardwaj,

    10 hours ago

    https://img.particlenews.com/image.php?url=1ficzC_0vU21jzs00

    The Oak Ridge Leadership Computing Facility (OLCF), a Department of Energy Office of Science user facility located at Oak Ridge National Laboratory (ORNL), is working towards building a new supercomputer – called Discovery- by 2028 that will also demonstrate next-generation energy efficiencies.

    Frontier – OLCF’s current flagship supercomputer – currently ranks first in the Top 500 list of the world’s most powerful computers. Moreover, at the time of its launch in 2022, it had also debuted as one of the world’s most energy-efficient computers.

    Ever since OLCF was formed, it has fielded five generations of world-class supercomputing systems that have produced a nearly 2,000 times increase in energy efficiency per floating point operation per second, or flops, according to a press release by ORNL.

    Therefore, the lab is trying to build even better and more energy-efficient supercomputers in the future.

    The need for energy-efficient supercomputers and data centers

    According to a white paper by the Electric Power Research Institute, it is predicted that data centers will annually consume up to 6.8% of the total US electricity generation by 2030 — versus an estimated 4% today.

    To meet this growing demand for electricity, the US will need to invest around $50 billion in new electrical generation capacity, according to an estimate by Goldman Sachs Research.

    Moreover, high-performance computing also needs innovations to manage the rising power demands.

    “Private companies are now deploying machines that are several times larger than Frontier. Today, they essentially have unlimited deep pockets, so it’s easy for them to stand up a data center without concern for efficiency,” said Scott Atchley, chief technology officer of the National Center for Computational Sciences, or NCCS, at ORNL. “That will change once they become more power constrained, and they will want to get the most bang for their buck.”

    One big change that has happened in the past decade or so is the use of graphic processing units (GPUs) over central processing units (CPUs).

    “When you run electricity into a machine with GPUs, it takes roughly about a tenth the amount of energy as a machine that just has CPUs,” said ORNL’s Al Geist, director of the Frontier project.

    OLCF’s last big offering – Frontier

    With help from the Department of Energy’s (DOE) FastForward semiconductor chip vendor AMD, a faster, more powerful compute node was developed for Frontier — consisting of a 64-core 3rd Gen EPYC CPU and four Instinct MI250X GPUs — and a method was devised to enhance the efficiency of the GPUs by deactivating sections of the chips that are not in use and then reactivating them when needed within a mere few milliseconds.

    “In the old days, the entire system would light up and sit there idle, still burning electricity. Now we can turn off everything that’s not being used — and not just a whole GPU. On Frontier, about 50 different areas on each GPU can be turned off individually if they’re not being used. Now, not only is the silicon area mostly devoted to floating point operations, but in fact I’m not going to waste any energy on anything I’m not using,” Geist said.

    However, more techniques will be needed to build supercomputers that are more energy-efficient.

    Long before Frontier was built, Feiyi Wang — leader of the OLCF’s Analytics and AI Methods at Scale (AAIMS) group — collected over one year’s worth of power profiling data from Summit, the OLCF’s 200-petaflop supercomputer launched in 2018.

    Using the energy-profile datasets from Summit, Wang and his team kicked off the Smart Facility for Science project to provide ongoing production insight into HPC systems.

    “I want to take this continuous monitoring one step further to ‘continuous integration,’ meaning that we want to take the computer’s ongoing metrics and integrate them into a system so that the user can observe how their energy usage is going to be for their particular job application,” Wang said.

    Digital twin of Frontier supercomputer

    At ORNL, the AAIMS group launched the Digital Twin for Frontier project to construct a simulation of the Frontier supercomputer.

    This virtual Frontier can enable operators to experiment with “What if we tried this?” energy-saving scenarios before attempting them on the actual Frontier machine.

    “With this digital twin idea, we can take all that telemetry data into a system where, if we have enough fidelity modeled for the power and cooling aspects of the system, we can experiment. What if I change this setting — does it have a positive effect on the system or not?” Wang said.

    Frontier’s digital twin can be run on a desktop computer, and using VR and AR allows operators to examine the system telemetry in a more interactive and intuitive manner as they adjust parameters.

    The AAIMS group also created a virtual scheduling system to examine the digital twin’s power consumption and how it progresses over time as it executes jobs.

    These will come in handy in the race to build the next-generation , energy-efficient Discovery supercomputer.

    Moreover, the researchers have also reduced the energy needed for cooling by 10 times from 2009 to 2022, and the team will continue to make cooling optimizations going forward.

    Expand All
    Comments /
    Add a Comment
    YOU MAY ALSO LIKE
    Local News newsLocal News

    Comments / 0