Technical Challenges to Scale Beyond GPT4 to 100K H100s | NextBigFuture.com

ByStaff Reporter December 9, 2024 Reading Time: 3 minutes

Up until late 2024, no one has been able to massively increase the amount of compute dedicated to a single model beyond the OpenAI GPT 4 model level. This information is from semianalysis and EIA.

Google’s Gemini Ultra, Nvidia Nemotron 340B, and Meta LLAMA 3 405B had similar or slightly more compute than GPT-4, but an inferior architecture was use.d. Those models did not unlock new capabilities.

A 100,000 GPU cluster needs
150MW in datacenter capacity
uses 1.59 terawatt hours in a single year
energy costs $123.9 million at a standard rate of $0.078/kWh
100,000 H100 GPU servers cost $4 billion

OpenAI began training GPT5 around May 2024.

OpenAI’s training BF16 FLOPS for GPT-4 21.5 million ExaFLOPs on ~20,000 A100s for 90 to 100 days. An 100k H100 cluster will have 15-31 times the compute.

A 100k H100 cluster training run for 100 days can reach 600 million ExaFLOPs. The reliability problems for hardware reduces effective compute to 35% of the theoretical level.

To understand network design, topology, reliability concerns, and checkpointing strategies we need to understand how LLM handle data and minimize data movement.

There are 3 different types of parallelism used in trillion parameter training – Data Parallelism, Tensor Parallelism, and Pipeline Parallelism.

Data Parallelism is the simplest form of parallelism in which each GPU holds the entire copy of the model weights and each GPU (rank) receives a different subset of the data. This type of parallelism has the lowest level of communication since just the gradients needs to be summed up (all reduce) between each GPU. This only works if each GPU has enough memory to store the entire model weights, activations, optimizer state. The model weights and optimizer state can take as much as 10.8 Terabytes of memory for training for GPT4.

Tensor parallelism reduces the total memory used per GPU by the number of tensor parallelism ranks. For example, it is common to use 8 tensor parallelism ranks today across NVLink so this will reduce the used memory per GPU by 8.

With Pipeline Parallelism, each GPU only has a subset of the layers and only does the computation for that layer and passes the output other the next GPU.

Brian Wang is a Futurist Thought Leader and a popular Science blogger with 1 million readers per month. His blog Nextbigfuture.com is ranked #1 Science News Blog. It covers many disruptive technology and trends including Space, Robotics, Artificial Intelligence, Medicine, Anti-aging Biotechnology, and Nanotechnology.

Known for identifying cutting edge technologies, he is currently a Co-Founder of a startup and fundraiser for high potential early-stage companies. He is the Head of Research for Allocations for deep technology investments and an Angel Investor at Space Angels.

A frequent speaker at corporations, he has been a TEDx speaker, a Singularity University speaker and guest at numerous interviews for radio and podcasts. He is open to public speaking and advising engagements.

Article Source

Information contained on this page is provided by an independent third-party content provider. This website makes no warranties or representations in connection therewith. If you are affiliated with this page and would like it removed please contact editor @riverton.business

Riverton Science

New Institute to Create Basic Molecular Machines and Direct to Diamondoid Patents | NextBigFuture.com

ByStaff Reporter December 9, 2024 Reading Time: 3 minutes

Jeremy Barton is creating an institute to create basic molecular machines. A primary focus is to create basic motors. We need design tools, motors and more. He previously created an group that combined chemistry, physics, surface science to work on positional chemistry. $150 million was not enough. Here are patents related to mechanosynthesis and direct…

Riverton Science

Millions of Quantum Operations From Near Term Quantum Computers | NextBigFuture.com

ByStaff Reporter December 11, 2024 Reading Time: 2 minutes

Home » Quantum computers » Millions of Quantum Operations From Near Term Quantum Computers Preskill’s has made major contributions to quantum computing including the concept of Noisy Intermediate-Scale Quantum” (NISQ) technology. This term describes the current state of quantum computers, which, despite not being fully error-corrected or fault-tolerant, can perform tasks beyond the capabilities of…

Riverton Science

IBMs Using Largest Quantum Computers With Largest Supercomputers | NextBigFuture.com

ByStaff Reporter December 11, 2024 Reading Time: 1 minute

Home » News » IBMs Using Largest Quantum Computers With Largest Supercomputers IBM is using the largest quantum computers with the largest supercomputers. They have already use these hybrid systems to advance science like larger accurate quantum simulations. They are building even larger quantum systems and larger supercomputers. Brian Wang is a Futurist Thought Leader…

Riverton Science

OpenAI is Walking Dead Like Intel | NextBigFuture.com

ByStaff Reporter December 6, 2024 Reading Time: 4 minutes

Intel was the leader with computer chips but lost the elad by failing to move from 10nm to nm. The Intel CEO Pat Gelsinger returned to Intel in 2021, he outlined an ambitious plan to restore Intel’s manufacturing leadership with a goal to achieve five nodes in four years, aiming to catch up with TSMC…

Riverton Science

Quantinuum Has 50 Logical Qubits with 98% Fidelity | NextBigFuture.com

ByStaff Reporter December 10, 2024 Reading Time: 2 minutes

Home » Artificial intelligence » Quantinuum Has 50 Logical Qubits with 98% Fidelity Quantinuum, $5 B quantum computer company, announced its latest hardware milestone: the largest ever GHZ state created with logical qubits with a greater than 98% fidelity. Quantinuum is the world’s largest integrated quantum company ($5B valuation), Quantinuum is leading the development of…

Riverton Science

Philippine Entrepreneur Combines Blockchain Innovation with Environmental Conservation through Ora Coin Foundation | NextBigFuture.com

ByStaff Reporter November 27, 2024 Reading Time: 3 minutes

Home » Technology » Philippine Entrepreneur Combines Blockchain Innovation with Environmental Conservation through Ora Coin Foundation Cebu, Philippines, November 27th, 2024, CyberNewsWire Cebu-based entrepreneur Brian Christopher Aguilar has emerged as a notable figure in the cryptocurrency sector, leveraging blockchain technology to support environmental sustainability. As the founder of Ora Coin Foundation, Brian has turned his…

Technical Challenges to Scale Beyond GPT4 to 100K H100s | NextBigFuture.com

New Institute to Create Basic Molecular Machines and Direct to Diamondoid Patents | NextBigFuture.com

Millions of Quantum Operations From Near Term Quantum Computers | NextBigFuture.com

IBMs Using Largest Quantum Computers With Largest Supercomputers | NextBigFuture.com

OpenAI is Walking Dead Like Intel | NextBigFuture.com

Quantinuum Has 50 Logical Qubits with 98% Fidelity | NextBigFuture.com

Philippine Entrepreneur Combines Blockchain Innovation with Environmental Conservation through Ora Coin Foundation | NextBigFuture.com

Woman Gives Her Simple Tips To Keep The Spark Alive In Marriage

IBMs Using Largest Quantum Computers With Largest Supercomputers | NextBigFuture.com

Quantum Chemistry Simulations Beyond Regular Supercomputers at 84 Qubits | NextBigFuture.com

Similar Posts