Here is their collected assembly of questions from the
users and replies from the engineers and PMs:
RTX 30-Series
Why only 10 GB of memory for RTX 3080? How was
that determined to be a sufficient number, when it is stagnant from the
previous generation?
[Justin Walker] We’re
constantly analyzing memory requirements of the latest games and regularly
review with game developers to understand their memory needs for current and
upcoming games. The goal of 3080 is to give you great performance at up to 4k
resolution with all the settings maxed out at the best possible price.
In order to do this, you need a very powerful
GPU with high speed memory and enough memory to meet the needs of the games. A
few examples - if you look at Shadow of the Tomb Raider, Assassin’s Creed
Odyssey, Metro Exodus, Wolfenstein Youngblood, Gears of War 5, Borderlands 3
and Red Dead Redemption 2 running on a 3080 at 4k with Max settings (including
any applicable high res texture packs) and RTX On, when the game supports it,
you get in the range of 60-100fps and use anywhere from 4GB to 6GB of memory.
Extra memory is always nice to have but it
would increase the price of the graphics card, so we need to find the right
balance.
When the slide says RTX 3070 is equal or faster
than 2080 Ti, are we talking about traditional rasterization or DLSS/RT
workloads? Very important if you could clear it up, since no traditional
rasterization benchmarks were shown, only RT/DLSS supporting games.
[Justin Walker] *We
are talking about both. Games that only support traditional rasterization and
games that support RTX (RT+DLSS).
Does Ampere support HDMI 2.1 with the full
48Gbps bandwidth?
[Qi Lin] Yes. The
NVIDIA Ampere Architecture supports the highest HDMI 2.1 link rate of
12Gbs/lane across all 4 lanes, and supports Display Stream Compression (DSC) to
be able to power up to 8K, 60Hz in HDR.
Could you elaborate a little on this doubling
of CUDA cores? How does it affect the general architectures of the GPCs? How
much of a challenge is it to keep all those FP32 units fed? What was done to
ensure high occupancy?
[Tony Tamasi] One
of the key design goals for the Ampere 30-series SM was to achieve twice the throughput
for FP32 operations compared to the Turing SM. To accomplish this goal, the
Ampere SM includes new datapath designs for FP32 and INT32 operations. One
datapath in each partition consists of 16 FP32 CUDA Cores capable of executing
16 FP32 operations per clock. Another datapath consists of both 16 FP32 CUDA
Cores and 16 INT32 Cores. As a result of this new design, each Ampere SM
partition is capable of executing either 32 FP32 operations per clock, or 16
FP32 and 16 INT32 operations per clock. All four SM partitions combined can
execute 128 FP32 operations per clock, which is double the FP32 rate of the
Turing SM, or 64 FP32 and 64 INT32 operations per clock.
Doubling the processing speed for FP32 improves
performance for a number of common graphics and compute operations and
algorithms. Modern shader workloads typically have a mixture of FP32 arithmetic
instructions such as FFMA, floating point additions (FADD), or floating point
multiplications (FMUL), combined with simpler instructions such as integer adds
for addressing and fetching data, floating point compare, or min/max for
processing results, etc. Performance gains will vary at the shader and
application level depending on the mix of instructions. Ray tracing denoising
shaders are good examples that might benefit greatly from doubling FP32
throughput.
Doubling math throughput required doubling the
data paths supporting it, which is why the Ampere SM also doubled the shared
memory and L1 cache performance for the SM. (128 bytes/clock per Ampere SM
versus 64 bytes/clock in Turing). Total L1 bandwidth for GeForce RTX 3080 is
219 GB/sec versus 116 GB/sec for GeForce RTX 2080 Super.
Like prior NVIDIA GPUs, Ampere is composed of
Graphics Processing Clusters (GPCs), Texture Processing Clusters (TPCs), Streaming
Multiprocessors (SMs), Raster Operators (ROPS), and memory controllers.
The GPC is the dominant high-level hardware
block with all of the key graphics processing units residing inside the GPC.
Each GPC includes a dedicated Raster Engine, and now also includes two ROP
partitions (each partition containing eight ROP units), which is a new feature
for NVIDIA Ampere Architecture GA10x GPUs. More details on the NVIDIA Ampere
architecture can be found in NVIDIA’s Ampere Architecture White Paper, which will
be published in the coming days.
Any idea if the dual airflow design is going to
be messed up for inverted cases? More than previous designs? Seems like it
would blow it down on the cpu. But the CPU cooler would still blow it out the
case. Maybe it’s not so bad.
Second question. 10x quieter than the Titan for
the 3090 is more or less quieter than a 2080 Super (Evga ultra fx for example)?
[Qi Lin] The new flow
through cooling design will work great as long as chassis fans are configured
to bring fresh air to the GPU, and then move the air that flows through the GPU
out of the chassis. It does not matter if the chassis is inverted.
The Founders Edition RTX 3090 is quieter than
both the Titan RTX and the Founders Edition RTX 2080 Super. We haven’t tested
it against specific partner designs, but I think you’ll be impressed with what
you hear… or rather, don’t hear. :-)
Will the 30 series cards be supporting 10bit
444 120fps ? Traditionally Nvidia consumer cards have only supported 8bit or
12bit output, and don’t do 10bit. The vast majority of hdr monitors/TVs on the
market are 10bit.
[Qi Lin] The 30 series
supports 10bit HDR. In fact, HDMI 2.1 can support up to 8K@60Hz with 12bit HDR,
and that covers 10bit HDR displays.
What breakthrough in tech let you guys
massively jump to the 3xxx line from the 2xxx line? I knew it would be scary,
but it's insane to think about how much more efficient and powerful these cards
are. Can these cards handle 4k 144hz?
[Justin Walker] There
were major breakthroughs in GPU architecture, process technology and memory
technology to name just a few. An RTX 3080 is powerful enough to run certain
games maxed out at 4k 144fps - Doom Eternal, Forza 4, Wolfenstein Youngblood to
name a few. But others - Red Dead Redemption 2, Control, Borderlands 3 for
example are closer to 4k 60fps with maxed out settings.
Will customers find a performance degradation
on PCIE 3.0?
System performance is impacted by many factors
and the impact varies between applications. The impact is typically less than a
few percent going from a x16 PCIE 4.0 to x16 PCIE 3.0. CPU selection often has
a larger impact on performance.We look forward to new platforms that can fully
take advantage of Gen4 capabilities for potential performance increases.
RTX IO
Could we see RTX IO coming to machine learning
libraries such as Pytorch? This would be great for performance in real-time
applications
[Tony Tamasi] NVIDIA
delivered high-speed I/O solutions for a variety of data analytics platforms roughly
a year ago with NVIDIA GPU DirectStorage. It provides for high-speed I/O
between the GPU and storage, specifically for AI and HPC type applications and
workloads. For more information please check out: https://developer.nvidia.com/blog/gpudirect-storage/
Does RTX IO allow use of SSD space as VRAM? Or
am I completely misunderstanding?
[Tony Tamasi] RTX
IO allows reading data from SSD’s at much higher speed than traditional
methods, and allows the data to be stored and read in a compressed format by
the GPU, for decompression and use by the GPU. It does not allow the SSD to
replace frame buffer memory, but it allows the data from the SSD to get to the
GPU, and GPU memory much faster, with much less CPU overhead.
Will there be a certain ssd speed requirement
for RTX I/O?
[Tony Tamasi] There
is no SSD speed requirement for RTX IO, but obviously, faster SSD’s such as the
latest generation of Gen4 NVMe SSD’s will produce better results, meaning
faster load times, and the ability for games to stream more data into the world
dynamically. Some games may have minimum requirements for SSD performance in
the future, but those would be determined by the game developers. RTX IO will
accelerate SSD performance regardless of how fast it is, by reducing the CPU
load required for I/O, and by enabling GPU-based decompression, allowing game
assets to be stored in a compressed format and offloading potentially dozens of
CPU cores from doing that work. Compression ratios are typically 2:1, so that
would effectively amplify the read performance of any SSD by 2x.
Will the new GPUs and RTX IO work on Windows
7/8.1?
[Tony Tamasi] RTX
30-series GPUs are supported on Windows 7 and Windows 10, RTX IO is supported
on Windows 10.
I am excited for the RTX I/O feature but I
partially don't get how exactly it works? Let's say I have a NVMe SSD, a 3070
and the latest Nvidia drivers, do I just now have to wait for the windows
update with the DirectStorage API to drop at some point next year and then I am
done or is there more?
[Tony Tamasi] RTX
IO and DirectStorage will require applications to support those features by
incorporating the new API’s. Microsoft is targeting a developer preview of
DirectStorage for Windows for game developers next year, and NVIDIA RTX gamers
will be able to take advantage of RTX IO enhanced games as soon as they become
available.
RTX Broadcast
What is the scope of the "Nvidia
Broadcast" program? Is it intended to replace current GFE/Shadowplay for
local recordings too?
[Gerardo Delgado] NVIDIA
Broadcast is a universal plugin app that enhances your microphone, speakers and
camera with AI features such as noise reduction, virtual background, and auto
frame. You basically select your devices as input, decide what AI effect to
apply to them, and then NVIDIA Broadcast exposes virtual devices in your system
that you can use with popular livestream, video chat, or video conference apps.
NVIDIA Broadcast does not record or stream
video and is not a replacement for GFE/Shadowplay
Jason, Will there be any improvements to the
RTX encoder in the Ampere series cards, similar to what we saw for the Turing
Release? I did see info on the Broadcast software, but I'm thinking more along
the lines of improvements in overall image quality at same bitrate.
[Jason Paul] Hi
Carmen813, for RTX 30 Series, we decided to focus improvements on the video
decode side of things and added AV1 decode support. On the encode side, RTX 30
Series has the same great encoder as our RTX 20 Series GPU. We have also
recently updated our NVIDIA Encoder SDK. In the coming months, livestream
applications will be updating to this new version of the SDK, unlocking new
performance options for streamers.
I would like to know more about the new NVENC
-- were there any upgrades made to this technology in the 30 series? It seems
to be the future of streaming, and for many it's the reason to buy nvidia card
rather than any other.
[Gerardo Delgado] The
GeForce RTX 30 Series leverages the same great hardware encoder as the GeForce
RTX 20 Series. We have also recently updated our Video Codec SDK to version
10.0. In the coming months, applications will be updating to this new version
of the SDK, unlocking new performance options.
Regarding AV1 decode, is that supported on 3xxx
series cards other than the 3090? In fact can this question and dylan522p
question on support level be merged into: What are the encode/decode features
of Ampere and do these change based on which 3000 series card is bought?
[Gerardo Delgado] All
of the GeForce RTX 30 Series GPUs that we announced today have the same
encoding and decoding capabilities:
They all feature the 7th Gen NVIDIA Encoder
(the one that we released with the RTX 20 Series), which will use our newly
released Video Codec SDK 10.0. This new SDK will be integrated in the coming
months by the live streaming apps, unlocking new presets with more performance
options.
They all have the new 5th Gen NVIDIA Decoder,
which enables AV1 hardware accelerated decode on GPU. AV1 consumes 50% less
bandwidth and unlocks up to 8K HDR video playback without a big performance hit
on your CPU.
NVIDIA Machinima
How active is the developer support for
Machinima? As it's cloud based, I'm assuming that the developers/publishers
have to be involved for it to really take off (at least indirectly through
modding community support or directly with asset access). Alongside this, what
is the benefit of having it cloud based, short of purely desktop?
[Richard Kerris] We
are actively working with game developers on support for Omniverse Machinima
and will have more details to share along with public beta in October.
Omniverse Machinima can be run locally on a
GeForce RTX desktop PC or in the cloud. The benefit of running Omniverse from
the cloud is easier real-time collaboration across users.
NVIDIA Studio
Content creator here. Will these cards be
compatible with GPU renderers like Octane/Arnold/Redshift/etc from launch? I
know with previous generations, a new CUDA version coincided with the launch
and made the cards inert for rendering until the 3rd-party software patched it
in, but I'm wondering if I will be able to use these on launch day using
existing CUDA software.
[Stanley Tack] A
CUDA update will be needed for some renderers. We have been working closely
with the major creative apps on these updates and expect the majority
(hopefully all!) to be ready on the day these cards hit the shelves.
NVIDIA Reflex
Will Nvidia Reflex be a piece of hardware in
new monitors or will it be a software that other nvidia gpus can use?
[Seth Schneider] NVIDIA
Reflex is both. The NVIDIA Reflex Latency Analyzer is a revolutionary new
addition to the G-SYNC Processor that enables end to end system latency
measurement. Additionally, NVIDIA Reflex SDK is integrated into games and
enables a Low Latency mode that can be used by GeForce GTX 900 GPUs and up to
reduce system latency. Each of these features can be used independently.
Post a Comment