In a discussion with Raph Levien, the author of our compute renderer
implementation, it became clear to me that it's not at all certain that
complex strokes will ever be efficiently supported by a GPU renderer.
At the same time, the machinery for converting a complex stroke to a
GPU-friendly outline has a significant maintenance cost. Further, it is
surprising to users that complex strokes are significantly slower and
allocate memory.
This change removes support for complex strokes, leaving only
round-capped, round-joined strokes supported by the compute renderer.
The default renderer still converts all strokes to outline, but it also
caches the result.
This is an API change. The complex stroke conversion code has been moved
to the external gioui.org/x/stroke package, with a similar API.
Updats gio#282 (Inkeliz brought up the allocation issue)
Signed-off-by: Elias Naur <mail@eliasnaur.com>
The op.Save and Load methods exist to support the need for
transformation, clip, pointer area state to behave as stacks. For
example, layout needs to apply an offset to its children but not
subsequent operations.
Before this change, op.Save and Load were used to save and restore the
state:
ops := new(op.Ops)
// Save state.
state := op.Save(ops)
// Apply offset.
op.Offset(...).Add(ops)
// Draw with offset applied.
draw(ops)
// Restore state.
state.Load()
A drawback with the op.Save mechanism is that there is no direct
connection between the state change and the saving and loading of state.
This causes confusion as to when a Save/Load is needed and who is
responsible for performing them, which leads to subtle bugs and over-use
of Save/Loads.
This change gets rid of the general state stack and replaces it with
per-state stacks. There is now a stack for transformation, clip, pointer
areas, and they can only be restored by the code pushing state to them.
The example above now becomes:
ops := new(op.Ops)
// Push offset to the transformation stack.
stack := op.Offset(...).Push(ops)
// Draw with offset applied.
draw(ops)
// Restore state.
stack.Pop()
For convenience, transformation also be Add'ed if the stack operation is
not required.
Simple state such as the current material no longer has a way to be
restored; it is assumed the client of a PaintOp adds their desired
material operation before it.
API change: replace op.Save/Load with explicit Push/Pop scopes for
op.TransformOps, pointer.AreaOps, clip.Ops.
To ease porting, this change retains a version of op.Save/Load that
saves and restores the transformation and clip stacks. It also retains
an Add method for clip.Op.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
We're about to split state into a clip stack and a transform stack. The
intersect fields belongs to clip state.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
MemoryBarrier is meant to stand in for OpenGL ES 3.1's glMemoryBarrier.
However, it badly fits with the other backends: Metal and D3D11 have
automatic memory barriers, and Vulkan needs barriers for graphics as
well.
This change removes MemoryBarrier, and puts the burden on the backends.
The OpenGL backend simply adds a barrier between every compute dispatch.
This change only adds a single memory barrier compared to the manual
barriers before this change, which is unlikely to affect performance
much.. We can revisit the automatic barriers if they ever become a
performance problem.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
driver.Device.NewFramebuffer doesn't provide additional information over
driver.Device.NewTexture, so Texture can hold its (optional) framebuffer
on behalf of the renderers. Metal don't even need a separate framebuffer
object.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
Vulkan textures (VkImage) are always in a particular layout, where each
layout is optimized for a particular use (transfer, sampling, compute
storage). Vulkan allows layout transitions everywhere except inside
render passes. This change adds driver.Device.PrepareTexture for
instructing the driver to switch a texture to a layout for sampling
in preparation for using it in a render pass.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
Vulkan doesn't support it, so move vertex buffer updates outside render
passes. Splitting render passes is another approach, but not as
efficient.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
Vulkan doesn't support changing uniforms during a render pass. However,
push constants *can* be changed. The gio-shader repository was changed
to use push constants instead of uniforms, this change implements the
corresponding driver and renderer change.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
Currently, we run kernel4.comp with whatever texture was
bound (or none) when there are no materials in the set of layers.
However, Vulkan require every image binding to a compute shader to be
non-null and valid. This change works around that limitation by binding
a small dummy texture when no materials are needed.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
Modern GPU API such as Metal and Vulkan use explicit render passes
and command buffers for recording rendering commands. They don't have
global state; each render pass starts with a clean set of bound
textures, pipeline etc.
Change our GPU abstraction to better match newer API and modify our two
renderers to explicitly describe their render passes.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
Before this change, transformed images would take up as much atlas
texture space to fit them. However, we can easily run out of space for
large images or images with large scaling applied.
This change limits transformed images to their rendered bounds which is
the window size in the worst case.
Updates gio#219 (fixes the chat kitchen example)
Signed-off-by: Elias Naur <mail@eliasnaur.com>
A follow-up change will cause some transformed images to render outside
their allocated atlas bounds. This change uses the GPU viewport to clip
them so they won't overwrite other atlas content.
Updates gio#219
Signed-off-by: Elias Naur <mail@eliasnaur.com>
There's no meaningful reason to have them separate. The intention was to
enable rendering concurrent with other processing, but that's gaining
framerate at the expense of input latency and complicating ImageOp
semantics.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
OpenGL ES 2.0 doesn't support glBlitFramebuffer, but does support
glCopyTexSubImage2D. Fortunately, we don't need the extra features of
glBlitFramebuffer anyway.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
The Metal (and presumably the D3D11) backend doesn't support transformed
framebuffer blits. The only caller doesn't need it either, so drop that
capability from the driver abstraction.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
Modern API such as Metal and Vulkan want clients to compile expensive
state changes into pipeline objects. Change our GPU driver abstraction
to match, thereby paving the way for future drivers.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
Both the OpenGL and the Direct3D API are stateful and gpu.GPU renders to
the render target current when Frame is called.
Modern GPU API such as Metal don't have a concept of a current render
target, and the target even changes each frame.
Add RenderTarget and add an explicit target argument to GPU.Frame as
well as the underlying driver.Device.BeginFrame.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
This change avoids the hard dependency on GPU support for sRGB encoded
textures in the compute renderer.
With this change and the previously added CPU fallback, Gio no longer
rely on any GPU functionality outside the OpenGL ES 2.0 level.
Fixes gio#49
Fixes gio#154
Fixes gio#97
Fixes gio#36
Fixes gio#172
Signed-off-by: Elias Naur <mail@eliasnaur.com>
This change adds a CPU fallback for devices that don't support the old
renderer nor have GPU support for compute programs.
Most of the hard work is implemented in the gioui.org/cpu module. It
uses the SwiftShader project with light modification to output
statically compiled CPU .o files for each compute program.
The CPU fallback only covers Linux and Android on arm, arm64, amd64
architectures. There is no fundamental reason support can't be extended
to other platforms:
- macOS and iOS are probably easy, but it's likely that virtually every
device has GPU support for compute shaders.
- Windows needs a Cgo-less port, or a build constraint to require a C
compiler (Gio core doesn't).
- FreeBSD and OpenBSD are probably also easy to do because they're so
similar to Linux.
- The 386 binaries didn't work properly in my tests, so fixes to
SwiftShader is probably needed. However, I expect virtually every
Intel device can run amd64 binaries.
Updates gio#49
Fixes gio#228
Signed-off-by: Elias Naur <mail@eliasnaur.com>
The hash of the clipping paths that affect drawing operations are computed
and used to quickly determine that two operations are not equal, the
most likely outcome of a comparison.
However, for paths that are constructed once and cached computing the
hash at every frame is wasteful. This is especially true for text, which
is both cached and also among the largest paths in a frame.
This change moves the hashing to op/clip.Path construction time, and
stores the hash in the ops list so it won't be re-computed at every use.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
To re-use previously cached layers, the compute renderer must know
whether two drawing operations are equal. In the case two operations are
not equal, a fast hash comparison will most likely fail. In the case two
equal operations with complicated clipping paths, the comparison of the
path data is expensive.
This change adds support for fast ops.Key comparisons, where two paths
are equal if their ops.Key are. This is an optimization that kicks in
for text rendering, where glyph clipping shapes are re-used across
frames.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
To re-use drawing operations common to two layers, every operation must
exactly match, including their transformations. However, layers that
differ only by an integer offset can be re-used because rendering does
not depend on the absolute integer offset. This is important in the very
common case of scrolling otherwise static UI content.
This change separate the integer offset from drawing operations and
relaxes the layer cache to match layers that differ only in integer
offsets.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
The compute renderer is more expensive to run than the old renderer on
low-end GPUs, and even more so on CPUs. To ensure good performance
regardless of the end-user device, this change implements automatic
re-use of content rendered in the frame before the current.
The basic idea is that every drawing operation (PaintOp), along with its
transform and clipping, can be hashed and efficiently looked up. A naïve
caching approach is then to rasterize every operation to separate
sections of several large texture atlases, turning a cache hit into a
very cheap texture copy.
However, for scenes with lots of overlapping operations, the resulting
texture memory from separating the operations would be much larger than
the memory for just the window framebuffer.
So instead of caching individual operations, this change caches layers,
which are sequences of drawing operations. It starts by putting all
operations into a single layer. Then, if the subsequent frame re-uses a
sub-sequence of that larger layer, it is split.
For example, consider a UI similar to the kitchen sample:
Hello, Gio
<Editor>
<Line Editor>
<Button> <Button> <Button>
<ProgressBar>
<Checkbox> <Toggle>
In the first frame, all of the drawing operations comprising the UI will
be stored and cached in a single layer. In the second frame the
progress bar will have moved and the renderer splits the UI into three
layers: layer A for everything up to (but not including) the progress
bar, layer B with just the progress bar, and layer C for the rest. Note
that nothing has been re-used yet. In the third frame, the progress bar
moves again, and this time layer A and C can be copied from the cache
only the progress bar needs redrawing through the compute programs.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
The performance difference is negligible, but is useful when the compute
pipeline can skip rendering to empty tiles.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
Until now, the two renderers have shared structures and code for
decoding drawing ops and convert them to GPU-friendly structures.
However, the decoder is tailored to the old renderer and use
structures that poorly fits the new compute renderer.
This change copies the decoder and specializes the copy for the compute
renderer, avoiding a round-trip through the old renderer decoder.
Signed-off-by: Elias Naur <mail@eliasnaur.com>