This change adds a CPU fallback for devices that don't support the old
renderer nor have GPU support for compute programs.
Most of the hard work is implemented in the gioui.org/cpu module. It
uses the SwiftShader project with light modification to output
statically compiled CPU .o files for each compute program.
The CPU fallback only covers Linux and Android on arm, arm64, amd64
architectures. There is no fundamental reason support can't be extended
to other platforms:
- macOS and iOS are probably easy, but it's likely that virtually every
device has GPU support for compute shaders.
- Windows needs a Cgo-less port, or a build constraint to require a C
compiler (Gio core doesn't).
- FreeBSD and OpenBSD are probably also easy to do because they're so
similar to Linux.
- The 386 binaries didn't work properly in my tests, so fixes to
SwiftShader is probably needed. However, I expect virtually every
Intel device can run amd64 binaries.
Updates gio#49
Fixes gio#228
Signed-off-by: Elias Naur <mail@eliasnaur.com>
The CPU fallback for the compute renderer is contained in a separate
module for space reasons, but the CPU binaries must exactly match the
compute programs. However, there is no way to express that constraint
in go.mod.
This change generates hashes of every compute program so that a
following change can verify the CPU binaries match the programs.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
The hash of the clipping paths that affect drawing operations are computed
and used to quickly determine that two operations are not equal, the
most likely outcome of a comparison.
However, for paths that are constructed once and cached computing the
hash at every frame is wasteful. This is especially true for text, which
is both cached and also among the largest paths in a frame.
This change moves the hashing to op/clip.Path construction time, and
stores the hash in the ops list so it won't be re-computed at every use.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
To re-use previously cached layers, the compute renderer must know
whether two drawing operations are equal. In the case two operations are
not equal, a fast hash comparison will most likely fail. In the case two
equal operations with complicated clipping paths, the comparison of the
path data is expensive.
This change adds support for fast ops.Key comparisons, where two paths
are equal if their ops.Key are. This is an optimization that kicks in
for text rendering, where glyph clipping shapes are re-used across
frames.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
To re-use drawing operations common to two layers, every operation must
exactly match, including their transformations. However, layers that
differ only by an integer offset can be re-used because rendering does
not depend on the absolute integer offset. This is important in the very
common case of scrolling otherwise static UI content.
This change separate the integer offset from drawing operations and
relaxes the layer cache to match layers that differ only in integer
offsets.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
The compute renderer is more expensive to run than the old renderer on
low-end GPUs, and even more so on CPUs. To ensure good performance
regardless of the end-user device, this change implements automatic
re-use of content rendered in the frame before the current.
The basic idea is that every drawing operation (PaintOp), along with its
transform and clipping, can be hashed and efficiently looked up. A naïve
caching approach is then to rasterize every operation to separate
sections of several large texture atlases, turning a cache hit into a
very cheap texture copy.
However, for scenes with lots of overlapping operations, the resulting
texture memory from separating the operations would be much larger than
the memory for just the window framebuffer.
So instead of caching individual operations, this change caches layers,
which are sequences of drawing operations. It starts by putting all
operations into a single layer. Then, if the subsequent frame re-uses a
sub-sequence of that larger layer, it is split.
For example, consider a UI similar to the kitchen sample:
Hello, Gio
<Editor>
<Line Editor>
<Button> <Button> <Button>
<ProgressBar>
<Checkbox> <Toggle>
In the first frame, all of the drawing operations comprising the UI will
be stored and cached in a single layer. In the second frame the
progress bar will have moved and the renderer splits the UI into three
layers: layer A for everything up to (but not including) the progress
bar, layer B with just the progress bar, and layer C for the rest. Note
that nothing has been re-used yet. In the third frame, the progress bar
moves again, and this time layer A and C can be copied from the cache
only the progress bar needs redrawing through the compute programs.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
The performance difference is negligible, but is useful when the compute
pipeline can skip rendering to empty tiles.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
Until now, the two renderers have shared structures and code for
decoding drawing ops and convert them to GPU-friendly structures.
However, the decoder is tailored to the old renderer and use
structures that poorly fits the new compute renderer.
This change copies the decoder and specializes the copy for the compute
renderer, avoiding a round-trip through the old renderer decoder.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
The compute renderer doesn't run on Windows yet, but the d3d11 backend needs
the method to satisfy the driver interface.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
The transformation information in ops.Key is a layer violation.
Introduce a key type specific to package gpu and use that instead.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
For some reason, commit d331f63d20 didn't
update the generated code for material.vert properly. The outdated
version is equivalent to the new, so I only discovered this discrepancy
while changing some other shader.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
To ease the integration with foreign OpenGL contexts, carefully save the
context state before rendering a frame and restore it afterwards. Gio
rendering can then be mixed with OpenGL code that expects exclusive
control over context state.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
This changes moves the macOS specific setup for desktop OpenGL to the
portable opengl package. The opengl package already takes care of the
desktop OpenGL setup for sRGB framebuffers, and by moving the code we
avoid calling the wrong OpenGL functions in case both OpenGL.framework
and ANGLE libGLESv2.dylib is linked into the program.
Remove the interface casting expressions for gl.Functions; it wasn't
worth the trouble to keep updated.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
Desktop OpenGL implements a GL_FRAMEBUFFER_SRGB setting; query that instead
of the frambuffer color encoding.
With this change it is no longer necessary to enable FRAMEBUFFER_SRGB
in the macOS setup; remove it.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
macOS is the only platform where desktop OpenGL is used. To support
foreign ANGLE contexts add a setting to override Gio's selection of
OpenGL implementation.
The bulk of this change is making all function pointers per-context
instead of global, and loading the OpenGL library dynamically. As a side
effect we're closer to Gio tolerating a platform without any OpenGL
implementation. For example, Apple has deprecated OpenGL and OpenGL ES
on its platforms and may remove them in the future.
Note that as a side-effect of this change, Gio needs Go 1.16 or newer to
run on iOS.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
Programs such as gio-example/glfw rely on Gio drawing blending with
the framebuffer background. This change makes it so when sRGB emulation
is active.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
The clear background is the most useful, and the old behaviour can
be achieved by filling the entire viewport with a white paint.ColorOp.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
Collect is for converting ops to GPU commands, Frame is for actual
rendering. There's little practical difference, but makes profiling
easier to distinguish between conversion and rendering.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
The current renderer transforms and processes paths before sending them
to the GPU. It can compute bounds during processing.
The new renderer passes paths verbatim to the GPU, but needs the bounds
for constructing clip bounds.
This change computes the bounds during construction, so it is available
at use. As a bonus for storing the bounds with the path, path caches
(such as for storing text fragments) automatically reuse the bounds
calculations as well.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
GPU operations logically belong in the Frame method, and it's probably
best to keep them inside BeginFrame/EndFrame as well.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
GPU APIs require that barrier() calls are dynamically uniform, that is
for every barrier in the code, every shader invocation in a workgroup
must all call it, or all not call it.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
The reflected uniform names are for the shader versions that don't use uniform
buffer objects. For UBO shaders, the names won't resolve.
This change adds a panic when shader uniforms are not found, and fixes
Fixes gio#216
Signed-off-by: Elias Naur <mail@eliasnaur.com>
The recent changes to the compute shaders have fixed all errors
previously reported by fxc. Switch from dxc to fxc to target shader
model 5.0, supported by Direct3D 11.
Because we know dxc must be available, always build compute shaders even
though the result is not yet used.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
Apparently, exec.Command.Output waits for winepath's grandchildren to
exit. However, that may take several seconds if wineserver was started
by winepath.
exec.Command.StdoutPipe works better, in that it is closed when the
winepath process exits.
A similar change may help run the fxc.exe tool under Wine, if that ever
turns out to have the same problem.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
Wine tools can be slow to run, so it makes sense to batch their use.
Fortunately, winepath supports resolving multiple paths in one
execution.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
Literal strings are a more compact than literal byte slices. A future
change will switch to go:embed to save even more space.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
Before this change, the two renderers both had special case code for
approximating strokes they don't support natively. This change moves
that conversion to clip.Op.Add, for several reasons:
- The compute renderer no longer need fallback logic and caches for
strokes it doesn't support.
- The approximation logic is slow. Moving it to clip.Op.Add will not
speed it up, but will make the cost easier to spot in profiles. Until all
strokes are supported natively, users can use macros to cache
expensive strokes.
- Reduced garbage: Op.Add takes an op.Ops anyway, and can use that for
storing the approximated stroke outline.
Signed-off-by: Elias Naur <mail@eliasnaur.com>