Used native hardware MSAA x4
Full multisampling support including custom blend and compositions.
Must be verified on web and 4K resolution for performance issues
Deep shader refactoring for the following purposes:
* used pre-calculated gradient texture instead of per-pixel gradient map computation
* used HW wrap samples for fill spread setting
* unified gradient shader types
* used single shader module for composition instead of signle module per composition type
* used single shader module for blending for each of fill type (solid, gradient, image) instaed of signle module per blend type
* much easier add new composition and blend equations
* get rided std::string uasge
* shaders code is more readable
* bind groups creation in real time removed - performance boost
* blend and composition shaders decomposed - performance boost
* shader modules and pipeline layouts generalized - less memory usage
* shared single stencil buffer used - less memory usage
* bind groups usage simplified
* general context API simplified and generalized
* all rendering logic moved into new composition class
* ready for hardware MSAA (in next steps)
* ready for direct mask applience (in next steps)
[issues 1479: ClipPath](#1479)
Supports ClipPath composition.
Clip path composition is an only composition type who doesn't ignore blend method.
Clip path is a combination of composition approach and blend approach using compute shader
Skip shapes rendering, if opacity is 0 and if fill color for shape and strokes also equal to 0
This behavior is used in sw renderer and fix visual artifacts in referenced animations.
Also this rule fix composition results in case of AlphaMask and InvAlphaMask methods
it provide changes public API for webgpu canvas interface to provide nessessary handles to native window for different platforms:
API Change:
- Result target(void *instance, void *surface, uint32_t w, uint32_t h) noexcept;
In a case of usage stencil buffer only we need to turn off an color target writes. In other case color buffer fill be filled by unxepcted color if fragment shader did not return any value.
It happens in a case on OpenGL realization of webgpu, that used in linux
Befire:
After:
[issues 1479: Fill Rule](#1479)
In this solution we dont need to find silhouette, that is not a cheep operation (decreasing performance in 2 times)
For winding, you can select separate operations for front and back faces (increment for front, decrement for back)
After rendering the fan, the value in the stencil buffer will be the winding number. You can fill the appropriate portion by rendering a screen-sized quad with stencil testing enabled and the stencil function set according to which winding rule you wish to use.
For even-odd, you don't need to distinguish front and back faces; you can just use INVERT as the operation.
[issues 1479: Masking, InvMasking, LumaMasking, InvLumaMasking](#1479)
Computes composition and blending using simgle pass istead of two full screen passes
[issues 1479: lottie](#1479)
- optimaze stroking algorithm by prevent vector normalizations
- using render meshes heaps to reduce webgpu instances re-creations
- using single instance of pilylines for standard operations such as trim and split
- "on-the-fly" strokes generations with dash pattern
- "on-the-fly" mesh objects generation in a process of path decoding
- merge strokes into single mesh by each shape
[issues 1479: lottie](#1479)
To optimize bled operations hardware pipeline blend stage are used for some blend methods:
BlendMethod::SrcOver
BlendMethod::Normal
BlendMethod::Add
BlendMethod::Multiply
BlendMethod::Darken
BlendMethod::Lighten
Other types compute shaders used
[issues 1479: antialiasing](#1479)
Anti-aliasing implementation
Implements antialiasing as a post process on cimpute shaders and original render target with scale of 2x.
Can be modified as an external settings
Before the current changes, all surfaces were painted using a full-screen overlay, no matter how large the object was rendered. This approach is redundant and required reorganization. At the moment, all objects are rendered using an overlay equal to the box of the object itself, which reduces the cost of filling the surface.
Also surfaces and images were divided into different entities, which reduces the pressure on memory.
Also geometry data for rendering and geometry data for calculations in system memory were logically separated.
For further development of features, we need to create off-screen buffers that will allow us to implement functionality related to composition and blending, as well as for loading data to system memory from the framebuffer. Separating the framebuffer into a separate entity allows you to create several instances of them, switch between them, and blend them according to given rules.
For current time we have only a single render target instance, that have a handle to drawing into surface surface, like a native window.
New approach allows:
- offscreen rendering
- render pass handling
- switching between render targets
- ability to render images, strokes and shapes into independent render targets