Changes:
- Prepare neon verison of ALPHA_BLEND API.
- Use ALPHA_BLEND_NEON in _translucentRle
Notes:
- _translucentRle with neon support reduces execution time of this
function ~ 300 % (measured on uint32_t 400 x 400 buffer).
- API was tested on ARMv7l device with GCC 9.2 based toolchain. Results
on other devices could be different.
Pixel-based image picture doesn't work at size() method.
Obvisouly, we missed to implement it properly.
This patch corrects it.
@Issue: https://github.com/Samsung/thorvg/issues/656
Changes:
- Added 'neon' vector option in build system
- Introduced neon version of rasterRGBA32 API, which improves
speed of the funciton on ARM cpu's around ~35%
changed alpha channel data type to 32 bits from 8 bits,
since subsequent data operations requires 32 bits values.
this 8 bits (since channel range is up to 255) doesn't helpful
for saving memory size because it would generate additional data casting by compiler.
I compared the binary size and this patch saves about 600bytes.
Calculations accuracy in ALPHA_BLEND function has been
improved. Until now blending resulted in a slight hue change
(all color channels affected). The chosen method of calculation
is a compromise between the accuracy and the performance.
We have encountered that multi-threading usage that user creates,
multiple canvases owned by multiple user threads.
Current sw_engine memory pool has been considered only for multi-threads,
spawned by tvg task scheduler.
In this case it's safe but when user threads introduced, it can occur race-condition.
Thus, Here is a renewal policy that non-threading tvg(initialized threads with zero),
takes care of multiple user threads bu changing its policy,
each of canvases should have individual memory pool to guarantee mutual-exclusion.
@API additions
enum MempoolPolicy
{
Default = 0, ///< Default behavior that ThorVG is designed to.
Shareable, ///< Memory Pool is shared among the SwCanvases.
Individual ///< Allocate designated memory pool that is only used by current instance.
};
Result SwCanvas::mempool(MempoolPolicy policy) noexcept;
All in all, if user calls multiple threads, set memory pool policy to Individual.
In the fillFetchLinear function the offset parameter was removed.
The destination address may be shifted directly in the dst parameter,
it doesn't need to be passed separately.
previously alpha multiplying operation doesn't have perfect precision,
could loss 1 pixel since it divides 255 values by 256.
This improved operation comply with both precision & performance.
If ClipPath is a singular rectangle,
we don't need to apply this to all children nodes to adjust rle span regions.
Rather than its regular sequence,
we can adjust render region as merging viewport that is introduced internally,
All in all,
If a Paint has a single ClipPath that is Rectangle,
it sets viewport with Rectangle area that viewport is applied to
raster engine to cut off the rendering boundary.
In the normal case it brings trivial effects.
but when use SVGs which has a viewbox, it could increase the performance
up to 10% (profiled with 200 svgs rendering at the same time)
Note that, this won't be applied if the Paint has affine or rotation transform.
@Issues: 294
Description:
Crash was observed in examples when composite object was used.
It was caused because __m256i object was used on non aligned
memory to 32bit. Algorithm in this function was changed to use
unaligned __m256i_u object. Code was also simplified.
* sw_engine: adding a gradient as a stroke feature
Similarly as a shape may have a gradient fill so can the stroke.
* Capi: adding APIs for a gradient stroke
Co-authored-by: Hermet Park <hermetpark@gmail.com>
added a routine that draw non-transformed translucent image.
composition images will use this routine to draw faster.
Also added optimization point comments in raster image.
Renamed internal interfaces.
We need both blender & compositor interfaces.
Renamed SwCompositor -> SwBlender which is for pixel joining methods.
Added (SwCompositor, Compositor) which is designed for compositing images.
Introduce RendererMethod::renderRegion() to return acutal drawing region info.
That is used by scene composition to composite actual partial drawing region
for better performance.
@Issues: 173
There is 1 pixel misaligned issue observed.
Found out transform() increases 0.5 pt always.
This transform() logic was broken by this change - e00f948705
and now recorvered to origin.
* sw_engine raster: code refactoring & optimize code.
1. move the computation out of rolling if possible.
2. renamed internal variables & function prototypes.
Add RawLoader class that loads and display raw images,
and adds a Rasterizer for image data.
Image data can be loaded via picture.
Loaded image supports Composition, Transformation and Alpha blending.
New API
Result load(uint32_t* data, uint32_t width, uint32_t height, bool isCopy) noexcept;
we introduced shared memory pool for avoiding reallocate memory
while it process the stroke outlines, It experimentally increase
the outline data if we use the allocated memory for multiples shape strokes,
we don't need to alloc/free memory during the process.
This shared outline memory is allocated for threads count
so that we don't interrupt memory access during the tasks.
@Issues: 75
It missed to update shape data if visilibity is changed from false to true by alpha.
Also, it needs to update engine shape data for every requests.
There scenario can be allowed,
1. update shape
2. change shape property
3. update shape
4. draw
previously engine could skip step 3, its result was not properly expected.
@fix #84
common sw_engine: Implement ClipPath feature
Paint object can composite by using composite API.
ClipPath composite is clipping by path unit of paint.
The following cases are supported.
Shape->composite(Shape);
Scene->composite(Shape);
Picture->composite(Shape);
Add enum
enum CompMethod { None = 0, ClipPath };
Add APIs
Result composite(std::unique_ptr<Paint> comp, CompMethod method) const noexcept;
* Example: Added testClipPath
we should avoid code insertion during file dependencies,
such as #include "xxx.h" which has implementations.
This could increase binary size, we can avoid it as possible.
Current patch improves binary size like this:
From: file(2059008) = text(120360) data(8096) bss(80) dec(128536)
To : file(1921832) = text(118429) data(7872) bss(56) dec(126357)
More additional patches will come in to optmize binary size.
Now, we have 2 points for asynchronous behaviors.
1. update shapes:
Each shape update will be performed by async when you push shape to canvas.
Meaning, if you have time gap between update and rendering in process main-loop,
you can have a benefit by this.
2. rasterization by canvas:
Canvas.draw() will be performed asynchnously until you call canvas.sync();
Meaing, if you can trigger tvg rendering eariler than composition time.
You can have a benefit by this.
If these 1, 2 points might not work for your program,
You can just toggle off async by setting threads number zero at initialization.
Or if you could apply either point of them for your program,
It might be good for performance.
But the best approach is to make both async properly.
Though this might need to fine-grained tuning integration between your program & tvg,
You could achieve the best peformance by parallelzing tasks as possible without any jobs delaying.
Change-Id: I04f9a61ebb426fd897624f5b24c83841737e6b5b
Actually Dali rendering system requires abgr8888.
We could add more colorspaces if it's necessary.
Change-Id: Ia42a6575d1313629e55efc3077e302992c47b6c0
We can use RGBA colorspace rather ARGB for pixel data.
This would be better for many rendering system,
since it's more widely preferred than ARGB including opengl.
Change-Id: Ibbfe6a511d77bf0ef30ce261995467c11164d306
we can't control any threads count that could drop the performance.
remove async() and will come back with fine-tuned threading-pool.
Change-Id: I17c39792234acfce6db334abc0ce12da23978a9a
Some user have no idea of premultiplied alpha concept,
We suggest more user-friendly interfaces so that they don't confuse it.
Now, this pre-multipying is acommplished by backend engines.
Change-Id: Ifd84d56361cb56a8b98240bbd16690accf370bad
previous fast track logic is useless,
it actually doesn't helpful for performance, just increase the code complexity.
Change-Id: Ib6ad204edfb241d74c41413dfec7ab42fb02af81
if the transform scale factor for x/y is not identical,
it keeps its both xy scale factor then apply them
for stroking calculation.
Change-Id: I519dfce3ce7b4a12c13da1801d6a00e139e7400f