Parallel rendering - Realtime rendering

alpt · Post by **alpt** » Thu May 22, 2008 8:04 am

It seems the Cfdg language is, by nature, parallel.
Basically, it suffices to "fork()/thread()" a process for each sub-shape "call".
A command line option can limit the number of concurrent threads.

This would benefit multi-core CPUs. It will be also easy to create a cluster: there are various programs which automatically distribute OS processes/threads to other hosts.

--

A simple new option: -U n_shapes. By default, n_shapes=0.
When n_shapes > 0, cfdg will update the output image after each n_shapes new rendered shapes.
In this way, you can use an external image viewer to watch the evolving image. With a small n_shapes, it will be almost realtime.
It shouldn't be difficult to add this option, it is basically an improved -m.
A -D delay option would be also useful: only after `delay' microseconds the output image will be updated.

However, these two options (-U, -D) can be generalized by -P,-R: -P tells cfdg to save the program state when reaching the -m limit. -R is used to recover a saved state.

Post by **MtnViewJohn** » Fri May 23, 2008 10:50 am

The cfdg language is very parallelizable. Unfortunately, running a cfdg grammar in multiple threads will change the order of resulting shapes, which will change the way things look. This can be remedied by requiring users to use z-order when they care about shape order. But much of the existing cfdg files would be broken because people routinely depend on the shape order imposed by the current single-threaded rendering engine. There would need to be a way for a cfdg file to indicate that it only depends on z-order and not on render order.

kipling · Post by **kipling** » Fri Jun 20, 2008 12:54 am

Two thoughts on this:

#1. For backwards compatibility, how about a slight extension to the language to indicate when parallel is OK? For example, steal the sh/bash/csh syntax "OBJ1 & OBJ2" to allow OBJ1 to be "backgrounded" and a new thread/process starts on OBJ2. I guess the implementation would have to fall back on sequential execution at some point to prevent a "fork bomb". This could lead to some complexity in the pool of "pending" objects, but the general case is that you have a partial order (in the mathematical sense) between rules that are yet to be processed.

#2. Are you absolutely sure that parallelism couldn't be implemented anyway? As far as I understand it, the program does some sort of "breadth first" traversal, pruned by element size. ... actually, glancing through RendererImpl::run, it seems to be done via a list of unfinished elements, ordered by size, so there can be significant departure from whichever rendering order is imposed by an implied "preorder" traversal.

#2 1/2. How about a "pool of threads" vs "pool of tasks" approach. If each thread worked on the same list mUnfinishedShapes, then the render order would essentially not change, in that it would work through from biggest to smallest, with minor "local" reordering. This could get messy though with many threads competing for mUnfinishedShapes. (I have not done any concurrent programming for >20 years, and it wasn't in C++ naturally, and so I won't be much help here. I guess STL would need to be happy working with threads.)

OK in writing this and looking at the C++ code, I think I understand now that I misinterpreted MtnViewJohn's comment about breaking existing cfdg code. I assumed it meant that the objects were drawn in a depth-first order, with local ordering of siblings in the tree from the order in the cfdg rule. Then I thought breadth-first .... but now I understand that the order that things are drawn (assuming same z-value) is determined purely by the "size", which may end up having little to do with the order things appear in each rule. So it is not breaking any order intended by the cfdg author, but an order that may have arisen accidentally. Am I correct here?

My other reflection is that since you are relying on STL's multiset for your storage, you can really only do linear ordering (aka total ordering), so what I suggested in #1 above will probably not fly. I suspect my comment #1 is actually off the mark anyway, due to my misinterpretation.

Of course, putting any sort of concurrency in as in #2 will still **change** the results of some renders, mostly in a very minor way, but I guess the question is whether this is considered breakage.

Post by **MtnViewJohn** » Sat Jun 21, 2008 6:25 am

As you see from the source code, unfinished shapes are drawn/processed strictly in order of size. Large shapes are drawn first and smaller shapes are drawn on top of larger shapes. The only time that order in a rule is important is for the primitive shapes and path shapes. They are drawn in the order that you find them in the rule. People internalize these two ordering effects: small on top of big, and finished shape order within a rule.

Having multiple render threads pulling shapes out of the multiset that is used to store unfinished shapes will cause a slight reshuffling of the order that finished shapes are drawn onto the canvas. For very busy designs this is OK, because the designer is not depending on the determinism of the current scheme. But many people do depend on the determinism (like this design).

So I consider the reshuffling to be breakage. There needs to be a way to explicitly enable multi-threaded rendering so that existing designs continue to work as the designer intended.

minimaleye · Post by **minimaleye** » Tue May 03, 2011 6:48 am

from the first time when I saw CFA I'am dreaming about realtime render mode...

I have no idea how hard it is to program but it will expand dramatically the uses of CFA...
This will make it a unique soft for live VJ-ing which will be just magnificent to see...

I understand that the idea is not good when the code generates too many shapes...
but also think it can be chooseable?

jonaskoelker · Post by **jonaskoelker** » Mon Oct 08, 2012 2:15 am

As you see from the source code, unfinished shapes are drawn/processed strictly in order of size. Large shapes are drawn first and smaller shapes are drawn on top of larger shapes. The only time that order in a rule is important is for the primitive shapes and path shapes. They are drawn in the order that you find them in the rule. People internalize these two ordering effects: small on top of big, and finished shape order within a rule.

What I hear you say is something like this: for any fixed set of random choices, the output is deterministic, in particular, the above/below ordering. The latter arises fairly straightforwardly in the current single-threaded code architecture.

I would think (with absolutely no knowledge of the code architecture) that it should be reasonably doable to compute all the shapes in parallel and attach some layering information to each shape in the process; then, when all shapes are computed, you sort by the layering information and render. This should combine parallelism and backwards compatibility, as far as I understand.

My first intuition would be to order by some combination of size and "tree address", which I can probably best explain by the dereferencing code:

Code: Select all

def deref_tree_addr(addr):
    node = root
    for i in addr: node = node.child[i]
    return node

Perhaps combined with a size-based ordering of all `child' lists, I think this will do the trick.

(I'm not perfectly sure, though; what may be the case if I read your description is that renderings are sorted by size *across layers*; is that the case?)

Post by **MtnViewJohn** » Tue Oct 09, 2012 11:28 am

Actually, I think parallel rendering can be done without too many artifacts. The way that the rendering algorithm works is that the renderer maintains a list of "unfinished" shapes, shapes that have rules that need to be executed. This list is stored in sorted order with the sort key being the determinant of the shape's affine transform. This determinant is easy to compute and is roughly equivalent to the area of the shape. When an unfinished shape is executed it may generate more unfinished shapes or "finished" shapes, or both. Finished shapes are the squares, circles, triangles, and paths that are actually drawn on the canvas. Finished shapes are stored in the order that they are generated. This finished shape list is the output sentence of the executed context free grammar. Then we do a stable sort of the finished shape list on the Z-order value and draw the finished shapes.

When the finished shapes are generated from the unfinished shapes list the renderer throws away the determinant value that was the sorting key for the unfinished shapes. It doesn't need it because the information is implicit in the order of the finished shape list. But if we keep the original sorting key as part of the finished shape then we can have multiple threads, each with its own finished shape list. When it is time to draw, we can merge the separate finished shape lists using Z-order and the sorting key that we retained.

Context Free Art

Forums

Parallel rendering - Realtime rendering

Parallel rendering - Realtime rendering

parallelism

Re: Parallel rendering - Realtime rendering

Re: Parallel rendering - Realtime rendering

Re: Parallel rendering - Realtime rendering