novaBBS - soc.culture.china - More about my inventions of scalable algorithms..

Hello,

More about my inventions of scalable algorithms..

I am a white arab, and i think i am smart since i have also
invented many scalable algorithms and algorithms..

More precision about my new inventions of scalable algorithms..

And look at my below powerful inventions of LW_Fast_RWLockX and
Fast_RWLockX that are two powerful scalable RWLocks that are FIFO fair
and Starvation-free and costless on the reader side
(that means with no atomics and with no fences on the reader side), they
use sys_membarrier expedited on Linux and FlushProcessWriteBuffers() on
windows, and if you look at the source code of my LW_Fast_RWLockX.pas
and Fast_RWLockX.pas inside the zip file, you will notice that in Linux
they call two functions that are membarrier1() and membarrier2(), the
membarrier1() registers the process's intent to use
MEMBARRIER_CMD_PRIVATE_EXPEDITED and membarrier2() executes a memory
barrier on each running thread belonging to the same process as the
calling thread.

So as you have just noticed he says the following:

"Until today, there is no known efficient reader-writer lock with
starvation-freedom guarantees;"

So i think that my above powerful inventions of scalable reader-writer
locks are efficient and FIFO fair and Starvation-free.

LW_Fast_RWLockX that is a lightweight scalable Reader-Writer Mutex that
uses a technic that looks like Seqlock without looping on the reader
side like Seqlock, and this has permitted the reader side to be
costless, it is fair and it is of course Starvation-free and it does
spin-wait, and also Fast_RWLockX a lightweight scalable Reader-Writer
Mutex that uses a technic that looks like Seqlock without looping on the
reader side like Seqlock, and this has permitted the reader side to be
costless, it is fair and it is of course Starvation-free and it does not
spin-wait, but waits on my SemaMonitor, so it is energy efficient.

You can read about them and download them from my website here:

https://sites.google.com/site/scalable68/scalable-rwlock

About the Linux sys_membarrier() expedited and the windows
FlushProcessWriteBuffers()..

I have just read the following webpage:

https://lwn.net/Articles/636878/

And it is interesting and it says:

---

Results in liburcu:

Operations in 10s, 6 readers, 2 writers:

memory barriers in reader: 1701557485 reads, 3129842 writes
signal-based scheme: 9825306874 reads, 5386 writes
sys_membarrier expedited: 6637539697 reads, 852129 writes
sys_membarrier non-expedited: 7992076602 reads, 220 writes

---

Look at how "sys_membarrier expedited" is powerful.

Cache-coherency protocols do not use IPIs, and as a user-space level
developer you do not care about IPIs at all. One is most interested in
the cost of cache-coherency itself. However, Win32 API provides a
function that issues IPIs to all processors (in the affinity mask of the
current process) FlushProcessWriteBuffers(). You can use it to
investigate the cost of IPIs.

When i do simple synthetic test on a dual core machine I've obtained
following numbers.

420 cycles is the minimum cost of the FlushProcessWriteBuffers()
function on issuing core.

1600 cycles is mean cost of the FlushProcessWriteBuffers() function on
issuing core.

1300 cycles is mean cost of the FlushProcessWriteBuffers() function on
remote core.

Note that, as far as I understand, the function issues IPI to remote
core, then remote core acks it with another IPI, issuing core waits for
ack IPI and then returns.

And the IPIs have indirect cost of flushing the processor pipeline.

More about WaitAny() and WaitAll() and more..

Look at the following concurrency abstractions of Microsoft:

https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task.waitany?view=netframework-4.8

https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task.waitall?view=netframework-4.8

They look like the following WaitForAny() and WaitForAll() of Delphi,
here they are:

http://docwiki.embarcadero.com/Libraries/Sydney/en/System.Threading.TTask.WaitForAny

http://docwiki.embarcadero.com/Libraries/Sydney/en/System.Threading.TTask.WaitForAll

So the WaitForAll() is easy and i have implemented it in my Threadpool
engine that scales very well and that i have invented, you can read my
html tutorial inside The zip file of it to know how to do it, you can
download it from my website here:

https://sites.google.com/site/scalable68/an-efficient-threadpool-engine-with-priorities-that-scales-very-well

And about the WaitForAny(), you can also do it using my SemaMonitor,
and i will soon give you an example of how to do it, and you can
download my SemaMonitor invention from my website here:

https://sites.google.com/site/scalable68/semacondvar-semamonitor

Here is my other just new software inventions..

I have just looked at the source code of the following multiplatform pevents

https://github.com/neosmart/pevents

And notice that the WaitForMultipleEvents() is implemented with pthread
but it is not scalable on multicores. So i have just invented a
WaitForMultipleObjects() that looks like the Windows
WaitForMultipleObjects() and that is fully "scalable" on multicores and
that works on Windows and Linux and MacOSX and that is blocking when
waiting for the objects as WaitForMultipleObjects(), so it doesn't
consume CPU cycles when waiting and it works with events and futures and
tasks.

Here is my other just new software inventions..

I have just invented a fully "scalable" on multicores latch and a
fully scalable on multicores thread barrier, they are really powerful.

Read about the latches and thread barriers that are not scalable on
multicores of C++ here:

https://www.modernescpp.com/index.php/latches-and-barriers

Here is my other software inventions:

More about my scalable math Linear System Solver Library...

As you have just noticed i have just spoken about my Linear System
Solver Library(read below), right now it scales very well, but i will
soon make it "fully" scalable on multicores using one of my scalable
algorithm that i have invented and i will extend it much more to also
support efficient scalable on multicores matrix operations and more, and
since it will come with one of my scalable algorithms that i have
invented, i think i will sell it too.

More about mathematics and about scalable Linear System Solver Libraries
and more..

I have just noticed that a software architect from Austria
called Michael Rabatscher has designed and implemented MrMath Library
that is also a parallelized Library:

Here he is:

https://at.linkedin.com/in/michael-rabatscher-6821702b

And here is his MrMath Library for Delphi and Freepascal:

https://github.com/mikerabat/mrmath

But i think that he is not so smart, and i think i am smart like
a genius and i say that his MrMath Library is not scalable on
multicores, and notice that the Linear System Solver of his MrMath
Library is not scalable on multicores too, and notice that the threaded
matrix operations of his Library are not scalable on multicores too,
this is why i have invented a scalable on multicores Conjugate Gradient
Linear System Solver Library for C++ and Delphi and Freepascal, and here
it is, read about it in my following thoughts(also i will soon extend
more my Library to support scalable matrix operations):

About SOR and Conjugate gradient mathematical methods..

I have just looked at SOR(Successive Overrelaxation Method),
and i think it is much less powerful than Conjugate gradient method,
read the following to notice it:

COMPARATIVE PERFORMANCE OF THE CONJUGATE GRADIENT AND SOR METHODS
FOR COMPUTATIONAL THERMAL HYDRAULICS

https://inis.iaea.org/collection/NCLCollectionStore/_Public/19/055/19055644.pdf?r=1&r=1

This is why i have implemented in both C++ and Delphi my Parallel
Conjugate Gradient Linear System Solver Library that scales very well,
read my following thoughts about it to understand more:

About the convergence properties of the conjugate gradient method

The conjugate gradient method can theoretically be viewed as a direct
method, as it produces the exact solution after a finite number of
iterations, which is not larger than the size of the matrix, in the
absence of round-off error. However, the conjugate gradient method is
unstable with respect to even small perturbations, e.g., most directions
are not in practice conjugate, and the exact solution is never obtained.
Fortunately, the conjugate gradient method can be used as an iterative
method as it provides monotonically improving approximations to the
exact solution, which may reach the required tolerance after a
relatively small (compared to the problem size) number of iterations.
The improvement is typically linear and its speed is determined by the
condition number κ(A) of the system matrix A: the larger is κ(A), the
slower the improvement.

Click here to read the complete article

Subject	Author
More about my inventions of scalable algorithms..	World90

Weekend, where are you?

interests / soc.culture.china / More about my inventions of scalable algorithms..

interests / soc.culture.china / More about my inventions of scalable algorithms..