Wisdom90
2019-11-17 19:37:08 UTC
Hello...
About Parallel Sorting..
I think time complexity of the in-place Merge Sort is O(n2 Log n)
because in-place merge is O(n2). Time complexity of standard merge sort
is less, O(n Log n).
You can notice it in the following algorithm , read about it here:
"Parallel in-place merge sort is now 2.2X faster than the STL sort. The
hybrid approach shows a reduced performance differential between
in-place and not-in-place from 4X to 3X, with in-place being slower."
Read more here:
Parallel In-Place Merge Sort
https://www.drdobbs.com/parallel/parallel-in-place-merge-sort/240169094
This is why i think that my powerful Parallel Sort Library that is more
efficient is much better, read about it:
My Parallel Sort Library that is more efficient version 4.03 is here..
Notice also in the source code that my Mergesort uses also insertion
sort like in a Timsort manner, so it is very very efficient.
You can download it from:
https://sites.google.com/site/scalable68/parallel-sort-library-that-is-more-efficient
I have come with a "powerful" Parallel Sort library that is very
efficient, and it comes with the source code, please read about it below:
Author: Amine Moulay Ramdane
Description:
Parallel Sort Library that supports Parallel Quicksort, Parallel
HeapSort and Parallel MergeSort on Multicores systems.
Parallel Sort Library uses my Thread Pool Engine and sort many array
parts - of your array - in parallel using Quicksort or HeapSort or
MergeSort and after that it finally merge them - with the merge()
procedure -
In the previous parallelsort version i have parallelized only the sort
part, but in this new parallelsort version i have parallelized also the
merge procedure part and it gives better performance.
My new parallel sort algorithm has become more cache-aware, and i have
done some benchmarks with my new parallel algorithm and it has given up
to 5X scalability on a Quadcore when sorting strings, other than that i
have cleaned more the code and i think my parallel Sort library has
become a more professional and industrial parallel Sort library , you
can be confident cause i have tested it thoroughly and no bugs have
showed , so i hope you will be happy with my new Parallel Sort library.
I have also included a "test.pas" example, just compile first the
"gendata.pas" inside the zip file and run it first, after that compile
the "test.pas" example and run it and do your benchmarks.
I have implemented a Parallel hybrid divide-and-conquer merge algorithm
that performs 0.9-5.8 times better than sequential merge, on a quad-core
processor, with larger arrays outperforming by over 5 times. Parallel
processing combined with a hybrid algorithm approach provides a powerful
high performance result.
My algorithm of finding the median of Parallel merge of my Parallel Sort
Library that you will find here in my website:
https://sites.google.com/site/scalable68/parallel-sort-library
Is O(log(min(|A|,|B|))), where |A| is the size of A, since the binary
search is performed within the smaller array and is O(lgN). But this new
algorithm of finding the median of parallel merge of my Parallel Sort
Library is O(log(|A|+|B|)), which is slightly worse. With further
optimizations the order was reduced to O(log(2*min(|A|,|B|))), which is
better, but is 2X more work, since both arrays may have to be searched.
All algorithms are logarithmic. Two binary searches were necessary to
find an even split that produced two equal or nearly equal halves.
Luckily, this part of the merge algorithm is not performance critical.
So, more effort can be spent looking for a better split. This new
algorithm in the parallel merge balances the recursive binary tree of
the divide-and-conquer and improve the worst-case performance of
parallel merge sort.
Why are we finding the median in the parallel algorithm ?
Here is my previous idea of finding the median that is
O(log(min(|A|,|B|))) to understand better:
Let's assume we want to merge sorted arrays X and Y. Select X[m] median
element in X. Elements in X[ .. m-1] are less than or equal to X[m].
Using binary search find index k of the first element in Y greater than
X[m]. Thus Y[ .. k-1] are less than or equal to X[m] as well. Elements
in X[m+1..] are greater than or equal to X[m] and Y[k .. ] are greater.
So merge(X, Y) can be defined as concat(merge(X[ .. m-1], Y[ .. k-1]),
X[m], merge(X[m+1.. ], Y[k .. ])) now we can recursively in parallel do
merge(X[ .. m-1], Y[ .. k-1]) and merge(X[m+1 .. ], Y[k .. ]) and then
concat results.
Thank you,
Amine Moulay Ramdane.
About Parallel Sorting..
I think time complexity of the in-place Merge Sort is O(n2 Log n)
because in-place merge is O(n2). Time complexity of standard merge sort
is less, O(n Log n).
You can notice it in the following algorithm , read about it here:
"Parallel in-place merge sort is now 2.2X faster than the STL sort. The
hybrid approach shows a reduced performance differential between
in-place and not-in-place from 4X to 3X, with in-place being slower."
Read more here:
Parallel In-Place Merge Sort
https://www.drdobbs.com/parallel/parallel-in-place-merge-sort/240169094
This is why i think that my powerful Parallel Sort Library that is more
efficient is much better, read about it:
My Parallel Sort Library that is more efficient version 4.03 is here..
Notice also in the source code that my Mergesort uses also insertion
sort like in a Timsort manner, so it is very very efficient.
You can download it from:
https://sites.google.com/site/scalable68/parallel-sort-library-that-is-more-efficient
I have come with a "powerful" Parallel Sort library that is very
efficient, and it comes with the source code, please read about it below:
Author: Amine Moulay Ramdane
Description:
Parallel Sort Library that supports Parallel Quicksort, Parallel
HeapSort and Parallel MergeSort on Multicores systems.
Parallel Sort Library uses my Thread Pool Engine and sort many array
parts - of your array - in parallel using Quicksort or HeapSort or
MergeSort and after that it finally merge them - with the merge()
procedure -
In the previous parallelsort version i have parallelized only the sort
part, but in this new parallelsort version i have parallelized also the
merge procedure part and it gives better performance.
My new parallel sort algorithm has become more cache-aware, and i have
done some benchmarks with my new parallel algorithm and it has given up
to 5X scalability on a Quadcore when sorting strings, other than that i
have cleaned more the code and i think my parallel Sort library has
become a more professional and industrial parallel Sort library , you
can be confident cause i have tested it thoroughly and no bugs have
showed , so i hope you will be happy with my new Parallel Sort library.
I have also included a "test.pas" example, just compile first the
"gendata.pas" inside the zip file and run it first, after that compile
the "test.pas" example and run it and do your benchmarks.
I have implemented a Parallel hybrid divide-and-conquer merge algorithm
that performs 0.9-5.8 times better than sequential merge, on a quad-core
processor, with larger arrays outperforming by over 5 times. Parallel
processing combined with a hybrid algorithm approach provides a powerful
high performance result.
My algorithm of finding the median of Parallel merge of my Parallel Sort
Library that you will find here in my website:
https://sites.google.com/site/scalable68/parallel-sort-library
Is O(log(min(|A|,|B|))), where |A| is the size of A, since the binary
search is performed within the smaller array and is O(lgN). But this new
algorithm of finding the median of parallel merge of my Parallel Sort
Library is O(log(|A|+|B|)), which is slightly worse. With further
optimizations the order was reduced to O(log(2*min(|A|,|B|))), which is
better, but is 2X more work, since both arrays may have to be searched.
All algorithms are logarithmic. Two binary searches were necessary to
find an even split that produced two equal or nearly equal halves.
Luckily, this part of the merge algorithm is not performance critical.
So, more effort can be spent looking for a better split. This new
algorithm in the parallel merge balances the recursive binary tree of
the divide-and-conquer and improve the worst-case performance of
parallel merge sort.
Why are we finding the median in the parallel algorithm ?
Here is my previous idea of finding the median that is
O(log(min(|A|,|B|))) to understand better:
Let's assume we want to merge sorted arrays X and Y. Select X[m] median
element in X. Elements in X[ .. m-1] are less than or equal to X[m].
Using binary search find index k of the first element in Y greater than
X[m]. Thus Y[ .. k-1] are less than or equal to X[m] as well. Elements
in X[m+1..] are greater than or equal to X[m] and Y[k .. ] are greater.
So merge(X, Y) can be defined as concat(merge(X[ .. m-1], Y[ .. k-1]),
X[m], merge(X[m+1.. ], Y[k .. ])) now we can recursively in parallel do
merge(X[ .. m-1], Y[ .. k-1]) and merge(X[m+1 .. ], Y[k .. ]) and then
concat results.
Thank you,
Amine Moulay Ramdane.