Why bubble when you can insert?
April 13, 2009 8:35 PM   Subscribe

 
See also: Sorting Out Sorting. Classic computer science video with that unforgettable music.
posted by Godbert at 8:47 PM on April 13, 2009 [7 favorites]


This is very pretty, and a great teaching tool, but there'll always be a place in my geeky heart for Sorting Out Sorting (Part 1, Part 2, or the entire thing super duper fast).
posted by thisjax at 8:47 PM on April 13, 2009 [5 favorites]


(Mine and Godbert's, apparently.)
posted by thisjax at 8:48 PM on April 13, 2009


No monkeysort? The cool thing about a monkeysort is that even someone who claims to be 100% ignorant of algorithms will tell you that a monkeysort is a stupid way to sort things.
posted by idiopath at 8:48 PM on April 13, 2009 [1 favorite]


This is my Big O face.
posted by cortex at 8:56 PM on April 13, 2009 [15 favorites]


Another sorting demo page.
posted by WL at 9:06 PM on April 13, 2009


I like:

SELECT
foo
FROM
bar
ORDER BY foo DESC


And how do you think you got that, eh? Magical sorting fairies, eh? +2 Helm of Sorting, eh? A visit from Baron Sortram von Sortington, eh?

You buy 'em books, you buy 'em books, and they chew on the covers.
posted by middleclasstool at 9:11 PM on April 13, 2009 [19 favorites]


A pseudocode implementation of Magical Faerie Sort was worth 10% extra credit on my CS2232 final. I got half-credit for drawing a cartoon of a pixie horking up some Befunge.
posted by cortex at 9:17 PM on April 13, 2009 [2 favorites]


Sorting the Slow Way: An Analysis of Perversely Awful Randomized Sorting Algorithms [pdf]

From the wikipedia bogosort page. Wonderful.
posted by cortex at 9:21 PM on April 13, 2009


Evidently Quicksort is Optimal. It's a pretty neat result, and it's surprisingly understandable for a set of presentation slides. Only took 40 years from the discovery of the algorithm, too.
posted by jedicus at 9:30 PM on April 13, 2009


I would really like to see a stable bogosort.
posted by grouse at 9:32 PM on April 13, 2009


ahem.
metafilter: i have very little idea of what is going one here. but i'm pretty enthralled. thanks!
posted by es_de_bah at 9:34 PM on April 13, 2009


And how do you think you got that, eh? Magical sorting fairies, eh? +2 Helm of Sorting, eh? A visit from Baron Sortram von Sortington, eh?

If you're sorting on a column with an index, it will never be unsorted.

For selecting something that doesn't have an index a regular sorting algorithm will probably be used, you can read about what MySQL (for example) does here. It's a bit more complex then what you see there, because they have to be able to sort more records then fit into memory.
posted by delmoi at 9:38 PM on April 13, 2009


Holy cow, this really takes me back. Awesome.
posted by fusinski at 9:38 PM on April 13, 2009


Does merge sort appear in "Sorting Out Sorting" under a different name? I thought it was one of the major sorting algorithms, but SOS doesn't mention it by name.
posted by abc123xyzinfinity at 9:54 PM on April 13, 2009


Quicksort if fine, if you're working on a single thread. But there are faster sorts for vectors, multithreaded and parallel applications.

And at the lowest level of Quicksort, Bubblesort can be faster than Insertion, if your vector register is the right size.
posted by Araucaria at 9:58 PM on April 13, 2009


Crack for my eyes, this is.
posted by sleslie at 10:34 PM on April 13, 2009


abc123: Shell sort?
posted by sleslie at 10:35 PM on April 13, 2009


My mind just went for a trip watching SOS... too cool.
posted by JoeXIII007 at 10:40 PM on April 13, 2009




Visualizing sorting algorithms another way.
posted by swift at 11:10 PM on April 13, 2009 [7 favorites]


Sorting the Slow Way: An Analysis of Perversely Awful Randomized Sorting Algorithms [pdf]


Want to play a game of "who can name the slowest sort"?
posted by Chuckles at 11:44 PM on April 13, 2009


If you're sorting on a column with an index, it will never be unsorted

I don't think that's true for typical databases; you're simply doing a tree-based selection sort. If it is true, then you're doing an insertion sort. In either case, the cost of the sort is split between the INSERT and the SELECT operations.

abc, sleslie: I think mergesort is missing. What makes mergseort so useful, esp. for large data sets, is its extreme locality / predictability of reference (consider the classic four-way tape sort). The Wikipedia entry for it has a nice visualization similar to the one at the end of the Sorting Out Sorting video.
posted by hattifattener at 11:56 PM on April 13, 2009


I've always been impressed by timsort, which is optimized to take advantage of real-world data.
posted by reventlov at 12:11 AM on April 14, 2009 [3 favorites]


Oh so awesome! I can remember the feeling I had in class when I realized I had wasted so much time in my life when sorting.

Seriously, does anyone implement quicksort by hand? I generally use merge sort nowadays.
posted by FuManchu at 12:42 AM on April 14, 2009


My favorite has long been a variant of Quicksort that simply stops recursing when the sublist size becomes small (say, 5 or 6 elements), followed by a single pass of insertion sort over the entire list.

Since most of the recursion-related time overhead in any recursive algorithm occurs in the deepest levels of recursion, this method gets rid of quite a lot of it. Insertion sort can be programmed with simple copies instead of swaps, which makes its already very low overhead for nearly-sorted lists even lower; also, since it does no more data movement when called once for the entire list than it would do if called as a base-case sublist sort when the recursion bottoms out, there's no need for the overhead involved in setting up its loops to happen more than once.
posted by flabdablet at 12:48 AM on April 14, 2009 [2 favorites]


"If you're getting bored, let this be a lesson to you about order n-squared sorts!"

Boy, if I had a nickel for every time I head that growing up.

Better yet would be if I got a nickel for each time I heard that growing up, each time I heard that growing up!

Oh wait that wouldn't be so much after all
posted by Bokononist at 2:02 AM on April 14, 2009


I've always been impressed by timsort, which is optimized to take advantage of real-world data.

Java Bug Database: Bug ID 6804124
posted by effbot at 2:03 AM on April 14, 2009


Evidently Quicksort is Optimal. It's a pretty neat result, and it's surprisingly understandable for a set of presentation slides. Only took 40 years from the discovery of the algorithm, too.

Holy crap. QS is optimal?

My professors told me time and time again that QS was considered optimal since nobody'd figured out anything empirically faster, but that no proof had been made.

I wish I could read the linked pdf... but, it refuses to render.
posted by Netzapper at 2:18 AM on April 14, 2009


This: Was cool.
The links to Sorting Out Sorting: Also cool. (How has this 90s-era CS degree never seen this video?)
The link to the alternative visualization: Also also cool.
The link from their to the Cairo graphics lib: Also also also cool.

To sum up: Cool.
posted by DU at 4:33 AM on April 14, 2009


I've always loved how the Spaghetti Sort demonstrates and makes use of the inherent parallel processing power of Nature.
posted by vacapinta at 4:35 AM on April 14, 2009 [1 favorite]


No wait, I have seen SOS. Who could forget that awesome music?
posted by DU at 4:43 AM on April 14, 2009


Back in college, when we were studying Occam as a hardware design tool (think: silicon compilers - Occam source code goes in *here*, chip layout diagram comes out *there*), we briefly considered a sorting algorithm they called "pumpsort" or the "sort pump". In order for it to work, you need at least as many processing elements as you have items to sort. Each processing element executes the following:

-- initialize
x = -NaN
-- main loop
while(get(item))
if (item >= x) then
put(x)
x := item
else
put(item)
end
-- flush
put(x)

As you feed the unsorted items in one end, they are sorted as the pipeline fills. Then, when the input is all done, you yank the chain and flush the pipeline and the items come out all sorted in O(n) time. Ta-da!
posted by kcds at 6:26 AM on April 14, 2009 [1 favorite]


In order for it to work, you need at least as many processing elements as you have items to sort.

Here's my algorithm for Unicorn Sort, which is O(1):

-- have the unicorn sort the list
newlist = unicorn_sort(list)

But I puzzled a while over your algorithm for a while anyway and still didn't get it.
posted by DU at 6:46 AM on April 14, 2009 [3 favorites]


Here's my algorithm for Unicorn Sort, which is O(1):

What if you had as many unicorns as you have items to sort?
posted by Combustible Edison Lighthouse at 6:49 AM on April 14, 2009


But I puzzled a while over your algorithm for a while anyway and still didn't get it.

Its just a Bubble Sort but with N processors. So instead of O(N^2) you get O(N) because you have N processors...
posted by vacapinta at 6:57 AM on April 14, 2009


unicorn sort, also known as quantum computing.
posted by ShadowCrash at 6:59 AM on April 14, 2009


My focus of high-performance computing is more on numerical than this sort of work, but I always assumed Quicksort was a bad idea. It's not cache-friendly, it's not pipeline-friendly, it's not thread-friendly, why would you use it?

Also, here's a hint for those of you still in undergrad (or hobbyist programmers). You are taught the basic sorting algorithms so you have an understanding of how to analyze how much more complicated algorithms will perform in the real world, with real data, real software, and real bugs (yay!). You are NOT taught Quicksort, Merge Sort, and Heap Sort so that you will be able to implement your own the very first day you show up at your new job at FinanceCorp.
posted by onalark at 7:18 AM on April 14, 2009


Its just a Bubble Sort but with N processors

That's what it looks like, but I couldn't figure out the put() stuff. Maybe because I know naught of chip design.

What if you had as many unicorns as you have items to sort?

You *always* have as many unicorns as you have items to sort.

You are taught the basic sorting algorithms so you have an understanding of how to analyze how much more complicated algorithms will perform...

Not even "analyze" I would say. More like "estimate". Even in the scientific(ish) computing I do, we never sit down and Analyze An Algorithm. We just kind of informally mention that one method is n2 vs another that's closer to n and leave the details up the implementor (i.e. me).
posted by DU at 7:24 AM on April 14, 2009


I've always loved how the Spaghetti Sort demonstrates and makes use of the inherent parallel processing power of Nature.

"The space complexity is also minimal: O(n), measured in rods of spaghetti."

Ha.
posted by grouse at 7:31 AM on April 14, 2009


That's what it looks like, but I couldn't figure out the put() stuff.

The nodes are pipelined. get() receives a value, put() pushes it to the next node. In the main loop, each node keeps the max value it has seen this far, and sends the rest down the pipeline. When you've fed N values into N nodes, they're sorted.
posted by effbot at 7:36 AM on April 14, 2009


...put() pushes it to the next node.

Aha! I've been using Tcl too much, put() looks like a print statement to me....
posted by DU at 7:39 AM on April 14, 2009


Yet another sorting demo page (with a slightly different take on it).
posted by joe vrrr at 8:42 AM on April 14, 2009 [4 favorites]


That is truly excellent, joe vrrr.
posted by grouse at 9:16 AM on April 14, 2009


joe, thanks for pointing that page out.
posted by onalark at 9:26 AM on April 14, 2009


Dude, this just rocks!

I've been bubble-sorting all my life, it's cool to see so many other methods.
posted by Westgate at 9:41 AM on April 14, 2009


occam on transputers was a revelation, but my followers scattered and my cult collapsed when the coming of the T800 did not happen according to the promises of scripture. Apparently the dust devils couldn't be kept off the holy wafers.
posted by StickyCarpet at 11:43 AM on April 14, 2009


Joe's link to the static visualizations was posted up-thread by swift yesterday.

I kind of agree with the take that animations are pretty but unillustrative and that the static versions have a certain stark beauty.

But still it seems to me that the static line diagrams also fail to tell the story: they're very good at showing the swaps, but most of the cost in inefficient sorts is the compares, and they're not visualized at all.

(And as time on these diagrams is measured only by swaps, his "trivial" time-based interpretations of the diagrams is meaningful only if the ratio of compares to swaps is static throughout the sort. Is this true for the algorithm's he's visualizing?)
posted by We had a deal, Kyle at 4:27 PM on April 14, 2009 [1 favorite]


...if the ratio of compares to swaps is static throughout the sort

Obviously untrue for bubblesort.
posted by DU at 4:23 AM on April 15, 2009


It's just occurred to my why my immediate reaction to the spaghetti sort was that it's some kind of cheat.

The O(n log n) speed limit applies only to comparison sorts. As soon as you start allowing some kind of mapping from list items to a simple linear quantity e.g. spaghetti length, you're not doing that kind of sort any more. Spaghetti sort corresponds rougly to a pigeonhole sort.

There are lots of ways to code essentially O(n) sorts without requiring parallel processing or quantum trickery or physical cuteness, if you're allowed to map sort keys to numbers.
posted by flabdablet at 6:04 PM on April 19, 2009


The other "cheat" in spaghetti sort is that, while the actual sort step is O(1), input/output is O(n).
posted by Netzapper at 11:37 PM on April 19, 2009


« Older "Architecture is my delight, and putting up and...   |   The Trombone Newer »


This thread has been archived and is closed to new comments