December 9, 2002 1:39 PM Subscribe

Do you have problems finding the cheapest flight? Well so do computers.

Carl de Marcken, the man who created the engine behind Orbitz and other travel search engines posits that finding the cheapest fare from one point to another is a NP-Hard problem. Even if you fix the specific route between destinations there can be as many as 10^{36} combinations.

posted by patrickje (18 comments total)

Carl de Marcken, the man who created the engine behind Orbitz and other travel search engines posits that finding the cheapest fare from one point to another is a NP-Hard problem. Even if you fix the specific route between destinations there can be as many as 10

posted by patrickje (18 comments total)

It is TSP, but it's TSP with an extra layer. With TSP you want to make the journey as short as possible without passing through the same place twice. That isn't necessarily true with this problem. You don't want to go somewhere and come back to where you started, but then you don't necessarily want the shortest number of hops. Sometimes you'll want to get there as quickly as possible. Some people will want to get there as cheaply as possible. The two may have completely different solutions.

posted by vbfg at 2:13 PM on December 9, 2002

posted by vbfg at 2:13 PM on December 9, 2002

In fact, the last link does in the third line of the definition. Other than that an interesting comment abez.

posted by Hjorth at 2:13 PM on December 9, 2002

The traveling salesman problem only solves part of it, and in fact is a trivial problem in regard to the overall solution. Most destination-to-destination routes have 'preferred routes' stored in a lookup somewhere rather than run through the TSP every time. The difficult problem is that sometimes the cheapest flight from Boston to New York involves a stop in London or Vegas. TSP doesn't really take into account multi-dimensional routes (i.e. this path is 10 miles long and costs $50). Overall it's a fascinating problem.

posted by patrickje at 3:01 PM on December 9, 2002

posted by patrickje at 3:01 PM on December 9, 2002

TSP doesn't have to take "multi-dimensional" routes into account. What should've been said is that this problem is easily reducable to a TSP. Such that if you want a balance between hops/distance/time you simply weight your edges differently. It's up to you to come up with a good encoding which best describes your situation and your goals.

posted by abez at 3:22 PM on December 9, 2002

posted by abez at 3:22 PM on December 9, 2002

Thanks for the great links, patrickje. They answered many of my questions about why I have such strange experiences when searching for the cheapest flights. Inexplicably, I find it reassuring that the intricacies of airfare-pricing result more from the complexity of the problem than from some vast conspiracy on the part of the airlines to confuse their customers. I'm not sure I can say the same for unfathomable long-distance calling plans.

posted by smrtsch at 3:49 PM on December 9, 2002

posted by smrtsch at 3:49 PM on December 9, 2002

I don't think that this is a TSP issue per se. It's been known for some time that if you say "I want to visit cities A, B, C, D, and I don't care what order I visit them", then the problem is NP-complete. If you say "I want to visit A, then B, then C, then D", then the problem is in P. If they are trying to solve TSP problems, then it is no surprise that it's NP.

The article is a bit weak on details, but I think that a couple of the issues that make this NP are time constraints, and weird airline rules. If you say "I must be back by 6:00pm Thursday", you can't just address this by reweighting the graph. Even if you factor in time with price for the weights, you might get a really cheap flight that gets you in after the deadline, or you might find a really fast flight that gets you in way early, but costs a lot. The question is not a tradeoff between time and price, but a constraint on time while minimizing price.

The other issue is weird airline rules. For example, you get a better deal if you have a Saturday night stay. Now suppose you want to go A->B->C->A, and you are planning on spending saturday in C. The obvious routing is 3 one-way tickets. However, it might be cheaper to get a round-trip ticket from A->B, and a round-trip ticket from B->C. Thus on the way home, you backtrack to B. This allows you to have two round-trips with Saturday night stays, which is probably cheaper than 3 one-way tickets even though it's 4 flights instead of 3. The key issue though is that the best way to get from B->C is conditional on how you got from A->B. This is a SAT type problem, as was mentioned in the article.

posted by cameldrv at 4:05 PM on December 9, 2002

The article is a bit weak on details, but I think that a couple of the issues that make this NP are time constraints, and weird airline rules. If you say "I must be back by 6:00pm Thursday", you can't just address this by reweighting the graph. Even if you factor in time with price for the weights, you might get a really cheap flight that gets you in after the deadline, or you might find a really fast flight that gets you in way early, but costs a lot. The question is not a tradeoff between time and price, but a constraint on time while minimizing price.

The other issue is weird airline rules. For example, you get a better deal if you have a Saturday night stay. Now suppose you want to go A->B->C->A, and you are planning on spending saturday in C. The obvious routing is 3 one-way tickets. However, it might be cheaper to get a round-trip ticket from A->B, and a round-trip ticket from B->C. Thus on the way home, you backtrack to B. This allows you to have two round-trips with Saturday night stays, which is probably cheaper than 3 one-way tickets even though it's 4 flights instead of 3. The key issue though is that the best way to get from B->C is conditional on how you got from A->B. This is a SAT type problem, as was mentioned in the article.

posted by cameldrv at 4:05 PM on December 9, 2002

My question is at what point does this all become diminishing returns. Yes, it's cheaper to travel from boston to london to new york, but at 18 hours it's cheaper to take the bus. I can see the value of adding complexity for more arcane flights, but really, how many people travel from white plains to minnesota (sheesh, just take the train in to kennedy). Is this a problem of mathematical prowess or actual usefulness?

posted by NGnerd at 4:11 PM on December 9, 2002

posted by NGnerd at 4:11 PM on December 9, 2002

While there is no solution to the travelling salesman problem, it should be noted that 1. nobody has yet proved that there is no way to find the shortest route and 2. there are ways of findiing very short routes consistantly. I hear that quantum computers may be the key to this problem, since matter at the quantum level has a tendency to do odd things (like exist in two places at the same time). o<

posted by KettleBlack at 4:45 PM on December 9, 2002

posted by KettleBlack at 4:45 PM on December 9, 2002

One can also see that this has commonalities with chess computation methods, where you need to look multiple "hops" in advance. But those are usually made easier not by an elegant algorithm, but by a brute-force elimination of unlikely possibilities. (This is how Deep Blue got so good, essentially.) For instance, booking a flight from Chicago to Miami (as my Dad just did last night with my help, via Orbitz!) can presumably skip all the flights that involve stopovers in Kinshasa.

Savvy travelers, too, know that cultural imperatives outside of simple economics make Las Vegas a terrific place to fly through for a low fare. It can be cheaper to fly from Chicago to Las Vegas to Miami than just Chicago to Miami (if travel time is no object). The airlines know this, too, so they probably have systems like Orbitz avoid suggesting such fares.

posted by dhartung at 4:48 PM on December 9, 2002

Savvy travelers, too, know that cultural imperatives outside of simple economics make Las Vegas a terrific place to fly through for a low fare. It can be cheaper to fly from Chicago to Las Vegas to Miami than just Chicago to Miami (if travel time is no object). The airlines know this, too, so they probably have systems like Orbitz avoid suggesting such fares.

posted by dhartung at 4:48 PM on December 9, 2002

There is a solution to the TSP problem. In fact, there are many ways of finding a solution to the TSP problem. They are just not polynomial in running time with respect to the input.

I think what you meant to say is that there are many "simple" or polynomial algorithms that approximate the solution, but may not be the

posted by mfli at 4:52 PM on December 9, 2002

Airline pricing formula attempts to say,

"How much does this seat cost?" = "How much is the customer willing to pay for it?"

-or-

How much can we squeeze out of the customer? (I know, it's actually a matter of varying optimal prices, etc. Basically the same deal.)

If you need to fly at the last minute and you*must* have the seat, then surely you're willing to shell out...

Big travel agent joke = "What if the Airlines Sold Paint?" (Scroll down aprox. half way.) It's soooo true... Great joke, really, and it applies to some hotel pricing systems now too. It's a trend...

posted by Shane at 5:45 PM on December 9, 2002

"How much does this seat cost?" = "How much is the customer willing to pay for it?"

-or-

How much can we squeeze out of the customer? (I know, it's actually a matter of varying optimal prices, etc. Basically the same deal.)

If you need to fly at the last minute and you

Big travel agent joke = "What if the Airlines Sold Paint?" (Scroll down aprox. half way.) It's soooo true... Great joke, really, and it applies to some hotel pricing systems now too. It's a trend...

posted by Shane at 5:45 PM on December 9, 2002

There was a really neat nucleic acid computer used to solve a version of the TSP in science, here is something I found about it. They added a bunch of nucleic acids together that were created to represent the possible paths, then as they anneal in all combinations an enzyme will come and cull the longer ones, the result is the shortest path. This is brute force I guess, but it's not slow.

posted by rhyax at 8:33 PM on December 9, 2002

posted by rhyax at 8:33 PM on December 9, 2002

It is easily reducible to the TSP, and is therefore NP-Complete, not NP-Hard. Heuristic solutions exist: it's all a question of balancing your tolerance for error against your need for speed. I've used the Simulated Annealing (Kirkpatrick, Gelatt, Vecchi) heuristic to good effect for some classes of TSP, but this is certainly going to be a big tour, with lots of interesting constraints, as pointed out. Moore's law only goes so far, KettleBlack: you can keep throwing computing power at it, but you had better hope your computing power increases at a faster rate than the size of the problem ...

posted by walrus at 2:56 AM on December 10, 2002

posted by walrus at 2:56 AM on December 10, 2002

Actually, combinatorial problems like the TSP have two different problems associated with them.

There is the decision problem: does there exist a Hamiltonian cycle (a cycle that includes all vertices/cities) of length <= k in the graph?

And the optimization problem: what is the shortest Hamiltonian cycle?

If you follow the definition of NP-hard given by the poster, then the decision problem is NP-complete, and the optimization problem is NP-hard.

This mainly because it's difficult to verify the optimization problem. For example, if I gave you a cycle, and claimed that it's the shortest one, how would you verify that what I gave you is indeed the shortest one (can you do it polynomial time)? On the other hand, if I gave you a cycle, and claimed that it's of size <= k, it's very easy to verify (just go through the cycle and add up the weights, which can be done in polynomial time). And that's the difference between NP-complete and NP-hard. NP-complete problems are also in NP, whereas NP-hard ones may not be.

posted by mfli at 5:21 AM on December 10, 2002

There is the decision problem: does there exist a Hamiltonian cycle (a cycle that includes all vertices/cities) of length <= k in the graph?

And the optimization problem: what is the shortest Hamiltonian cycle?

If you follow the definition of NP-hard given by the poster, then the decision problem is NP-complete, and the optimization problem is NP-hard.

This mainly because it's difficult to verify the optimization problem. For example, if I gave you a cycle, and claimed that it's the shortest one, how would you verify that what I gave you is indeed the shortest one (can you do it polynomial time)? On the other hand, if I gave you a cycle, and claimed that it's of size <= k, it's very easy to verify (just go through the cycle and add up the weights, which can be done in polynomial time). And that's the difference between NP-complete and NP-hard. NP-complete problems are also in NP, whereas NP-hard ones may not be.

posted by mfli at 5:21 AM on December 10, 2002

Thanks for the correction, mfli. I should know that.

posted by walrus at 6:22 AM on December 10, 2002

posted by walrus at 6:22 AM on December 10, 2002

mfli spoke the truth, but I had to read it a couple times to sort it all out. I'm surprised that no one has pointed to formal definitions. Yet.

P = polynomial. Such a problem can be solved in polynomial time. That is, the time amount of work required to solve the problem is roughly proportional to the size of the input raised to some power.

NP = nondeterministic polynomial. An NP problem can be checked in polynomial time. That is, given a solution, checking the accuracy of the solution is a P problem.

Obviously, a P problem is an NP problem.*It is an important unsolved problem to determine if all apparently NP problems are actually P. *

NP-Hard :*A problem is NP-hard if an algorithm for solving it can be translated into one for solving any other NP-problem (nondeterministic polynomial-time) problem.*

NP-Complete: A problem which is both NP and NP-Hard.

Trivially, this implies that any NP-Complete problem is NP-Hard which contradicts a previous comment. Also finding a polynomial time algorithm for a single NP-Complete problem is much like finding an algorithm for all NP problems. Such an algorithm would answer (affirmatively) the question of whether NP and P are equivalent.

posted by stuart_s at 2:10 PM on December 10, 2002

P = polynomial. Such a problem can be solved in polynomial time. That is, the time amount of work required to solve the problem is roughly proportional to the size of the input raised to some power.

NP = nondeterministic polynomial. An NP problem can be checked in polynomial time. That is, given a solution, checking the accuracy of the solution is a P problem.

Obviously, a P problem is an NP problem.

NP-Hard :

NP-Complete: A problem which is both NP and NP-Hard.

Trivially, this implies that any NP-Complete problem is NP-Hard which contradicts a previous comment. Also finding a polynomial time algorithm for a single NP-Complete problem is much like finding an algorithm for all NP problems. Such an algorithm would answer (affirmatively) the question of whether NP and P are equivalent.

posted by stuart_s at 2:10 PM on December 10, 2002

Oh come on surely a beowulf cluster and linux can handle this!!!!!!!!! am i rite??!?!/1 This sort of problem is noethng gnu 2 me!

posted by KettleBlack at 2:20 PM on December 10, 2002

posted by KettleBlack at 2:20 PM on December 10, 2002

« Older Beautiful galleries of alchemical imagery... | US foriegn aid at work... Newer »

This thread has been archived and is closed to new comments

TSP applies to routes like this because instead of using distance you use price so you want to find the minimum priced route not the minimum distance route. Now I'm assuming this is a TSP problem because you want to come back home. It's very easy to come up with a shortest path from A to B.

posted by abez at 2:06 PM on December 9, 2002