the nvidia defect
November 23, 2010 2:23 PM   Subscribe

During the second quarter of fiscal 2009, NVIDIA recorded a $196 million charge against cost of revenue to cover anticipated customer warranty, repair, return, replacement and associated costs arising from a weak die/packaging material set in certain versions of our previous generation MCP and GPU products used in notebook systems. "The plaintiffs are seeking class-action status and want the graphics firm to pay “unspecified damages” as well as replace the faulty chips. Interestingly, those behind the lawsuit all had an HP, Dell, or Apple laptop". Mar 1, 2007 this problem was made known.
posted by sgt.serenity (54 comments total) 9 users marked this as a favorite
Basically, this post consists of the various resources you can use to identify whether you have this defect and the various advice forums that have been set up in some different countries.
posted by sgt.serenity at 2:25 PM on November 23, 2010

The two main sources i have found at the mo: (mostly for the us) (uk site)
posted by sgt.serenity at 2:28 PM on November 23, 2010

You can also identify if you have this defect if your screen suddenly goes black and never shows a pixel again, which is what happened to my Macbook before Apple had to fix it.
posted by thorny at 2:32 PM on November 23, 2010

Wait, it's not supposed to do that? The dude at the Apple store told me it was iBlack and I was lucky to have it!
posted by Mister_A at 2:34 PM on November 23, 2010 [4 favorites]

Got hit by this too.
posted by Blazecock Pileon at 2:35 PM on November 23, 2010

The dude at the Apple store told me it was iBlack and I was lucky to have it!

Good for you ; )

I called techsupport and they told me it was a 'problem with the battery' and to take it out, hold the power button for 30 secs and it would all be right as rain.
posted by sgt.serenity at 2:37 PM on November 23, 2010

My MacBook Pro had this problem. Apple replaced the video card for me gratis, even though the machine was well out of warranty by the time it crapped out.

I mean, uh, Apple blows! Reality distortion field! One-button mouse!

posted by Zozo at 2:43 PM on November 23, 2010 [4 favorites]

My wife's MacBook Pro was bitten by this twice, but Apple has been pretty great about it. Both times we took it to the Apple store and had the same machine back with a new logic board in just a couple of days. Apple is covering free replacements through at least May 2011, even for Macs that are out of warranty.

By comparison, my sister-in-law had a Dell with the same problem. Support from Dell was a nightmare. She was out a computer for well over a month and ended up receiving a different model entirely (newer but also larger, heavier, and louder). Luckily we made a full backup of her hard drive before we sent it in.
posted by jedicus at 2:43 PM on November 23, 2010

nvidia owes me several thousand in mental anguish for having to deal with customers with busted laptops over the phone for months. Can we tack that rider onto the lawsuit please?
posted by Mister Fabulous at 2:44 PM on November 23, 2010

Apple replaced the 8600M on my 2008 MacBook Pro a few months after I bought it and extended the hardware warrantee for the chip to 3 years.

Are the links saying that the replacement is bound to fail and that my laptop has limited resale value?
posted by bonobothegreat at 2:47 PM on November 23, 2010

During the second quarter of fiscal 2009

Known elsewhere in the world as May-July 2008 -- Fiscal X in Nvidia land goes from January X-1 to January X.
posted by Monday, stony Monday at 2:48 PM on November 23, 2010

Apple fixed my MBP for free even though the warranty had expired. Fast turnaround, too.
posted by Blazecock Pileon at 2:48 PM on November 23, 2010

Ha, for once I get to feel superior for not having a real graphics card.

*goes back to not playing modern games on my integrated graphics laptop*
posted by kmz at 2:50 PM on November 23, 2010

If you had one of the defective chips, you have probably installed a bios update that caused your fan to run nonsrop so your computer would die horribly juuuuuuust after your warranty expired at which point you would plunge into Customer Service Purgatory:

"Hey, uh, my computer had one of those defective NVIDIA chips. It meltered my motherboard. Plz halp"

"We've never heard of such a thing!"

"Uh, yes you have. I was trying to figure out why I suddenly have a $1200 paperweight and discovered instead that thousands of people who have the same model seem to have had the same thing happen to them,"

"LIES. Go fuck yourself! Thanks for buying HP! Have a nice day!"
posted by louche mustachio at 2:50 PM on November 23, 2010

Are the links saying that the replacement is bound to fail and that my laptop has limited resale value?

Yes and no. Nvidia never fixed the underlying problem, so the replacement parts have a lifespan of about a year to eighteen months. But Apple will continue to replace the defective parts for free for up to four years after the original purchase and "Apple will continue to evaluate the repair data and will provide further repair extensions as needed," so they may extend the program again.
posted by jedicus at 2:50 PM on November 23, 2010

Are the links saying that the replacement is bound to fail and that my laptop has limited resale value?

The links are mostly addressing the non-apple fallout from this, apple dealt with this promptly while HP, Dell and numerous retailers perhaps need to sharpen their crisis management skills.
posted by sgt.serenity at 2:50 PM on November 23, 2010

So, what you've had is gullible people like myself being easily fobbed off until googling the laptop make and finding out the real reason.
posted by sgt.serenity at 2:53 PM on November 23, 2010

Ha, for once I get to feel superior for not having a real graphics card.

These failing GPUs were integrated, that is, soldered directly onto the motherboard. If it were a real graphics card, replacement cards could have been sent out and changed out instead of having to have hundreds of thousands of laptops shipped all over the place.

The real winners: FedEx and UPS.
posted by Mister Fabulous at 3:00 PM on November 23, 2010

Now that I think about it... shipping all of those laptops used untold gallons of fuel. Fuel that was purchased from the Middle East. Funding terrorists!


Okay, I'm done.
posted by Mister Fabulous at 3:03 PM on November 23, 2010

I am not pleased to discover that this is the graphics card on my aging MBP (ages out of Applecare in February). Knock on wood, it's done all right so far.
posted by immlass at 3:21 PM on November 23, 2010

Without this defect, I would not have been bequeathed a modern (for me) laptop, and I'd have considerably less completely-gut-a-laptop-and-then-put-it-back-together skills.

...And I'd be richer by about $300 in parts, and would not hate HP with a black fury.
posted by tmacdonald at 3:21 PM on November 23, 2010

I am not pleased to discover that this is the graphics card on my aging MBP (ages out of Applecare in February). Knock on wood, it's done all right so far.

It'll be fine. Apple's replacement program is fast, free, and independent of AppleCare and the regular warranty.
posted by jedicus at 3:25 PM on November 23, 2010

To save others digging to find out which chips are affected (from the MCP link), the affected GPUs are the nvidia geforce 8400M and 8600M chipsets.
posted by ArkhanJG at 3:38 PM on November 23, 2010 [1 favorite]

I've used a hot air station to reflow the graphics chip on a couple of HP laptops that had this defect. One of them failed again 6 months later, reflowed again, and so far so good.

While I'm sure there is a specific Nvidia defect, it is also a problem that effects many many BGA chips made around 2007-2008. The Xbox 360's famous Red Ring of Death problem is a BGA soldering defect, and I think the Linksys WRT310N has a similar problem on the wireless chip. Basically, any BGA chip that goes through a lot of extreme thermal cycling is more prone to failure than it should be..

I'll try to find a good BGA assembly video on youtube.. The videos are there, and probably demonstrate what a BGA is more effectively than me trying to type it out :)
posted by Chuckles at 3:41 PM on November 23, 2010 [3 favorites]

Looks like my Dell is affected. Haven't noticed any problems though, and I've run it so hot it's caused warping and cracks in my wooden table (that would seem to indicate a fairly serious cooling design problem, but that's another story.)

I had just ordered a new laptop this morning for other reasons, so hopefully it sticks it out another week or so. The 330M isn't affected, I hope?
posted by chundo at 3:44 PM on November 23, 2010

In technical circles, its known as bumpgate. Nvidia's manfuacturing of the chip was suspect and caused many failures.

It was only on that generation of chips, it appears to be rectified on all chips since that problematic 8000-series.
posted by SirOmega at 3:51 PM on November 23, 2010 [1 favorite]

Here's a non irritating video that shows a technique for BGA soldering by hand. I'd like an industrial video too, but I couldn't find one.
posted by Chuckles at 4:01 PM on November 23, 2010

In related news, Unsealed Lawsuit Indicates Dell Hid Faults of Computers. Not nVidia chips, in this case, but "capacitors made in Asia with a bad chemical recipe". Good old bulging caps. The crazy thing is how hard Dell worked to deny problems and avoid responsibility for so long.

And in other related news, my Internet is currently flaking out every few hours. Because of, wait for it, a bad capacitor in the DSL modem! Awesome. Fortunately my ISP tech support is fantastic and a replacement modem is promised for tomorrow.

Capacitors: $0.01, but priceless.
posted by Nelson at 4:04 PM on November 23, 2010 [2 favorites]

The extent to which hp went to deny the problem was amazing. hp's official support sites had thousands of comments by people affected who were brushed off.

I guess that's what Hurd's cost cutting meant.
posted by stratastar at 4:14 PM on November 23, 2010 [1 favorite]

ArkhanJG: "To save others digging to find out which chips are affected (from the MCP link), the affected GPUs are the nvidia geforce 8400M and 8600M chipsets."

The GeForce Go 7600M also has the problem, but those were put in some of the first Core2Duo laptops and are now "too old" for HP to bother dealing with. My Pavilion DV9200CTO has this exact same problem, but HP's never bothered to try to remedy it.

My new ThinkPad, on the other hand, has a 4 year extended warranty. Screw you HP.
posted by fireoyster at 4:25 PM on November 23, 2010

Chuckles explains the issue pretty well. Basically, the problem stems from the ROHS Directive, which forced electronics manufacturers to switch to lead-free solder. Lead-free solder melts at a much higher temperature, and is more brittle than real solder, and as such, there are a number of engineering challenges to using it. Assembly lines, however, are ridiculously expensive to retool, so production continued as if they were still using leaded solder.

In contrast to a DIP chip -- the standard rectangular chip with pins down the sides -- A BGA chip is a big flat square with a bunch of metal dots on the bottom. On each one of those dots is a ball of solder. The chip is placed on the board and heated, ostensibly melting the solder and fusing each dot to a corresponding one on the circuit board. Since these chips generate a fair amount of heat in daily use, they go through a lot of heating and cooling cycles, which can make the chip expand and contract slightly. This puts stress on those little solder balls, and since they're using this new brittle solder, they tend to lift off their pads, partially detaching the chip from the board.

In my shop, we basically just strip the laptop down to the motherboard, point a heat gun and an IR thermometer at the chip, wait till it hits ~210 celsius. This melts the solder balls under the chip, "reflowing" the solder and sticking the chip back to the board. It's actually pretty simple, and if you're out of warranty and have nothing to lose, I do suggest giving it a try.

I've fixed literally dozens of HP DV9000's, and DV6000's this way, as well as a few desktop graphics cards. The usual culprit is the GeForce 8200 or 8400 GPU, but really, any BGA chip that uses ROHS solder can be affected. The Xbox360's RROD problem is probably the most famous example.
posted by inedible at 4:33 PM on November 23, 2010 [28 favorites]

That last link is a massive repository of human pain. If I ever feel bad I can just go read that and know someone else is worse off.
posted by jewzilla at 4:39 PM on November 23, 2010

Holy hell was that informative. Thanks, inedible!
posted by Civil_Disobedient at 4:44 PM on November 23, 2010

It saddens me to see NVIDIA have a setback like this, I hope they can keep up the innovation.
posted by StickyCarpet at 5:09 PM on November 23, 2010

Wow, inedible and Chuckles, between the two of you, that was probably the clearest explanation I've seen of this to date. Thank you for that!
posted by limeonaire at 5:24 PM on November 23, 2010

sgt.serenity: True enough. It's "mostly" a temporary solution, though like I said I've fixed dozens, and have only had one that failed again within 6mo (our warranty we put on the fix). You can improve the odds by adding some sort of shim between the chip and heatsink, putting more pressure on the chip (some ghetto solutions use pennies or other coins). That's the same "solution" as the "x-clamp" xbox rrod fix.

I'm not saying it's the preferred solution, or that HP/Apple/Nvidia/Etc shouldn't be taking more responsibility here, but I do work at a shop that performs these sorts of repairs. We're straightforward about it and tell the customer beforehand the potential caveats and pitfalls, and a lot of them are happy to pay $250 to enjoy their 2 year old laptop for another year. (I'll point out that any laptop with a dedicated GPU was not a cheap laptop when it was bought.)

I also live in Canada, which means any of the class action lawsuits that arise over this are completely irrelevant here. If my shop was in the US, my customers could get my repair bill reimbursed through HP, but judgments in US lawsuits stop at the border. These people truly are completely screwed and my shop really is their last hope at getting a working laptop.

Oh, and if you ARE in the US, you do have a couple options. Apparently you can apply here for a reimbursement, repair, or replacement. If your laptop is an HP, and it was purchased with a 12mo warranty, less than 24mo ago, your warranty has been extended by 12mo, and this IS covered, though you may have to talk to a manager to get satisfaction. (Incidentally, I had this page bookmarked, which talked about the 12mo extension, but it seems they've moved or removed the page.)
posted by inedible at 5:59 PM on November 23, 2010

Got bit by this twice. The first time they did some poking, nodded and repaired it.

The second time, I turned it on, they didn't even bother poking, just replaced the system bord.

no problems since.
posted by mephron at 6:02 PM on November 23, 2010

'reflow' repair as a temporary solution at best:

I think that's way too simplistic. After all, repairs are "temporary at best" at the best of times. Look at those bumpgate articles, you see a lot of talk about the careful and detailed engineering work and the precision and everything.. Sure, true enough, except that 90% of that care and precision is "we tried a bunch of stuff, and this worked".

For an industrious individual faced with the problem, reflowing is a fantastic solution.

As a business.. I personally avoided offering the repair as a service because I couldn't see a working business case. What if I screw it up and short a couple of balls, what if it fails again in a few weeks, etc. At $250 a shot, I can see a business case, but as a basement engineer, I don't think I could charge that much.

My solution was to lease the one extra laptop to somebody looking for a temporary home PC--it breaks, stop paying me, no big deal--and use the other as my media centre PC.

Funny thing is, the market price for these failed laptops is quite high, so I only ever had to face the question a couple of times. Or, maybe I just never shopped the right places at the right time :)
posted by Chuckles at 6:45 PM on November 23, 2010

I decided not to say initially was "sounds like that guy has a vested interest in discrediting reflow", but reading the link:
And if You decide to do a proper repair - make sure its a new GPU with a DC Code 09+
Not going to advertise myself and its been honest post for all of You.
Knew it :)
Which is not to say I disagree, just that he makes it sound all cut and dried. Trust me I'm the expert and everything.. Sure, his repair is better, but how much better? He probably doesn't know himself.
posted by Chuckles at 6:55 PM on November 23, 2010

These people truly are completely screwed and my shop really is their last hope at getting a working laptop.

Canadians have no consumer rights ? I think that might not be the case - did you tell any of the customers that came to you they may be actually legally entitled to a refund/repair ?
posted by sgt.serenity at 7:08 PM on November 23, 2010

Having said that though as we all know, no consumer rights exist in scotland and if anyone in the area wishes me to attempt to resucitate your inherently flawed laptop with a hammer and some shortbread (for the knockdown price of £250) - the emails in profile.
posted by sgt.serenity at 7:16 PM on November 23, 2010

has this recall been offered in canada?
posted by paradroid at 7:37 PM on November 23, 2010

Canadians have no consumer rights ? I think that might not be the case - did you tell any of the customers that came to you they may be actually legally entitled to a refund/repair ?

We have plenty of consumer protection laws in Canada, but none of them say that if a product with a 1 year warranty breaks after a year, you're entitled to a new one for free.

Yes. As soon as the issue was made aware to us, we started printing out the HP extended warranty info and giving it to people. I always turned away customers that were within warranty, even if they're within some special secret warranty that only I know about. Just recently, after the settlement was given one of the last approval stamps in the chain, the day this website was created to process claims, I sent the link out to a few clients informing them they're entitled to a reimbursement. It was only hours later that I was corrected by one of our clients who runs a law firm, that if you read the fine print, it's only applicable in the US.

Apparently, as a Canadian, doing business entirely within Canada, who has no involvement with the US company, and has never done business with this US based company, you have no right to sue them. Sure, you can try to start up a class action lawsuit within Canada, to try and find other Canadians who bought Canadian laptops from Dell Canada and get them to join you in a long and costly legal battle, but you can't join in the existing lawsuit with US citizens who bought US laptops from Dell America. Crazy, huh? The rub here is that Canadians aren't really the litigious type, so a class action lawsuit for this would never get off the ground. Consumer rights didn't help here, this is a civil suit.
posted by inedible at 7:50 PM on November 23, 2010

Well, so many theories as to the source of the problem presented in these links.

A little bird told me that it was a combination of things: some these parts from NVidia turned out to run much hotter than they expected from early builds, and in addition their packaging supplier changed their formula in the glue inside so that it couldn't handle these higher temperatures, (and supposedly the supplier admitted fault to NVidia).

NVidia's first response was BIOS fixes to run the fans faster. The problem is that laptops are a lot harder to cool, and they fail even when its fan runs at maximum (which does wonders for battery life).

I hadn't heard the ROHS/soldering issue, but if reflow is fixing any of these chips, then that definitely indicates a soldering problem. I can't believe there are manufacturer's out there so foolish as to not build the boards correctly with the lead-free solder--it is pretty well established how to do this, but if it some fly-by-night ultracheap manufacturer, I guess you never know. In any case, the board manufacturing is something out of NVidia's control. If their overly-hot chips are causing poor solder joints to fail, I wouldn't give them 100% of the blame.

Nevertheless, most failures were in the chips themselves.

That BGA-by-hand video only shows one step of many for soldering on a BGA--something I wouldn't recommend for a chip the size of NVidia's processor (especially with lead-free solder).
posted by eye of newt at 8:42 PM on November 23, 2010

I had the problem with my Lenovo Thinkpad T61p. Lenovo is not included in the lawsuit. Lenovo refused to do any repair work without me paying over $600 because it was out of warranty, and even that wasn't the board replacement that would truly fix the problem so my $600+ could buy me a dead laptop again in a few months. Some people were getting theirs fixed for free via a helpful guy (Mark at Lenovo) in their forums, but he never got back to me and phone support was a dead end. Basically Lenovo doesn't want to acknowledge there was a defect they (or nVidia) were responsible for. I got completely frustrated. In the end I bought a cheap $250 netbook at Costco to get me through while I decide if I will ever give Lenovo any money again. Many months later I'm still content with this netbook (mostly because I don't do any PC gaming anymore) but soon I will be deciding who to buy from next and the customer service response to this issue by the different manufacturers will be something I take largely into account.
posted by girlhacker at 8:54 PM on November 23, 2010

Here's a good article on the Nvidia failure from The Inquirer.

They say that the packaging materials were all Nvidia's choices, where I had heard the the packaging company made these (poor) choices. Who knows? In any case I think the real issue is that the chips ran way too hot.
posted by eye of newt at 9:06 PM on November 23, 2010

girlhacker, what did you do with the dead laptop? :)

eye of newt: Nevertheless, most failures were in the chips themselves.

I think most are fixed by the reflow process. I guess it is possible that the reflow process helps some of the internal die to case bonding connections, but it isn't supposed to..
posted by Chuckles at 9:37 PM on November 23, 2010

Oddly enough, I've yet to encounter this problem professionally, though all the laptops we bought in that timeframe were cheapies with integrated intel graphics, and we're admittedly not equipped for significant laptop repairs inhouse. It was also about that time I switched to ATI solutions personally, though that was mainly because of the absolutely dire nvidia drivers in vista.

I did have the infamous RROD on my own 360 though, which got fixed under warranty. My mate's was bought 2nd hand, and his RROD'd at the same time (gears of war 2 scuppered both of ours on the same day; guess it runs the GPU that bit harder and hotter).

Since his was 2nd hand, he attempted a ghetto repair. After the usual attempts (X-clamp etc) he found running the thing with the case open and the fans off managed to get the GPU hot enough to partially reflow the solder so it worked again. He now runs it with the DVD drive external to the case, and a 120mm PC case fan cable tied to the miserably small GPU heatsink. So far, it's still working...

Microsoft's first attempt at fixing it was strapping an extra flyaway heatsink onto the GPU heatsink, which is desperately small as it's under the DVD drive. They did switch to a 65nm version of the GPU - instead of the original hot 90nm version - which draws less power (and generates less heat) and the current slim uses a 45nm SoC (system-on-a-chip), again reducing heat load and putting less stress on the solder joints, though presumably they've also taken advantages of the packaging improvements.
posted by ArkhanJG at 11:33 PM on November 23, 2010

girlhacker, what did you do with the dead laptop? :)

It's holding down the dining room table we use only once a year ...uhh that's coming up on Thursday so I'm going to have to move it. Can I really get a lot for it? :-)
posted by girlhacker at 12:21 AM on November 24, 2010

Huh. I'd not heard of this until last week, when the Macbook Pro screen refused to work after applying the latest security patches. Apple is replacing the entire logic board, of course. I suspect eventually nVidia will be paying this cost.
posted by clvrmnky at 7:16 AM on November 24, 2010

What lucky-for-me synchronicity; I just had my hp dv95xx laptop's video die, trapping some pics from a recent trip on the machine. Well, lucky-for-me that I know what to try and fix it. Really lucky-for-me would have been no failure in the first place =p
posted by nomisxid at 7:40 AM on November 24, 2010

So, I was reading the claim. Technically, I am also one of those affected and the wonderful FAQ for the settlement states that if my computer happens to fail before the deadline I am fine. On the other hand, if it doesn't I seem to be screwed. What a wonderful day!
posted by lizarrd at 11:14 AM on November 24, 2010

Well, it is probably worth a little under $200 as it is.. Depends a lot on specific details, of course.
posted by Chuckles at 10:58 PM on November 24, 2010

« Older Yes, BWV565 is also included   |   We'll need to declaw that cat. Newer »

This thread has been archived and is closed to new comments