The Safe Mac

Understanding upgrade nightmares

Every time Apple releases any system update, online communities go wild with reports of problems. These problems are always blamed on bugs by some, and discussions degenerate into vitriolic arguments. The truth is that, although bugs can and do happen, there are many common problems that recur frequently and have nothing to do with bugs. This means that it is prudent for anyone having problems following an update to try troubleshooting those problems, rather than assuming they are caused by a bug and waiting for someone else to fix them.

Computers are very complex machines, and depend on a complex interaction of many hardware and software components.  Any little thing can potentially gum up the works, and when that happens, the average user has no idea where to turn.  In order to understand how to approach such a problem, it is first necessary to gain an understanding of the potential causes for problems.

Causes

The first and most obvious reason that an upgrade can cause problems is presence of bugs.  This is the one everyone tends to blame most of the time, but actually, bugs are rarely the source of serious problems.  It does happen sometimes, though.  Consider, for example, the guest login bug in Mac OS X 10.6 and 10.6.1, which could cause an entire user account to disappear, taking all that user’s data with it.  That was a major bug, which Apple did acknowledge and eventually figured out how to fix in 10.6.2.  In general, however, system upgrades and updates are tested pretty thoroughly, and only less significant bugs are likely to slip through testing undiscovered.  Since major bugs are rare and you would have to wait for Apple to fix them anyway, it’s almost always more productive to seek other causes for update problems.

Another source of problems is incompatible third-party software.  Developers have documentation available to them that tells them how to do things, and they should follow those directions closely to ensure their programs function properly.  However, sometimes developers don’t follow those instructions closely enough, or they followed the instructions very closely ten years ago and the rules have changed by now.  In either case, changes to the system eventually catch up with them, and their code stops working properly.  This can cause all manner of problems, including (but not limited to) crashing, kernel panics, cosmetic bugs and even data loss in extreme cases.

A third possibility is corruption.  Corruption occurs whenever data is unintentionally modified, causing some or all of the data in question to be replaced by what is essentially random data.  This can cause very serious problems, even failure of the computer to start up if a vital file becomes corrupt, can cause data loss in a user’s documents or all manner of other issues.  Corruption can exist without notice, until the installation of a new system writes thousands of files, worsening the corruption and further damaging a slightly damaged system.  Imagine going into a factory and removing one component from the machinery at random, replacing it with some random object.  That could cause the entire factory to shut down or could be completely unnoticeable until just the right set of circumstances occurred, depending on the part that was removed.

Lack of resources – RAM, hard drive space, processor speed, and so on – can be a cause for performance problems, and in extreme cases can even cause crashes.  Often, major new versions of the system are more demanding than the previous version.  Mac OS X 10.7 (Lion), for example, requires a minimum of 2 GB of RAM, while Mac OS X 10.6 (Snow Leopard) only required 1 GB and Mac OS X 10.5 (Leopard) needed a minimum of 512 MB.  Newer software with higher requirements is not likely to run as well as a previous version did on the same hardware.  (I, for example, had to upgrade my computer’s RAM to 8 GB after installing Lion, due to RAM-related issues that were not present in the less-demanding Snow Leopard.)

Causes can also include hardware failures, contrary to popular belief.  Sometimes hardware has a minor flaw that is not a problem until a new system stresses it in a new way.  Other times, certain hardware can fail as an unrelated side-effect of installing an upgrade.  For example, the act of writing thousands of files to the hard drive may push a drive on the verge of failure over the edge.  Finally, one should not discount the coincidental failure.  To many people, it seems wrong to say that a hardware failure that happens at exactly the same time as a system update is coincidence, but coincidences do happen.  In addition, the failure may not have actually been coincidental…  for example, someone who hasn’t used their optical drive in 6 months may try to burn the Mac OS X 10.7 (Lion) installer to a DVD and find their optical drive is broken.  This might lead to the update being blamed, even though the drive could have been broken at any point in that 6 month period without the user noticing.

Solving the Problems

Understanding the causes is all very well and good, but the average user with problems doesn’t care about all that.  He/she just wants to have a working computer again!  So what can you do if your system is having problems following an update or upgrade?

Note that, before you attempt any of the following fixes, you should have a good set of up-to-date backups!

Step 1: Check the hard drive and the RAM

Many problems can be traced to one of these things.  Your Mac requires that you keep roughly 10% of the hard drive free.  You can check these numbers by selecting your hard drive in the Finder and choosing File -> Get Info.  In the window that appears, compare the Capacity and Available values.  If you have a drive with a total capacity of 500 GB, your available space should not fall below 50 GB.

If you don’t have enough free space on your hard drive, your computer will begin to slow down, may start crashing and, if space gets tight enough, may even corrupt data.  The fix is easy, fortunately…  delete some stuff!  You will either want to get an external drive and move some of your data to that or replace the internal drive with a larger one.  The former is a quick and easy solution for anyone, while the latter is not a do-it-yourself project for everyone and will take more time.  In any case, though, you’ve gotta free up some hard drive space!

If you don’t have enough RAM, that can cause your machine to slow to a crawl and make the spinning beach ball cursor a common sight.  Mac OS X 10.7 (Lion) in particular is very demanding when it comes to RAM.  Fortunately, there’s an easy check to see if you’re having RAM-related performance problems.  When you start having problems, open Activity Monitor (in /Applications/Utilities), click the System Memory tab and compare the page ins and page outs.  If page outs is 10% or more of the page ins, you need more RAM.  So, for example, if the page ins value is 500 MB and page outs is 50 MB, you probably need more RAM.  (The higher the page outs, the more desperately you need more RAM.)  However, if page outs is only 5 or 10 MB, you’re fine.

If that test indicates that you need more RAM, you have two options.  First, you can start being more cautious about what apps you are running at the same time.  That may not be sufficient if you use particularly “heavy” apps, though.  The other choice is to buy more RAM.  You’ll get a better price if you buy from a company like Crucial or Other World Computing and install the RAM yourself.  But if that intimidates you, pay the premium to buy it from Apple and have one of their techs install it for you.

Step 2: Examine kernel panic logs

If you are not having kernel panics, you can skip this step.

Kernel panics are usually caused by one of two things: bad third-party kernel extensions (kexts) or hardware problems.  The kernel panic logs can help you determine the cause.  Open the latest log and search for a line that reads “loaded kexts:”, and look directly below that for any lines that don’t start with “com.apple.”  Also look at what appears directly below the line reading “Kernel Extensions in backtrace:”, to help narrow things down.  If you recognize the names of any software you have installed in either of those places, try removing that software.

If you don’t see any third-party kernel extensions mentioned in the log, you may have a hardware problem.  You should run the rest of the tests, but be prepared to call Apple for service.

Step 3: Repair permissions and the hard drive

Open Disk Utility, select your hard drive (the one the system is on if you have more than one) and click the Repair Disk Permissions button in the First Aid tab.  That will probably not help anything, but sometimes it does, and it’s a quick and easy thing to try.  Note that there are a number of messages that will appear every time and that can be ignored…  see Apple’s document on permissions messages for more information.

Next, you need to repair the hard drive, following Pondini’s instructions here:

http://pondini.org/OSX/DU6.html

If problems are found that cannot be repaired, you’ll need to either erase the hard drive and restore the system from a backup or try repairing with another tool, like Disk Warrior.

Of course, even if you get all the repairs done successfully, that does not necessarily mean your problems will go away.  A damaged hard drive can easily lead to corrupted data somewhere, so you’ll probably still need to keep chugging through these steps.  But at least you won’t be worsening the problem by continuing to use a damaged drive!

Oh, and one other note: if you still see your problems even when you have started up from the recovery partition or Mac OS X install disk, chances are good that it’s hardware.  You can certainly run the rest of these tests if you like, but I personally would take my machine to the nearest Apple Store ASAP, and would probably skip ahead to step 8 just to see if a hardware issue could be identified easily.

Step 4: Start up in Safe Mode

Mac OS X allows you to start up in Safe Mode by holding down the shift key at startup.  One of the things that Safe Mode does is disable all third-party software that runs at startup.  This test can tell you if problems are caused by third-party software.  If problems go away during Safe Mode, but come back as soon as you have restarted normally, then they’re almost certainly caused by third-party software.  The trick will then be sorting through all your software to find the culprit.  Once you have found the cause, you should check to see if a newer version is available, offering compatibility with the new system.  If the developer does not offer an update, you’ll have to either replace the software or revert to an older system (not an easy or fun task).

If the problem goes away after starting up in Safe Mode, even after you reboot normally, then it may have simply been a cache-related problem or something else along those lines.  See the information about Safe Mode on Apple’s web site for more information about other things Safe Mode does.

Step 5: Test a new account

Corruption of settings files in your account’s home folder can cause all manner of strange problems.  Open System Preferences (from the Apple menu), then click Accounts (in Mac OS X 10.6 or earlier) or Users & Groups (in Mac OS X 10.7).  Click the lock in the bottom left corner of the window to allow you to make changes (you will need an administrator account username and password).  Then click the + button below the user list.  Create a new user (a “Standard” account is sufficient).

Now log out and log in as the new user.  Run some tests and try to get your problems to happen.  If they still occur, the problem has nothing to do with one particular user, and you can skip to the next step.  If they don’t, the problem is specific to your user account.  Now you just have to try finding the cause.  One thing to try, if your user account is an admin user, would be to log back in to your regular account and execute the following command in the Terminal (found in /Applications/Utilities):sudo plutil -s ~/Library/Preferences/*.plist

(Copy and paste it rather than trying to re-type it.)  When you hit return, you’ll be prompted for your account password…  don’t be startled when typing doesn’t display anything, that’s normal.  Hit return again, then wait until the Unix prompt reappears.  If nothing else is printed on the screen, it didn’t find any corrupt preferences (though that’s no guarantee).

Step 6: Clear caches

Clearing caches is not something that should be done for regular maintenance, as some people will tell you.  Caches are there to help keep things running fast.  However, sometimes caches can become corrupt, and this can cause all manner of strange behavior, including application crashes.  If the previous fixes haven’t helped, it’s time to try clearing the caches.  There are a number of utilities that can help you do this, some good and some best avoided.  To be safe, I recommend simply doing the job yourself.  You can safely delete everything in the following folders:~/Library/Caches /Library/Caches

The first of those is found in the user Library folder (which is invisible in Mac OS X 10.7), while the second is found at the root level of the hard drive.  If that’s Greek to you, just choose Go -> Go to Folder in the Finder and paste in the first path, then hit return.  Delete everything inside that folder.  Then repeat with the second folder.  Once you’re done, restart the computer.

One caution: any time you delete stuff like this from your computer, it would be wise to have a good set of backups, just in case something goes wrong.

If this is too intimidating to try to do manually, download OnyX and use it to clear the caches.

Step 7: Reset the SMC

This is not a solution that should be applied to all problems.  But if you’re having trouble with sleep, starting up or shutting down, excessive fan speed or heat, battery issues or other similar problems, you should try following Apple’s instructions to reset the SMC.

Step 8: Zap the PRAM

PRAM, aka Parameter RAM, is a chip that maintains certain settings on the machine – things like clock settings, the selected beep sound, speaker volume, key repeat rates, etc.  This is kind of a last-ditch effort, a bit like sacrificing a chicken at the altar of the technology gods, but it can help in certain cases.  To reset the PRAM, start up the computer and immediately press and hold command-option-P-R.  When the computer restarts and you hear the startup chime a second time, you can release the keys.

I’ll be honest, though…  I haven’t zapped the PRAM on any of my Macs in more than a decade.  I only offer this as something to try because it’s quick and easy, and has been known to fix some things, but not because it’s likely to help in most cases.

Step 9: Run Apple Hardware Test

See Apple’s instructions for running Apple Hardware Test.  If any problems are found, contact Apple for service.  If problems are not found, however, that is not a guarantee that they don’t exist…  it just means none of the problems AHT is capable of checking for could be found.

Step 10: Reinstall the system

If all else has failed, you can try reinstalling the system.  In Mac OS X 10.6 and 10.7, reinstalling the system will simply replace system files with new copies, leaving everything else alone.  (Of course, be sure to back up just in case something goes wrong.)  If you have any corrupt system files lurking deep in the bowels of your hard drive, this will replace them with fresh copies.  Be sure to update the system with Software Update (in the Apple menu) after reinstalling.

Step 11: “Nuke and pave”

You’ve tried everything else, and you’re desperate for a solution.  Time to “nuke and pave”…  that’s nerd-speak for “erase the hard drive and reinstall everything from scratch.”  You’ll want to follow the instructions from step 3 for booting in recovery mode or from your Mac OS X install disk, then use Disk Utility to erase the hard drive.  This will destroy everything on the hard drive, so backups – and more than one of them – are absolutely critical!  Be sure one backup is a clone made with something like  Carbon Copy Cloner.

Once the drive is clean, install the system.  DO NOT install any applications yet!  Use Software Update to bring it back up to date.  Copy your documents manually from the clone backup (don’t use Migration Assistant or Setup Assistant).  Don’t copy any preference files or anything else from the user Library folder.  For things like your iCal database, Address Book database, Mail data, iTunes library, iPhoto library, etc, use Google to find out where the important items are and copy those, if necessary.

Once the machine is reconfigured the way you want, and not exhibiting the problem(s), start installing your applications one at a time.  Test for a while to make sure the problem(s) don’t come back before installing the next application.

If the last step doesn’t solve your problem, there are really only three possibilities: you did something wrong, there’s a hardware problem or you’ve found a legitimate bug!


Posted

in

by

Tags: