Inactive Document, kept for historical information.
Version 1.0, 7/26/10; 1.1 8/13/10; 1.2 9/7/10; 1.3 11/30/10
This page describes problems with users not being able to log onto Windows 7 that we have been working on for a couple months. Symptoms and circumventions are discussed and this page will be updated when new information is obtained.
At this time there are several distinct problems, plus various other problems likely caused by intermittent network connectivity during logon.
Problem 1. Logon and Forced Logoff (Solved)
This has been seen on many types of Windows 7 builds, and for logons to a local account, dce.psu.edu accounts and Windows domain accounts. The userid and password is entered, and the "Welcome" message is displayed for 3 to 4 seconds, then the "Press CTRL + ALT + DELETE to log on" is displayed. There are no error messages. The next logon attempt works. In some forums this was called "double logon". The Security Event log will show a valid logon followed by a forced logoff.
If the hotfix KB977074 is removed this problem goes away. Microsoft is aware of if this and we believe they are working on a fix. Read about the fix at http://support.microsoft.com/kb/977074 before you decide to remove it.
The command line:
wusa.exe /quiet /uninstall /kb:977074
can be used to remove the hotfix. That will trigger a reboot.
The hotfix has been removed from CLM systems. It is still in the current version of the build key but the removal command will soon be put in a baseline for all systems.
Problem 2. The user name or password is incorrect (Circumventions)
When attempting to log onto a Windows 7 computer that has just resumed from sleep mode (standby) or hibernate mode and using the domain (realm) dce.psu.edu and a valid Penn State Access Account userid and correct password, the user gets "The user name or password is incorrect" multiple times. Sometimes the problem goes away in 10 minutes (or 1 minute if the FarKdcTimeout=1) but sometimes the problem persists until reboot. Logging on locally or via a Windows domain works fine.
Various tests, observations, and revelations include:
- The problem is almost always after resume from sleep and then attempting to log on immediately. Waiting 20-30 seconds usually avoids the issue.
- Dell Optiplex 960's and 980's log link up/down/up events, sometimes multiple times, on resume from sleep. Dell driver 188.8.131.52 for these models helps, but there is still one up/down/up after wakeup. The Optiplex 780 does this about 80% of the time.
- For Dell Optiplex 980, updating the BIOS from A01 to A02 eliminates the logging of the network link going down/up and seems to help or eliminate the problem.
- The network link up/down/up problem on Dell Optiplex 960's had been referred to Dell and the case has been escalated.
- Other makes and models may have intermittent network connectivity after resume from sleep that cause this problem, but we do not know how frequent those problems are.
- The expected response when attempting to do a network logon when the network is not up is "There are no logon servers available to service the logon request"; this is common with XP and Windows 7 will display this at times too, and you may soon be able to log on or you may get "user name or password incorrect" even when the password was correct. Note that this with the setting "cached credentials" set to 0, which is appropriate in on a public computer and what we set globally (most users do not log onto the same computer day after day, and cached credentials also creates problems when the roaming profiles can't be loaded).
- A case with Microsoft has been escalated and they determined a problem is that the Kerberos client invalidates the dce.psu.edu realm if the network is not up at the time it attempts to resolve the KDC names for the realm. Setting a KDCNames registry entry for dce.psu.edu doesn't help when DNS lookups for these fail too. It should retry binding to the realm in X minutes, as set by the FarKdcTimeout parameter; the default is 10 minutes.
- This is not a DNS timeout problem; with no network up, there are no DNS servers configured, so the failure to bind to the realm or any of the KDCs is immediate.
- Sometimes users can log on after the realm is validated again, but it looks like Kerberos often fails to determine the supported encryption types set on the computer, and so it will allow only AES256. We expect Microsoft will accept this as a bug, but don't know how long it might take to get a fix.
- Penn State Access Accounts created before 1/6/10, or whose passwords were last changed prior to 1/6/10, cannot use AES256; these users can't log on but users who have changed their password after 1/6/10 can.
- If a random key is added or removed from the registry under HKLM\System\CurrentControlSet\Control\Lsa\Kerberos\Domains, Kerberos detects that change and resets itself; all users then can log on. This is as effective as reboot.
- Update 11/30/2010: we have tracked the problem for several months, logging all occurrences (about 0.5% of logins) and have given several more Kerberos traces to Microsoft. The problem happens at times when the system has not resumed from sleep, and on all models and all locations. MS case manager says they are not likely to accept it is a bug but will try to put in an enhancement request for Windows 8 to retry binding to the domain when if fails (sooner than the minimal FarKdcTimeout setting of 1 minute) and giving the correct error message to the user. Still have network problem with Optiplex 960's and Intel AMT open with Dell.
- Users who have not changed their Access Account password since 1/6/10 can go to https://www.work.psu.edu and change it. So far this looks like it circumvents the problem when Kerberos has forgotten the supported encryption types, as it will use AES256 and the Access Account can now use that too. However, other problems (see below) due to intermittent network connectivity, may persist.
- Administrators should update network adapter drivers to the latest version from the vendor, particularly if the System Event log has entries showing the network link going up/down/up after resume from sleep.
Administrators need to be sure the correct encryption types all allowed; all can be set via a GPO or by setting: [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System\Kerberos\Parameters]
- Administrators may consider setting the DWORD value FarKdcTimeout to 1 in HKLM\System\CurrentControlSet\Control\Lsa\Kerberos\Parameters to shorten the time that Kerberos waits to retry looking up the dce.psu.edu KDC names.
- On 8/12/10 we stumbled on the Kerberos parameter DefaultEncryptionType and found setting it to 1 (DES) allows all accounts to log onto a computer that is in the state that it has forgotten the default encryption types. This is also in HKLM\System\CurrentControlSet\Control\Lsa\Kerberos\Parameters. I recommend setting both.
On 8/13/10 we discovered disabling AMT (Intel Active Management Technology) as per this posting fixes the intermittent network connectivity problem after resume from sleep that the Optiplex 780 and Optiplex 960 have. This is part of the Intel Management Engine BIOS Extension (MEBx). To do this:
- Go to MEbx settings by rebooting and pressing ctrl-P when the Dell logo appears.
- Enter the default ME password admin (presuming you have never changed it).
- You must now change the ME password. Really. The new password must have a number, a symbol, and uppercase and lowercase letters, and must be 8 characters. For example Chris;1; works, but don't use that, it's mine.
- Press enter to select "Intel(R) ME Configuration".
- Press "y" for yes.
- Press enter to select "Intel(R) ME State Control".
- Press enter to select "[ ] Disabled".
- Press ESC to exit and reboot.
- This page has screen shots.
- We are looking for a way to automate this; we're not optimistic about finding one. Disabling the “Intel(R) Management Engine Interface” driver doesn't help.
- On 9/3 we deployed a "PSU WakeupFixer" program to selected rooms that can scan the Security Event log and report to a central log server when users cannot log on because of an "invalid account". It will watch the log for 5 minutes and report if the user can eventually log on (most of time), or if Kerberos has to be "reset"; it then reports any successful logons and if there was a Kerberos reset or not.
- On 9/7 we somehow managed to get a computer into the state that it would not allow users to log on unless they were set with the "AES256 encryption type", and the registry settings described above did not help. Such userids were logged on this computer after it woke up and subsequently the same ones could not log on. Everyone changing their password might be the solution.
Problem 3. "Welcome" Displayed Forever (Solved)
This is relatively infrequent. After entering a valid userid and password, the "Welcome" message is displayed and then nothing happens (for hours, aka forever). CTRL-ALT-DEL does nothing, the keyboard and mouse are unresponsive, and the system has to be forced powered-off. This might be triggered by entering an invalid user name, such as an FPS userid but it is hard to reproduce. Sometimes it is on first logon attempt after reboot.
(8/13/10) I can reproduce the problem reliably by running a program that opens and reads from the security event log (was written to try to detect Problem #2), so perhaps this has something to do with a bug with the EventLog service and the Security log.
(8/18/10) We can reproduce the problem by entering random characters for a userid and password. Later we find this only works if the default domain is dce.psu.edu. Thinking that means the Kerberos.dll, we installed the most recent hotfix that pertains to the Kerberos.dll . . . that appears to fix the problem. See http://support.microsoft.com/kb/981394/. We did not try any of the earlier versions listed here. As the KB article says, be sure you test thoroughly before putting that in production. Installing this hotfix requires a restart. A command line to install it would be:
wusa.exe _\Windows6.1-KB981394-x86.msu /quiet
Intermittent network connectivity during logon may also result in these problems:
- Roaming profile not loaded ("you were logged on with a temporary profile").
- Desktop not found (if the user's desktop was redirected to a network share, as they are for CLM systems).
- Logging on is very slow, but eventually ok.
So far we believe these are due to the network link going down after the