Hi All,
We're encountering an error on several of our production servers when running a development tool from a third party vendor. After working closely with the vendor and our own analysis we believe there to be some type of timeout/corner case issue with file caching on the Windows side. At this point in the tool's lifespan what has occurred is:
- File on Remote Network Share Opened (using ReadFile).
- A line is read from the file
- A period of time elapses (we seem to have had luck with waiting exactly 5 minutes)
- Attempt to read another line from the same file.
- Reader Machine Hangs
The machines in play are:
- (Reader/Client) Windows Server 2012 R2 with all updates installed. This is installed on a Hyper-V Machine
- ("Server") Windows 7 Enterprise SP1. This is a machine working at a remote office site several hundred miles away over a possibly unreliable connection.
The Client Machine (the Server 2012R2 Machine) at this point hangs indefinitely. We have attached a Kernel Debugger and produced a full dump, analysis of the dump by us yields the following (from the x64 perspective the program in question is an x86 application):
kd> !analyze -v ******************************************************************************* * * * Bugcheck Analysis * * * ******************************************************************************* INTERRUPT_EXCEPTION_NOT_HANDLED (3d) Arguments: Arg1: fffff800eef3a260 Arg2: 0000000000000000 Arg3: 0000000000000000 Arg4: fffff800ed763b90 Debugging Details: ------------------ CONTEXT: fffff800eef3a260 -- (.cxr 0xfffff800eef3a260) rax=000000000006b901 rbx=fffff800ed8ef180 rcx=0000000000000001 rdx=0000044000000000 rsi=00000001d52c7b5a rdi=0000000000000001 rip=fffff800ed763b90 rsp=fffff800eef3ac98 rbp=000000000000c4c8 r8=ffffe001092a9000 r9=0000000000000028 r10=0000000000800000 r11=0000000000000017 r12=0000000000000002 r13=0000000000000001 r14=0000000000000002 r15=0000000000000000 iopl=0 nv up ei ng nz na pe nc cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00000282 nt!DbgBreakPointWithStatus: fffff800`ed763b90 cc int 3 Resetting default scope DEFAULT_BUCKET_ID: WIN8_DRIVER_FAULT BUGCHECK_STR: 0x3D PROCESS_NAME: dbr.exe CURRENT_IRQL: d TRAP_FRAME: 0000000000000001 -- (.trap 0x1) Unable to read trap frame at 00000000`00000001 LAST_CONTROL_TRANSFER: from fffff800ed79cb54 to fffff800ed763b90 STACK_TEXT: fffff800`eef3ac98 fffff800`ed79cb54 : 00000000`00000002 00000000`00006c01 00058d15`e1762800 fffff800`ed70e07f : nt!DbgBreakPointWithStatus fffff800`eef3aca0 fffff800`ed6bd678 : 00000000`00000000 00000000`00000000 00000000`00000000 fffff800`00000000 : nt! ?? ::FNODOBFM::`string'+0x2f6a4 fffff800`eef3ad30 fffff800`edd9a67f : ffffe001`094ab080 00000000`00000000 fffff800`edde69b0 ffffe001`094ab080 : nt!KeClockInterruptNotify+0x788 fffff800`eef3af40 fffff800`ed6d8363 : fffff800`edde6900 fffff800`ed71073b 00000000`00000000 00000000`00000000 : hal!HalpTimerClockInterrupt+0x4f fffff800`eef3af70 fffff800`ed75e42a : fffff800`edde6900 00000000`00000000 fffff800`ed8ef180 ffffd001`32e14340 : nt!KiCallInterruptServiceRoutine+0xa3 fffff800`eef3afb0 fffff800`ed75e80f : 00000000`00000000 ffffe001`0a5b6ba0 00000000`00000000 00001f80`009502a8 : nt!KiInterruptSubDispatchNoLockNoEtw+0xea ffffd001`32e150d0 fffff800`ed794e1c : 00000000`00000000 00000000`00010008 00000000`00000001 00000000`20206f49 : nt!KiInterruptDispatchLBControl+0x11f ffffd001`32e15260 fffff800`ed661416 : 00000000`80000000 00000000`00000001 00000000`00000000 00000000`00001000 : nt! ?? ::FNODOBFM::`string'+0x2796c ffffd001`32e15370 fffff800`ed6dd173 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiDeliverApc+0x166 ffffd001`32e153f0 fffff800`ed62f712 : 00000000`00a6c460 00000000`00000000 00000000`00000000 00000000`6c526d4d : nt!KiCheckForKernelApcDelivery+0x23 ffffd001`32e15420 fffff800`ed6897e5 : 00000000`00002000 00000000`00002000 ffffb001`db7c0000 00000000`00000000 : nt!MmWaitForCacheManagerPrefetch+0xa6 ffffd001`32e15460 fffff800`eda0cfa3 : 00000000`c00000d8 00000000`00000100 ffffe001`0a4c74d0 ffffd001`32e15550 : nt!CcFetchDataForRead+0xe5 ffffd001`32e154b0 fffff800`ed68957e : ffffe001`0a4c74d0 00000000`00000f44 fffff801`0003f0bc ffffe001`0a4c74d0 : nt!CcMapAndCopyFromCache+0xc7 ffffd001`32e15540 fffff800`edad9b2f : ffffd001`32e15801 ffffd001`32e159b0 ffffe001`00000000 ffffd001`32e15901 : nt!CcCopyReadEx+0xfe ffffd001`32e155b0 fffff800`edae7ff3 : 00000000`00000000 ffffe001`00000000 ffff850b`799ff556 ffffd001`32e15610 : nt!CcCopyRead+0x23 ffffd001`32e15600 fffff801`4e65422f : 00000000`00000002 00000000`00001044 ffffd001`32e15901 00000000`00000000 : nt!CcFastCopyRead+0x2b ffffd001`32e15650 fffff801`4e654429 : ffffe001`0a4c74d0 ffffd001`32e158a0 00000000`00000100 ffffe001`0a4c7401 : rdbss!RxFastCopyRead+0x253 ffffd001`32e15700 fffff801`4e05b230 : ffffe001`0a4c74d0 ffffd001`32e158a0 00000000`00000100 ffffe001`0a4c7401 : rdbss!RxFastIoRead+0xe1 ffffd001`32e157a0 fffff801`4d7114dd : fffff801`4e055780 ffffe001`09572350 ffffe001`09572b80 fffff800`eda04b18 : mup!MupFastIoRead+0x80 ffffd001`32e15800 fffff801`4d738476 : 00000000`00000000 ffffd001`32e158d0 ffffe001`0a4c74d0 ffffd001`32e159b0 : fltmgr!FltpPerformFastIoCall+0xbd ffffd001`32e15860 fffff800`eda0099b : ffffe001`0a4c74d0 00000000`00000000 00000000`00000002 00000000`00000003 : fltmgr!FltpFastIoRead+0x1c6 ffffd001`32e15910 fffff800`ed7687b3 : 00000000`00060158 00000000`00cbe0d4 00000000`00000000 fffff960`0021800e : nt!NtReadFile+0x44b ffffd001`32e15a90 00000000`77d22772 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13 00000000`00bbe7d8 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x77d22772 FOLLOWUP_IP: nt! ?? ::FNODOBFM::`string'+2f6a4 fffff800`ed79cb54 90 nop SYMBOL_STACK_INDEX: 1 SYMBOL_NAME: nt! ?? ::FNODOBFM::`string'+2f6a4 FOLLOWUP_NAME: MachineOwner MODULE_NAME: nt IMAGE_NAME: ntkrnlmp.exe DEBUG_FLR_IMAGE_TIMESTAMP: 5318053f STACK_COMMAND: .cxr 0xfffff800eef3a260 ; kb BUCKET_ID_FUNC_OFFSET: 2f6a4 FAILURE_BUCKET_ID: 0x3D_nt!_??_::FNODOBFM::_string_ BUCKET_ID: 0x3D_nt!_??_::FNODOBFM::_string_ Followup: MachineOwner ---------
I would be happy to provide the bugcheck analysis from the x86 perspective but it does not show anything that we believe to be of interest. The particular program is incapable of rasing a WIN8_DRIVER_FAULT according to our vendor.
Things we've tried:
- Ensuring all Windows Updates are Installed
- Ensuring we've removed any antivirus software from the client side equation
- Turned off all firewalls
- Moved the server to another Hyper-V Host (to rule out Memory/Hardware issue)
- Disabling SMB2/3 (we are not confident that this occurred)
The issue is frustrating as it appears to be timeout related (5 minutes is just too convenient of a number to be random chance) it is especially frustrating when it occurs on a Terminal Server as this causes the entire server to lock up.
We have opened a (paid) support case with Microsoft though our partnership benefits, but have not had much progress since opening the case on 07/01/2014, and frankly have been less than impressed with the level of service provided. In the latest call their suggestion was to turn off "Dell Backup and Recovery" which is NOT what this program is! We have been promised that a "Debug Analysis Team" would be evaluating one of the many dumps we've uploaded, but have been unable to do so for over a week (frankly I don't see what the hold up is).
Any thoughts? Anyone seen anything even remotely similar to this? I've tried to see what timeouts related to file sharing are 5 minutes (unfortunately there seem to be a fair number of them) but nothing looked promising. Is there something more I can evaluate in the memory dump?
Thanks