Process just disappears on Access Violation; no crash dump produced

Discussion:

(too old to reply)

r***@yahoo.com

2009-11-23 18:10:19 UTC

I have a problem with a production system in that some of its
processes recently experience a software error leading to a crash. In
production machines, I have setup Dr. Watson to catch a crash dump
file for post-mortem analysis. The problem is that I observe recently
that despite Dr. Watson being correctly registered as JIT debugger
(AeDebug key), no crash dump is actually produced and the faulty
process magically just disappears without leaving any trace about what
happened whatsoever.

The software platform that I am using is Windows XP x64 Professional
Edition (64 bits). This is compiled into 64 bits release of Microsoft
Visual 8 (Developer Studio 2005) into native C code.

I have managed to reproduce this behavior with a small 30-line C
program. The crucial thing is that there is buffer overrun in one of
the functions. This buffer overrun also erases the function return
address. The function thus returns to a bogus location and the faulty
RIP register immediately produces Access Violation Exception when next
instruction is fetched.

This Access Violation Exception is always caught when the faulty
process is run under debugger. However, when the process is running
free (and not as debugee) then this Access Violation cannot be caught
by any means (is not caught by function level SEH __try+__except
block; is not caught by UnhandledExceptionFilter, is not caught by
external JIT debugger) - effectively disabling a possibility to
arrange for a post-mortem crash analysis.

As an experiment, I have switched back into win32 and discovered that
everything works properly there. Access Violation Exception is
properly caught by function level SEH frame, also by
UnhandledExceptionFilter and by external JIT debugger. Everything
works on win32, nothing works on x64.

Turned to some Internet studies and discovered that SEH in Windows x64
platform was given a major overhaul and is quite a different
implementation from SEH in win32.

Still can't find an answer however: why Access Violation Exception
stemming from buffer overrun and stack corruption cannot be caught by
any means in x64? Having critical processes just silently vanishing
from production system without leaving any trace behind is a big
concerns to me.

The small 30-line C program reproducing this behaviour follows:
compile and link it with Microsoft Visual C 8, (console application), /
EHa, release mode, x64 (rest of settings are pretty much standard) -
have JIT debugger setup in registry and see how the process just
vanishes without giving JIT tool any chance to get hold of it

#include <stdio.h>
#include <stdlib.h>
#include <Windows.h>

int func_a()
{
int res = 3;
int arr[4000];
int i;
// write 12 extra bytes so that to overwrite the function return
address
// make this code complex enough so that optimizer does not
eliminate it
for (i = 0; i < 4003; ++i)
arr[i] = rand();
for (i = 0; i < 4003; ++i)
res += arr[i];
return res;
}

int main()
{
func_a();
printf("Sleep\n");
Sleep(2000);
printf("Done\n");
return 0;
}

I understand that the stack is corrupt, so stack trace might not be
available, but there is still lots of useful information in the failed
process for post-mortem analysis (partially corrupt stack trace, all
RAM image, registers etc)? Why can't crash dump be produced in this
scenario?
Robert

Gerard O'Brien

2009-11-23 20:51:15 UTC

Permalink

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#3333ff">
<a class="moz-txt-link-abbreviated" href="mailto:***@yahoo.com">***@yahoo.com</a> wrote:
<blockquote
cite="mid:6201aab1-ce1a-47c5-88e1-***@j35g2000vbl.googlegroups.com"
type="cite">
<pre wrap="">I have a problem with a production system in that some of its
processes recently experience a software error leading to a crash. In
production machines, I have setup Dr. Watson to catch a crash dump
file for post-mortem analysis. The problem is that I observe recently
that despite Dr. Watson being correctly registered as JIT debugger
(AeDebug key), no crash dump is actually produced and the faulty
process magically just disappears without leaving any trace about what
happened whatsoever.

The software platform that I am using is Windows XP x64 Professional
Edition (64 bits). This is compiled into 64 bits release of Microsoft
Visual 8 (Developer Studio 2005) into native C code.

I have managed to reproduce this behavior with a small 30-line C
program. The crucial thing is that there is buffer overrun in one of
the functions. This buffer overrun also erases the function return
address. The function thus returns to a bogus location and the faulty
RIP register immediately produces Access Violation Exception when next
instruction is fetched.

This Access Violation Exception is always caught when the faulty
process is run under debugger. However, when the process is running
free (and not as debugee) then this Access Violation cannot be caught
by any means (is not caught by function level SEH __try+__except
block; is not caught by UnhandledExceptionFilter, is not caught by
external JIT debugger) - effectively disabling a possibility to
arrange for a post-mortem crash analysis.

As an experiment, I have switched back into win32 and discovered that
everything works properly there. Access Violation Exception is
properly caught by function level SEH frame, also by
UnhandledExceptionFilter and by external JIT debugger. Everything
works on win32, nothing works on x64.

Turned to some Internet studies and discovered that SEH in Windows x64
platform was given a major overhaul and is quite a different
implementation from SEH in win32.

Still can't find an answer however: why Access Violation Exception
stemming from buffer overrun and stack corruption cannot be caught by
any means in x64? Having critical processes just silently vanishing
from production system without leaving any trace behind is a big
concerns to me.

The small 30-line C program reproducing this behaviour follows:
compile and link it with Microsoft Visual C 8, (console application), /
EHa, release mode, x64 (rest of settings are pretty much standard) -
have JIT debugger setup in registry and see how the process just
vanishes without giving JIT tool any chance to get hold of it

#include <stdio.h>
#include <stdlib.h>
#include <Windows.h>

int func_a()
{
int res = 3;
int arr[4000];
int i;
// write 12 extra bytes so that to overwrite the function return
address
// make this code complex enough so that optimizer does not
eliminate it
for (i = 0; i < 4003; ++i)
arr[i] = rand();
for (i = 0; i < 4003; ++i)
res += arr[i];
return res;
}

int main()
{
func_a();
printf("Sleep\n");
Sleep(2000);
printf("Done\n");
return 0;
}

I understand that the stack is corrupt, so stack trace might not be
available, but there is still lots of useful information in the failed
process for post-mortem analysis (partially corrupt stack trace, all
RAM image, registers etc)? Why can't crash dump be produced in this
scenario?
Robert
</pre>
</blockquote>
A 64bit JIT debugger needs to be
installed to catch problems in 64bit processes.  This JIT debugger uses
the HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows
NT\CurrentVersion\AeDebug key. 
 
A 32bit JIT debugger needs to be installed to catch problems in 32bit
processes.  This JIT debugger uses the
HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Windows
NT\CurrentVersion\AeDebug key. 
 
The debuggers will quite happily co-exist. 

</body>
</html>

r***@yahoo.com

2009-11-24 11:50:59 UTC

Permalink

A 64bit JIT debugger needs to be installed to catch problems in 64bit
processes. This JIT debugger uses the HKEY_LOCAL_MACHINE\SOFTWARE
\Microsoft\Windows NT\CurrentVersion\AeDebug key.

A 32bit JIT debugger needs to be installed to catch problems in 32bit processes. This JIT debugger uses the HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Windows NT\CurrentVersion\AeDebug key.
The debuggers will quite happily co-exist.

Thanks Gerard for replying,
That is true. I believe I have a correct JIT configuration for both 64
and 32bit processes. I have just double-checked this by inserting the
following 2 lines right at the beginning of func_a in my sample
program:

int* p = (int*)7;
*p = 5;

This obviously produces another Access Violation exception (no stack
corruption here). And this exception is correctly caught by my JIT cdb
or windbg. This proves to me that my registry settings are correct.

The original Access Violation (with stack corrupt) is still not caught
by JIT debugger.

Martin B.

2009-11-24 12:13:28 UTC

Permalink

Post by r***@yahoo.com
A 64bit JIT debugger needs to be installed to catch problems in 64bit
processes. This JIT debugger uses the HKEY_LOCAL_MACHINE\SOFTWARE
\Microsoft\Windows NT\CurrentVersion\AeDebug key.

Thanks Gerard for replying,
That is true. I believe I have a correct JIT configuration for both 64
and 32bit processes. I have just double-checked this by inserting the
following 2 lines right at the beginning of func_a in my sample
int* p = (int*)7;
*p = 5;
This obviously produces another Access Violation exception (no stack
corruption here). And this exception is correctly caught by my JIT cdb
or windbg. This proves to me that my registry settings are correct.
The original Access Violation (with stack corrupt) is still not caught
by JIT debugger.

What is the exit code of the crashed process?
(You can start you test application from the console (or from a batch
file if it's not a console app) and then check what %ERRORLEVEL% contains.)

r***@yahoo.com

2009-11-24 17:00:29 UTC

Permalink

Post by Martin B.

Thanks Gerard for replying,
That is true. I believe I have a correct JIT configuration for both 64
and 32bit processes. I have just double-checked this by inserting the
following 2 lines right at the beginning of func_a in my sample
int* p = (int*)7;
*p = 5;
This obviously produces another Access Violation exception (no stack
corruption here). And this exception is correctly caught by my JIT cdb
or windbg. This proves to me that my registry settings are correct.
The original Access Violation (with stack corrupt) is still not caught
by JIT debugger.

What is the exit code of the crashed process?
(You can start you test application from the console (or from a batch
file if it's not a console app) and then check what %ERRORLEVEL% contains.)

The exit code of the crashed process is 128.

Martin B.

2009-11-25 07:59:07 UTC

Permalink

Post by r***@yahoo.com

Post by Martin B.

Thanks Gerard for replying,
That is true. I believe I have a correct JIT configuration for both 64
and 32bit processes. I have just double-checked this by inserting the
following 2 lines right at the beginning of func_a in my sample
int* p = (int*)7;
*p = 5;
This obviously produces another Access Violation exception (no stack
corruption here). And this exception is correctly caught by my JIT cdb
or windbg. This proves to me that my registry settings are correct.
The original Access Violation (with stack corrupt) is still not caught
by JIT debugger.

What is the exit code of the crashed process?
(You can start you test application from the console (or from a batch
file if it's not a console app) and then check what %ERRORLEVEL% contains.)

The exit code of the crashed process is 128.

Now that is strange.
I have observed (on XP 32bit) that when a windows process is terminated
with an SE (structured exception) == "crash" the exit code of the
process is set to the value of the crashing SE.
e.g.
0xC0000005 == Access Violation => exit code = -1073741819
0x00000094 == Div Zero => exit code = -1073741676

br,
Martin

Scot Brennecke

2009-11-28 05:50:49 UTC

Permalink

Post by r***@yahoo.com
I have a problem with a production system in that some of its
processes recently experience a software error leading to a crash. In
production machines, I have setup Dr. Watson to catch a crash dump
file for post-mortem analysis. The problem is that I observe recently
that despite Dr. Watson being correctly registered as JIT debugger
(AeDebug key), no crash dump is actually produced and the faulty
process magically just disappears without leaving any trace about what
happened whatsoever.
The software platform that I am using is Windows XP x64 Professional
Edition (64 bits). This is compiled into 64 bits release of Microsoft
Visual 8 (Developer Studio 2005) into native C code.
I have managed to reproduce this behavior with a small 30-line C
program. The crucial thing is that there is buffer overrun in one of
the functions. This buffer overrun also erases the function return
address. The function thus returns to a bogus location and the faulty
RIP register immediately produces Access Violation Exception when next
instruction is fetched.
This Access Violation Exception is always caught when the faulty
process is run under debugger. However, when the process is running
free (and not as debugee) then this Access Violation cannot be caught
by any means (is not caught by function level SEH __try+__except
block; is not caught by UnhandledExceptionFilter, is not caught by
external JIT debugger) - effectively disabling a possibility to
arrange for a post-mortem crash analysis.
As an experiment, I have switched back into win32 and discovered that
everything works properly there. Access Violation Exception is
properly caught by function level SEH frame, also by
UnhandledExceptionFilter and by external JIT debugger. Everything
works on win32, nothing works on x64.
Turned to some Internet studies and discovered that SEH in Windows x64
platform was given a major overhaul and is quite a different
implementation from SEH in win32.
Still can't find an answer however: why Access Violation Exception
stemming from buffer overrun and stack corruption cannot be caught by
any means in x64? Having critical processes just silently vanishing
from production system without leaving any trace behind is a big
concerns to me.
compile and link it with Microsoft Visual C 8, (console application), /
EHa, release mode, x64 (rest of settings are pretty much standard) -
have JIT debugger setup in registry and see how the process just
vanishes without giving JIT tool any chance to get hold of it
#include<stdio.h>
#include<stdlib.h>
#include<Windows.h>
int func_a()
{
int res = 3;
int arr[4000];
int i;
// write 12 extra bytes so that to overwrite the function return
address
// make this code complex enough so that optimizer does not
eliminate it
for (i = 0; i< 4003; ++i)
arr[i] = rand();
for (i = 0; i< 4003; ++i)
res += arr[i];
return res;
}
int main()
{
func_a();
printf("Sleep\n");
Sleep(2000);
printf("Done\n");
return 0;
}
I understand that the stack is corrupt, so stack trace might not be
available, but there is still lots of useful information in the failed
process for post-mortem analysis (partially corrupt stack trace, all
RAM image, registers etc)? Why can't crash dump be produced in this
scenario?
Robert

Scot Brennecke

2009-11-28 05:52:47 UTC

Permalink

I know that you would most likely not going looking in this article in
MSDN, but I believe it explains the reason for the problem:
http://msdn.microsoft.com/en-us/library/ms633573.aspx

If your exception occurs during a user-mode callback, it will be
swallowed when the stack unwinds across the kernel mode boundary.

r***@yahoo.com

2009-11-30 11:30:15 UTC

Permalink

Post by Scot Brennecke

I know that you would most likely not going looking in this article in
MSDN, but I believe it explains the reason for the problem:http://msdn.microsoft.com/en-us/library/ms633573.aspx
If your exception occurs during a user-mode callback, it will be
swallowed when the stack unwinds across the kernel mode boundary.

Hi Scot,
yes, this is an interesting piece of information. Very useful one to
me when I am trying to put all those bits and pieces together until I
get a complete model of how x64 SEH works.

I wonder if the exception processing restrictions mentioned in the
page apply to WndProc callbacks or to all Windows WINAPI callbacks. In
the sample program I am not sitting in WINAPI callback when the first
exception occurs.

But then when the System reacts to first exception by means of
RtlDispatchException, then all the not-yet-understood magic happens
and process gets terminated without JIT tool invoked.

There is also this post: http://www.nynaeve.net/?p=128, which suggests
that any non-trivial processing made as part of
UnhandledExceptionFilter cannot be relied upon (and that also applies
to CreateProcess invocation needed to start the JIT tool). (Windows
default UnhandledExceptionFilter calls CreateProcess itself).
Robert