For whatever it's worth, we've done similar stuff with node.js, operating from an actual OS-generated (i.e., not reinvented) core dump.[1] The difference in approach here is that we don't require the app to load anything, and we don't change its behavior -- we perform all of the inference from the dump itself, using extensive knowledge of V8.[2] It's great to see other dynamic languages discover the merits of postmortem debugging; for us, it's been essential for developing node.js-based services.
Does anyone know of a debugger that can be built into C apps during development, so that testers can log stack traces with values, or be able to continue on breakpoints?
I can't tell you how many times users have hit strange bugs, and it took me forever to reproduce them in my debugger. Giving them a limited debugger would have saved me countless hours. Remote debugging is not really an option (I'd like something that can be compiled in).
What you want are core dumps, they allow after-the-fact use of the debugger. Use ulimit to raise the core dump size limit above 0 (I just use unlimited) and you'll get a core dump whenever the process faults. More specifically there are a handful of signals that cause a core file to be created, so you can also send the right kills to the process to get one on demand.
But unfortunately it doesn't give very descriptive stack traces, even with debugger symbols turned on in the project settings. I'm really looking for something that shows me a full view of the program's state just like if I was in the debugger. It may be possible to extrapolate from the core dump, but I’m having a hard time figuring it out. This post summarizes how to do it with gdb:
But I’m thinking a huge opportunity has been lost here. This should be built into IDEs and especially for mobile apps, there should be a standard way of sending core dumps back to the developer when apps crash, especially for ad hoc builds during testing.
Crittercism, Bugsense, Crashlytics and HockeyApp are all commercial providers that capture applications crashes and upload them to their backend. A few of them have built their SDKs on the open-source PLCrashReporter. You can also look at KSCrash and Google Breakpad. KSCrash may be the most advanced. Google Breakpad captures the closest thing to a core dump (it gathers only the stack and registers though).
We've been using this internally for a few weeks now and it has been really, really awesome. Currently just Python, but should be feasible for Ruby, PHP, and other dynamic langauges too.
Kind of related trick that paste.exceptions implemented (and may also be in weberror, all adopted from Zope) is if you set the local variable __traceback_info__ to some value, that value would be included in the traceback (that is emailed or whatever). And there are other __traceback_* variables that allow you to do more detailed additions to the report.
That's a nice trick, though it does require knowing ahead of time what data you want (and having lots of __traceback_info__ and __traceback_supplement__ statements in the code). The nice thing about grabbing all locals is that the data is collected without having to think about it.
It does not dump all local vars though, only those which appear in the line where the exception happened. And not only local but also global. But also including subfields, like `obj.field`. And it does that in a kind of hacky way, via some embedded simple Python parsing, but it works most of the way just fine.
Very nice. Did you find globals to be useful often? We excluded those here because there can be a lot of them (i.e. imported modules), but could be a good addition. `obj.field` is a nice touch too (our current approach will work if repr() shows the field, but that requires some code ahead of time).
That's why I only include those globals which are referred to in the line of the exception. And when they were referred in the line, they often were also useful. Otherwise, you are right, way too many to show all. I even found all the locals to be too many in many cases, that's why I did that simple heuristics. Also, I just made this to be a `sys.excepthook` replacement, so it's just text and you cannot simply hide the locals away and a simple traceback would just look too long/complicated.
I open sourced something very similar a couple years ago. It is basically an error monitoring service that uses git blame to figure out which developer last touched the code that caused the exception. It then sends that developer a stack trace, along with all the values of all the variables in the stack frame and additionally the HTTP request if it is running as a django/pylons middleware.
Yep - the debug toolbars for Pyramid, Flask, etc. do as well, using the same facility as here. The cool thing about this feature is being able to see the local variables from real exceptions in production instead of just in your local dev environment.
Ruby: absolutely. It's not quite as straightforward as Python , but it should be feasible using binding_of_caller. If any Ruby people want to help us figure this out, please drop us a line or stop by https://github.com/rollbar/rollbar-gem/issues/117
Excluding sensitive data: yep. This feature uses the same scrub_field list that's used to scrub sensitive data from the request (GET/POST/headers/etc).