freenode/#clasp - IRC Chatlog
Search
2:49:09
drmeister
Ok, just to be clear - cclasp built, asdf built, when you start slime it compiles all of the slime code and then segfaults.
3:03:18
drmeister
I'm not aware of any OS X/Linux differences that could be tripping stuff up - but I was working with threading, MPS and thread local storage.
3:12:27
stassats
(mp:process-run-function nil (lambda () (load "quicklisp/setup.lisp"))) does it for me
3:19:02
stassats
there's core__catch_function, then there are some core__call_with_variable_bound, and then _Unwind_Find_FDE
3:24:09
drmeister
(mp:process-run-function nil #'(lambda () (block x (funcall #'(lambda () (return-from x nil))))))
3:26:04
drmeister
If you are using GDB on Linux or FreeBSD, run this command: handle SIGSEGV pass nostop noprint
3:27:44
drmeister
Bike: I haven't seen a problem on OS X - and I haven't used lldb on linux with MPS.
3:28:23
drmeister
whoman: That depends on what you want to do. Clasp is great for exposing C++ libraries.
3:30:26
stassats
#3 core::DynamicBinding::DynamicBinding (this=<optimized out>, sym=..., val=...) at ../../include/clasp/gctools/threadLocalStacks.h:27
3:34:55
drmeister
Once I reproduce the problem I have some tools to track it down. I can compile clasp with guards around every allocated object and I can symbolicate jitted frames - if gdb gives me return addresses and complete backtraces.
3:35:24
drmeister
Sure I wrote them - but they are disconnected stack frames - what does the entire backtrace look like?
3:36:38
drmeister
This is the crash with: (mp:process-run-function nil #'(lambda () (catch 'x (eval '(throw 'x 10))))) ?
3:40:14
drmeister
I can get this to crash in aclasp: (mp:process-run-function nil #'(lambda () (catch 'x (eval '(throw 'x 10)))))
3:41:21
stassats
i can get (loop repeat 100 do (mp:process-run-function nil #'(lambda ()))) to crash
3:41:54
drmeister
Yeah - I was seeing that on OS X as well - more than 40 threads or so had problems. I hoped that was a separate issue.
3:51:56
drmeister
When I do this in aclasp: (mp:process-run-function nil #'(lambda () (catch 'x (eval '(throw 'x 10)))))
4:01:40
drmeister
"You may need to make sure that the debugger isn’t entered on barrier(1) hits (because the MPS uses barriers to protect parts of memory, and barrier hits are common and expected)."
4:14:18
drmeister
start_thread is called when the thread is started and I create thread local allocation points, register the thread and register the thread stack.
4:16:41
stassats
if it's a valid address, then it knows to protect the region, but not that it belongs to it
4:23:19
drmeister
x/8xg 0x3e800005860 --> 0x3e800005860: Cannot access memory at address 0x3e800005860
4:23:51
drmeister
MPS puts hardware barriers on memory - when the program touches it it signals to the MPS that that memory needs to be fixed.
4:27:52
drmeister
I don't see that in any of the registers in the frame above the <signal handler called> frame #2 -- although maybe I shouldn't expect to.
4:28:38
drmeister
Does info registers give you the values of registers as they were in the frame that I'm currently looking at - or only the current values of the registers in the top frame.
4:34:27
drmeister
In my morning (7 hours from now) one of my friends at Ravenbrook will be up - I'll ask them about this. They might have some advice. The problem is easy to reproduce.
4:38:47
drmeister
There is some stuff in here that I'm not sure if we are doing (or not doing). frgo set up a lot of signal handling code for clasp.
4:46:54
drmeister
It is a tagged pointer - how does it end up in $_siginfo._sifields._sigfault.si_addr and where is $_siginfo._sifields._sigfault.si_addr located in memory? Is it in kernel space (I'm guessing)?
4:50:01
drmeister
MPS has this function: mps_bool_t mps_arena_has_addr(mps_arena_t arena, mps_addr_t addr)
4:55:05
drmeister
Unless you have some further insight I was going to ask my friend at Ravenbrook about it in the morning.
4:55:44
drmeister
There's clearly something going on in Linux when we do non-trivial things in a child thread.
4:56:43
drmeister
I have one thread local pointer that points to a data structure at the top of each threads stack.
4:57:18
drmeister
With MPS I have a second thread local structure that contains half a dozen MPS allocation points.
4:57:40
drmeister
The first thread-local data structure stored at the top of each stack is described here:
4:57:41
stassats
looking at the actual faulting instruction and the memory it uses, it doesn't look like 0x3e800005860
4:57:56
drmeister
https://github.com/drmeister/clasp/blob/dev/include/clasp/gctools/threadlocal.h#L7
5:02:48
drmeister
You could use nm to check if the symbol for the function you are editing is in there.
5:03:12
drmeister
the mygc.c.4.o is a bitcode file. You could also llvm-dis it and look at the human readable .ll file.
5:16:54
drmeister
I wasn't clear. ./waf build_imps will rebuild iclasp-mps - you could run that. Alternatively you can use ./waf build_cmps and that will relink cclasp-mps - but that takes longer because it recompiles some Common Lisp and relinks everything with everything to make cclasp-mps.
5:18:18
drmeister
It shouldn't be different - but it may be - I don't understand this error and I don't know if iclasp-boehm will reproduce the same error as cclasp-boehm.
5:19:20
stassats
Add support for unknown (immediate?) object to lisp_instance_class obj = 0xffffffffffffffff
5:20:31
drmeister
My knee jerk reaction is to turn on the guards and rebuild and see if we can track it down then. It's a lot more fun tracking down GC problems with the guards on.
5:22:54
drmeister
CONFIG_VAR_COOL turns on assertions in MPS and the other three cause clasp to check objects for validity. They don't slow things down too much.
5:27:20
stassats
sigHandle is being hit multiple times with 0x3e800004132, refuses to handle it and ultimately is hit with 0
12:25:46
frgo
::notify drmeister Re MPS and signals: I think we need to change behavior in file src/gctools/interrupt.cc: ADD_SIGNAL( SIGSEGV, "+SIGSEGV+", ext::_sym_segmentation_violation); - Am I right that this estalishes a handler represented by the symbol ext::_sym_segmentation_violation? If so: when using MPS on Linux, we're not allowed to do that. If not: I'd like to know what this line actually does.
13:05:04
frgo
Because MPS on Linux relies on SIGSEGV being not handled by someone else. It uses SIGSEGV to manage memory.
13:06:44
frgo
Yes, it does. But if you install another handler, then this leads to MPS being prevented from doing its job.
13:50:57
Colleen
drmeister: frgo said 1 hour, 25 minutes ago: Re MPS and signals: I think we need to change behavior in file src/gctools/interrupt.cc: ADD_SIGNAL( SIGSEGV, "+SIGSEGV+", ext::_sym_segmentation_violation); - Am I right that this estalishes a handler represented by the symbol ext::_sym_segmentation_violation? If so: when using MPS on Linux, we're not allowed to do that. If not: I'd like to know what this line actually does.
13:54:36
drmeister
My friend at Ravenbrook got back to me and wants to see a backtrace. The machine I'm using is in the Amazon Cloud - so I can give him access as well.
13:58:48
drmeister
It doesn't reproduce the problem with the cases I tried last night - it works on simple cases.
13:59:57
drmeister
Nope - it does - I was using cclasp - it behaves differently. In aclasp it fails like it did last night.
14:06:28
drmeister
I can reproduce the problem and I passed it on to David along with stassats' observation.
14:12:41
frgo
Error >>>>>>>> In file included from /opt/common-lisp/lang/clasp/src/clasp/src/gctools/interrupt.cc:2:
14:12:42
frgo
In file included from /opt/common-lisp/lang/clasp/src/externals-clasp/llvm50/include/llvm/Support/ErrorHandling.h:18:
14:13:31
drmeister
I just realized something - I can create Amazon Cloud machines with Clasp running and give people access to them. Great for debugging.
14:14:01
drmeister
frgo: That was the same problem that you had last night - this is with the new externals-clasp build?