freenode/#clasp - IRC Chatlog

2:47:00 stassats Process inferior-lisp<1> segmentation fault

2:47:29 drmeister Is that when it starts up?

2:47:35 stassats yes

2:47:56 drmeister Does it compile anything?

2:48:10 stassats everything

2:49:09 drmeister Ok, just to be clear - cclasp built, asdf built, when you start slime it compiles all of the slime code and then segfaults.

2:49:21 stassats yes

2:49:48 drmeister What distro of linux are you running now?

2:50:00 stassats i won't tell you!

2:50:09 drmeister Why?

2:50:16 stassats don't really see how it's relevant

2:50:58 stassats and you won't be able to replicate my setup

2:50:58 drmeister I'll spin up an AWS system and check it out.

2:51:22 drmeister It ran well enough to compile asdf

2:52:04 Bike does it work for single threaded slime

2:52:31 drmeister Yeah - that would be useful to know.

2:52:55 drmeister Anyway - I'm spinning up a linux now.

2:53:08 stassats kinda

2:53:21 stassats COMMON-LISP-USER> Bad client pointer 0x7f0ed6a879b8

2:53:21 stassats The MPS detected a problem!

2:53:22 stassats ../../include/clasp/mps/code/lockli.c:139: MPS ASSERTION FAILED: res == 0

2:54:04 Bike that seems pretty bad.

2:54:07 drmeister Well, you are the first person to try clasp+MPS on linux. Congratulations.

2:54:19 stassats it segfaults again promptly after that

2:54:57 stassats is spawn the default now?

2:55:08 stassats cause i commented out setting it to :spawn, and got that assertion

2:55:22 stassats but when i set it explicitly to nil, i actually get slime to connect

2:56:21 stassats so, yes, :spawn is the default, but it does not always result in these assertions

3:03:18 drmeister I'm not aware of any OS X/Linux differences that could be tripping stuff up - but I was working with threading, MPS and thread local storage.

3:12:27 stassats (mp:process-run-function nil (lambda () (load "quicklisp/setup.lisp"))) does it for me

3:12:40 drmeister It's on it's way now - I'll have more in an hour and a half or so.

3:12:54 drmeister Does what? Tickles your fancy? Or segfaults?

3:13:54 stassats segfaults

3:14:23 stassats and i like segfaults

3:14:24 drmeister Ok, that should be easy to reproduce. I'll bet it fails even in the interpreter.

3:14:53 drmeister (mp:process-run-function nil #'(lambda () (print "Hi - I'm going to segfault!")))

3:15:00 stassats segfaults at _Unwind_Find_FDE

3:15:11 stassats drmeister: no, that's not what my snippet says

3:15:46 drmeister No - that's what I'm going to try as soon as the interpreter finishes compiling.

3:16:12 drmeister Which will be soon - the AWS machines are pretty beefy.

3:16:26 stassats well, then it'll be lying, it's not going to segfault

3:16:56 drmeister You've demonstrated that?

3:17:59 stassats can i call for some lisp backtraces from gdb?

3:18:42 drmeister I don't know - I'm having issues with JITted code and backtraces on linux.

3:18:49 drmeister stassats: Have you ever used libunwind?

3:19:02 stassats there's core__catch_function, then there are some core__call_with_variable_bound, and then _Unwind_Find_FDE

3:19:13 stassats drmeister: no

3:21:17 stassats Thread 1 "cclasp-mps" received signal SIGXFSZ, File size limit exceeded.

3:21:30 stassats PC LOAD LETTER

3:22:13 drmeister Huh? File size limit exceeded? wth?

3:22:28 drmeister And PC LOAD LETTER? Is your printer empty?

3:23:06 drmeister Right - as you said - I can start processes in the interpreter with no problems.

3:24:01 stassats but it's not dying with SIGXFSZ, that's only in gdb

3:24:08 drmeister And simple tests unwinding the stack don't have problems.

3:24:09 drmeister (mp:process-run-function nil #'(lambda () (block x (funcall #'(lambda () (return-from x nil))))))

3:24:10 stassats is that a gc signal? (weird choice)

3:24:21 drmeister Oh - hang on - there's an issue with MPS and gdb.

3:24:30 stassats drmeister: does that unwind anything?

3:24:52 Bike it should, yeah.

3:24:56 drmeister In the interpreter I thought it would.

3:25:20 drmeister Yeah - block/return-from are done with C++ exception handling.

3:25:26 drmeister In the interpreter.

3:26:04 drmeister If you are using GDB on Linux or FreeBSD, run this command: handle SIGSEGV pass nostop noprint

3:26:14 drmeister http://www.ravenbrook.com/project/mps/master/manual/html/guide/debug.html

3:26:20 stassats (mp:process-run-function nil #'(lambda () (catch 'x (eval '(throw 'x 10)))))

3:26:21 stassats crashes

3:26:23 drmeister Thats an MPS specific thing.

3:26:23 whoman as a 'user', would i want to use clasp later on, if i am using ecl now ?

3:26:33 stassats sometiems

3:26:45 Bike how does that work on lldb

3:27:43 stassats no, the throw doesn't crash

3:27:44 drmeister Bike: I haven't seen a problem on OS X - and I haven't used lldb on linux with MPS.

3:28:23 drmeister whoman: That depends on what you want to do. Clasp is great for exposing C++ libraries.

3:30:26 stassats #3 core::DynamicBinding::DynamicBinding (this=<optimized out>, sym=..., val=...) at ../../include/clasp/gctools/threadLocalStacks.h:27

3:30:34 stassats #2 <signal handler called>

3:33:13 stassats #11 core::cl__read

3:33:50 drmeister I can't follow these.

3:34:16 stassats but you wrote them

3:34:55 drmeister Once I reproduce the problem I have some tools to track it down. I can compile clasp with guards around every allocated object and I can symbolicate jitted frames - if gdb gives me return addresses and complete backtraces.

3:35:24 drmeister Sure I wrote them - but they are disconnected stack frames - what does the entire backtrace look like?

3:35:44 stassats nothing interesting

3:36:38 drmeister This is the crash with: (mp:process-run-function nil #'(lambda () (catch 'x (eval '(throw 'x 10))))) ?

3:36:48 stassats no

3:36:57 stassats that was a misdirection

3:37:41 drmeister (mp:process-run-function nil (lambda () (load "quicklisp/setup.lisp"))) then

3:40:14 drmeister I can get this to crash in aclasp: (mp:process-run-function nil #'(lambda () (catch 'x (eval '(throw 'x 10)))))

3:41:21 stassats i can get (loop repeat 100 do (mp:process-run-function nil #'(lambda ()))) to crash

3:41:54 drmeister Yeah - I was seeing that on OS X as well - more than 40 threads or so had problems. I hoped that was a separate issue.

3:41:55 Bike is the eval required?

3:42:19 stassats it's to avoid optimization

3:42:32 Bike i know, but i don't think we do any.

3:51:56 drmeister When I do this in aclasp: (mp:process-run-function nil #'(lambda () (catch 'x (eval '(throw 'x 10)))))

3:52:08 drmeister it crashes. I run it under gdb ...

3:52:22 drmeister aw shoot - I forgot the signal thing... hang on.

3:52:40 stassats just dump a core

3:59:48 drmeister Ok - got it - What's up with the <signal handler called> ?

4:00:02 stassats a signal handler is being called

4:00:16 drmeister Ah - that illuminates things.

4:00:27 drmeister Reading MPS docs

4:00:28 stassats what more can be said about that?

4:01:08 stassats should be sigsegv

4:01:37 drmeister Well, MPS uses barriers and signals as part of its operation...

4:01:40 drmeister "You may need to make sure that the debugger isn’t entered on barrier(1) hits (because the MPS uses barriers to protect parts of memory, and barrier hits are common and expected)."

4:02:35 drmeister "On OS X, barrier hits do not use signals and so do not enter the debugger."

4:02:56 drmeister Clasp is currently the only system that uses MPS with multithreading.

4:03:29 drmeister Maybe there is an issue - or we have to accommodate MPS signals.

4:03:52 stassats why are you focusing on signals?

4:04:01 stassats you have a segfault, it's the same signal

4:05:08 stassats mps punts on it, not in its memory, should've printed something instead

4:05:39 stassats but now you'll have to inspect the context to see where it has faulted

4:06:59 drmeister I'm a bit slow and I don't know what's going on. What is the deal with segfault?

4:08:33 stassats so the address is 0x3e800003e8d

4:08:37 stassats not a zero or anything

4:12:14 stassats drmeister: are you registering thread memory regions correctly with mps?

4:12:57 drmeister I believe that I am ...

4:13:19 drmeister https://github.com/drmeister/clasp/blob/dev/src/core/mpPackage.cc#L117

4:14:18 drmeister start_thread is called when the thread is started and I create thread local allocation points, register the thread and register the thread stack.

4:16:41 stassats if it's a valid address, then it knows to protect the region, but not that it belongs to it

4:17:37 drmeister Where do you get the address: 0x3e800003e8d

4:17:52 stassats from the ucontext

4:18:04 drmeister This is the top of my stack:

4:18:07 drmeister https://www.irccloud.com/pastebin/XDjXWAGd/

4:20:23 stassats siginfo, rather

4:21:20 stassats p $_siginfo._sifields._sigfault.si_addr

4:22:07 drmeister (gdb) p $_siginfo._sifields._sigfault.si_addr -> $1 = (void *) 0x3e800005861

4:22:21 stassats that's close

4:23:19 drmeister x/8xg 0x3e800005860 --> 0x3e800005860: Cannot access memory at address 0x3e800005860

4:23:51 drmeister MPS puts hardware barriers on memory - when the program touches it it signals to the MPS that that memory needs to be fixed.

4:23:57 drmeister That's why I'm going on about signals.

4:24:39 stassats yes, i'm familiar with some common gc techniques

4:24:49 stassats but this is more likely a misdirection

4:25:12 drmeister Ok - sorry - I don't want to teach my gramma to suck eggs.

4:25:26 stassats well, it could very well be all of the above

4:25:28 drmeister What do you mean by a misdirection?

4:26:02 drmeister Other than - we might be being led down the wrong track

4:26:19 drmeister What is: p $_siginfo._sifields._sigfault.si_addr ?

4:26:29 stassats the fault address

4:26:43 drmeister The memory address that was read and that caused the fault?

4:27:52 drmeister I don't see that in any of the registers in the frame above the <signal handler called> frame #2 -- although maybe I shouldn't expect to.

4:28:38 drmeister Does info registers give you the values of registers as they were in the frame that I'm currently looking at - or only the current values of the registers in the top frame.

4:29:37 drmeister They change when I change frames - shows you how familiar I am with gdb.

4:34:27 drmeister In my morning (7 hours from now) one of my friends at Ravenbrook will be up - I'll ask them about this. They might have some advice. The problem is easy to reproduce.

4:38:17 drmeister http://www.ravenbrook.com/project/mps/master/manual/html/topic/thread.html

4:38:47 drmeister There is some stuff in here that I'm not sure if we are doing (or not doing). frgo set up a lot of signal handling code for clasp.

4:44:48 beach Good morning everyone!

4:45:00 stassats drmeister: this may be an old not updated pointer

4:45:38 drmeister Oh - now - that's possible - if it's not in MPS managed memory.

4:46:54 drmeister It is a tagged pointer - how does it end up in $_siginfo._sifields._sigfault.si_addr and where is $_siginfo._sifields._sigfault.si_addr located in memory? Is it in kernel space (I'm guessing)?

4:47:34 stassats that's not really relevant

4:47:42 stassats it's just an address that's not there

4:48:24 drmeister So its not a real address?

4:48:44 stassats who knows

4:50:01 drmeister MPS has this function: mps_bool_t mps_arena_has_addr(mps_arena_t arena, mps_addr_t addr)

4:50:17 drmeister I could use it to ask if the address is managed by MPS

4:50:24 drmeister Would that help?

4:51:22 stassats i doub it

4:51:45 stassats it already decides to disavow it

4:53:03 drmeister I tried to run under gdb - but I don't get the same error.

4:54:30 drmeister I'm wiped out - I've been up for 18 hours working on this.

4:55:05 drmeister Unless you have some further insight I was going to ask my friend at Ravenbrook about it in the morning.

4:55:22 drmeister This works on OS X and it works in single threaded mode.

4:55:44 drmeister There's clearly something going on in Linux when we do non-trivial things in a child thread.

4:56:12 stassats how are your thread local bindings done?

4:56:43 drmeister I have one thread local pointer that points to a data structure at the top of each threads stack.

4:57:18 drmeister With MPS I have a second thread local structure that contains half a dozen MPS allocation points.

4:57:40 drmeister The first thread-local data structure stored at the top of each stack is described here:

4:57:41 stassats looking at the actual faulting instruction and the memory it uses, it doesn't look like 0x3e800005860

4:57:56 drmeister https://github.com/drmeister/clasp/blob/dev/include/clasp/gctools/threadlocal.h#L7

4:58:39 stassats how do i quickly recompile just protli.c?

4:58:54 drmeister ./waf build_imps

4:59:18 drmeister All of the MPS C code is #include'd within Clasp in mygc.c

4:59:29 drmeister There is no separate library.

4:59:47 stassats that ran too quickly

5:00:00 stassats leading me to believe that it did nothing

5:00:05 drmeister Did it recompile mygc.c?

5:00:14 stassats no

5:00:28 drmeister Ok, then it isn't registered as a dependency.

5:00:41 stassats i only modified protli.c

5:00:45 drmeister Hang on - I'll find you the mygc.c bitcode product.

5:00:53 drmeister You can delete that and it should rebuild it.

5:01:18 stassats ./build/mps/src/gctools/mygc.c.4.o ?

5:01:26 drmeister Yup

5:01:50 stassats it only says [ 7/356] Compiling src/gctools/mygc.c, but i need to rebuild protli.c

5:02:34 drmeister My understanding is that mygc.c includes ALL of the MPS code transitively.

5:02:47 stassats let's see

5:02:48 drmeister You could use nm to check if the symbol for the function you are editing is in there.

5:03:06 stassats waiting for the linker

5:03:12 drmeister the mygc.c.4.o is a bitcode file. You could also llvm-dis it and look at the human readable .ll file.

5:03:34 stassats i'm not that kind of human to be able to read .ll files

5:04:16 drmeister What is the name of the function you are editing?

5:04:39 stassats it's a secret

5:04:41 drmeister sigHandle?

5:05:20 drmeister Information wants to be free.

5:05:34 stassats blimey, didn't export LIBRARY_PATH=/usr/local/lib/ relinking again

5:05:43 stassats drmeister: wrong information needs to be chained down

5:06:58 stassats so slow linking

5:07:22 drmeister https://github.com/Ravenbrook/mps/blob//code/mps.c#L33

5:07:57 drmeister mps.c is included in mygc.c and mps.c includes EVERYTHING in MPS

5:08:00 stassats 404

5:09:27 drmeister Huh

5:10:00 drmeister https://github.com/Ravenbrook/mps/blob/master/code/mps.c#L33

5:14:00 stassats doesn't look it got recompiled

5:14:03 stassats oh well

5:15:44 drmeister Oh - wait - are you using cclasp-mps and build ./waf build_imps?

5:16:16 stassats yeah

5:16:54 drmeister I wasn't clear. ./waf build_imps will rebuild iclasp-mps - you could run that. Alternatively you can use ./waf build_cmps and that will relink cclasp-mps - but that takes longer because it recompiles some Common Lisp and relinks everything with everything to make cclasp-mps.

5:17:17 stassats what does iclasp-mps give me?

5:17:21 stassats everything?

5:17:36 drmeister It is the C++ executable and it loads the fasl/cclasp-mps-image.fasl.

5:18:18 drmeister It shouldn't be different - but it may be - I don't understand this error and I don't know if iclasp-boehm will reproduce the same error as cclasp-boehm.

5:18:30 stassats Bad client pointer 0x7f74c5bc7b68

5:18:30 stassats The MPS detected a problem!

5:18:45 drmeister Hmmm.

5:19:20 stassats Add support for unknown (immediate?) object to lisp_instance_class obj = 0xffffffffffffffff

5:19:39 drmeister Double Hmmm

5:20:31 drmeister My knee jerk reaction is to turn on the guards and rebuild and see if we can track it down then. It's a lot more fun tracking down GC problems with the guards on.

5:21:49 drmeister I added DEBUG_GUARD=1 to clasp/wscript.conf and...

5:21:58 drmeister I uncommented these in the wscript file:

5:22:02 drmeister https://www.irccloud.com/pastebin/YkDvnt5w/

5:22:54 drmeister CONFIG_VAR_COOL turns on assertions in MPS and the other three cause clasp to check objects for validity. They don't slow things down too much.

5:23:48 drmeister I've got to head to bed - I'm a zombie.

5:26:48 stassats huuuuh

5:27:20 stassats sigHandle is being hit multiple times with 0x3e800004132, refuses to handle it and ultimately is hit with 0

5:29:29 stassats gotta abort() the first time before it screws things up more

12:25:46 frgo ::notify drmeister Re MPS and signals: I think we need to change behavior in file src/gctools/interrupt.cc: ADD_SIGNAL( SIGSEGV, "+SIGSEGV+", ext::_sym_segmentation_violation); - Am I right that this estalishes a handler represented by the symbol ext::_sym_segmentation_violation? If so: when using MPS on Linux, we're not allowed to do that. If not: I'd like to know what this line actually does.

12:25:46 Colleen frgo: Got it. I'll let drmeister know as soon as possible.

12:28:32 Shinmera Why are you not allowed to do that?

13:05:04 frgo Because MPS on Linux relies on SIGSEGV being not handled by someone else. It uses SIGSEGV to manage memory.

13:05:53 stassats mps handles sigsegv fine

13:06:08 stassats receives sigsegv fine, not sure as to how it handles it

13:06:44 frgo Yes, it does. But if you install another handler, then this leads to MPS being prevented from doing its job.

13:07:08 stassats that's not he case here

13:50:57 drmeister Hello

13:50:57 Colleen drmeister: frgo said 1 hour, 25 minutes ago: Re MPS and signals: I think we need to change behavior in file src/gctools/interrupt.cc: ADD_SIGNAL( SIGSEGV, "+SIGSEGV+", ext::_sym_segmentation_violation); - Am I right that this estalishes a handler represented by the symbol ext::_sym_segmentation_violation? If so: when using MPS on Linux, we're not allowed to do that. If not: I'd like to know what this line actually does.

13:52:30 drmeister I'll try this.

13:52:46 frgo Morning drmeister

13:53:18 frgo stassats already pointed out that MPS actually receives SIGSEGV.

13:53:30 stassats and it works on the main thread, so

13:53:42 frgo (I don't have linux here right now to test)

13:53:46 drmeister Ok.

13:54:36 drmeister My friend at Ravenbrook got back to me and wants to see a backtrace. The machine I'm using is in the Amazon Cloud - so I can give him access as well.

13:54:46 drmeister I'm just waking up - need tea.

13:55:04 stassats drmeister: what i've uncovered, it actually receives multiple faults

13:55:12 stassats and resignals them

13:55:17 stassats ending with a fault at zero

13:55:23 drmeister Interesting...

13:55:35 drmeister I built a version with guards on - it took 7 hours.

13:55:42 drmeister It's like the bad old days.

13:55:46 Shinmera Like good old times.

13:56:00 stassats you have different attitudes

13:56:09 Shinmera I'm just sarcastic.

13:58:48 drmeister It doesn't reproduce the problem with the cases I tried last night - it works on simple cases.

13:59:57 drmeister Nope - it does - I was using cclasp - it behaves differently. In aclasp it fails like it did last night.

14:06:28 drmeister I can reproduce the problem and I passed it on to David along with stassats' observation.

14:12:40 frgo drmeister: I seem to have a problem with wscript in clasp: I get:

14:12:41 frgo Error >>>>>>>> In file included from /opt/common-lisp/lang/clasp/src/clasp/src/gctools/interrupt.cc:2:

14:12:42 frgo In file included from /opt/common-lisp/lang/clasp/src/externals-clasp/llvm50/include/llvm/Support/ErrorHandling.h:18:

14:12:42 frgo #include "llvm/Config/llvm-config.h"

14:12:42 frgo ^~~~~~~~~~~~~~~~~~~~~~~~~~~

14:13:00 frgo well, the file *is* there.

14:13:31 drmeister I just realized something - I can create Amazon Cloud machines with Clasp running and give people access to them. Great for debugging.

14:13:46 frgo I don't see script setting the include path for llvm...

14:14:00 frgo s/script/wscript.

14:14:01 drmeister frgo: That was the same problem that you had last night - this is with the new externals-clasp build?

14:14:10 frgo Yes.

14:14:21 drmeister Could you paste your wscript.config file?

14:14:32 frgo Sure.