freenode/#clasp - IRC Chatlog

18:20:08 frgo drmeister: Wouldn't that be a good candidate for a clasp extension module?

18:21:13 drmeister Yes - the extension modules give you something in addition - they let you define GC managed C++ classes.

18:22:59 drmeister frgo: I have a question about signals - do you have a few minutes?

18:23:13 drmeister I'm pretty sure I know the answer, but I wanted to check with you.

18:25:35 drmeister What happens under the hood when I hit Control-C in Clasp?

18:27:21 drmeister And what should happen?

18:28:24 drmeister The way I see it is an asynchronous signal handler should get the SIGINT signal and should set a flag that some thread (which?) will occasionally check and then unwind the stack.

18:30:40 frgo drmeister: re signals: Shoot

18:31:10 frgo Yes - We need to setup signal handlers still for a few more sigs.

18:31:57 frgo Behavior depends on the signal and the desired ourtcome.

18:33:17 frgo E.g.: Do we want to stop clasp on sigkill? Certainly. But in a controlled way, Now: What is the "controlled way"? Send sigkill to all threads - meaning; Calling a function per thread that kills the thread? Would be one way to do it.

18:34:13 drmeister But the underlying mechanism - how does a thread recognize that it got a sigkill?

18:34:43 frgo It doesn't ..

18:35:00 frgo It's the process that gets the signal.

18:36:11 frgo we need to have a function registered in the thread struct that gets called when the process' signal handler is invoked. So:

18:37:20 frgo Signal -> Caught by signal handler in process -> That signal handler does for all threads: Check if function (generally or a separate one for each signal)

18:37:28 frgo and the call that function.

18:38:01 frgo The called function my be set to something thread specific - but normally would just exit the thread .

18:38:08 frgo This then unwinds the stack.

18:40:21 frgo For applicatoions that really need to shutdown gracefully no matter what signal is received it is required to have an alternate stack in place (via function sigaltstack) that is used to handle signal handler function execution.

18:41:14 drmeister But you can't unwind the stack from a signal handler in linux or OS X

18:41:52 frgo It#s not the signal handler that unwinds. It's the terminated thread that does.

18:42:06 drmeister How does the terminated thread recognize that it has to unwind?

18:42:46 frgo Yes. That is the question. For C++ only that's magic (for me) - for clasp: I don't know.

18:44:41 Bike if signals are process level they're hard to use. say we get a floating point exception, how could we know what thread triggered it?

18:44:44 drmeister The only way I know how is for C++ to occasionally check a flag (global or thread local) and proceed based on the results.

18:45:12 frgo When in C land, I had to use sigsetjmp() and siglongjmp() for unwinding.

18:45:52 drmeister Bike: Martin Cracauer brought that up - he said we shouldn't involve signals at all - they are too expensive, and bring problems like the one you are raising - they are the wrong mechanism.

18:45:55 frgo Well, a thread that does cause a non-recoverabele error wll be in a defunct state that we can check for.

18:46:03 Bike yeah, i get that

18:46:10 Bike but with C-c C-c what is to be done?

18:47:42 drmeister The only way I see is for every thread to occasionally check a flag to see if it should check further for directions.

18:49:01 frgo I've never done it, but: I would like to test if we can setup signal handlers per thread. Hang on ...

18:50:27 drmeister Even if you can - they are fundamentally asynchronous and the stack will be in a dirty state and shouldn't be touched, let alone unwound.

18:52:42 drmeister Unless someone has a better idea or understanding - I would say we have a thread-local variable that each thread checks as it leaves a function or at the bottom of loops (and could be generated or not using a (DECLARE (OPTIMIZE ...))) and take the hit in performance.

18:53:57 frgo drmeister: Threads in clasp: Pthreads? Something else? (Didn't check, sorry)

18:54:04 drmeister pthreads

18:55:17 frgo So we have pthread_sigqueue to send a signal to a specific thread. will read up on that again ...

19:02:39 drmeister Hmmph - I just figured out what I broke this morning - I moved GC managed pointers out of GC managed memory.

19:03:58 Bike TLS is managed, no?

19:07:12 drmeister I managed it in a special way - I defined a ThreadLocalState struct that contained all of the thread local data structures directly.

19:08:01 drmeister I allocated that at the top of the stack. With it on the stack it is managed, its roots are identified by the conservative stack scanner.

19:09:40 drmeister I ran into an issue with header file order now that I want to define the allocation points within the ThreadLocalState structure - so I defined ThreadLocalState to contain pointers to some of the thread local data structures - BRRRRAAAAPPPP! Wrong - don't do that.

19:10:02 Bike this sounds pretty arcane.

19:10:24 drmeister What I did wrong? Or what I was doing before?

19:10:47 drmeister Because I'm going back to doing what I was doing before.

19:11:21 drmeister I thought it would be messy to declare a bunch of roots for each thread.

19:11:41 drmeister It seemed cleaner to just toss them into the top of the stack.

19:12:11 drmeister As long as the thread is live the stack is active and the thread local roots are roots.

19:14:23 drmeister Bike: By the way - David Lovemore (Ravenbrook) says (re: Saving and restoring MPS memory) "It's not possible at the moment. But it ought to be possible."

19:14:29 drmeister So we are gonna need their help.

19:15:31 Bike i thought that was given :)

19:16:03 drmeister Pretty much - I was cross checking.

19:46:25 jackdaniel just as I thought, watching this movie was a great rest for the brain – I had enough visual effects to keep it at some low futile gear yet it has too little meaning to engage higher parts

20:59:47 drmeister Bike: Is there a way to identify the bottom of loops - to insert a check for notifications from the system to threads?

21:00:47 Bike we could detect obvious ones. that would cover most of it but obviously we can't identify all possibly nonterminating regions

21:01:46 drmeister Could we insert an invocation of an inlined test controlled by a (declare (optimize ...)) policy?

21:02:17 Bike don't see why not, but i'd have to write the logic

21:02:24 Bike i think beach already had something in mind for that

21:03:14 drmeister Ok - we can talk about it next week.

21:03:51 drmeister I think I resolved my MPS+threading thread local storage issues.

0:39:57 drmeister cclasp with MPS works with slime in :spawn mode

0:41:24 stassats will it work for me?

0:41:28 frgo Yay! Congrats!

0:41:49 Bike slime with :spawn works again at all? coo

0:41:56 drmeister stassats: You break everything I make.

0:42:40 drmeister https://www.irccloud.com/pastebin/LJ3gMTOU/

0:47:26 stassats just say when

0:47:27 drmeister I pushed everything to dev

0:47:34 stassats that was fast

0:47:39 drmeister ./waf build_cmps

0:47:40 Bike so dev builds mps now?

0:47:43 Bike oh. right

0:48:10 drmeister It's a little slow - I'm not sure why - it took 1h45m on my system here when a couple of days ago it was 1h30m

0:48:36 stassats ok, let's see where the build breaks

0:49:24 stassats "The project was not configured: run "waf configure" first!" that was fast

0:49:57 stassats Checking for program 'llvm-config' : not found

0:49:58 drmeister ./waf configure build_cmps

0:50:01 stassats lovely

0:50:15 drmeister That's a feature

0:50:30 stassats to behave unlike `make'?

0:51:01 Bike for the not found, you have to set the variable in wscript.config to point to the externals config llvm-config.

0:51:09 Bike (when are we going to be done with externals-clasp for real)

0:51:15 stassats i don't even have externals-clasp

0:51:22 stassats and `make' works

0:51:41 drmeister `make' shouldn't work

0:52:00 Bike why have we not deleted the makefile.

0:52:14 drmeister stassats: Put this in your wscript.config: LLVM_CONFIG_BINARY = '/Users/meister/Development/externals-clasp/build/release/bin/llvm-config'

0:52:20 stassats now you're deleting the only thing that works

0:52:23 stassats i don't have externals-clasp

0:52:36 drmeister But fix up the path to point to wherever your llvm-config is that you are using.

0:52:43 stassats this is what make says Checking for program 'llvm-config' : /usr/lib/llvm-5.0/bin/llvm-config

0:53:19 drmeister As long as /usr/lib/llvm-5.0/bin/llvm-config exists - it should be fine.

0:53:25 drmeister It's llvm-config and it's 5.0

0:53:39 stassats plain ./waf configure doesn't think so

0:58:33 drmeister So - it's not building for you?

0:58:47 drmeister Did you go and break my beautiful build system already?

0:59:44 stassats did sudo ln -s ../lib/llvm-5.0/bin/llvm-config llvm-config, now it's to clang++

1:00:58 stassats that pacified it

1:01:21 Bike you can put the path in wscript.config, right.

1:01:50 drmeister How does sbcl get the number of processor cycles in TIME?

1:01:53 stassats i could, but everything worked with make before

1:02:05 drmeister ACTION has always wondered about that.

1:02:22 stassats and please, don't remove makefile altogether, i wouldn't remember the right build incantation

1:02:33 stassats drmeister: rdtsc

1:02:38 Bike it uses the cycle count instruction

1:02:39 Bike yeah.

1:02:45 Bike llvm has intrinsics for it

1:02:54 drmeister Does it now?

1:03:01 Bike pretty sure...

1:03:35 stassats something's building now

1:04:10 Bike llvm.readcyclecounter

1:04:31 stassats what does it do on arm? cause i couldn't figure out a non-kernel approach

1:04:46 Bike dunno, the manual only mentions alpha and x86. lemme dig

1:05:31 drmeister Huh - make does still seem to work - but on my system it makes the boehm version.

1:05:49 Bike we could make it a dumb alias that just runs the appropriate waf command

1:06:25 Bike on arm it uses the "Performance Monitors Extension", whatever that is

1:06:25 drmeister So... all: ./waf configure build_cmps ?

1:06:38 Bike maybe update submodules too

1:06:46 drmeister Ok.

1:06:47 stassats and whatever it did to find llvm-config-5.0

1:06:51 Bike though i guess that's a git thing.

1:07:20 Bike yeah, looks like some arm cpus have a "performance monitoring unit" that has a few counters

1:07:50 Bike "the cycle-count is: mrc p15, #0, <Rt>, c9, c13, #0"

1:08:36 stassats /usr/bin/ld.gold: error: cannot find -lboost_filesystem

1:08:54 stassats Bike: and that's the system-level part, yes

1:09:46 Bike you should install boost, i guess.

1:09:56 Bike and ah. yeah it says readcyclecounter might need special perms

1:10:43 stassats i have boost installed

1:13:08 stassats it's in /usr/local/

1:14:19 drmeister I think waf should find it in /usr/local - wth

1:16:30 stassats export LIBRARY_PATH=/usr/local/lib/ works (and why do i have it in /usr/local? because i had to build it by hand for clasp to get linked by clang)

1:16:31 drmeister Bike: There's a guy at Temple who could help us set up some virtual servers to do continuous integration.

1:17:38 drmeister stassats: You had to build boost recently by hand? I've been using package managers to install boost for a couple of months.

1:19:46 frgo Build failed

1:19:46 frgo -> missing file: '/opt/common-lisp/lang/clasp/src/clasp/build/mps/src/gctools/interrupt.sif'

1:20:01 stassats drmeister: i haven't rebuilt it

1:20:11 stassats but i'm sure it'll break if i don't have it

1:20:41 drmeister frgo: Did you do a clean build? ./waf distclean configure build_cmps

1:20:58 frgo (I did pull from dev and ./waf distclean && ./waf configure before)

1:21:20 frgo So - yes

1:22:15 drmeister interrupt.sif should have been built by the scraper - I don't understand how it could be missing. It's automatic.

1:23:27 drmeister Are there any other .sif files in that directory?

1:24:29 frgo Yes - like:

1:24:31 frgo -rw-r--r-- 1 frgo admin 533959 19 Nov 02:19 boehmGarbageCollection.sif

1:24:31 frgo -rw-r--r-- 1 frgo admin 606155 19 Nov 02:19 gcFunctions.sif

1:24:31 frgo -rw-r--r-- 1 frgo admin 120942 19 Nov 02:19 gcStack.sif

1:24:31 frgo -rw-r--r-- 1 frgo admin 506450 19 Nov 02:19 gc_boot.sif

1:24:31 frgo -rw-r--r-- 1 frgo admin 3 19 Nov 02:19 gc_interface.sif

1:24:31 frgo -rw-r--r-- 1 frgo admin 120942 19 Nov 02:19 gcalloc.sif

1:24:31 frgo -rw-r--r-- 1 frgo admin 526259 19 Nov 02:19 gctoolsPackage.sif

1:24:32 frgo -rw-r--r-- 1 frgo admin 486530 19 Nov 02:19 gcweak.sif

1:24:32 frgo -rw-r--r-- 1 frgo admin 120942 19 Nov 02:19 globals.sif

1:24:33 frgo -rw-r--r-- 1 frgo admin 123020 19 Nov 02:19 hardErrors.sif

1:24:33 frgo -rw-r--r-- 1 frgo admin 535557 19 Nov 02:19 memoryManagement.sif

1:24:34 frgo -rw-r--r-- 1 frgo admin 615694 19 Nov 02:19 mpsGarbageCollection.sif

1:25:04 drmeister Yeah - all those.

1:25:21 drmeister When it was building clasp - did it report something like: [ 19/356] Scraping with preproc.scan src/gctools/interrupt.cc

1:25:22 frgo and they've been built just 5 mins ago.

1:25:51 drmeister There should be about 180 lines like: Scraping with preproc.scan src/gctools/...

1:26:05 drmeister Scraping with preproc.scan ...

1:26:37 drmeister It's using sbcl and running the preprocessor on each source file - I'm worried that it failed somehow for interrupt.cc

1:27:15 drmeister frgo: If you have the log of the build - could you paste it to gist.github.com?

1:27:59 frgo Yes, it did fail for interrupt.cc - see: https://gist.github.com/dg1sbg/5538a17e93b32543d3fa4bd33cc0569f#file-gistfile1-txt-L53

1:28:24 frgo So I need to pull for externals-clasp, too?

1:29:35 drmeister Interesting. How did you install llvm?

1:30:03 frgo ah - git pulled externals-clasp .

1:30:30 drmeister Um - I'm not following.

1:30:53 drmeister Are you using externals-clasp - or not?

1:30:59 frgo whatever externals-clasp does. It uses the LLVM in there.

1:31:02 drmeister Or did you have an old one?

1:31:15 drmeister An old version of externals-clasp.

1:31:22 stassats clasp-mps-FRGO_CLASP_DEV_FOREIGN_DATA_002-2046-g440b8c5c9

1:31:25 stassats a mouthful

1:31:48 drmeister Yes - I haven't figured out where that comes from (sigh)

1:32:14 stassats from the cloud

1:32:15 drmeister frgo is immortalized in the clasp executable name.

1:32:26 frgo Ah . I'm guilty for that one. I'll remove that tag.

1:32:47 drmeister Oh - you know where that is? At least someone does.

1:33:01 stassats drmeister: and SBCL is using my initials

1:33:17 stassats or am i?

1:33:22 frgo It's a git tag that I introduced decades ago.

1:33:42 drmeister ACTION thought that that was what it was called Stas-Boukarev's-Common-Lisp

1:34:11 drmeister Oh it's a git tag.

1:34:11 stassats drmeister: that's what i use as a pick up line at the bar

1:34:13 frgo So how again do I build exterals-clasp. Man, I need to write that down somewhere.

1:34:22 Bike stas boukarev, count of lancaster

1:34:31 drmeister Yup - there it is.

1:34:31 Bike externals clasp is just make, isn't it

1:34:43 drmeister frgo: Pull it and 'make'

1:34:54 drmeister The master branch is up to date.

1:35:49 frgo Thx

1:35:57 frgo Building externals-clasp now...

1:36:57 frgo drmeister: For a later date: Signal handling the safe way - that's what we need to do in clasp: https://gist.github.com/dg1sbg/3b608bf1c880148cc06faffd641c8809

1:37:06 stassats what is GENERATE-DIRECT-CALL-DEFUN?

1:37:54 stassats the name is a bit funny

1:38:11 drmeister stassats: It's a macro that generates a wrapper around a C++ function so that it compiles the lambda list/argument handling with Cleavir rather than using the C++ LambdaListHandler_O class.

1:39:00 stassats all macros generate code, so prefixing with "generate-" is a tad strange

1:49:26 frgo To get rid of that FRGO... tag I need to assign a new one. How should be name the current state of affairs? which version of clasp do we currently have? like pre_0.5 ?

1:50:02 stassats and what's with all the "Read: (DEBUG-INLINE primop)

1:50:03 stassats "

1:50:04 drmeister Sure

1:50:22 Bike some debug thing. another thing to delete

1:51:23 stassats all uses of GENERATE-DIRECT-CALL-DEFUN use MAGIC-INTERN, maybe it should just be moved into generate-direct-call-defun?

1:52:36 frgo Tags removed

1:56:19 frgo That was locally in my forked github repo.

1:56:31 frgo drmeister: You'd need to execute

1:56:49 frgo git push --delete origin FRGO_CLASP_DEV_FOREIGN_DATA_001

1:56:51 frgo git push --delete origin FRGO_CLASP_DEV_FOREIGN_DATA_002

1:57:16 drmeister Done - thank you.

1:57:17 frgo if your github remote is also called "origin".

1:57:20 frgo Ok.

1:58:48 frgo Ok . 3 am here. time to get some sleep while externals-clasp is building ...

1:58:56 frgo See you all tomorrow.

1:59:01 drmeister Good night frgo

1:59:05 drmeister stassats: Ok

2:00:56 drmeister I changed it from generate-direct-call-defun to wrap-c++-function

2:01:42 drmeister There's a chance that it might not use magic-intern - but they are all using it at the present time. I think I'll leave that this way.

2:47:00 stassats Process inferior-lisp<1> segmentation fault

2:47:29 drmeister Is that when it starts up?

2:47:35 stassats yes

2:47:56 drmeister Does it compile anything?

2:48:10 stassats everything

2:49:09 drmeister Ok, just to be clear - cclasp built, asdf built, when you start slime it compiles all of the slime code and then segfaults.

2:49:21 stassats yes

2:49:48 drmeister What distro of linux are you running now?

2:50:00 stassats i won't tell you!

2:50:09 drmeister Why?

2:50:16 stassats don't really see how it's relevant

2:50:58 stassats and you won't be able to replicate my setup

2:50:58 drmeister I'll spin up an AWS system and check it out.

2:51:22 drmeister It ran well enough to compile asdf

2:52:04 Bike does it work for single threaded slime

2:52:31 drmeister Yeah - that would be useful to know.

2:52:55 drmeister Anyway - I'm spinning up a linux now.

2:53:08 stassats kinda

2:53:21 stassats COMMON-LISP-USER> Bad client pointer 0x7f0ed6a879b8

2:53:21 stassats The MPS detected a problem!

2:53:22 stassats ../../include/clasp/mps/code/lockli.c:139: MPS ASSERTION FAILED: res == 0

2:54:04 Bike that seems pretty bad.

2:54:07 drmeister Well, you are the first person to try clasp+MPS on linux. Congratulations.

2:54:19 stassats it segfaults again promptly after that

2:54:57 stassats is spawn the default now?

2:55:08 stassats cause i commented out setting it to :spawn, and got that assertion

2:55:22 stassats but when i set it explicitly to nil, i actually get slime to connect

2:56:21 stassats so, yes, :spawn is the default, but it does not always result in these assertions

3:03:18 drmeister I'm not aware of any OS X/Linux differences that could be tripping stuff up - but I was working with threading, MPS and thread local storage.

3:12:27 stassats (mp:process-run-function nil (lambda () (load "quicklisp/setup.lisp"))) does it for me

3:12:40 drmeister It's on it's way now - I'll have more in an hour and a half or so.

3:12:54 drmeister Does what? Tickles your fancy? Or segfaults?

3:13:54 stassats segfaults

3:14:23 stassats and i like segfaults

3:14:24 drmeister Ok, that should be easy to reproduce. I'll bet it fails even in the interpreter.

3:14:53 drmeister (mp:process-run-function nil #'(lambda () (print "Hi - I'm going to segfault!")))

3:15:00 stassats segfaults at _Unwind_Find_FDE

3:15:11 stassats drmeister: no, that's not what my snippet says

3:15:46 drmeister No - that's what I'm going to try as soon as the interpreter finishes compiling.

3:16:12 drmeister Which will be soon - the AWS machines are pretty beefy.

3:16:26 stassats well, then it'll be lying, it's not going to segfault

3:16:56 drmeister You've demonstrated that?

3:17:59 stassats can i call for some lisp backtraces from gdb?

3:18:42 drmeister I don't know - I'm having issues with JITted code and backtraces on linux.

3:18:49 drmeister stassats: Have you ever used libunwind?

3:19:02 stassats there's core__catch_function, then there are some core__call_with_variable_bound, and then _Unwind_Find_FDE

3:19:13 stassats drmeister: no

3:21:17 stassats Thread 1 "cclasp-mps" received signal SIGXFSZ, File size limit exceeded.

3:21:30 stassats PC LOAD LETTER

3:22:13 drmeister Huh? File size limit exceeded? wth?

3:22:28 drmeister And PC LOAD LETTER? Is your printer empty?

3:23:06 drmeister Right - as you said - I can start processes in the interpreter with no problems.

3:24:01 stassats but it's not dying with SIGXFSZ, that's only in gdb

3:24:08 drmeister And simple tests unwinding the stack don't have problems.

3:24:09 drmeister (mp:process-run-function nil #'(lambda () (block x (funcall #'(lambda () (return-from x nil))))))

3:24:10 stassats is that a gc signal? (weird choice)

3:24:21 drmeister Oh - hang on - there's an issue with MPS and gdb.

3:24:30 stassats drmeister: does that unwind anything?

3:24:52 Bike it should, yeah.

3:24:56 drmeister In the interpreter I thought it would.

3:25:20 drmeister Yeah - block/return-from are done with C++ exception handling.

3:25:26 drmeister In the interpreter.

3:26:04 drmeister If you are using GDB on Linux or FreeBSD, run this command: handle SIGSEGV pass nostop noprint

3:26:14 drmeister http://www.ravenbrook.com/project/mps/master/manual/html/guide/debug.html

3:26:20 stassats (mp:process-run-function nil #'(lambda () (catch 'x (eval '(throw 'x 10)))))

3:26:21 stassats crashes

3:26:23 drmeister Thats an MPS specific thing.

3:26:23 whoman as a 'user', would i want to use clasp later on, if i am using ecl now ?

3:26:33 stassats sometiems

3:26:45 Bike how does that work on lldb

3:27:43 stassats no, the throw doesn't crash

3:27:44 drmeister Bike: I haven't seen a problem on OS X - and I haven't used lldb on linux with MPS.

3:28:23 drmeister whoman: That depends on what you want to do. Clasp is great for exposing C++ libraries.

3:30:26 stassats #3 core::DynamicBinding::DynamicBinding (this=<optimized out>, sym=..., val=...) at ../../include/clasp/gctools/threadLocalStacks.h:27

3:30:34 stassats #2 <signal handler called>

3:33:13 stassats #11 core::cl__read

3:33:50 drmeister I can't follow these.

3:34:16 stassats but you wrote them

3:34:55 drmeister Once I reproduce the problem I have some tools to track it down. I can compile clasp with guards around every allocated object and I can symbolicate jitted frames - if gdb gives me return addresses and complete backtraces.

3:35:24 drmeister Sure I wrote them - but they are disconnected stack frames - what does the entire backtrace look like?

3:35:44 stassats nothing interesting

3:36:38 drmeister This is the crash with: (mp:process-run-function nil #'(lambda () (catch 'x (eval '(throw 'x 10))))) ?

3:36:48 stassats no

3:36:57 stassats that was a misdirection

3:37:41 drmeister (mp:process-run-function nil (lambda () (load "quicklisp/setup.lisp"))) then

3:40:14 drmeister I can get this to crash in aclasp: (mp:process-run-function nil #'(lambda () (catch 'x (eval '(throw 'x 10)))))

3:41:21 stassats i can get (loop repeat 100 do (mp:process-run-function nil #'(lambda ()))) to crash

3:41:54 drmeister Yeah - I was seeing that on OS X as well - more than 40 threads or so had problems. I hoped that was a separate issue.

3:41:55 Bike is the eval required?

3:42:19 stassats it's to avoid optimization

3:42:32 Bike i know, but i don't think we do any.

3:51:56 drmeister When I do this in aclasp: (mp:process-run-function nil #'(lambda () (catch 'x (eval '(throw 'x 10)))))

3:52:08 drmeister it crashes. I run it under gdb ...

3:52:22 drmeister aw shoot - I forgot the signal thing... hang on.

3:52:40 stassats just dump a core

3:59:48 drmeister Ok - got it - What's up with the <signal handler called> ?

4:00:02 stassats a signal handler is being called

4:00:16 drmeister Ah - that illuminates things.

4:00:27 drmeister Reading MPS docs

4:00:28 stassats what more can be said about that?

4:01:08 stassats should be sigsegv

4:01:37 drmeister Well, MPS uses barriers and signals as part of its operation...

4:01:40 drmeister "You may need to make sure that the debugger isn’t entered on barrier(1) hits (because the MPS uses barriers to protect parts of memory, and barrier hits are common and expected)."

4:02:35 drmeister "On OS X, barrier hits do not use signals and so do not enter the debugger."

4:02:56 drmeister Clasp is currently the only system that uses MPS with multithreading.

4:03:29 drmeister Maybe there is an issue - or we have to accommodate MPS signals.

4:03:52 stassats why are you focusing on signals?

4:04:01 stassats you have a segfault, it's the same signal

4:05:08 stassats mps punts on it, not in its memory, should've printed something instead

4:05:39 stassats but now you'll have to inspect the context to see where it has faulted

4:06:59 drmeister I'm a bit slow and I don't know what's going on. What is the deal with segfault?

4:08:33 stassats so the address is 0x3e800003e8d

4:08:37 stassats not a zero or anything

4:12:14 stassats drmeister: are you registering thread memory regions correctly with mps?

4:12:57 drmeister I believe that I am ...

4:13:19 drmeister https://github.com/drmeister/clasp/blob/dev/src/core/mpPackage.cc#L117

4:14:18 drmeister start_thread is called when the thread is started and I create thread local allocation points, register the thread and register the thread stack.

4:16:41 stassats if it's a valid address, then it knows to protect the region, but not that it belongs to it

4:17:37 drmeister Where do you get the address: 0x3e800003e8d

4:17:52 stassats from the ucontext

4:18:04 drmeister This is the top of my stack:

4:18:07 drmeister https://www.irccloud.com/pastebin/XDjXWAGd/

4:20:23 stassats siginfo, rather

4:21:20 stassats p $_siginfo._sifields._sigfault.si_addr

4:22:07 drmeister (gdb) p $_siginfo._sifields._sigfault.si_addr -> $1 = (void *) 0x3e800005861

4:22:21 stassats that's close

4:23:19 drmeister x/8xg 0x3e800005860 --> 0x3e800005860: Cannot access memory at address 0x3e800005860

4:23:51 drmeister MPS puts hardware barriers on memory - when the program touches it it signals to the MPS that that memory needs to be fixed.

4:23:57 drmeister That's why I'm going on about signals.

4:24:39 stassats yes, i'm familiar with some common gc techniques

4:24:49 stassats but this is more likely a misdirection

4:25:12 drmeister Ok - sorry - I don't want to teach my gramma to suck eggs.

4:25:26 stassats well, it could very well be all of the above

4:25:28 drmeister What do you mean by a misdirection?

4:26:02 drmeister Other than - we might be being led down the wrong track

4:26:19 drmeister What is: p $_siginfo._sifields._sigfault.si_addr ?

4:26:29 stassats the fault address

4:26:43 drmeister The memory address that was read and that caused the fault?

4:27:52 drmeister I don't see that in any of the registers in the frame above the <signal handler called> frame #2 -- although maybe I shouldn't expect to.

4:28:38 drmeister Does info registers give you the values of registers as they were in the frame that I'm currently looking at - or only the current values of the registers in the top frame.

4:29:37 drmeister They change when I change frames - shows you how familiar I am with gdb.

4:34:27 drmeister In my morning (7 hours from now) one of my friends at Ravenbrook will be up - I'll ask them about this. They might have some advice. The problem is easy to reproduce.

4:38:17 drmeister http://www.ravenbrook.com/project/mps/master/manual/html/topic/thread.html

4:38:47 drmeister There is some stuff in here that I'm not sure if we are doing (or not doing). frgo set up a lot of signal handling code for clasp.

4:44:48 beach Good morning everyone!

4:45:00 stassats drmeister: this may be an old not updated pointer

4:45:38 drmeister Oh - now - that's possible - if it's not in MPS managed memory.

4:46:54 drmeister It is a tagged pointer - how does it end up in $_siginfo._sifields._sigfault.si_addr and where is $_siginfo._sifields._sigfault.si_addr located in memory? Is it in kernel space (I'm guessing)?

4:47:34 stassats that's not really relevant

4:47:42 stassats it's just an address that's not there

4:48:24 drmeister So its not a real address?

4:48:44 stassats who knows

4:50:01 drmeister MPS has this function: mps_bool_t mps_arena_has_addr(mps_arena_t arena, mps_addr_t addr)

4:50:17 drmeister I could use it to ask if the address is managed by MPS

4:50:24 drmeister Would that help?

4:51:22 stassats i doub it

4:51:45 stassats it already decides to disavow it

4:53:03 drmeister I tried to run under gdb - but I don't get the same error.

4:54:30 drmeister I'm wiped out - I've been up for 18 hours working on this.

4:55:05 drmeister Unless you have some further insight I was going to ask my friend at Ravenbrook about it in the morning.

4:55:22 drmeister This works on OS X and it works in single threaded mode.

4:55:44 drmeister There's clearly something going on in Linux when we do non-trivial things in a child thread.

4:56:12 stassats how are your thread local bindings done?

4:56:43 drmeister I have one thread local pointer that points to a data structure at the top of each threads stack.

4:57:18 drmeister With MPS I have a second thread local structure that contains half a dozen MPS allocation points.

4:57:40 drmeister The first thread-local data structure stored at the top of each stack is described here:

4:57:41 stassats looking at the actual faulting instruction and the memory it uses, it doesn't look like 0x3e800005860

4:57:56 drmeister https://github.com/drmeister/clasp/blob/dev/include/clasp/gctools/threadlocal.h#L7

4:58:39 stassats how do i quickly recompile just protli.c?

4:58:54 drmeister ./waf build_imps

4:59:18 drmeister All of the MPS C code is #include'd within Clasp in mygc.c

4:59:29 drmeister There is no separate library.

4:59:47 stassats that ran too quickly

5:00:00 stassats leading me to believe that it did nothing

5:00:05 drmeister Did it recompile mygc.c?

5:00:14 stassats no

5:00:28 drmeister Ok, then it isn't registered as a dependency.

5:00:41 stassats i only modified protli.c

5:00:45 drmeister Hang on - I'll find you the mygc.c bitcode product.

5:00:53 drmeister You can delete that and it should rebuild it.

5:01:18 stassats ./build/mps/src/gctools/mygc.c.4.o ?

5:01:26 drmeister Yup

5:01:50 stassats it only says [ 7/356] Compiling src/gctools/mygc.c, but i need to rebuild protli.c

5:02:34 drmeister My understanding is that mygc.c includes ALL of the MPS code transitively.

5:02:47 stassats let's see

5:02:48 drmeister You could use nm to check if the symbol for the function you are editing is in there.

5:03:06 stassats waiting for the linker

5:03:12 drmeister the mygc.c.4.o is a bitcode file. You could also llvm-dis it and look at the human readable .ll file.

5:03:34 stassats i'm not that kind of human to be able to read .ll files

5:04:16 drmeister What is the name of the function you are editing?

5:04:39 stassats it's a secret

5:04:41 drmeister sigHandle?

5:05:20 drmeister Information wants to be free.

5:05:34 stassats blimey, didn't export LIBRARY_PATH=/usr/local/lib/ relinking again

5:05:43 stassats drmeister: wrong information needs to be chained down

5:06:58 stassats so slow linking

5:07:22 drmeister https://github.com/Ravenbrook/mps/blob//code/mps.c#L33

5:07:57 drmeister mps.c is included in mygc.c and mps.c includes EVERYTHING in MPS

5:08:00 stassats 404

5:09:27 drmeister Huh

5:10:00 drmeister https://github.com/Ravenbrook/mps/blob/master/code/mps.c#L33

5:14:00 stassats doesn't look it got recompiled

5:14:03 stassats oh well

5:15:44 drmeister Oh - wait - are you using cclasp-mps and build ./waf build_imps?

5:16:16 stassats yeah