freenode/#clasp - IRC Chatlog
Search
18:21:13
drmeister
Yes - the extension modules give you something in addition - they let you define GC managed C++ classes.
18:28:24
drmeister
The way I see it is an asynchronous signal handler should get the SIGINT signal and should set a flag that some thread (which?) will occasionally check and then unwind the stack.
18:33:17
frgo
E.g.: Do we want to stop clasp on sigkill? Certainly. But in a controlled way, Now: What is the "controlled way"? Send sigkill to all threads - meaning; Calling a function per thread that kills the thread? Would be one way to do it.
18:34:13
drmeister
But the underlying mechanism - how does a thread recognize that it got a sigkill?
18:36:11
frgo
we need to have a function registered in the thread struct that gets called when the process' signal handler is invoked. So:
18:37:20
frgo
Signal -> Caught by signal handler in process -> That signal handler does for all threads: Check if function (generally or a separate one for each signal)
18:38:01
frgo
The called function my be set to something thread specific - but normally would just exit the thread .
18:40:21
frgo
For applicatoions that really need to shutdown gracefully no matter what signal is received it is required to have an alternate stack in place (via function sigaltstack) that is used to handle signal handler function execution.
18:42:46
frgo
Yes. That is the question. For C++ only that's magic (for me) - for clasp: I don't know.
18:44:41
Bike
if signals are process level they're hard to use. say we get a floating point exception, how could we know what thread triggered it?
18:44:44
drmeister
The only way I know how is for C++ to occasionally check a flag (global or thread local) and proceed based on the results.
18:45:52
drmeister
Bike: Martin Cracauer brought that up - he said we shouldn't involve signals at all - they are too expensive, and bring problems like the one you are raising - they are the wrong mechanism.
18:45:55
frgo
Well, a thread that does cause a non-recoverabele error wll be in a defunct state that we can check for.
18:47:42
drmeister
The only way I see is for every thread to occasionally check a flag to see if it should check further for directions.
18:49:01
frgo
I've never done it, but: I would like to test if we can setup signal handlers per thread. Hang on ...
18:50:27
drmeister
Even if you can - they are fundamentally asynchronous and the stack will be in a dirty state and shouldn't be touched, let alone unwound.
18:52:42
drmeister
Unless someone has a better idea or understanding - I would say we have a thread-local variable that each thread checks as it leaves a function or at the bottom of loops (and could be generated or not using a (DECLARE (OPTIMIZE ...))) and take the hit in performance.
18:55:17
frgo
So we have pthread_sigqueue to send a signal to a specific thread. will read up on that again ...
19:02:39
drmeister
Hmmph - I just figured out what I broke this morning - I moved GC managed pointers out of GC managed memory.
19:07:12
drmeister
I managed it in a special way - I defined a ThreadLocalState struct that contained all of the thread local data structures directly.
19:08:01
drmeister
I allocated that at the top of the stack. With it on the stack it is managed, its roots are identified by the conservative stack scanner.
19:09:40
drmeister
I ran into an issue with header file order now that I want to define the allocation points within the ThreadLocalState structure - so I defined ThreadLocalState to contain pointers to some of the thread local data structures - BRRRRAAAAPPPP! Wrong - don't do that.
19:12:11
drmeister
As long as the thread is live the stack is active and the thread local roots are roots.
19:14:23
drmeister
Bike: By the way - David Lovemore (Ravenbrook) says (re: Saving and restoring MPS memory) "It's not possible at the moment. But it ought to be possible."
19:46:25
jackdaniel
just as I thought, watching this movie was a great rest for the brain – I had enough visual effects to keep it at some low futile gear yet it has too little meaning to engage higher parts
20:59:47
drmeister
Bike: Is there a way to identify the bottom of loops - to insert a check for notifications from the system to threads?
21:00:47
Bike
we could detect obvious ones. that would cover most of it but obviously we can't identify all possibly nonterminating regions
21:01:46
drmeister
Could we insert an invocation of an inlined test controlled by a (declare (optimize ...)) policy?
0:48:10
drmeister
It's a little slow - I'm not sure why - it took 1h45m on my system here when a couple of days ago it was 1h30m
0:51:01
Bike
for the not found, you have to set the variable in wscript.config to point to the externals config llvm-config.
0:52:14
drmeister
stassats: Put this in your wscript.config: LLVM_CONFIG_BINARY = '/Users/meister/Development/externals-clasp/build/release/bin/llvm-config'
0:52:43
stassats
this is what make says Checking for program 'llvm-config' : /usr/lib/llvm-5.0/bin/llvm-config
1:02:22
stassats
and please, don't remove makefile altogether, i wouldn't remember the right build incantation
1:07:20
Bike
yeah, looks like some arm cpus have a "performance monitoring unit" that has a few counters
1:16:30
stassats
export LIBRARY_PATH=/usr/local/lib/ works (and why do i have it in /usr/local? because i had to build it by hand for clasp to get linked by clang)
1:16:31
drmeister
Bike: There's a guy at Temple who could help us set up some virtual servers to do continuous integration.
1:17:38
drmeister
stassats: You had to build boost recently by hand? I've been using package managers to install boost for a couple of months.
1:19:46
frgo
-> missing file: '/opt/common-lisp/lang/clasp/src/clasp/build/mps/src/gctools/interrupt.sif'
1:22:15
drmeister
interrupt.sif should have been built by the scraper - I don't understand how it could be missing. It's automatic.
1:25:21
drmeister
When it was building clasp - did it report something like: [ 19/356] Scraping with preproc.scan src/gctools/interrupt.cc
1:26:37
drmeister
It's using sbcl and running the preprocessor on each source file - I'm worried that it failed somehow for interrupt.cc
1:27:59
frgo
Yes, it did fail for interrupt.cc - see: https://gist.github.com/dg1sbg/5538a17e93b32543d3fa4bd33cc0569f#file-gistfile1-txt-L53
1:36:57
frgo
drmeister: For a later date: Signal handling the safe way - that's what we need to do in clasp: https://gist.github.com/dg1sbg/3b608bf1c880148cc06faffd641c8809
1:38:11
drmeister
stassats: It's a macro that generates a wrapper around a C++ function so that it compiles the lambda list/argument handling with Cleavir rather than using the C++ LambdaListHandler_O class.
1:49:26
frgo
To get rid of that FRGO... tag I need to assign a new one. How should be name the current state of affairs? which version of clasp do we currently have? like pre_0.5 ?
1:51:23
stassats
all uses of GENERATE-DIRECT-CALL-DEFUN use MAGIC-INTERN, maybe it should just be moved into generate-direct-call-defun?
2:01:42
drmeister
There's a chance that it might not use magic-intern - but they are all using it at the present time. I think I'll leave that this way.
2:49:09
drmeister
Ok, just to be clear - cclasp built, asdf built, when you start slime it compiles all of the slime code and then segfaults.
3:03:18
drmeister
I'm not aware of any OS X/Linux differences that could be tripping stuff up - but I was working with threading, MPS and thread local storage.
3:12:27
stassats
(mp:process-run-function nil (lambda () (load "quicklisp/setup.lisp"))) does it for me
3:19:02
stassats
there's core__catch_function, then there are some core__call_with_variable_bound, and then _Unwind_Find_FDE
3:24:09
drmeister
(mp:process-run-function nil #'(lambda () (block x (funcall #'(lambda () (return-from x nil))))))
3:26:04
drmeister
If you are using GDB on Linux or FreeBSD, run this command: handle SIGSEGV pass nostop noprint
3:27:44
drmeister
Bike: I haven't seen a problem on OS X - and I haven't used lldb on linux with MPS.
3:28:23
drmeister
whoman: That depends on what you want to do. Clasp is great for exposing C++ libraries.
3:30:26
stassats
#3 core::DynamicBinding::DynamicBinding (this=<optimized out>, sym=..., val=...) at ../../include/clasp/gctools/threadLocalStacks.h:27
3:34:55
drmeister
Once I reproduce the problem I have some tools to track it down. I can compile clasp with guards around every allocated object and I can symbolicate jitted frames - if gdb gives me return addresses and complete backtraces.
3:35:24
drmeister
Sure I wrote them - but they are disconnected stack frames - what does the entire backtrace look like?
3:36:38
drmeister
This is the crash with: (mp:process-run-function nil #'(lambda () (catch 'x (eval '(throw 'x 10))))) ?
3:40:14
drmeister
I can get this to crash in aclasp: (mp:process-run-function nil #'(lambda () (catch 'x (eval '(throw 'x 10)))))
3:41:21
stassats
i can get (loop repeat 100 do (mp:process-run-function nil #'(lambda ()))) to crash
3:41:54
drmeister
Yeah - I was seeing that on OS X as well - more than 40 threads or so had problems. I hoped that was a separate issue.
3:51:56
drmeister
When I do this in aclasp: (mp:process-run-function nil #'(lambda () (catch 'x (eval '(throw 'x 10)))))
4:01:40
drmeister
"You may need to make sure that the debugger isn’t entered on barrier(1) hits (because the MPS uses barriers to protect parts of memory, and barrier hits are common and expected)."
4:14:18
drmeister
start_thread is called when the thread is started and I create thread local allocation points, register the thread and register the thread stack.
4:16:41
stassats
if it's a valid address, then it knows to protect the region, but not that it belongs to it
4:23:19
drmeister
x/8xg 0x3e800005860 --> 0x3e800005860: Cannot access memory at address 0x3e800005860
4:23:51
drmeister
MPS puts hardware barriers on memory - when the program touches it it signals to the MPS that that memory needs to be fixed.
4:27:52
drmeister
I don't see that in any of the registers in the frame above the <signal handler called> frame #2 -- although maybe I shouldn't expect to.
4:28:38
drmeister
Does info registers give you the values of registers as they were in the frame that I'm currently looking at - or only the current values of the registers in the top frame.
4:34:27
drmeister
In my morning (7 hours from now) one of my friends at Ravenbrook will be up - I'll ask them about this. They might have some advice. The problem is easy to reproduce.
4:38:47
drmeister
There is some stuff in here that I'm not sure if we are doing (or not doing). frgo set up a lot of signal handling code for clasp.
4:46:54
drmeister
It is a tagged pointer - how does it end up in $_siginfo._sifields._sigfault.si_addr and where is $_siginfo._sifields._sigfault.si_addr located in memory? Is it in kernel space (I'm guessing)?
4:50:01
drmeister
MPS has this function: mps_bool_t mps_arena_has_addr(mps_arena_t arena, mps_addr_t addr)
4:55:05
drmeister
Unless you have some further insight I was going to ask my friend at Ravenbrook about it in the morning.
4:55:44
drmeister
There's clearly something going on in Linux when we do non-trivial things in a child thread.
4:56:43
drmeister
I have one thread local pointer that points to a data structure at the top of each threads stack.
4:57:18
drmeister
With MPS I have a second thread local structure that contains half a dozen MPS allocation points.
4:57:40
drmeister
The first thread-local data structure stored at the top of each stack is described here:
4:57:41
stassats
looking at the actual faulting instruction and the memory it uses, it doesn't look like 0x3e800005860
4:57:56
drmeister
https://github.com/drmeister/clasp/blob/dev/include/clasp/gctools/threadlocal.h#L7
5:02:48
drmeister
You could use nm to check if the symbol for the function you are editing is in there.
5:03:12
drmeister
the mygc.c.4.o is a bitcode file. You could also llvm-dis it and look at the human readable .ll file.