freenode/#clasp - IRC Chatlog
Search
14:56:55
kpoeck
I hope I have fixed this annoying "Warning: compiled-function-file expected a function as argument " in the debugger
15:14:48
drmeister
kpoeck: So what was the conclusion of that benchmark - was all the slowdown eliminated by switching back to "object" mode?
15:15:28
drmeister
So that's like a 400x slowdown because of "large" code model vs "small" code model?
15:17:39
drmeister
(1) stack unwinding on linux has a mutex and that seriously slows down compile-file-parallel.
15:18:02
drmeister
(2) We need to use the "large" code model for compile-file-parallel - and that slows down everything.
15:18:47
drmeister
POIU doesn't appear to work - blocking us from using it to do more than single core quicklisp compilation.
15:19:16
drmeister
(3) POIU doesn't appear to work - blocking us from using it to do more than single core quicklisp compilation.
15:22:06
kpoeck
Generally clasp is much faster than it used to be, the ansi-tests now run in 40 minutes on my 8 year old machine and require much less memory
15:27:16
kpoeck
I have 54 benchmarks result for clasp from the last 20 months, perhaps I can make a graph showing the evolution
15:45:31
Bike
i talked with beach a bit about call with variable bound and stuff last night. he's working on the sicl lowering of the new instructions about now, but told me i should still hold off on using cleavir2
15:45:41
Bike
i mean, i'll still look at how things go with the new instructions, like we talked about
15:48:21
drmeister
Ok, thank you - I think there is a major breakthrough to be had there once we can get rid of call-with-variable-bound.
15:52:09
drmeister
(5) Our memory managers both have mutexes in their allocators. MPS currently worse than Boehm.
16:04:45
drmeister
I'm going to extract the ctak example into a single C++ file and create a github project to test libgcc and unix implementations.
16:08:48
Bike
mm, well, he kind of has a parallel stack for special bindings and stuff, whereas we mostly use the regular control stack
16:09:01
Bike
i think it ought to work out, though, it's the same difference as we have in the current iteration of unwinding
16:12:03
Bike
it's just another way to do it... i mean, we don't push and pop, but we still need to perform operations at the same times to keep everything coherent
16:12:21
Bike
one advantage is that the popping can be done explicitly instead of only when exiting the function
16:12:58
Bike
like, if we have (tagbody loop (multiple-value-prog1 (form) ... (go loop) ...)), we'll keep allocating more space for the multiple values, and not freeing anything until the function exits
16:14:34
drmeister
I see, because the point is to do more of this control flow within functions and not just at the level of functions.
16:42:47
beach
While it is currently a parallel "stack" it is a stack allocated as a list of dynamic-environment entries on the heap. But the ultimate idea is to allocate those entries on the ordinary control stack.
16:45:45
beach
I think it will be easier to debug things if I don't introduce yet another allocation mechanism at first.
16:54:43
drmeister
There was a weird post that said that Ubuntu 14 server did not show the terrible multithreaded performance I see on the latest debian.
16:55:56
drmeister
Its a small test that tests the multithreaded unwinding performance that seems to be causing us problems.
17:31:04
drmeister
I was following up on a clue I read about in that blog post that Ubuntu 14 server did something special with stack unwinding/multithreading.
17:33:06
drmeister
"Interestingly, I didn't see any evidence of this behavior on Ubuntu 14.04 on x86."
17:34:53
Bike
we have to dig into all this libgcc shit because C++ is in the sweet spot of being low level but also having all these implicit libraries that do mysterious things
17:34:53
drmeister
I spun up an AWS EC2 instance to test it... "ec2_version:Ubuntu 14.04 LTS (Trusty Tahr)"
17:35:26
drmeister
What I'm getting out of this is a massive thorn up my @ss and a deepening suspicion that Unix was a huge mistake.
17:36:29
Bike
maybe i should look at libcxxabi more closely to see why it's _not_ taking locks, actually
17:36:39
drmeister
The variants of Linux and FreeBSD we tried in the last couple of hours suffer this problem.
17:37:15
Bike
it looks like linux is doing kind of what you were doing for backtraces... like, looking through all the loaded libraries to see where the return address is from
17:37:32
drmeister
That's an idea! I can't link libcxx and libcxxabi with clasp - but I might be able to link it with this ctak example.
17:38:54
drmeister
But this is read only data - right? Why does it need a mutex? Is it just for the perverse situation where you load a library while you are doing the unwinding.
17:40:31
drmeister
But if libcxxabi links with ctak - then I will redouble my efforts to get libcxxabi to link with clasp.
17:42:00
drmeister
I didn't get it the first time that housel mentioned that dl_iterate_phdr. Now I see what he was getting at.
17:48:27
Bike
i think there's some kind of special API for an object to register unwind info, but it's uncommon... maybe.
17:48:34
drmeister
You make the pointer to a linked list atomic - then you can add to it atomically with compare-and-swap and you can search it atomically. You can't remove things from it atomically.
17:49:02
Bike
https://github.com/gcc-mirror/gcc/blob/master/libgcc/unwind-dw2-fde.c#L52 it's just an int
17:50:04
drmeister
I appreciate you posting this libgcc stuff. I'm not quite ready to dig into it's intricacies - but I'm getting there - and then I'll go back and read the links again.
17:51:08
Bike
probably one of those pieces of code maybe twenty people have ever looked at, but it's core to a basic C++ language construct
17:55:51
drmeister
We look at how the sausage is made and we see that it's made out of more sausage.
18:04:14
kpoeck
all my cl-bench data for clasp is now in excel in https://kpoeck.github.io/clasp-report.xlsx
18:15:11
drmeister
I built an Ubuntu 18.04 server machine and compiled ctak and linked it with libcxx and libcxxabi
18:18:07
drmeister
I'm going to add some options to ctak so I can run the ctak example for lots of cycles with lots of threads and watch ptrace.
18:22:24
drmeister
Bike: here’s a crazy idea. What if we set up longjmp and when we want to unwind we check if there are any c++ personality functions between the start and stop frames of the unwind. If there are none we use longjmp. Would that buy is anything? I know c-w-v-b will block this a lot right now.
18:35:54
Bike
i don't think i really understand how setjmp/longjmp fit into things. there's a whole part of the itanium ABI for them but they're defined to allow destructors to not run...
18:40:14
Bike
also the whole problem is when we unwind we do often have intervening cleanups, like bound specials.
19:08:31
drmeister
I think setjmp/longjmp would only fit into things if there were no intervening cleanups between the frame we start at and the one we end up at.
19:10:03
drmeister
We would have a runtime cost for say the BLOCK to use setjmp and then the RETURN-FROM would to a longjmp but only if we knew there were no intervening cleanups.
23:13:16
drmeister
I just compiled gcc for yuks and the ATOMIC_FDE_FAST_PATH path is active in this gcc.
23:43:30
Bike
mm... that's good. "you have to custom compile libgcc" would be a pretty absurd build instruction even by our standards