freenode/#clasp - IRC Chatlog

14:36:34 selwyn i haven't had time yet to look at the flamegraphs - could the slow backtraces have been responsible for the poor flamegraph output? how do they look now?

14:55:41 Bike drmeister: https://github.com/slime/slime/blob/master/swank/clasp.lisp#L710 do you remember why the swank/clasp locks are recursive? things seem to work fine with non recursive locks

15:15:49 Bike there's an entire other thing for working atomically that i didn't know about. AtomicTHolder. cool

15:15:58 Bike don't think i like it much, but still

15:28:43 Bike if i do (nth-value 1 (ignore-errors (funcall (block nil (lambda () (return)))))) clasp still dies aaaaagh

15:31:23 drmeister Hello

15:35:06 selwyn hi

15:53:29 Bike man i do not understand our unicode-support

15:53:41 Bike (code-char 161) doesn't crxash or anything but it displays as #\UA1

15:54:10 drmeister Howdy

15:54:58 Bike wait, make-hash-table has a :debug keyword argument?

15:55:58 Bike this manual thing continues to be very educational

16:06:29 drmeister What manual thing?

16:08:32 Bike the manual i'm writing

16:09:15 drmeister Yess

16:09:17 drmeister !

16:09:35 Bike there's a section titled "C++ Interface" which I am going to leave blank

16:09:51 drmeister Give me structure I can fill in.

16:12:46 Bike drmeister: https://github.com/slime/slime/blob/master/swank/clasp.lisp#L710 do you remember why the swank/clasp locks are recursive? things seem to work fine with non recursive locks

16:19:57 drmeister I don't recall - I think since I didn't have good data at the time I just played it safe.

16:21:12 Bike I turned it off and everything seems to work fine.

16:21:15 drmeister I've got to tell you about our new compiler memory management scheme. We were leaking llvm compiler objects. We aren't leaking them now - but there is some danger that hasn't presented itself yet.

16:21:30 Bike I ask because while I'm doing this I'm normalizing some interfaces and removed the `:recursive` keyword.

16:21:37 Bike (in my branch)

16:21:38 drmeister Ok.

16:21:50 Bike so if i merge it will break slime, which is bad, unless we remove that from swank/clasp.lisp

16:22:01 drmeister I can do that.

16:22:17 drmeister Sure. Are you using :spawn in slime?

16:22:31 Bike yep

16:23:03 Bike i mean what i did is look at the other implementation's swanks, and they're not recursive locks

16:23:04 drmeister Ok then. Try to do things that might invoke a recursive lock.

16:23:24 drmeister Ok.

16:24:24 drmeister Hmm, I just realized our llvm C++ object cleanup scheme is more robust than I feared.

16:24:38 Bike that's good

16:25:16 drmeister Nothing can get collected until we run (gctools:thread-local-cleanup) and then only if the GC collected its wrapper.

16:27:04 drmeister On another note.

16:27:34 drmeister We have some quicklisp systems for which compile-file-parallel is pathologically slow. I'm trying to figure out which ones.

16:31:41 Bike we export a whole lot of symbols from EXT that don't seem to be anything

16:32:00 Bike also things like run-dsymutil that i'm skeptical of the use of for programmers. but i can just not put those in the manual for now

17:50:42 drmeister Bike: Are you in the lab?

17:50:48 drmeister Can you drop by my office?

17:50:53 Bike coming.

17:50:57 drmeister I'm trying to get something to work and I'm being thwarted

17:51:25 kpoeck Hello

17:51:30 drmeister Hi kpoeck

17:51:50 kpoeck in https://kpoeck.github.io/report.html are cl-bench results of different lisps/clasp-releases

17:52:31 kpoeck The one to the right is newest clasp as of this morning

17:53:21 kpoeck The two one from the left are clasp as of 3rd of january with the bignum gc fix and without

17:54:29 kpoeck ctak used to take 40 seconds for 900 calls. now takes like 7800

17:55:41 kpoeck FRPOLY/FIXNUM and FRPOLY/FLOAT are also much worse

17:56:06 kpoeck and CLOS/methodcalls

17:56:45 kpoeck Will try to bisect to find out where that started, will first go back to monday morning last week, since a lot was merged that day

18:01:16 kpoeck will distclean with commit b4c487592b91cd3ef1a24908320d893148dc8311

18:01:34 kpoeck and satiation activated

18:02:07 kpoeck since this takes a while, of for shopping

18:52:43 kpoeck Bike (progn (write-char (code-char 161)) (values)) seems to do the right thing

18:54:17 kpoeck or (progn (write-char (code-char 297)) (values))

18:59:07 Bike yeah, it's char-name that's messed up, i guess.

19:02:24 kpoeck I believe I did that part, what do you expect?

19:03:33 Bike (char-name (code-char 161)) => "UA1". sbcl gives "INVERTED_EXCLAMATION_MARK", which I'd say is better. Of course that would mean including a couple kilobytes of UnicodeData.txt or whatnot

19:04:19 Bike not like it's anything we need to fix rapidly, of course

19:04:41 kpoeck Yes, since I didn't had that table

19:05:44 kpoeck But could borrow from sbcl or other

19:08:01 Bike https://www.unicode.org/Public/UCD/latest/ucd/ it's part of the unicode standard

19:08:14 Bike i guess i could file an issue as a reminder, but we have bigger problems

19:10:17 kpoeck yup

19:53:47 Bike drmeister: https://pastebin.com/L3NgX0Ck manual sketch. Mostly just wanna see if it has all the extensions Clasp has - I can't think of more. You can throw it at https://markdownlivepreview.com/ or something to see it rendered

20:14:48 kpoeck Well written!

20:28:00 kpoeck Perhaps add profiling with flamegraphs? Debugging with (core:btcl), (core:safe-backtrace) & (CORE:BT-ARGUMENT <frame> <index>)

20:29:38 Bike yeah, the debugger section needs work, in that it needs to exist

20:29:54 Bike we ought to have an actual backtrace interface, like a backtrace object and navigating frames

20:31:46 kpoeck I was starting to implement Shinmera dissect , but got distracted. He also has this nice overview table with capabilities of the different cl implementations

20:32:46 kpoeck https://shinmera.github.io/portability/

20:38:36 Bike yeah, shinmera has a lotta those, it's nice

20:39:32 kpoeck in revision b4c487592b91cd3ef1a24908320d893148dc8311 ctak is still fast, but CLOS/methodcalls either are really slow or hang

20:41:35 kpoeck just doing gc

20:42:27 Bike this says clasp is 100% for trivial backtrace. huh

20:42:50 Bike oh, but it doesn't have frame navigation or anything

20:42:59 Bike just the really trivial "print the whole backtrace"

20:46:32 kpoeck I also did impl-map-backtrace

20:49:29 kpoeck but dissect seems to have more funcionality

20:50:15 Bike let's see, should do definitions too

20:50:29 Shinmera kpoeck: Oh, I was going to do dissect at some point myself, but I'd be very happy if I didn't have to! :)

20:52:11 kpoeck cl-bench terminated, so did not hang

20:56:02 kpoeck https://kpoeck.github.io/report.html updated

21:07:29 kpoeck Bike will you put your sketch in the wiki?

21:22:20 Bike i want to fill it out and use it as documentation

22:12:45 drmeister Bike Are you still on campus?

22:13:41 Bike yeah i'm in the lab

22:14:16 drmeister I've got this Unwind_Find_FDE issue under a microscope - if you are still here for a few min - could you drop by and take a look at it with me?

22:14:28 Bike ok

22:14:30 drmeister There are a couple of frustrating features about debugging this on linux

22:15:01 drmeister Backtraces are truncated. flame graphs don't work. Functions I can figure out are all called LAMBDA

22:15:13 drmeister So close yet so far.

23:52:50 drmeister Bike: I can't get that -lc++ -stdlib=libc++ to link on linux.

23:53:18 drmeister I also write a C++ example program and look at the speed of unwinding and it's nowhere near as bad as clasp.

23:53:28 Bike well yeah, we already tested that, no?

23:53:29 drmeister I wonder if we are doing something wrong with unwinding.

23:53:32 Bike with the classes versus having IDs.

0:02:52 drmeister Yeah that’s slower

0:03:10 drmeister But I’m wondering if we are doing even worse than that

0:15:29 drmeister There's no real difference between libstdc++ and libc++ in these tests.

0:15:31 drmeister https://www.irccloud.com/pastebin/Xp4dgLaJ/

0:15:34 drmeister No

0:16:01 drmeister https://www.irccloud.com/pastebin/ElbgYG7O/

0:16:03 drmeister That.

0:17:01 drmeister Maybe we should focus on doing what we can to fix this problem.

0:18:47 drmeister Bike: You understand the unwinding stuff better than I do - do you have an idea for how to fix this?

0:19:36 Bike doesn't this just indicate that whatever unwind.cc is isn't doing unwinding in the same way clasp is?

0:20:01 drmeister Could you restate that?

0:20:32 drmeister https://www.irccloud.com/pastebin/Y0fL1RBA/

0:20:57 drmeister https://www.irccloud.com/pastebin/PQnyJONK/

0:21:20 Bike well, i mean, there's still some difference between mac and linux, no?

0:21:24 Bike with clasp.

0:21:38 drmeister Yes - there is a difference between mac and linux.

0:22:01 drmeister There doesn't appear to be a difference between libstdc++ and libc++ with this demo - not a significant one.

0:22:24 Bike right. so i'm saying it seems likely to me that this means the demo isn't an accurate reflection of clasp.

0:23:11 drmeister Yes - I'm with you there . That's where my vague "But I’m wondering if we are doing even worse than that" came from.

0:23:11 Bike unless there's some other reason for there to be this gulf between linux and mac performance

0:23:29 drmeister It seems to be something to do with unwinding.

0:23:34 Bike like mac performance seems okay to me. it's not amazing or anything but there hasn't been this botteleneck, yeah?

0:23:43 drmeister Not like this - no.

0:23:46 drmeister Mac is better.

0:24:09 Bike so clasp's approach _can_, at least, work out okay.

0:24:22 drmeister I can't link libc++ with clasp on linux - it throws up all kinds of missing symbols - I'm not ssure why.

0:24:40 drmeister Right.

0:25:42 Bike and like, i've been talking about writing our own unwinder - and i still think that has some advantages - but Unwind_Find_FDE, the big offender, is called by Unwind_RaiseException, isn't it?

0:25:55 Bike Unwind_RaiseException is exactly the underlying function our unwinder would still be calling

0:26:05 Bike it's part of the interface (Unwind_Find_FDE is not)

0:26:40 drmeister Would we rewrite Unwind_RaiseException?

0:26:48 Bike no, that is part of the itanium interface.

0:26:57 drmeister Ok.

0:27:05 drmeister Is it part of libstdc++?

0:27:21 drmeister It ssounds like No.

0:27:23 Bike i'm not actually sure what library the unwind interface is in. but i think it is, yes.

0:27:36 Bike let me look a bit

0:29:48 drmeister It's not

0:29:56 drmeister https://www.irccloud.com/pastebin/73euYjRZ/

0:30:26 Bike um.... isn't it right there?

0:30:42 drmeister It's "U" and not "T"

0:30:42 Bike https://github.com/gcc-mirror/gcc/blob/master/libgcc/unwind-seh.c#L328-L330 and here's the source

0:30:48 Bike i don't know how to read nm

0:31:11 drmeister U _Unwind_RaiseException

0:31:20 Bike i don't know what "U" means

0:31:34 drmeister That means undefined. "T" would mean external and defined

0:31:58 Bike ok so it's in libgcc, not libstdc++.

0:32:16 drmeister Ok.

0:32:38 drmeister So libgcc we can't replace. We are stuck with that on linux - right?

0:32:43 Bike no clue.

0:33:11 Bike i mean, i don't want to anyway, though.

0:33:24 drmeister https://gcc.gnu.org/onlinedocs/gccint/Exception-handling-routines.html#Exception-handling-routines

0:33:40 Bike yeah, that's itanium level two, or whatever.

0:33:58 drmeister Right - but any stack unwinding has to call these - are they the reason for the slow down.

0:34:18 Bike well if _Unwind_Find_FDE is such a bottleneck, yes.

0:36:24 Bike there are two problems with unwinding. one is general slowness, independent of OS or implementation. that is what i have in mind when i talk about writing an unwinder.

0:36:54 Bike then there's this Mac/Linux thing, which is apparently due to Linux's implementation having this stupid linear search in it. Writing our own unwinder wouldn't help with that, probably.

0:37:09 Bike except insofar as it might improve things on both systems.

0:39:59 drmeister Ok

0:40:12 drmeister I don't understand what is going on now. Check this out...

0:40:13 drmeister (defun foo (num) (dotimes (i num) (block hhh (funcall (lambda () (return-from hhh))))))

0:40:18 drmeister Compile that on linux and mac

0:40:21 drmeister On mac...

0:40:49 drmeister (time (foo 10000)) -> 3.2 sec

0:40:53 drmeister On linux...

0:41:33 drmeister (time (foo 10000)) -> 0.24sec

0:41:48 Bike that will be inlined. there's no actual function boundary

0:41:56 Bike no idea why there's a difference

0:44:01 drmeister It's consing as well - on both the same amount

0:44:01 drmeister Time real(0.245 secs) run(0.245 secs) consed(560000 bytes)

0:44:24 Bike that's probably not unwinding related... sucks, though

0:50:16 drmeister If it's consing then there is a mutex involved.

0:50:41 Bike ...you think?

0:50:51 drmeister The GC has a mutex.

0:53:58 drmeister It's not inlined.

0:58:01 drmeister https://www.irccloud.com/pastebin/ced2plNa/

0:58:35 drmeister (defun foo (num) (dotimes (i num) (block hhh (funcall (lambda () (declare (core:lambda-name inner-foo)) (return-from hhh))))))

0:59:46 drmeister Is that surprising to you?

1:03:23 Bike that is pretty damn surprising, yes

1:03:46 drmeister Oh - ok.

1:04:03 Bike i might have to look at that. ugh.

1:04:04 drmeister So run with that - can you check that tomorrow?

1:04:18 Bike i mean that's stupid but it's nothing to do with unwind performance.

1:04:36 drmeister But on the other hand - this is testing unwind performance - right?

1:04:53 Bike it shouldn't be

1:04:59 Bike so i don't think it's a good example to use

1:05:20 drmeister What would be a good example?

1:05:58 Bike put in a special variable binding. that's realistic.

1:06:04 drmeister Got it.

1:06:33 drmeister Question though - the consing - are we allocating a closure on the heap or on the stack in this case?

1:07:20 Bike heap probably.

1:08:27 drmeister Here's a thought - we are doing a lot of unwinding. Unwinding is allocating closures on the heap. Allocating on the heap involves a mutex in the GC.

1:09:09 drmeister Mutexes on linux appear more expensive than macOSS.

1:09:19 Bike sorry, unwinding is allocating closures? what?

1:09:54 drmeister Unwinding requires a closure to pass the closed over handle for the the unwind target?

1:10:28 Bike ok like, this is all deep and confusing stuff and we need to be exact. when you say "unwinding" i think like, what happens, at runtime, when you hit a return-from

1:10:34 Bike that is not what you're talking about.

1:11:37 drmeister Yes - I need to be more careful.

1:11:46 drmeister Thinking...

1:12:08 Bike anyway, although the underlying "unwind tag" has dynamic extent, the closure itself can still have unlimited extent, so we can't stack allocate it.

1:12:21 Bike we could stack allocate the particular cell for the "unwind tag", but we shouldn't even have a cell anyway really

1:12:33 Bike but also the unwind tag itself might be a heap allocated cons. i don't remembert

1:12:52 Bike i think it was at one point and now it's not, actually

1:44:58 drmeister I'm using this code now

1:45:00 drmeister https://www.irccloud.com/pastebin/nDWlPzom/

1:45:34 drmeister On macOS (time (foo 10000)) -> Time real(3.058 secs) run(3.057 secs) consed(0 bytes)

1:45:46 drmeister So zero bytes consed now

1:46:31 drmeister On linux it's mucch faster

1:46:32 drmeister Time real(0.257 secs) run(0.257 secs) consed(0 bytes)

1:46:37 drmeister More than 10x faster

1:46:53 drmeister This is hyper cnfusing

1:49:12 drmeister On linux I run this:

1:49:15 drmeister (time (foo 100000000))

1:49:24 drmeister perf top -p <pid>

1:49:48 drmeister 85.30% libgcc_s.so.1 [.] _Unwind_Find_FDE