freenode/#clasp - IRC Chatlog
Search
11:35:40
kpoeck
::notify drmeister Can you have a look on https://github.com/clasp-developers/clasp/pull/1012 Calls 60.000 times less read-from-string, so I hope it is faster
12:34:05
Colleen
drmeister: kpoeck said 58 minutes, 25 seconds ago: Can you have a look on https://github.com/clasp-developers/clasp/pull/1012 Calls 60.000 times less read-from-string, so I hope it is faster
12:34:35
drmeister
kpoeck: Thank you - I will implement that soon - but I want us to try and fix this in the compiler.
12:54:40
drmeister
Yesterday I had an issue that bordeaux-threads from quicklisp still refers to mp:lock.
12:55:11
drmeister
There was supposed to be a release of bordeaux-threads to quicklisp. I don't have any idea if or when that happens.
12:56:35
drmeister
2. waf REFUSES to generate output in one very frustrating situation. When I run waf in docker non-interactively - it just hangs when cando runs to build quicklisp code after install
12:57:29
drmeister
3. Clasp, when building generated-encodings.lsp within docker takes almost 700x (SEVEN HUNDRED TIMES) longer than on the exact same iMacPro running macOS.
12:58:09
drmeister
I'm trying to get a prototype of cando out in a docker container with jupyter widgets running so I can build a prototype of an application.
12:59:48
yitzi
drmeister: you are welcome. I've posted updated docker images to the hub. And updated the various branches
12:59:58
drmeister
2. I have spent the morning rerererererereading the waf documentation and then posting a question to the #waf IRC channel.
13:00:51
drmeister
3. Asked Bike and karlosz to develop a compiler optimization that will use setjmp/longjmp to unwind the stack in Common Lisp code that we can guarantee has no C++ code that could cause problems.
13:01:53
drmeister
kpoeck: I will try your new version. It's just that once I get past the 700x slowdown in the docker container I can ignore that problem for a while and hack the docker container so that I can get through the install.
13:02:39
drmeister
Also, I only discovered last night that it was a reproducible 700x slowdown. If that doesn't say FIX ME - I don't know what does.
13:02:46
kpoeck
that why I proposed to make the version that passes the problem to the runtime for the few users that use encodings
13:03:06
yitzi
drmeister: I have other info regarding cando on jupyter, but I can wait til you guys are done resolving this stuff.
13:03:47
drmeister
Nonononono. This problem has come up again and again and again - it's a problem that we have because we need to maintain compatibility with C++ from Common Lisp. I want to fix the problem.
13:05:02
drmeister
yitzi: cl-netcdf was missing and one needs to install lib-netcdf here... this is what I added...
13:05:35
drmeister
I also learned that I do see the output of ./waf install_cboehm but ONLY if I run: docker -it <image-id>
13:07:56
drmeister
I may sound a little unhinged right now - I get that way - it's ok. It's part of how I get things done. I care intensely about getting cando working smoothly and we have put a LOT of work into it. This C++ stack unwinding problem keeps popping at the worst times, in the WORST ways. Up to 700x slowdown out of the blue with no freaking rhyme or reason.
13:09:02
drmeister
Well, there is a rhyme/reason. So called "zero-cost C++ exception handling" sucks and the effort put into optimizing it has been highly variable.
13:10:25
yitzi
I am writing a subclass of cl-jupyter kernel for cando. and I just got it to work in the docker!!! see above
13:12:10
drmeister
yitzi: Could you elaborate on what a "subclass of cl-jupyter kernel for cando" is? I thought that's what we had. (sorry to make that anticlimactic)
13:12:37
drmeister
You are running that from the command line - I've never seen that before. But I'm missing a finer point.
13:14:10
yitzi
Right now you are hooking into hooks that you guys added to cl-jupyter to do code evaluation and start the kernel in "cando-jupyter". In order to do that in my kernel you need to subclass "common-lisp-jupyter" and handle the appropriate methods.
13:14:50
yitzi
https://github.com/yitzchak/cando/blob/clj-migrate/src/lisp/cando-jupyter/kernel.lisp
13:15:49
drmeister
I see - in our approach with cl-jupyter the whole program was a jupyter kernel but we used crude hooks to achieve that.
13:16:06
drmeister
You have a jupyter kernel class and you subclassed it for cando. Yes - excellent.
13:16:34
drmeister
FYI: We have two kinds of kernels that we could generate - and one is more important than the other.
13:17:44
drmeister
We use what is called "leap script" and it let's you invoke Common Lisp code by putting it in parentheses.
13:18:27
yitzi
https://github.com/yitzchak/cando/blob/cb84c9da6901a814dd7861d1214c242c23035031/src/lisp/cando-jupyter/kernel.lisp#L52
13:18:54
yitzi
And yes that is what we do in maxima-jupyter, subclass the common-lisp-jupyter kernel.
13:19:00
drmeister
We do have a little bit of an issue with case. Leap script allowed mixed case and was case sensitive for variable names.
13:19:55
drmeister
I hope no one in the last 27 years defined variables like foo, Foo, fOO and so on - within a single script.
13:21:02
drmeister
yitzi: I want to retain that post_install thing. I know the expedient thing is to remove it - but the cando build system needs to run cando once after it is installed to build all the quicklisp code.
13:22:02
drmeister
When I run docker build - there is no output generated by the post_install command - are you experiencing that?
13:23:07
drmeister
I've run: docker -it <image-id> several times now and then ./waf install_cboehm and I got it to build all the way through.
13:24:04
drmeister
It just occurred to me that we can force it to exit with a backtrace if it hangs.
13:24:21
drmeister
I'm going to change the wscript file and at least then it won't hang in the debugger.
13:25:06
yitzi
Ok, I'm going to go work the cando kernel so more. Hopefully in an hour or so I'll have a working docker with cando and jupyter.
13:27:44
drmeister
Bike: yitzi's common-lisp-jupyter code uses 'ironclad' - it compiles pretty quickly on clasp now.
13:29:58
drmeister
Here's the evidence - although no timing. I'm building a docker image and it built ironclad in just the last feew minutes.
13:32:26
drmeister
Sorry - in this docker container we install yitzi's jupyter code using pip3 and that invoked sbcl.
13:33:51
drmeister
I was just puzzled by two things but I glossed over them. 1. yitzi took out ironclad as a dependency of his common-lisp-jupyter code for clasp because clasp profides the hash generating facility that we would otherwise need from ironclad.
13:50:00
drmeister
yitzi: I've pushed a bunch of changes to the Dockerfile to my fork. Our two Dockerfiles now won't merge automatically.
13:55:42
drmeister
yitzi: We will have to do it by hand. I went through this because there are some things I wanted to make sure were in there.
14:04:01
drmeister
Bike: If we use setjmp/longjmp - we are going to have to do something with the map functions - correct?
14:04:23
drmeister
I am thinking 1. implement them in Common Lisp or 2. make sure they don't use RAII
14:06:16
Bike
it's okay for there to be intervening C++ frames, so long as they don't have catch blocks or nontrivial destructors that need to be run
14:16:20
Bike
maybe C++ has some kind of annotation so the compiler can check that. but that would probably be too convenient
14:22:18
yitzi
drmeister: Sorry, but your email about the jupyter mockup and problems with radio buttons was in my spam. just saw it
14:22:34
drmeister
Do you think this will work? An optimization to detect when we can replace C++ exception handling with setjmp/longjmp?
14:23:02
drmeister
yitzi: No worries. You demonstrated radio buttons work in your latest docker image.
14:23:46
drmeister
yitzi: Take a look at my fork of cando-clj and I'll take a look at your Dockerfile - we can merge them.
14:24:11
yitzi
The issue is with the :options and :value keys. It's not part of the model spec. ipython simulates some of that stuff for select boxes. I've added some code to do that recently.
14:25:00
drmeister
Cando can start a swank server allowing an external slime to connect into it and modify code. So I build slime into the docker image.
14:25:05
yitzi
And yes, I am looking at your file. I think I have most of the stuff in mine with the exception of slime.
14:25:47
drmeister
So - what we did in the past is turn a docker image like this into a portable development environment that worked on macOS and Windows.
14:26:23
drmeister
You run the docker container and map in directories of source code from github overtop of the existing source directories.
14:26:41
drmeister
You also map the quicklisp cache directory from the host into the docker container.
14:27:28
drmeister
It lets you edit source code and commit it to github and generally do software development using everything built within the docker container.
14:28:03
drmeister
One problem we had though is that a couple of years ago when we did this mounted directories were significantly slower than internal directories.
14:28:33
drmeister
So compilation slowed way down and it was slow already. It took hours sometime for the docker container/development system to get ready for development.
14:31:56
Bike
alright, i tried a very dumb test, with a simple and inefficient use of setjmp/longjmp, and the version with block/return-from runs 20 times slower
14:32:22
drmeister
I'll need to do some digging into the Dockerfile to see which of my assumptions were wrong.
14:32:46
drmeister
Bike: Yeah - in the docker image it's single threaded compilation of generated-encodings.lisp.
14:33:58
yitzi
drmeister: Also, you won't have a cando kernel till I finish with some stuff on my end. Currrently building on my machine...
14:36:33
drmeister
kpoeck: Is 'enconding-strong-to-encoding-symbol' supposed to be 'encoding-string-to-encoding-symbol' ?
14:37:54
drmeister
Thank you. And thank you for making these changes. The "slow" one was valuable to push us to fix this problem and the "fast" one to get through the next couple of days.
14:39:24
drmeister
It's just on some machines it's slower - and not a few percent slower, 700x slower!
14:40:29
Bike
what we've learned is that C++ implementors do not consider using exceptions to be reasonable.
14:47:18
yitzi
drmeister: Is there a simple one liner that I can use to test out the non-lisp cando syntax? Is is called leap?
14:50:22
Bike
well, __builtin_setjmp on clang has a different signature from the gcc one, on top of being undocumented
14:51:18
Bike
and i found a dev thread saying "It's not like this is a core language feature; it's completely acceptable to just not provide the builtins on certain targets that don't support it"
15:03:57
yitzi
The kernel crashes right afterward, which probably means the evaluation result is getting properly wrapped for the handoff to common-lisp-jupyter
15:04:58
yitzi
I stuck it a naive 'make-lisp-result' which probably isn't gonna work for your cando stuff.
15:06:28
yitzi
I pushed the docker image to yitzchak/cando-clj:nglview if you want to see it. No slime and a faulty cando kernel, but ... making progress.
16:23:04
Bike
i don't know. those are specifically listed as being for exception handling, and are generated internally rather than used in frontends. it might be unstable.
16:24:14
Bike
also when i tried to use the equivalent __builtin_setjmp it said it wanted a void**, whereas this says it wants an i64* in the text but an i8% in the signature
16:28:41
Bike
you can also see these jmp_bufs are heavier than the intrinsic ones, like they have a sigset_t, which i don't think we need
16:28:53
drmeister
attributes #3 = { nounwind returns_twice "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false"
16:32:30
Bike
setjmp is a macro and its return value can't be used normally, so calling out to a C++ function would be annoying
16:32:47
Bike
what i did in my test code is define a C++ function that takes two thunks, but that won't work for catch-instruction, really
16:33:05
Bike
i guess it could return a value and then code branches on the value. that kind of sucks though.
16:33:52
drmeister
kpoeck: I incorporated your change and docker is sitting compiling generated-encodings.lsp right now.
16:36:38
kpoeck
--35357(out)--> Writing :OBJECT kernel fasl file to: #P"/home/cracauer/work/clasp/deploy/02-build-cando/build/clasp/build/boehm/fasl/cclasp-boehm-bitcode/src/lisp/kernel/lsp/generated-encodings.fasl" --35357(out)--> Time run(21.027 secs) consed(483162960 bytes)
17:28:17
cracauer
That file is odd. I have one machine taking 26 seconds and another taking 2 seconds.
17:33:23
drmeister
cracauer: I'll call you to fill you in. We had a file that compiled 700x slower on one machine than another.
18:16:49
drmeister
I incorporated kpoeck's fix for faster compilation of that one file and I'm trying another fix for the no output in post_install.
18:19:13
yitzi
Still doing a local user install of cando dependencies since I don't want to wait for a complete clasp rebuild right now.
18:22:10
kpoeck
optimized define-unicode-tables.lisp (https://github.com/clasp-developers/clasp/pull/1013)
18:32:24
drmeister
kpoeck: Could you write an example of the previous code that will illustrate the problem?
19:32:25
karlosz
Bike: you suggested making the cleavir analysis weaker with respect to special bindings and stuff like that, right? because C++ unwind needs to take care of them. couldn't we reuse the same machinery that local unwind had?
19:33:08
karlosz
2. Could we maybe handle the mapc stuff and more by just testing the function for membership in the CL package? i suspect there won't be many ordinary functions in the CL package relying on C++ unwinding
19:33:41
Bike
C++ unwind doesn't take care of them, the compiler generates the code for it. it's just that that code is meant to run in the stack frame doing the binding; for example the old value of the variable is just a datum in the frame
19:33:47
karlosz
the problem with marking any kind of user function would be redefinition: you can't guarantee that the uesr won't put in some C++ function that requires unwinding
19:35:06
karlosz
Bike: i see. so i guess we'd have to "manually" unwind the stack and do every cleanup action in frame if that's the case
19:35:43
Bike
in other news, if i do map-ast-etc without consing, asdf with the ast-interpreter is now only slightly slower than using the compiler
19:36:08
Bike
without consing -> er, i mean without cleavir-ast:children and its consing, it still does the hash table thing
19:37:25
karlosz
cool. so just a bit more before it's hopefully faster. still wondrous to me that the hash tables are doing it
19:37:42
Bike
also on the topic of CL functions and unwinding, that might not actually be true. maphash takes a lock, for example
19:41:20
karlosz
ugh, doing specials binding with setjmp means we'd have to maintain a jmp buffer stack ourselves (sounding more and more like an ordinary machine code implementation)
19:42:06
karlosz
and cleavir doesn't really emit instructions to maintain the stack at that level, maybe that only happens in MIR, idk how sicl handles that
19:42:58
Bike
which is why i want to start with the cases where there are no bindings, cos yeah, could be a mess
19:43:17
Bike
dunno if you saw, i did a really dumb prototype to see how setjmp/longjmp performs and it was like twenty times faster than exceptions.
19:48:11
Bike
the builtin operators take differently typed arguments from gcc, so i worry how that's going to work out
19:51:30
karlosz
oh okay. i guess i was confused by this: "see how setjmp/longjmp performs and it was like twenty times faster than exceptions."
20:13:41
Bike
which was (block nil (mapc (lambda (x) (when x (return t))) list) nil) versus (core:call-with-setjmp (lambda (p) (mapc (lambda (x) (when x (core:longjmp p))) list) nil) (lambda () t))
20:13:55
kpoeck
drmeister the code that run so slow on the buildbot is here: https://gist.github.com/kpoeck/5e9b64834283dccff701b4fd45272a27
20:28:02
Bike
eclector read-from-string uses signal internally. kpoeck is reducing the use of read-from-string, so less signaling occurs, so things are faster. right?
20:28:21
drmeister
In the docker container - where I expected it to take 700x longer it's 7.5 seconds
20:29:28
drmeister
This code should replicate the bad code using read-from-string - or did I misunderstand what kpoeck provided?
20:36:21
drmeister
I'm going to have to futz around with the previous commit to reproduce the problem.
20:36:43
drmeister
This is why I wasn't that excited about changing the code - I had a really good example of the problem there.
20:38:54
scymtym
drmeister: not in the sense of being merged into eclector master anytime soon, but it passes all important tests in eclector if i remember correctly
20:39:46
scymtym
in any case, READ-FROM-STRING should not be very different compared to READ unless WITH-INPUT-FROM-STRING does something crazy in Clasp
20:41:22
kpoeck
drmeister I wonder whether we need "fork" as in our build process, to make the example real slow
20:42:08
scymtym
yes, i'm saying that 70.000 READ calls should be just as (or maybe almost as) bad unless WITH-INPUT-FROM-STRING does something crazy
20:43:11
kpoeck
Stil believe that the pattern is different compiling, we read source code, do transformation, generate code, read again ...
20:45:36
drmeister
This was the commit with terrible performance in the docker container and the buildbot.
20:47:44
scymtym
kpoeck: the equivalent would be READing 70.000 toplevel forms. because READ-FROM-STRING is like a non-recursive READ call. that entails stetting up circularity tracking and other pre- and post-processing
20:48:41
scymtym
kpoeck: i have plans to delay the circularity tracking setup until the first #N= is encountered. that should help with READ-FROM-STRING and non-recursive READ calls
20:55:30
drmeister
I don't know what the buildbot is. I hit the problem in docker and that was the worst case I've ever seen.
20:56:22
drmeister
kpoeck: You pointed this problem out a long time ago. One of the cl-bench examples runs 100x slower than expected.