freenode/#sbcl - IRC Chatlog

10:51:05 pfdietz I've wanted to be able to compile sbcl with coverage, so I could see what parts are not being tested. Also, random test generation + coverage => automatically generating minimized tests that extend coverage.

12:00:48 pkhuong pfdietz: the easiest way to get coverage everywhere might be editcore + HW/binary tracing like honggfuzz?

12:02:14 pfdietz I want something where the coverage is made available in a form the reducer can easily access. It would also help if the a snapshot of the coverage could be taken and the state rolled back to the snapshot.

12:03:08 pfdietz So: generate a test case, determine that it extends coverage, then reduce it to a minimal form that still extends coverage (and repeat that if the original had more coverage than the reduced form).

12:03:28 pkhuong do you care about human readability

12:03:46 pkhuong or do you just want unique branches / edges?

12:03:52 pfdietz This is inspired right now by Doug K's comment on vop coverage in the most recent commit.

12:04:07 pfdietz I don't care about human readability.

12:04:43 pfdietz If I were trying to locate points for manual test generation then I would.

12:08:28 pkhuong i don't have access to any machine with intel pt, but I do have BTS. i'll try to hack something up with intel's PMU today

12:17:26 pkhuong pfdietz: do you fork before compiling random inputs?

12:17:54 pfdietz No. The random tester does not fork.

12:28:00 pkhuong the hardest part will be dropping branches to / from the C runtime and newly generated code

12:31:42 pfdietz The approach I've taken on this sort of thing is instrumenting lisp when it's compiled. So if I can recompile part of the compiler, I can collect the information I want.

12:45:03 pfdietz Hmm. *macroexpand-hook* does not appear to be affecting COMPILE.

12:57:40 pfdietz Rather, not affecting the simple eval

13:46:12 pfdietz Ugh, obviously I was wrong. (let ((*macroexpand-hook* …)) <form>) does not affect the macroexpansions in <form>.

14:23:31 stassats not copying &rest above the frame is not without its problems

14:23:48 stassats since if there's nothing to move it touches more memory, to establish a new frame

14:25:49 stassats and tail-calling becomes a headache

14:26:56 stassats the other option is burning a register on a second frame pointer

14:29:03 stassats wonder if i should pursue it further or cut my losses

14:30:21 stassats on a large number of &rest it's a clear win, but it requires modifying a lot of call/return vops and is slightly slower when the number is small

14:34:33 flip214 Would it make sense to move &rest to some heap-allocated frame instead of holding them on the stack?

14:34:48 stassats absolutely not

14:35:49 flip214 apropos, I'd really appreciate a pair of functions that would allow to split a closure up into a code and an environment pointer, so that C callback structures that use a single (void*) argument for many functions (like a C++ vtable) can be easier handled

14:38:08 stassats why does it have to be a closure?

14:39:39 pfdietz Is (lambda (&rest r) … (apply #'fn r)) optimized as a tail call?

14:39:54 stassats yes

14:41:41 stassats it's not very optimized, as it has to copy R twice

14:44:59 flip214 stassats: because I need to pass some state around

14:45:12 stassats are closures the only way to pass state?

14:48:25 flip214 stassats: well, in this case I'm pushing multiple data items to C, and the C functions will run the callbacks at some later time, possibly one after another.

14:48:34 flip214 So I can't use a global or special variable.

14:49:00 stassats so other than closures or global variables, you can't pass anything?

14:49:30 flip214 and instead of allocating some struct manually, storing my data in there, and passing its locked address around as a void*, I hoped that I could do that implicotly

14:49:40 flip214 *implicitly via the environment of closures

14:50:06 stassats and if closures is all you have, what's wrong with them anyway?

14:52:08 flip214 instead of creating fresh (C-api) structures of closures all the time, I'd like to have one static structure with all the functions set up, and get the environment passed in via the (void*)

14:52:49 stassats i still don't understand because i guess you're describing a solution, not the problem

14:53:24 stassats is all you want a callback on #'funcall or something?

14:54:04 flip214 there's a (foreign) structure that stores 8 or 10 function pointers. I allocate one, store closures in there, and call the C api with it. many, many times.

14:54:45 flip214 I would have hoped that I could instead allocate _one_ of these structures, store function pointers in there, and pass that _single_ structure to the C api _every_ time.

14:55:21 flip214 but then I need the environment passed around via the available (void*)

14:56:07 flip214 as an implicit way to pass the required state on, instead of allocating some class with the data myself

14:57:01 stassats i don't get it, why can't you allocate it once now?

14:57:53 flip214 becauses these are queued up and called at some later time

14:58:15 stassats ok, and how can you do that without allocating anything then?

15:00:26 flip214 I'd like to avoid _manually_ allocating something to keep the state. The dynamic environment has all the information, so I'd like to pack that into a void* and send it on

15:00:58 stassats send closures?

15:01:29 stassats closures are the environment, so what else do you need?

15:02:44 flip214 I guess it would be more easy to explain in person.... sadly you won't be at SBCL20, but perhaps I'll get a chance at the next ELS

15:03:52 stassats i doubt it, i'm not even close to getting how the normal closures are not suitable

15:11:33 flip214 yeah, seems I can't explain coherently enough... thanks for the patience, anyway!

15:14:53 pfdietz One problem with closures is they can't be serialized.

15:15:19 stassats` for some value of can't

15:16:01 pfdietz Support for closure savings/loading in fasls would be really nice for some things I've tried to do.

15:16:30 stassats` fasls are not required to do so, so...

15:16:43 pfdietz Right.

15:31:03 puchacz hi, can somebody tell me please if I am on the right track, I try to prepare a bug report that happens only if I run my program for about 30 - 60 minutes on 32 (virtual) computer. I did not isolate it yet, but I tried to follow http://www.sbcl.org/manual/#Signal-Related-Bugs - I recompiled sbcl with :sb-show, :sb-show-assem, :sb-qshow, :sb-xref-for-internals and :sb-hash-table-debug

15:32:03 puchacz with these settings, I delivered image to the 32 core server and when the bug appeared, it did not corrupt the image this time, but I still tried to follow the steps with gdb and got this: https://paste.ubuntu.com/p/GJphMPXRwV/

15:33:36 puchacz before when hunchentoot unbound session secret appeared, and USOCKET:BAD-FILE-DESCRIPTOR-ERROR signal, the image was corrupted with the message "continuing with fingers crossed", e.g. https://paste.ubuntu.com/p/cbC7jWjsmz/

15:34:23 puchacz maybe this extra safety settings prevented image corruption or I did not wait for long enough

15:43:02 stassats` or because sb-show slows things down

15:43:25 puchacz stassats: shall I remove it?

15:43:35 puchacz sorry, stassats`

15:43:38 stassats` then you won't know what's going on

15:45:02 puchacz if I manage to isolate it, shall I submit the program that triggers it even if all I get is something like https://paste.ubuntu.com/p/cbC7jWjsmz/ or not worth it?

15:45:21 puchacz mind it never happened on my desktop PC, only 32 core computer

15:45:24 stassats` but you already know where the error happens, in 0x539f3163

15:46:06 puchacz I never debugged with ldb or gdb

15:46:19 stassats` you'll get the function name with (sb-di::code-header-from-pc (sb-sys:int-sap #x539f3163))

15:47:37 puchacz okay, I will recompile without these extras and try to provoke it again - I don't have the previous image anymore

15:47:49 stassats` yes, i you can reliably get cbC7jWjsmz/, then that's all that's needed

15:47:54 stassats` if you

15:48:27 stassats` also run with --lose-on-corruption

15:48:37 puchacz so it will quit

15:48:46 puchacz and then run again, ask for the function name, right?

15:48:53 stassats` no, it won't quit

15:49:01 stassats` it will descend into ldb

15:49:21 puchacz this is lisp, not ldb: (sb-di::code-header-from-pc (sb-sys:int-sap #x539f3163))

15:49:42 stassats` you'll have to run that in gdb

15:49:51 stassats` not that, but a different name

15:49:59 stassats` component_ptr_from_pc

15:50:43 stassats` gotta add that as a command to ldb

15:50:52 puchacz and then get name?

15:51:12 stassats` you'll get an address, you can then add a lowtag and print it

15:51:14 stassats` easy!

15:51:39 stassats` or with 15 and "print" the result in ldb

15:52:05 puchacz I don't know the syntax, I will stick to Lisp form: (sb-di::code-header-from-pc (sb-sys:int-sap #x539f3163))

15:52:11 puchacz with the right address of course

15:53:20 puchacz is there a manual for ldb?

15:54:29 stassats` help

15:55:34 puchacz tks, recompiling sbcl first with regular --fancy, nothing extra

16:01:04 stassats` another way to get rid of &rest copying, if the caller knows how much stack space the callee allocates and do that on its behalf, and then push all the arguments

16:03:08 stassats` difficult with all the different function types, closures, symbols, callable instances

17:31:01 puchacz okay, I got sbcl corruption and as stassats advised, I checked the function where it happened:

17:31:06 puchacz 0] (sb-di::code-header-from-pc (sb-sys:int-sap #X53af25c3))

17:31:06 puchacz #<code id=1175 [1] SB-IMPL::OUTPUT-UNSIGNED-BYTE-FULL-BUFFERED {53AF245F}>

17:31:30 puchacz it it useful or I still need to isolate a a small program that triggers it?

18:19:47 pfdietz Do you have more than one thread writing to the same stream?

18:20:38 puchacz pfdietz: it happens inside hunchentoot, and I think it is some sort of internet error, like I recorded yesterday, here: https://paste.ubuntu.com/p/cbC7jWjsmz/

18:20:44 puchacz see, bad-file-descriptor

18:22:09 puchacz so I don't do anything

18:25:31 flip214 puchacz: you are sure the machine is okay, hardware-wise? did you run a memtest already?

18:26:30 puchacz flip214 - it is a virtual computer at a provider, I rented it because I wanted to run a simulation on 32 cores

18:26:37 puchacz it never happened on my PC

18:28:15 puchacz but I deleted it and restored it few times, I tried 2 locations, so must be different hardware. unlikely it is a "hardware" (virtualised or otherwise) problem

18:29:57 pfdietz Does it work with CCL?

18:31:28 puchacz I haven't tried full setup, but it looked to me that for my problem CCL (when I ran on my PC) was about 10 times slower, so I gave up.

18:35:48 puchacz I still have it up, if you have a suggestion what I shall type into REPL

18:39:45 pfdietz Is it reproducibly that same failure, in SBCL?

18:39:58 puchacz it looks like it is the same failure, but

18:40:32 puchacz I did not isolate it, so I observed with my full program running and

18:40:44 puchacz it only happens on the 32 core virtual computer, not my PC

18:40:55 puchacz it happens after about an hour of running

18:43:23 flip214 puchacz: and you're running full-speed against hunchentoot there? or a limited amount of traffic?

18:45:11 puchacz full speed

18:45:50 puchacz hunchentoot is an "interface" between my program on that virtual computer that runs simulations with different parameters, and Mathematica running on my PC that provides these parameters via HTTP

18:46:25 puchacz so no simultaneous calls but hundreds and then thousands of calls in rapid succession.

18:47:22 flip214 so basically single-threaded call patterns? hmmm

18:47:35 puchacz yes, but my simulation is 64 threaded

18:47:59 puchacz split the work in hunchentoot handler into 64 pieces, then join them all and respond

18:48:08 flip214 via cl-parallel, or manually?

18:48:12 puchacz manually

18:48:21 flip214 do you know whether you trigger GC often or very seldom?

18:48:34 puchacz no idea, but I create a lot of garbage.

18:48:35 flip214 so using sb-threads:join-thread et al

18:48:48 puchacz I don't call anything like this explicitely

18:49:08 flip214 well, how do you run 64 threads in parallel then?

18:51:05 puchacz flip214: like this: https://paste.ubuntu.com/p/Zny76fJpyf/

18:51:38 puchacz locks and then notify when every function ends

18:51:51 puchacz so hunchentoot thread waits

19:51:50 puchacz okay, it seems I will need to prepare an isolated test case and send. Realistically - next weekend :)

19:58:54 stassats puchacz: running with --lose-on-corruption will give you a backtrace

20:00:44 stassats bad file descriptor, faulting when writing to a buffer, it appears you're touching a stream that has been already closed

20:02:18 puchacz stassats, I will try again with --lose-on-corruption

20:04:53 stassats puchacz: how many simultaneous connections does it have?

20:05:24 puchacz one at a time, but I spawn 64 threads.

20:05:38 puchacz let me re-paste what I explained before

20:05:56 puchacz hunchentoot is an "interface" between my program on that virtual computer that runs simulations with different parameters, and Mathematica running on my PC that provides these parameters via HTTP

20:06:09 puchacz so no simultaneous calls but hundreds and then thousands of calls in rapid succession.

20:06:26 puchacz my simulation is 64 threaded, split the work in hunchentoot handler into 64 pieces, then join them all and respond

20:06:28 puchacz done

20:08:12 stassats next i'd modify HUNCHENTOOT::*SUPPORTS-THREADS-P* to always default to NIL and try that

20:08:44 puchacz I will try to get ldb stack trace

20:08:48 puchacz first

20:09:12 puchacz but you are right, hunchentoot does not need to be parallel, it is not a bottleneck

20:09:14 stassats sure, but i have a feeling it's not going to help, but it's useful to know that it's not going to help anyway

20:09:39 puchacz stassats, maybe I will learn something about low level debugging :)

20:09:41 stassats does usocket use finalizers or something?

20:09:56 puchacz I did not check, all standard quicklisp

20:12:13 stassats sb-bsd-sockets uses finalizers, so, the answer would be yes

20:14:02 stassats but all that does is closing the socket, might explain the bad-file-descriptor-error, but not the segfault

20:14:37 puchacz stassats, did you see I recovered the function name as you advised?

20:14:38 puchacz #<code id=1175 [1] SB-IMPL::OUTPUT-UNSIGNED-BYTE-FULL-BUFFERED {53AF245F}>

20:14:45 stassats yes

20:14:48 puchacz ok

20:15:08 stassats not really useful

20:15:31 stassats but, paste the (disassemble #'SB-IMPL::OUTPUT-UNSIGNED-BYTE-FULL-BUFFERED) anyway

20:20:01 cosimone_ ** NICK cosimone

20:20:27 stassats ok, not necessary

20:20:33 puchacz just done it

20:20:35 stassats so, the buffer is also finalized

20:21:29 puchacz https://paste.ubuntu.com/p/qrYP9Wvq9Z/

20:22:57 stassats ok, faulting at (setf (sap-ref-8 (buffer-sap obuf) tail) byte)

20:24:09 puchacz (it is running the simulations now with --lose-on-corruption by the way, if we are lucky it will corrupt it in 30 minutes maybe)

20:24:44 puchacz it ALWAYS corrupted it on 32 cores, just with different times, but not too long

20:25:35 stassats but you're connecting more than once?

20:26:36 puchacz mathematica sends something like http://my-ip/simulate?x1=3423.324&x2=34.7&x3=.... etc. and waits

20:26:51 puchacz hunchentoot starts 64 threads to simulate with the x1 x2 x3 etc.

20:27:02 stassats how often does it send it?

20:27:17 puchacz when all threads complete, it sends back in the same handler one number as a result, simulation score

20:27:38 puchacz mathematica immediately after receiving it sends another http request, with different values of x1 x2 etc.

20:27:45 puchacz and so on

20:27:47 stassats ok

20:31:37 puchacz it seems to be completing one simulation (so one http request / 64 threads / response cycle) within say 3 seconds

20:31:59 stassats ok, don't really need the frequency, just that it's more than once

20:32:08 puchacz ok

20:35:54 cosimone_ ** NICK cosimone__

20:36:43 cosimone_ ** NICK cosimone