freenode/#sbcl - IRC Chatlog
Search
14:30:21
stassats
on a large number of &rest it's a clear win, but it requires modifying a lot of call/return vops and is slightly slower when the number is small
14:34:33
flip214
Would it make sense to move &rest to some heap-allocated frame instead of holding them on the stack?
14:35:49
flip214
apropos, I'd really appreciate a pair of functions that would allow to split a closure up into a code and an environment pointer, so that C callback structures that use a single (void*) argument for many functions (like a C++ vtable) can be easier handled
14:48:25
flip214
stassats: well, in this case I'm pushing multiple data items to C, and the C functions will run the callbacks at some later time, possibly one after another.
14:49:30
flip214
and instead of allocating some struct manually, storing my data in there, and passing its locked address around as a void*, I hoped that I could do that implicotly
14:52:08
flip214
instead of creating fresh (C-api) structures of closures all the time, I'd like to have one static structure with all the functions set up, and get the environment passed in via the (void*)
14:52:49
stassats
i still don't understand because i guess you're describing a solution, not the problem
14:54:04
flip214
there's a (foreign) structure that stores 8 or 10 function pointers. I allocate one, store closures in there, and call the C api with it. many, many times.
14:54:45
flip214
I would have hoped that I could instead allocate _one_ of these structures, store function pointers in there, and pass that _single_ structure to the C api _every_ time.
14:56:07
flip214
as an implicit way to pass the required state on, instead of allocating some class with the data myself
15:00:26
flip214
I'd like to avoid _manually_ allocating something to keep the state. The dynamic environment has all the information, so I'd like to pack that into a void* and send it on
15:02:44
flip214
I guess it would be more easy to explain in person.... sadly you won't be at SBCL20, but perhaps I'll get a chance at the next ELS
15:03:52
stassats
i doubt it, i'm not even close to getting how the normal closures are not suitable
15:16:01
pfdietz
Support for closure savings/loading in fasls would be really nice for some things I've tried to do.
15:31:03
puchacz
hi, can somebody tell me please if I am on the right track, I try to prepare a bug report that happens only if I run my program for about 30 - 60 minutes on 32 (virtual) computer. I did not isolate it yet, but I tried to follow http://www.sbcl.org/manual/#Signal-Related-Bugs - I recompiled sbcl with :sb-show, :sb-show-assem, :sb-qshow, :sb-xref-for-internals and :sb-hash-table-debug
15:32:03
puchacz
with these settings, I delivered image to the 32 core server and when the bug appeared, it did not corrupt the image this time, but I still tried to follow the steps with gdb and got this: https://paste.ubuntu.com/p/GJphMPXRwV/
15:33:36
puchacz
before when hunchentoot unbound session secret appeared, and USOCKET:BAD-FILE-DESCRIPTOR-ERROR signal, the image was corrupted with the message "continuing with fingers crossed", e.g. https://paste.ubuntu.com/p/cbC7jWjsmz/
15:34:23
puchacz
maybe this extra safety settings prevented image corruption or I did not wait for long enough
15:45:02
puchacz
if I manage to isolate it, shall I submit the program that triggers it even if all I get is something like https://paste.ubuntu.com/p/cbC7jWjsmz/ or not worth it?
15:46:19
stassats`
you'll get the function name with (sb-di::code-header-from-pc (sb-sys:int-sap #x539f3163))
15:47:37
puchacz
okay, I will recompile without these extras and try to provoke it again - I don't have the previous image anymore
15:52:05
puchacz
I don't know the syntax, I will stick to Lisp form: (sb-di::code-header-from-pc (sb-sys:int-sap #x539f3163))
16:01:04
stassats`
another way to get rid of &rest copying, if the caller knows how much stack space the callee allocates and do that on its behalf, and then push all the arguments
16:03:08
stassats`
difficult with all the different function types, closures, symbols, callable instances
17:31:01
puchacz
okay, I got sbcl corruption and as stassats advised, I checked the function where it happened:
18:20:38
puchacz
pfdietz: it happens inside hunchentoot, and I think it is some sort of internet error, like I recorded yesterday, here: https://paste.ubuntu.com/p/cbC7jWjsmz/
18:25:31
flip214
puchacz: you are sure the machine is okay, hardware-wise? did you run a memtest already?
18:26:30
puchacz
flip214 - it is a virtual computer at a provider, I rented it because I wanted to run a simulation on 32 cores
18:28:15
puchacz
but I deleted it and restored it few times, I tried 2 locations, so must be different hardware. unlikely it is a "hardware" (virtualised or otherwise) problem
18:31:28
puchacz
I haven't tried full setup, but it looked to me that for my problem CCL (when I ran on my PC) was about 10 times slower, so I gave up.
18:43:23
flip214
puchacz: and you're running full-speed against hunchentoot there? or a limited amount of traffic?
18:45:50
puchacz
hunchentoot is an "interface" between my program on that virtual computer that runs simulations with different parameters, and Mathematica running on my PC that provides these parameters via HTTP
18:46:25
puchacz
so no simultaneous calls but hundreds and then thousands of calls in rapid succession.
18:47:59
puchacz
split the work in hunchentoot handler into 64 pieces, then join them all and respond
19:51:50
puchacz
okay, it seems I will need to prepare an isolated test case and send. Realistically - next weekend :)
20:00:44
stassats
bad file descriptor, faulting when writing to a buffer, it appears you're touching a stream that has been already closed
20:05:56
puchacz
hunchentoot is an "interface" between my program on that virtual computer that runs simulations with different parameters, and Mathematica running on my PC that provides these parameters via HTTP
20:06:09
puchacz
so no simultaneous calls but hundreds and then thousands of calls in rapid succession.
20:06:26
puchacz
my simulation is 64 threaded, split the work in hunchentoot handler into 64 pieces, then join them all and respond
20:08:12
stassats
next i'd modify HUNCHENTOOT::*SUPPORTS-THREADS-P* to always default to NIL and try that
20:09:12
puchacz
but you are right, hunchentoot does not need to be parallel, it is not a bottleneck
20:09:14
stassats
sure, but i have a feeling it's not going to help, but it's useful to know that it's not going to help anyway
20:14:02
stassats
but all that does is closing the socket, might explain the bad-file-descriptor-error, but not the segfault
20:24:09
puchacz
(it is running the simulations now with --lose-on-corruption by the way, if we are lucky it will corrupt it in 30 minutes maybe)
20:26:36
puchacz
mathematica sends something like http://my-ip/simulate?x1=3423.324&x2=34.7&x3=.... etc. and waits
20:27:17
puchacz
when all threads complete, it sends back in the same handler one number as a result, simulation score
20:27:38
puchacz
mathematica immediately after receiving it sends another http request, with different values of x1 x2 etc.
20:31:37
puchacz
it seems to be completing one simulation (so one http request / 64 threads / response cycle) within say 3 seconds
21:23:35
puchacz
that it was another requester, totally different from Mathematica trying to reach my regular web application at "/"
21:24:23
puchacz
so the reason it worked on my PC was simply that there were no web spiders etc. randomly trying to knock at port 8080
21:25:23
puchacz
whereas on the rented computer at the provider site, it is being inspected by all sort of spiders
21:25:59
puchacz
rather than saying something like no postgres connection, which is a socket protocol, no native library loaded
21:27:22
puchacz
in Lisp I have a habit of having one image.... just save-and-die, load it with all the extra baggage I don't need
21:29:43
pfdietz
There's a CL library for handling ELF format. The dream: a gdb replacement written in CL.
21:37:10
puchacz
(and sorry for confusion, it is a regular bug in my application after all, to leave the other handlers in)
21:43:22
|3b|
trying to execute the query even though there isn't a postgres server available does sound like user code bug though
21:43:23
puchacz
when I prepare the image for the simulation, I clear fasl cache, then I start everything up (which takes long as it has to compile all that is required from quicklisp again), then I load data from postgres into a global variable, and save-and-die
21:44:10
puchacz
it is just a strange habit to have the same image for a web application and numerical simulation :)
21:44:51
|3b|
nah, being lazy and reusing something that is already set up doesn't sound that strange :)
21:46:17
puchacz
anyway, if anybody wants more isolated program that would trigger the same crash, I can try to prepare it on next weekend, ping me at piotr.wasik@gmail.com
21:59:50
|3b|
yeah, looks like writing to streams left open from when image was saved gets corruption warning on linux
22:02:13
|3b|
file maybe could try to reopen and seek to same place or something, but i suspect that would be worse than erroring as often as not
22:05:08
|3b|
though in this case it looks like it is the (foreign?) buffer rather than the stream that isn't surviving the save
22:07:08
puchacz
I have enough knowledge now to run my simulations without undercover spy spiders interference
22:07:31
|3b|
maybe file a bug that writing to old streams after image save/reload should be handled better
22:45:07
pfdietz
I would recommend not building a deliverable from a development image. You should have a script that builds the deliverable from scratch.
22:47:34
|3b|
sounded like it was more or less doing that, just that part of the build involved grabbing data from a DB
22:48:00
|3b|
and it happened to also include code that would try to reuse that db connection in some unexpected cases
22:50:12
|3b|
arguably the db lib should clear its connection cache on image saving, but not many libs bother with that sort of thing :/
23:55:59
pkhuong
BTS tracing isn't too bad https://gist.github.com/pkhuong/1ce34e33c6df4b9be3bc9beb22415a47