freenode/#sbcl - IRC Chatlog
Search
16:36:42
slyrus
so if I add back :compact-instance-header boom, things crash. without it I can pound on the webserver indefinitely, both dockerized and not, without it crashing.
16:56:36
stassats
compact-instance-header may not survive the same memory corruption you're heaving without crashing
16:58:40
slyrus
possible, but it's also possible that the compact-instance-header stuff is the source of the problem
17:00:26
slyrus
in any event, the pathological behavior is always the same. A corrupt funcallable-instance.
17:01:08
slyrus
seems unlikely that some arbitrary memory corruption would manifest itself the same way everytime
18:03:28
dougk
my original plan was to have the layout vector be an immobile object, allowing "mov result, [rax*8+address_of_constant_vector]". Never got there
18:04:36
dougk
either way, this is worth less that than the issue i'm currently looking at which is the GC failing to pin some objects referenced from the stack in a very obscure case that I can reproduce but can't explain why it's a reproduction
18:07:00
dougk
i have some memory that the GC does not believe is any GC-managed space or a stack space, which points to objects that clearly would merit pinning thereby, because one of the addresses shows up in a register some time later. So it must have been a stack. But it's not getting scanned in the stack scan
18:07:43
dougk
sounds eerily suspiciously similar to whatever unexplained crash slyrus gets, though i don't know enough of those details to say. I know only what I see, which is exactly as I said
18:09:03
dougk
anyway, 100% reproducible for me, but only by enabling my mmap fuzzer, which nobody does except me, i presume
18:09:36
dougk
however the effect of the container plus automatic heap relocation *could* mean that people are getting the effect of that
18:09:57
dougk
the test would be for slyrus to disable relocatable heap and see what happens in the container
18:23:19
slyrus
dougk: happy to disable the relocatable heap. I suppose I should turn compact-instance-header back on though?
18:23:51
dougk
i'm suspicious that heap relocation is causing a problem, and not that compact-instance is
18:24:35
dougk
compact-instance has been in production for us for years, relocatable heap not at all. only to enable llvm sanitizers to work; and they don't, so ....
18:25:00
dougk
yes, either --without-relocatable-heap if using the command-line based option, or in customize-target-features if that
18:26:35
slyrus
plus I need to rebuild to get compact-instance-header back on, so, rebuildng anyway.
18:50:04
slyrus
dougk: without relocatable-heap and with compact-instance-header on I'm able to trigger a crash right away :(
19:06:15
stassats
basically, i need to be able to launch sbcl with --lose-on-corruption, have it crash in some manner and attach gdb to it
20:07:54
stassats
dougk: one theory: X is a funcallable instance, (funcall x) is mov rax, X then call rax, enter your trampoline, which has MOV RAX,[RIP-23] JMP [RAX-3]
20:08:28
stassats
if the GC hits after MOV RAX,[RIP-23] nothings pins the trampoline, PC doesn't think it's inside a code object
20:27:40
dougk
the aver in room happens before or after the crash? Because room's aver is actually unsafe and wrong, but most of the time works by accident
20:28:24
dougk
the thing room is trying to assert can't safely be asserted if the allocator is using a page in which a region was opened, then closed, then opened again because there were more available bytes on the page
20:30:16
stassats
ok, conservative_root_p doesn't seem to be pinning trampolines in funcallable-instance
20:32:18
dougk
fdefn's are definitely ok because they are only referenced from jumps, which means you're in a code component whose header necessarily referenced the fdefn that you're calling through
20:32:56
dougk
the code that did the call had to be itself live. As to funcallable-instances, the only ones with builtin trampolines should be immobile, but i believe they are live because again the code component header references the GF directly
20:34:24
slyrus
dougk: the aver happens after for sure, not sure about before, but I think it's clean before
20:39:33
dougk
i suppose the safe thing to do, if we thought there was a liveness issue with the immobile GFs, is to consider an untagged pointer (the program counter) as a root; and I can actually do that check after everything else as evidence that it was necessary to do the check at all. so perhaps random testing can prove that it's necessary; but it would be nicer to construct an argument that it isn't
20:41:39
dougk
as to funcallable-instance-tramp, that's actually an interesting question - we're reloading RAX with the thing in the funcallable-instance, are we losing the instance itself in between the two instructions ?
20:43:56
stassats
normally it's not important, unless the funcallable-instance is holding the next instruction