freenode/#sbcl - IRC Chatlog

16:46:14 stassats the new wxallowed detection works

16:46:25 Xof NEWS entry please

16:47:27 stassats not my addition!

16:48:36 stassats is it really newsworthy? now, if it actually worked with w^x

16:49:06 joshe I didn't think a NEWS entry was worthwhile, it wasn't even a build fix

16:49:38 stassats default w^x would probably be not a good idea, but optionally it's somewhat trivially doable

16:50:00 stassats always allocate on new pages, the GC will coalesce them laer

16:50:10 Xof we get flak on reddit for a trend decrease in the number of NEWS items

16:50:40 Xof I realise that this might not be the best reason for an extravagantly detailed NEWS

16:50:49 stassats well, right now lisp is getting flak on reddit for PRINTING IN UPPERCASE

16:50:59 joshe don't you mean LISP

16:51:10 Xof slightly more importantly, having this kind of thing in NEWS sends the signal that yes, we do care (a little bit) about things being broken

16:52:19 stassats if we're testing for w^x, why not for noexec too?

16:52:21 pfdietz More NEWS entries, especially if they boost my ego.

16:52:36 Xof at least to the extent of telling people that things are about to be broken

16:52:41 joshe /home isn't noexec in the default install

16:52:54 stassats well, not limited to openbsd

16:53:22 joshe and I'm not sure adding detection for an openbsd feature added two years ago really sends a message that someone really cares ;)

16:53:53 joshe but sure, I can add an entry

16:54:00 pfdietz Thinking "fix committed" is not enough state; should indicate if the bug had been present at the last release. Short term bug fixes of things that were never released do not belong in the release notes.

16:54:19 stassats noexec fails really early, so, no need

16:54:32 stassats make-config.sh: 263: make-config.sh: ./generate-version.sh: Permission denied

16:56:10 stassats i have some other stuff to work on, otherwise i would've tried my hand at w^x

16:57:14 joshe I tried to follow what was going on but quickly got lost

17:00:23 stassats i think the simplest thing to do is just to inflate the code object size at creation time to be a multiple of pagesize, then reprotect when gcing

17:00:40 stassats and probably reconstituting sanctify-for-execution on x86oids

17:01:35 stassats it does sound so simple i might actually try doing it

17:03:10 joshe that would be nice

17:07:32 pkhuong foreign callbacks will become even heftier

17:08:03 stassats well, it's optional, and openbsd doesn't mind performance degradation for performance reasons

17:08:08 stassats for security

17:09:40 joshe pkhuong: become heftier as in use more virtual address space?

18:15:36 pkhuong joshe:

18:15:47 pkhuong and actual memory

20:36:51 asarch What went wrong?: http://paste.scsys.co.uk/582188

20:37:25 stassats nothing, actually

20:38:19 stassats scratch that, that's not a usual sb-concurrency test failure

20:38:25 stassats not the usual

20:40:41 joshe make: *** [../asdf-module.mk:41: test] Illegal instruction (core dumped)

20:40:59 joshe that seems a bit odd

20:41:18 stassats seen that reported, but it builds fine here

20:41:39 stassats i unscrewed my virtualbox and installed 6.4

20:42:03 joshe the sb-concurrency tests frequently fail on openbsd but I don't recall how

20:43:13 stassats the frequently fail on risc

20:43:22 stassats but not by doing illegal things

20:43:50 joshe there are a few timing problems in various tests caused by openbsd's course scheduler granularity

20:49:56 asarch :'-(

20:53:51 asarch Can it be solved?

20:54:57 stassats it needs to be reproduced by me or joshe

20:54:59 stassats first

20:55:23 stassats or you need to send the core dump and the binary

20:55:32 asarch How could I get it?

20:55:51 stassats /cores? or your `pwd`?

20:56:57 asarch Ok, then tar vcf core-dump.tar sbcl-1.4.13/

20:57:14 asarch And then gzip --verbose --best core-dump.tar

20:57:22 asarch How could I send you? By email?

20:57:31 stassats i would only need the dump and the src/runtime/sbcl file

20:57:56 stassats did you find it?

20:58:32 asarch find . -type d -name cores <- Finds nothing

20:58:45 asarch find . -type f -name sbcl: ./src/runtime/sbcl

20:58:59 asarch There is no /cores dir

20:58:59 stassats well, it's not named "cores"

20:59:07 pkhuong asarch: you're looking for a file, probably named "core" or "core.$PID"

20:59:10 asarch D'oh!

20:59:53 asarch I found /output: after-xc.core, cold-sbcl.core and sbcl.core

20:59:59 stassats that's not that

21:00:02 stassats that's sbcl's

21:00:04 asarch ./contrib/sb-concurrency/sbcl.core

21:00:19 pkhuong asarch: you want the actual core dumped by the OS when the process crashed.

21:00:23 stassats that's probably it

21:00:57 asarch http://paste.scsys.co.uk/582189

21:01:06 asarch That is all the core I found

21:01:14 stassats ./contrib/sb-concurrency/sbcl.core is the one

21:01:24 asarch Ok

21:03:43 stassats what CPU do you have?

21:04:23 asarch The command line I used was: ./make.sh --fancy --prefix=$HOME/bin/sbcl

21:04:50 stassats oh, i did not build with threds

21:04:58 stassats just hold on, don't send me anything yet

21:05:14 asarch From dmesg: http://paste.scsys.co.uk/582190

21:05:20 asarch Do you want the full dmesg output?

21:05:43 asarch Ok

21:05:51 asarch ACTION waits for new instructions...

21:07:13 stassats that's an atom, but nothing about it is out of the ordinary

21:09:13 stassats sb-concurrency tests are kinda slow on openbsd, maybe i need to adjust the cpu number detector, but no sigills yet

21:10:05 stassats asarch: ok, i can't reproduce it now either, so, i'll need that core dump after all

21:12:33 asarch Ok, how could I send it to you?

21:13:15 stassats @gmail.com

21:14:24 asarch Ok. Warte bitte because I am from Mexico...

21:14:30 joshe I can reproduce on real hardware

21:14:47 stassats joshe: can you tell me which instruction it is then?

21:15:08 stassats asarch: they speak german in mexico?

21:15:22 joshe let's see

21:16:15 stassats i've been thinking the other day, why don't we catch sigill

21:16:22 stassats or do we?

21:16:47 joshe huh, gdb thinks it's in futex()

21:16:59 asarch A little :-P

21:17:00 joshe 0x00000002f137f510 <futex+0>: mov $0x53,%eax

21:17:00 joshe 0x00000002f137f515 <futex+5>: mov %rcx,%r10

21:17:01 joshe 0x00000002f137f518 <futex+8>: syscall

21:17:01 joshe 0x00000002f137f51a <futex+10>: retq

21:17:48 stassats openbsd has futexes?

21:17:59 stassats and sbcl uses them?

21:17:59 joshe oh nevermind, $pc is futex+10 which is into the trapsled

21:18:26 joshe I don't think sbcl uses them directly, but via the pthread_* functions, sure

21:18:43 stassats do you have a backtrace?

21:18:44 joshe http://man.openbsd.org/futex

21:19:19 joshe bad stack pointer, maybe:

21:19:19 joshe Cannot access memory at address 0x2f9c379d8

21:19:26 stassats if openbsd does have futex then sbcl should use them, because the non-futex path is really horrible

21:19:51 pkhuong joshe: is 6.2 old enough to assume no one runs older releases?

21:20:26 joshe 6.2 isn't supported by openbsd anymore, but some people may run it

21:20:30 stassats pkhuong: it could be make-config.sh detected, i don't think openbsd is big on binary compatibility across versions anyway

21:20:49 joshe yea, libc is already bumped with every 6-month release

21:21:08 joshe the libc major version, that is

21:21:16 stassats but i'd rather be improving the mutexes on darwin, not openbsd

21:21:20 stassats (cause that's what i use)

21:22:04 pkhuong stassats: I honestly think the old futex emulation code was the right approach... too bad reentrancy was so hard.

21:23:31 joshe https://gist.githubusercontent.com/jre/90bce0519cc928859282c36c7ba65494/raw/aba75d89ead5f9861371f42e35e96e51749f4379/gistfile1.txt

21:24:49 pkhuong joshe: can you print the instruction bytes?

21:26:28 joshe (gdb) x/16xb futex

21:26:28 joshe 0x2f137f510 <futex>: 0xb8 0x53 0x00 0x00 0x00 0x49 0x89 0xca

21:26:28 joshe 0x2f137f518 <futex+8>: 0x0f 0x05 0xc3 0xcc 0xcc 0xcc 0xcc 0xcc

21:26:35 Bicyclidine ** NICK bike

21:27:22 joshe http://ref.x86asm.net/coder32.html#xC3

21:28:05 joshe I suppose that should be coder64, but the opcode is the same either way

21:31:16 stassats disassemble /m ?

21:32:00 stassats or what was the option

21:32:29 pkhuong joshe: any other thread in that core?

21:32:47 stassats oh yeah, it's sb-concurrency

21:32:58 asarch Did you get it?

21:33:28 joshe oh, 12 other threads in _thread_sys_nanosleep

21:33:56 stassats asarch: not yet

21:35:20 joshe I updated that paste with disassemble /r and info threads

21:35:59 stassats you didn't link the paste, though

21:36:03 stassats but the raw file

21:36:23 joshe my bad https://gist.github.com/jre/90bce0519cc928859282c36c7ba65494

21:38:32 stassats it's all in os code, so, what gives?

21:38:36 asarch Maybe at the spam folder...

21:41:55 stassats ok, sigill does land in ldb

21:42:19 stassats asarch: nothing

21:42:41 stassats so, a sigill in a foreign thread would not be caught

21:43:26 stassats i only have 1 core in my vm, let's increase that

21:46:29 asarch ACTION whispers: "Damn Hotmail!"

21:47:15 stassats no failure with two cores, but it appears to be even slower

21:52:27 stassats but openbsd still thinks there's one core

21:52:57 joshe I see what you mean about being used to the old host-2 output

21:53:07 joshe I keep thinking it's about to dump the cold core

21:53:41 stassats when you stare at the same thing for over a decade you get used to it

22:01:41 asarch Gotcha!: "Blocked for security reasons!"

22:02:21 stassats can't be sending illegal instructions

22:07:15 joshe I wonder if the kernel would send SIGILL for other reasons

22:08:04 joshe oh, I think that's how something with an invalid stack pointer is killed

22:08:15 stassats unaligned?

22:08:35 stassats why is it surfacing with multiple threads only?

22:08:36 joshe outside of a region with a special mmap flag

22:08:49 stassats why doesn't it happen to me?

22:09:46 stassats do i need an smp kernel to get multiple cores or something?

22:10:37 stassats i had only one core during installation

22:10:37 asarch_ D'oh!

22:10:43 asarch_ The file was so big...

22:10:49 asarch_ This is the link from Dropbox: https://www.dropbox.com/s/wcwohmr0km2sdpv/debug.tar.gz?dl=0

22:11:00 joshe oh, yes if you added cores then you need to change kernels

22:11:12 stassats and how do i do that?

22:11:32 joshe mv /bsd /bsd.sp && mv /bsd.mp /bsd

22:12:17 joshe which is all the installer would do on the next upgrade anyway

22:12:44 stassats ok, now i need to find bsd.mp

22:12:57 joshe oh right, it wasn't installed

22:13:35 stassats i can just download it

22:14:21 asarch_ ** NICK asarch

22:14:42 asarch Did you get it?

22:15:16 joshe ftp http://cdn.openbsd.org/pub/OpenBSD/$(uname -r)/$(machine)/{bsd.mp,SHA256.sig}

22:15:18 joshe signify -C -p /etc/signify/openbsd-$(uname -r | tr -d .)-base.pub -x SHA25.sig bsd.mp

22:15:34 joshe ;)

22:19:48 stassats oh yeah, sigill here

22:22:09 joshe if I'm reading this right, you get killed with an uncatchable SIGILL if the kernel can't write to your stack while delivering a signal

22:22:54 joshe so where'd the bad RSP value come from?

22:23:00 stassats that shouldn't be the case, it can write to the stack

22:24:03 joshe gdb shows no memory mapped where RSP points in the core dump I have

22:25:07 stassats 1 is the main thread, we move the stack without informing the os

22:25:10 joshe also, you can pkg_add gdb as root to get a less-old gdb installed as 'egdb'

22:26:19 joshe hm

22:31:54 stassats and it's non deterministic

22:38:55 stassats well, $rsp is kinda not where the control stack is supposed to be, but it may be just sigaltstack

22:41:16 stassats but it has no trouble receiving signals

22:45:11 stassats well, if $rsp is not even dump

22:45:12 stassats ed

22:45:41 stassats i actually see some output in the console

22:46:20 stassats trap [sbcl]47135/509372 type 6: sp 220f3fdf8 not inside 220d50000-220f40000 |

22:46:20 stassats trap [sbcl]39758/313918 type 6: sp 244f77178 not inside 244d88000-244f78000

22:47:09 joshe oh huh, how about that

22:49:38 stassats (< #X244d88000 #x244f77178 #x244f78000) => T

22:55:31 asarch Sorry, sorry. dhclient went crazy

22:56:19 stassats that is pretty cloe to the end, but still, what is it talking about?

22:56:52 asarch Did you get the files?

22:57:02 stassats asarch: no need anymore

22:57:40 asarch :-(

22:57:57 asarch Can you fix the problem? :-)

22:58:58 joshe we're investigating

22:59:16 asarch Yeah!

22:59:24 asarch I'm glad I could help

22:59:33 asarch Take your time guys!

22:59:37 stassats i already pry into the internals of the thread struct on darwin

22:59:46 joshe anyway, it's just a test failure

22:59:52 joshe go ahead and install and use it

23:00:02 stassats to set the stack boundaries, maybe openbsd needs that as well

23:00:34 stassats even if MAP_STACK is used, the check appears to be expensive if it's out of specified bounds

23:01:57 stassats i probably don't have access to p_spstart

23:02:17 stassats i would assume it's in the kernel

23:04:03 joshe yes, in struct proc

23:04:11 stassats oh, it updates p->p_spstart

23:04:18 stassats uvm_map_check_stack_range

23:04:25 stassats so, it shouldn't be always expensive

23:05:22 stassats now, it only happens on dualcore, can it be that it's accessing p->p_spstart while it's being updated?

23:06:28 stassats that would explain (< #X244d88000 #x244f77178 #x244f78000) => T

23:07:10 joshe I'd guess it'll be covered by the big kernel lock or a process subsystem-wide lock

23:10:34 stassats and the kernel can't be interrupted, can it?

23:11:40 stassats well, the process has been running for some time, the main thread p_spstart should have settled down

23:12:06 stassats and new threads are already born with the right stack locations specified

23:13:56 stassats and it's fine for the main thread, or for "bsd"

23:23:00 stassats i have no more new clues

23:23:56 joshe so, is it correct that sbcl will use a lisp stack allocated out of the heap?

23:24:19 stassats well, you did change it to use MAP_STACK

23:24:24 joshe so that RSP might point outside the thread struct?

23:24:38 stassats no, that shouldn't happen

23:39:15 stassats threads.pure also fails

23:40:30 stassats it's say type 6, which is sigsegv

23:40:53 stassats but the main thread should have received countless sigsegvs by that point

23:50:33 stassats https://gist.github.com/stassats/1081ccb72c9468f1f7f4c06698b7fc16

23:50:38 stassats all on that trap

23:51:18 stassats except for run-progrm

23:52:51 joshe run-program is a known problem on the openbsd side, the fix is to stop disabling PIE

23:53:12 stassats well, it's something about the environment

23:53:20 stassats anyway, i blame the OS

23:53:25 stassats (for the trap)

23:54:00 joshe http://openbsd-archive.7691.n7.nabble.com/Wrong-linkage-to-environ-with-Wl-nopie-td338325.html

23:56:41 stassats i think i would need to recompile the kernel to get more insight

23:57:06 joshe I've already added a few printfs but they weren't enlightening

23:57:44 joshe uvm_map_check_stack_range() is returning false because of the uvm_map_lookup_entry() case

23:57:44 stassats i wanted to know where exactly in https://github.com/openbsd/src/blob/master/sys/uvm/uvm_map.c#L1774 it returns

23:58:15 joshe maybe the stack page was unmapped somehow?

23:58:27 stassats not really possible

23:58:49 stassats well, for sbcl, not for the os to do

23:59:21 joshe or it ran off the end of the first MAP_STACK page and into a normal data page, and the kernel only noticed on the next signal delivery?

0:00:44 stassats you're mapping the whole thing map_stack

0:00:51 joshe anyway, in that file it's returning on line 1788

0:00:52 stassats there's nowhere it can go outside of it

0:01:45 stassats what is p->p_vmspace->vm_map.serial?

0:01:59 stassats so, i assume it gets there by p->p_vmspace->vm_map.serial != p->p_spserial

0:02:01 joshe would sbcl mprotect() any part of the stack later?

0:02:18 stassats the guard pages, but that's only on stack overflow, which is not the case here

0:02:46 joshe u_int serial; /* signals stack changes */

0:03:03 joshe that's all I know

0:03:16 stassats altstack, that we do use

0:03:29 stassats but it's inside the thread struct well

0:03:32 stassats as

0:04:47 joshe ah, I see

0:05:22 joshe it's a serial number which exists to invalidate those p_sp* members

0:05:36 stassats it's ++

0:05:47 joshe it's incremented when MAP_STACK is added or removed

0:06:09 stassats new threads are MAP_STACKed

0:06:28 joshe p_spserial should be per-thread

0:06:29 stassats is that actually needed?

0:07:08 joshe there's also some magic to automatically apply MAP_STACK to whatever you pass to sigaltstack()

0:09:15 stassats i would assume pthread_attr_setstack does the job of map_stack, and map_stack is only needed for the main thread

0:20:58 stassats if you're saying 1788, then sp is completely unmapped, nevermind MAP_STACK

0:21:43 joshe right

0:25:37 stassats can it be caused by thread destruction?

0:26:08 joshe so I added a printf to sbcl of the initial thread struct address

0:26:09 joshe pid 36644 initial stack 0x290fbe000-0x2914b00b8

0:26:18 joshe versus the kernel message:

0:26:19 joshe trap [sbcl]36644/487632 type 6: sp 2cfa37a78 not inside 2cf848000-2cfa38000

0:26:21 stassats we schedule thread unmapping when another thread dies, but there's a window when a second thread may have just died but the os is still writing something

0:27:19 joshe hm

0:28:59 stassats something like 88a92a8129b6559c140e16e8d0e01bb7bba56f6a

0:34:15 stassats it would be good to know what's the PC when that "type 6" sigsegv is delivered

0:34:53 joshe oh good idea

0:41:53 joshe it's the same as in the core dump, <futex+10>

0:42:20 stassats makes sense

0:42:42 stassats well, it's in the main thread, the main thread shouldn't die, should it?

0:42:55 stassats why is it starting from 1 and not 0?

0:42:57 joshe if it's not in the initial thread then it must be in a thread struct which was unmapped after the thread finalized

0:43:44 joshe I think that thread numbering is gdb's, the OS uses random pid-like IDs

0:44:54 stassats gdb indeed starts from 1

0:45:13 stassats so, there's no way thread 1 is being deallocated

0:48:01 stassats a new message syscall [sbcl]83491/452323 sp 254937cd8 not inside 254748000-254938000

0:48:04 stassats syscall, not trap

0:48:18 joshe oh right, futex

0:49:28 joshe well I'm getting a trap, presumably the page fault from trying to retq with sp pointing to unmapped memory

0:50:26 joshe how is that happening though, the sp value it's being killed for isn't even inside the main thread's stack

0:55:41 stassats removed free_thread_struct(post_mortem);, can no longer crash in info.impure.lisp

1:00:49 stassats we are calling pthread_join, so it shouldn't be existing anymore at all

1:00:53 stassats before free_thread_struct

1:01:02 stassats so, how come and why the main thread is receiving it?

1:07:27 stassats well, i'm still puzzled and will leave it at that for today

1:12:15 joshe oh!

1:12:28 joshe this isn't happening in the main thread

1:12:50 stassats well, duh, i never believed that anyway

1:13:23 joshe the thread didn't make it into the core dump

2:19:54 slyrus1 ** NICK slyrus