freenode/#clasp - IRC Chatlog
Search
17:17:17
Colleen
Bike: drmeister said 1 hour, 8 minutes ago: I found the type error at startup problem. I have a reproducer. It's probably type-inference running on dead code. Check the logs for the last two hours.
17:18:56
drmeister
It reads garbage out of NIL and tries to check the type of that garbage. Sometimes it keeps going and other times it spits up a type error.
17:19:27
Bike
do you know that it's due to type inference? like, does it never happen if type inference is disabled?
17:20:07
Bike
not sure i understand. why does it read garbage out of nil? Like it tries to treat NIL like a cons and takes its car, or something?
17:21:25
drmeister
I step through the instructions with udb and see that it first tests if 'a' is a CONS and it's not, it's NIL but then it loads the CDR of NIL and ... kaboom.
17:22:05
drmeister
With udb I can back up and drive over the instructions over and over and over until even I can figure out what it's trying to do.
17:23:13
drmeister
I think that's what it's doing. It has the tagged pointer for NIL and then it does mov 0x5(%rax),%rdi where %rax is 0x7f2f7c010041 (NIL)
17:23:55
drmeister
I am assuming that mov 0x5(%rax),%rdi is (cdr rax) because that's how we should be doing it. But %rax contains a General_O tagged pointer - so that's a problem.
17:24:21
Bike
well, if that is the problem it'll be obvious in the post type inference hir, probably
17:24:27
drmeister
If %rax contained a cons tagged-ptr then it would end with 0x3 and 0x3+0x5 -> 0x8
17:25:23
drmeister
Also, we should talk about the DWARF that is generated for this. Watch the movie I made.
17:25:44
drmeister
Once it hits the (first arguments) and (second arguments) it goes into a source info dead-zone.
17:26:04
drmeister
https://www.dropbox.com/s/2kzehok7x2kjfm5/Screen%20Recording%202020-08-02%20at%2010.39.40%20AM.mov?dl=0
17:29:39
drmeister
I can back up and go forward over instructions many times and see exactly what DWARF information is available and from the source info around the dead-zone figure out where we are.
17:30:48
drmeister
I can examine registers at any point and then figure out what kind of object they are with the python extension that I yesterday extended to gdb and udb.
17:32:17
drmeister
I commented out those useless bindings in cmpintrinsics.lsp and now I can't get it to crash with that type error. Yippeee.
17:36:03
Bike
so to be clear on the utility of the undo thing - the crash is in some region that doesn't have source info for whatever stupid reason, so you just rewind until you end up somewhere that does have source info?
17:36:41
drmeister
In the movie I backed up to near the top of the LET* and then nexti down into the dead-zone.
17:36:59
drmeister
You can see the Common Lisp source lines highlighted in the top window as I move through them.
17:38:00
drmeister
Later I hit the error inside of a call and then I 'reverse-finish' out of the error back into the LET* code but in the dead-zone.
17:38:35
drmeister
'reverse-finish' is like 'finish' but in reverse. It rewinds to the instructions just BEFORE the call that you are in.
17:39:14
drmeister
The udb debugger recreates the state of the machine anywhere between when I attached to the process and where the error happens.
17:41:23
drmeister
Right now I'm looking for a better GUI for udb. The 'tui' interface is primitive but serviceable. I'd rather use the 'gud' interface in emacs - but it's giving me some trouble.
17:42:02
drmeister
I can't get X-forwarding to work through the ThirdLaw VPN we have - I'll need cracauer's help with that.
17:42:28
drmeister
If I got X-forwarding to work I should be able to get the 'ddd' debugger working.
17:43:37
drmeister
Some kind of GUI is better than none because I like having a source view, a disassembly view, registers and local variables vies up at the same time.
17:44:41
drmeister
I can give you access to this linux machine (hermes) and teach you how to use it. It's so, so powerful.
17:45:20
drmeister
The Undo company is about to start a "developers program" - up until now they have only sold this thing to large enterprises.
17:45:48
drmeister
My connection to Gareth (who works there now) got me in with the folks at Undo and they gave me an evaluation license for a month.
18:00:34
drmeister
Bike: When you inline at the HIR level - you keep a chain of source-location information for XXX inlined into YYY inlined into ZZZ and so on - yes?
18:00:47
drmeister
I remember we talked about it but I don't recall if you figured out how to do it.
18:04:18
drmeister
But like CONSP would be inlined in CDR, which would be inlined into SECOND and that gets inlined into this code - sort of thing.
18:05:43
drmeister
I don't get a lot of those Fixpoint iterations exceeded threshold limit warnings. It's just that stretch before cl-ppcre where there are several of them.
18:06:01
drmeister
But: include/clasp/gctools/gcalloc.h:274 Bad size calc header@0x7f798513c030 header->stamp_wtag_mtag._value(7692) obj_skip(stamp 480) allocate_size -> 160 obj_skip -> 96 delta -> 64
18:08:36
drmeister
And I wipe out the cl-ppcre compiled code in the quicklisp cache and start icando-mps up again and it reproduces the problem right away.
18:11:35
drmeister
Ok, but you get the idea - that chain of inlines is like the chain of inlines that you get in C and C++. The DWARF spec knows how to handle it. We just need to tell it about our chains.
18:22:23
Bike
https://github.com/clasp-developers/clasp/blob/master/src/lisp/kernel/cmp/debuginfo.lsp#L254-L259
20:02:36
drmeister
Why is Sunday afternoon debugging always seem so painful. Everything slows down. I'm compiling hot functions with __attribute__((optnone)) and linking takes 20 min. Connecting the debugger and getting to the break point takes 20 minutes. And I have to do this repeatedly to get to the state where I can figure out what is going on. GARAHGHAGH
20:05:55
drmeister
Meanwhile I'm screaming at my computer to hurry up. Well, that's what I'm doing in my mind.
21:06:27
drmeister
Actually I know - it allocated thousands of bytes for a BitVectorNs_O and only used 64 of them.
21:07:55
drmeister
Spot checked array_int2.h and array_int4.h - they are correct. It was just array_bit.h
21:09:35
drmeister
I also wasted a couple of hours today because I was editing files in two repositories cando/ and clasp-badge/ and making changes in one repo and thought they were in the other one. So things were building weird and my changes were not having any effect. I tend to do that Sunday afternoons.
21:20:23
drmeister
I don't think it causes the type inference problem. With boehm I think the bug would be invisible. It wreaks havoc in MPS though.
21:21:56
drmeister
This bug creates a mismatch between the amount of memory allocated for a BitVectorNs_O and what obj_skip thinks the size of the BitVectorNs_O should be.
21:24:28
drmeister
I'm not going to try and work it through my noodle. I've fixed it and I'll merge it and then we can see.
21:25:32
Colleen
karlosz: drmeister said 5 hours, 16 minutes ago: I found the type error at startup problem. I have a reproducer. It's probably type-inference running on dead code. Check the logs for the last two hours.
21:33:10
karlosz
has anyone tried just turning off type inference via the variable and testing to see if it reproduces
21:33:34
karlosz
if it's buggy type inference a test code and bad hir will make it clear if type inference is doing something its not supposed to be doing
21:36:13
drmeister
The type error is intermittent. It depends on whatever junk it reads out of the badge in NIL.
21:37:46
Bike
i don't think it has anything to do with badges, i saw it in a branch of master i have
21:38:19
drmeister
Bike: it's not anything to do with badges, it's just if it tries to read the CDR out of NIL in my system it will get the badge.
21:39:53
drmeister
I can start icando-mps fine - but when I exit I get a bunch of MPS assertions. (sigh). My work is never done.
21:41:50
Bike
first, second, third don't have compiler macros, they're just defined as inlinable functions
21:42:50
Bike
car is defined as (if (typeq x cons) (primop:car x) (if (null x) nil (error ...))), first as the obvious cars and cdrs
21:45:03
drmeister
I can compile-file the example and then start up a new clasp and load the fasp file and run it.
21:47:44
Bike
i mean i don't see any way that eliminating typeq CONS in this code could be valid (without path replication or whatever, which we are not doing)
21:48:00
Bike
i don't know how those variables work any more. i just put cleavir-ir-graphviz calls in my-hir-transformations
21:48:02
drmeister
I don't think so - but I always have to dig into the code to find the arcane incantation necessary to generate HIR graphs.
21:48:50
drmeister
We have a new tool that we haven't used yet. Yitzi wrote a HIR viewer with cytoscape.
21:50:59
drmeister
seg.c: gcseg->buffer == NULL. --> The client program destroyed a pool without first destroying all the allocation points created on that pool. The allocation points must be destroyed first.
21:55:34
drmeister
I'll get this MPS assertion put to bed and then I'll generate the HIR and post it.
21:56:11
drmeister
I've been working on getting the MPS to work with Cando for years. I'm monomaniacal.
22:02:27
Bike
so if i set *eliminate-typeq* to nil the bug doesn't happen, right? is there a sensible way i can set that up for build while i'm working on other stuff?
22:12:21
karlosz
OK, my current hypothesis is that i made an assumption that maphash iterates over entries in the order in which the key value pairs were added last to the table
22:14:04
karlosz
that would be the easiest way to eliminate non determinacy from the algorithm i guess
23:14:12
Bike
or, well, if the keys are hir instructions or something, you'd have to have some kind of order that preserves their relative position between compiles
1:10:04
drmeister
I need to startup an lparallel kernel when cando starts, and shut it down when cando finishes.
1:11:37
drmeister
It's expensive to start lparallel kernels and so in the lparallel docs they say it's expected to live the lifetime of the lisp session.
1:12:42
drmeister
I gotta figure out how to ensure that (lparallel:end-kernel lparallel:*kernel*) is evaluated when cando shuts down.
1:14:50
Bike
we'd only need to do that to avoid some warnings, right? i mean, cando is already shutting down
1:21:17
drmeister
So I'll implement something like atexit - but it has to happen at the right time, before all of the MPS teardown starts happening - otherwise I get those assertion failures.
3:43:03
drmeister
::notify Bike We can set breakpoints with the bogus DWARF info. Starting up iclasp-boehm, connect with udb and go: b cmpintrinsics.lsp:9995 / b cmpintrinsics.lsp:999993 and then compile-file something it breaks when it hits that code.
3:43:27
drmeister
::notify Bike This will help us find it and fix the compiler to generate better DWARF info.