freenode/#clasp - IRC Chatlog
Search
14:36:19
drmeister
https://github.com/clasp-developers/clasp/blob/master/src/lisp/kernel/cmp/cmpintrinsics.lsp#L1100
14:37:39
drmeister
One problem with debugging this is that the (let (... (values (first arguments)) (size (second arguments)) (gf-args (second arguments))) assignment is a source info dead-zone.
14:38:54
drmeister
But there is more than just this bug. The source info dead-zone is a problem as well.
14:41:39
drmeister
https://www.dropbox.com/s/2kzehok7x2kjfm5/Screen%20Recording%202020-08-02%20at%2010.39.40%20AM.mov?dl=0
14:50:20
drmeister
inline core::T_O *raw_() const { return reinterpret_cast<core::T_O *>(this->theObject); }
14:51:42
drmeister
Calling SingleDispatchGenericFunctions_O needs to be changed. It's really inefficient.
15:06:48
drmeister
In both cases gf-args reads the same entry (second arguments) as size (second arguments)
15:08:20
drmeister
AND then finally arguments is NIL and either (values (first arguments)) or (size (second arguments)) or (gf-args (second arguments)) is signalling a type error.
15:11:42
drmeister
It might be that since values (also a terrible name for a variable in CL) is not being used maybe type inference is removing a type check and then it's trying to read the CAR of NIL?
15:13:00
drmeister
It makes it so much easier to debug this problem because I can back up and go forward over the error over and over again.
15:17:43
drmeister
So it gets NIL from the literal vector. (Aside, udb + my python extension = very powerful)
15:21:08
drmeister
ASIDE: Now - why would it do that? Whatever arguments is - there is no reason to check if it's T
15:25:04
drmeister
That doesn't make sense for NIL - but it would make sense for a CONS cell. 0x5(cons-tagged-ptr) would get the CDR
15:26:07
drmeister
Yeah - then the next thing it does is test if the result in %rdi is another cons-tagged-ptr.
15:33:55
drmeister
kpoeck: I think I see why it's intermittent. It's possible that what it reads out of NIL might avoid a type error and since the results don't get used it keeps going.
15:38:10
drmeister
Here's what mine looks like now. I changed the upper 'values' variable name to 'arg-values' as well.
15:39:52
drmeister
I want to show this to bike and karlosz so they can figure out if it's type inference going awry on dead code.
15:42:44
kpoeck
sbcl would complain about the unused variables, I wonder whether we should try to compile - not execute - out compiler with sbcl
16:04:19
drmeister
I've been learning udb and changing the lldb-clasp extension so that it works with gdb.
16:06:26
drmeister
It's much better than debugging from the command line because you can take in lots of information from the different frames.
16:07:24
drmeister
I've been forced to use lldb through the command line because I don't use xcode and emacs doesn't have an lldb GUI interface AFAIK
16:09:13
drmeister
::notify Bike I found the type error at startup problem. I have a reproducer. It's probably type-inference running on dead code. Check the logs for the last two hours.
16:09:27
drmeister
::notify karlosz I found the type error at startup problem. I have a reproducer. It's probably type-inference running on dead code. Check the logs for the last two hours.
16:21:15
drmeister
crash means I started up clasp, (load (compile-file "/tmp/foo.lisp")) (foo) -> crash
16:28:50
drmeister
I'd like to use emacs with udb and I've heard there is this gud interface in emacs but I've never used it.
16:30:23
kpoeck
MY disasembly starts with ; disassemble-assembly Size: 18446744069233803343 Origin: #<POINTER :ptr 0x10ac67fb0>
16:37:31
drmeister
But you can't see what is going wrong unless you single step through the instructions for the (let* (...) ...)
16:38:27
drmeister
It's made more difficult because we don't have good source info when we generate the inline code for (first a ) (second a) ...
16:39:22
drmeister
Because we can back up and go forward over instructions many times and see what the debug info is and decide what we want to see at that point.
17:17:17
Colleen
Bike: drmeister said 1 hour, 8 minutes ago: I found the type error at startup problem. I have a reproducer. It's probably type-inference running on dead code. Check the logs for the last two hours.
17:18:56
drmeister
It reads garbage out of NIL and tries to check the type of that garbage. Sometimes it keeps going and other times it spits up a type error.
17:19:27
Bike
do you know that it's due to type inference? like, does it never happen if type inference is disabled?
17:20:07
Bike
not sure i understand. why does it read garbage out of nil? Like it tries to treat NIL like a cons and takes its car, or something?
17:21:25
drmeister
I step through the instructions with udb and see that it first tests if 'a' is a CONS and it's not, it's NIL but then it loads the CDR of NIL and ... kaboom.
17:22:05
drmeister
With udb I can back up and drive over the instructions over and over and over until even I can figure out what it's trying to do.
17:23:13
drmeister
I think that's what it's doing. It has the tagged pointer for NIL and then it does mov 0x5(%rax),%rdi where %rax is 0x7f2f7c010041 (NIL)
17:23:55
drmeister
I am assuming that mov 0x5(%rax),%rdi is (cdr rax) because that's how we should be doing it. But %rax contains a General_O tagged pointer - so that's a problem.
17:24:21
Bike
well, if that is the problem it'll be obvious in the post type inference hir, probably
17:24:27
drmeister
If %rax contained a cons tagged-ptr then it would end with 0x3 and 0x3+0x5 -> 0x8
17:25:23
drmeister
Also, we should talk about the DWARF that is generated for this. Watch the movie I made.
17:25:44
drmeister
Once it hits the (first arguments) and (second arguments) it goes into a source info dead-zone.
17:26:04
drmeister
https://www.dropbox.com/s/2kzehok7x2kjfm5/Screen%20Recording%202020-08-02%20at%2010.39.40%20AM.mov?dl=0
17:29:39
drmeister
I can back up and go forward over instructions many times and see exactly what DWARF information is available and from the source info around the dead-zone figure out where we are.
17:30:48
drmeister
I can examine registers at any point and then figure out what kind of object they are with the python extension that I yesterday extended to gdb and udb.
17:32:17
drmeister
I commented out those useless bindings in cmpintrinsics.lsp and now I can't get it to crash with that type error. Yippeee.
17:36:03
Bike
so to be clear on the utility of the undo thing - the crash is in some region that doesn't have source info for whatever stupid reason, so you just rewind until you end up somewhere that does have source info?
17:36:41
drmeister
In the movie I backed up to near the top of the LET* and then nexti down into the dead-zone.
17:36:59
drmeister
You can see the Common Lisp source lines highlighted in the top window as I move through them.
17:38:00
drmeister
Later I hit the error inside of a call and then I 'reverse-finish' out of the error back into the LET* code but in the dead-zone.
17:38:35
drmeister
'reverse-finish' is like 'finish' but in reverse. It rewinds to the instructions just BEFORE the call that you are in.
17:39:14
drmeister
The udb debugger recreates the state of the machine anywhere between when I attached to the process and where the error happens.
17:41:23
drmeister
Right now I'm looking for a better GUI for udb. The 'tui' interface is primitive but serviceable. I'd rather use the 'gud' interface in emacs - but it's giving me some trouble.
17:42:02
drmeister
I can't get X-forwarding to work through the ThirdLaw VPN we have - I'll need cracauer's help with that.
17:42:28
drmeister
If I got X-forwarding to work I should be able to get the 'ddd' debugger working.
17:43:37
drmeister
Some kind of GUI is better than none because I like having a source view, a disassembly view, registers and local variables vies up at the same time.
17:44:41
drmeister
I can give you access to this linux machine (hermes) and teach you how to use it. It's so, so powerful.
17:45:20
drmeister
The Undo company is about to start a "developers program" - up until now they have only sold this thing to large enterprises.
17:45:48
drmeister
My connection to Gareth (who works there now) got me in with the folks at Undo and they gave me an evaluation license for a month.
18:00:34
drmeister
Bike: When you inline at the HIR level - you keep a chain of source-location information for XXX inlined into YYY inlined into ZZZ and so on - yes?
18:00:47
drmeister
I remember we talked about it but I don't recall if you figured out how to do it.
18:04:18
drmeister
But like CONSP would be inlined in CDR, which would be inlined into SECOND and that gets inlined into this code - sort of thing.
18:05:43
drmeister
I don't get a lot of those Fixpoint iterations exceeded threshold limit warnings. It's just that stretch before cl-ppcre where there are several of them.
18:06:01
drmeister
But: include/clasp/gctools/gcalloc.h:274 Bad size calc header@0x7f798513c030 header->stamp_wtag_mtag._value(7692) obj_skip(stamp 480) allocate_size -> 160 obj_skip -> 96 delta -> 64
18:08:36
drmeister
And I wipe out the cl-ppcre compiled code in the quicklisp cache and start icando-mps up again and it reproduces the problem right away.
18:11:35
drmeister
Ok, but you get the idea - that chain of inlines is like the chain of inlines that you get in C and C++. The DWARF spec knows how to handle it. We just need to tell it about our chains.
18:22:23
Bike
https://github.com/clasp-developers/clasp/blob/master/src/lisp/kernel/cmp/debuginfo.lsp#L254-L259
20:02:36
drmeister
Why is Sunday afternoon debugging always seem so painful. Everything slows down. I'm compiling hot functions with __attribute__((optnone)) and linking takes 20 min. Connecting the debugger and getting to the break point takes 20 minutes. And I have to do this repeatedly to get to the state where I can figure out what is going on. GARAHGHAGH
20:05:55
drmeister
Meanwhile I'm screaming at my computer to hurry up. Well, that's what I'm doing in my mind.
21:06:27
drmeister
Actually I know - it allocated thousands of bytes for a BitVectorNs_O and only used 64 of them.
21:07:55
drmeister
Spot checked array_int2.h and array_int4.h - they are correct. It was just array_bit.h
21:09:35
drmeister
I also wasted a couple of hours today because I was editing files in two repositories cando/ and clasp-badge/ and making changes in one repo and thought they were in the other one. So things were building weird and my changes were not having any effect. I tend to do that Sunday afternoons.
21:20:23
drmeister
I don't think it causes the type inference problem. With boehm I think the bug would be invisible. It wreaks havoc in MPS though.