freenode/#clasp - IRC Chatlog
Search
15:09:15
drmeister
For some reason dlsym is return NULL for everything at snapshot load time. But it works fine when I call (core:dlsym... ). it from the CL repl Grrrrr
15:48:52
drmeister
In the snapshot save function I call dladdr on every address that I need to identify the symbol name and then I dlsym on the symbol name and at 6073/9084 the dlsym calls start failing.
15:51:55
drmeister
But I want to turn my head to the sky and just start screaming incoherently. This is so frustrating.
15:53:14
beach
I guess it's because you are trying to do it the Unix/C++ way with the standard dynamic linker.
15:53:20
drmeister
It's navigating the unix dynamic linking layer and ELF files and macho files and my own stupid bugs.
15:54:42
drmeister
It's because I have to use symbols and basically implement my own linker because I need this to be robust to linking new executables if I want to embed the snapshot within an executable.
15:56:55
drmeister
This looks like it's probably my fault - everything works for about 6000 symbols and then it starts failing. That smells like my problem.
15:58:20
drmeister
This appears to work on linux - but I'm moving the code back into linux to look at the output of this address -> dladdr -> dlsym -> address test.
16:56:13
frgo
drmeister: Some code to stare at? Do you use dlinfo to check symbol visibility? Hey, btw.
18:23:29
drmeister
I have lots of function pointers, method pointers and vtable pointers from objects in memory to addresses in the executable (and later to dynamic libraries).
18:24:18
drmeister
So when I save the snapshot I now use 'dladdr' to convert those address to mangled names and I save the mangled names in a symbol table and reference them with an integer offset.
18:24:49
drmeister
Then when the snapshot loads I'll pass the name to 'dlsym' to recover the address of the function/method/vtable pointer at load time.
18:25:47
drmeister
I've struggled with this for weeks because of reasons I can go into later. But now I think this idea of save-address -> dladdr -> name -> dlsym -> load-address should work.
18:26:54
drmeister
I'm testing this when I save the snapshot - I'm checking if save-address -> dladdr -> name -> dlsym -> save-address works. I'm running into a problem where after about 6050 addresses (out of about 9050) the dlsym starts returning NULL.
18:27:56
drmeister
I just got into my office before I started talking and I'm testing one idea and cleaning up the code a bit and then Ill push and post the link.
18:29:43
drmeister
https://github.com/clasp-developers/clasp/blob/future/src/gctools/imageSaveLoad.cc#L1262
18:40:42
drmeister
Backwards - the same ones are failing as forwards - ok - so this isn't too crazy.
18:42:12
frgo
Just by staring at the source for the first time: Move the statement DL_Info info; inside the for loop.
18:42:35
drmeister
frgo: Is RTLD_DEFAULT a good idea? Should I be using the handle of the executable instead.
18:44:07
drmeister
What is your thinking about moving the Dl_info into the loop? I think that would be immaterial.
18:45:14
drmeister
And if you take the time to look at my code and make a suggestion I'll definitely give it a whirl.
18:45:48
drmeister
Reporting back on that - it doesn't make a difference to the result when I move Dl_info into the loop.
18:46:36
frgo
On line 1266 : int ret = dladdr( (void*)address, &info ); the struct is re-used again and again without being reset before. We may end up garbage if dladdr doesn't set all slots.
18:48:20
drmeister
These errors are really weird: OFFSET-FAIL! Address 9071/9084 save the address 0x10771a770 resolved to the symbol and then dlsym'd back to 0x10771a760 delta: 18446744073709551600 symbol: _ZTVN4core11Readtable_OE
18:51:04
drmeister
This only applies to a subset of the failures. With vtable pointers I see that they don't point to symbol/addresses - but rather they point to symbol/address+0x10
18:56:23
drmeister
What's weird here is I can resolve 0x1145b7de0 to _ZTVN4core6Lisp_OE but I can't resolve that symbol back to the address.
19:00:51
frgo
Could you do a dladdr call immediately after the dlsym call? Just to check if a pure call to dlsym doesn't affect location?
19:01:41
drmeister
With the address passed to the first dladdr? Should I check the result or anything?
19:05:37
frgo
So dladdr(address) -> dlsym(symbolname) -> dladdr(address) then check what dl_info tells us.
19:06:34
frgo
It might be that the call to dlsym affects a dyld-internal table and we get changing info on every dlsym call.
19:12:40
frgo
A "Library" object is a clasp compiled object, not some third party library - is that right?
19:14:49
frgo
Also, I'd be curious to see what dlerror(); returns as a message when we don't get a symbol name back.
19:19:29
drmeister
https://github.com/clasp-developers/clasp/blob/future/src/gctools/imageSaveLoad.cc#L1262
19:30:14
drmeister
I mean I can call dlsym with one symbol and get an address and call with another and get NULL.
19:32:35
frgo
Yeah I see that. I was looking at dlfcn.h here: https://opensource.apple.com/source/dyld/dyld-832.7.3/include/dlfcn.h.auto.html
19:33:24
drmeister
Ok so some of these symbols are external and some are local? I'm back to that huh?
19:35:58
drmeister
I don't know why one is external and the other is not - I'm looking at more of them now...
19:40:17
drmeister
Every case that I checked - when dlsym can't resolve the name it is a 's'/local symbol
19:41:33
drmeister
But dladdr will give you the symbol whether or not it is external - and that is helpful at least.
19:42:16
drmeister
Ok - so I have to either figure out why and then make every one of these symbols external - or I need to use '-exported_symbols_list'
19:43:46
drmeister
So why would __ZTVN4core26DerivableCxxClassCreator_OE be external and __ZTVN4core13ClassHolder_OE be local?
19:46:20
drmeister
Yeah: echo __ZTVN4core26DerivableCxxClassCreator_OE | c++filt --> "vtable for core::DerivableCxxClassCreator_O"
19:46:39
drmeister
So I'll call these vtable pointers (although in memory the pointer is actually to this + 16 bytes)
19:52:43
drmeister
Huh - so far every one of these local symbols is a vtable pointer - like: _ZTVN4core15VariadicMethoidILi0ENS_6policy5claspEMN4chem19StereoInformation_OEFvvEEE
19:59:20
drmeister
I have about 9050 addresses that I need to resolve to symbols with dladdr at snapshot save time.
20:00:40
drmeister
On linux - it's not a problem. Either they are all external linkage or dlsym on linux doesn't care
20:03:52
frgo
A completely different rout on macOS may be to use the ImageLoader class -> https://opensource.apple.com/source/dyld/dyld-832.7.3/src/ImageLoader.h.auto.html
20:05:41
drmeister
Nah - I've already got this. The problem looks like a small one now - just figure out how to get these vtable pointers with external linkage.
20:13:10
frgo
So, there are vtable pointers that are exported? If so, I seem to see that you can ignore the ones that are not exported.
20:14:32
drmeister
I can't save an snapshot and then load it back on macOS until I figure out how to make these external.
20:15:23
drmeister
I'm going to collect them in a file and try adding '-exported_symbols_list <symbol-list>' to the link command line.
20:22:21
drmeister
That appears to define the ENTIRE symbol list to export. All sorts of new problems arise if I just list the symbols I want to add to the list of exported symbols.
20:24:06
frgo
Add in the fact that macOS is based on BSD - which, strictly speaking - is not a Unix and even has a Mach-O kernel, then yeah - Oh for crying out loud.
20:30:48
drmeister
If I can use dladdr/dlsym - I'd rather do that. Figuring out the absolute value of symbols that I read from symbol tables from ELF and macho files is a massive headache.
20:31:43
drmeister
For the last couple of weeks I've been trying one thing after another and every approach has had challenges and nightmares.
20:32:40
drmeister
Right now the problem appears to be reduced to "macos vtable pointers are sometimes external and sometimes internal linkage - how do I control it?"
21:01:54
frgo
drmeister: Does clasp use clang with -fvisibility=hidden? (I can't recall if it is even the default now). If so, explicitly making a symbol external is done using something like #define EXTERNAL __attribute__((visibility("default"))) and use that to mark the failing symbols with that. Just a possibility - and not a good one as it needs code changes.
21:04:22
drmeister
I am looking at classes and I can't see any obvious difference between classes that have external vtable symbols and those that have local vtable symbols.
21:11:41
drmeister
Templated ones are a bit part of the problem - but I see regular classes as part of the problem as well.
21:27:37
drmeister
I'm writing small test cases now and compiling them. vtable symbols are all external
22:03:56
drmeister
The vtable symbol for my classes starts out in an object file with external visibility but in the executable some of them become local - so it's the linker (macOS) that makes the call. Googling how linkers decide symbols have external or local visibility.