freenode/#clasp - IRC Chatlog

15:47:48 drmeister Hello everybody

15:48:03 Bike good morning.

15:48:14 drmeister In llvm9 - this is what the layout of a Code_O object would look like by default...

15:48:21 drmeister https://www.irccloud.com/pastebin/MDve6YJV/

15:49:36 drmeister It starts with a big chunk of code followed by the __bss section <<-- this contains the literal table.

15:50:17 drmeister The last line cc_initialize_gcroots_in_module shows where the roots are - they start at 0x125cf3040 and there are 5 roots.

15:50:24 drmeister That's all I need to scan.

15:50:35 karlosz Bike: you were right to suspect negative handling is the issue. currently i got the dumping and loading code working for positives but negatives are all loaded back as 0

15:50:44 karlosz is there anything to watch out for with mp and signed integers?

15:50:54 drmeister The maintainer for boehm got back to me. It looks like I understand the tags and boehm header word.

15:51:09 Bike the trick is that bignums are represented as sign-magnitude instead of two's complement. the sign is indicated as the sign of the length

15:51:20 Bike so -1 would be represented as one 0x1 byte and a size of -1

15:51:29 karlosz ohhhh

15:51:34 karlosz that would explain it

15:51:53 Bike are you using ash and stuff instead of grabbing the mp_limb_t's directly?

15:51:55 drmeister There is one that describes a length in bytes, which can be smaller than the size of the objects. It describes the amount of memory that needs to be scanned at the beginning of the object.

15:52:10 drmeister If I could get the literals at the beginning of the object I could use that.

15:52:22 drmeister Instead I'll use a marking procedure.

15:52:34 karlosz Bike: this is how i'm setting the bytes of a 0 bignum to initialize it: https://paste.gnome.org/p4li5jv9t

15:53:00 karlosz oh wait

15:53:10 karlosz that could be a lot simpler by casting the pointer to a char

15:53:14 drmeister I'm going to generate a "boehm header" for every class that the static analyzer handles.

15:53:37 Bike shouldn't you be able to work with the mp_limb_t directly? like write out limbs to the file instead of bytes? I'm pretty sure we write and read other multi byte integers okay

15:54:13 drmeister I'll use boehm-bitmaps for regular objects, boehm-lengths for vectors of gc managed pointers and a marking procedure for code objects.

15:54:30 Bike oh so you figured out how to mark vectors in boehm, huh

15:54:35 drmeister It's nice that they set it up to be so flexible.

15:55:01 drmeister Yeah - I learned some stuff as well. I should be using bitmaps more.

15:55:20 drmeister Even for MPS - it will simplify things.

15:56:53 drmeister They use bitmaps where the most significant bit represents the first word and they shift left, treat the word like a signed int64 and if the value is negative they mark the corresponding word.

15:57:09 drmeister If the bitmap becomes zero - they stop.

15:57:27 drmeister It's leads to nice, tight loops.

15:59:06 drmeister Precise mode will work with Boehm. Then I'll well on my way to implementing image save/load

15:59:33 drmeister I have a lot of other stuff to deal with so I'm trying to multitask on this.

16:08:41 karlosz Bike: it doesn't seem like we dump out any other integers besides fixnums, which aren't handled the same way as mp_limbs. is there a way to get at the limbs directly in lisp? the dumping code is in cmpliteral

16:09:01 Bike i don't think so, but it could be written pretty easily

16:09:02 karlosz so i don't see a way to frob the raw limbs from lisp without consing strings

16:09:13 karlosz ah right

16:09:16 karlosz by exposing it

16:09:18 Bike it just hasn't been needed until now

16:12:07 karlosz i guess there's no way to avoid assuming what an mp_limb_t is typewise though

16:16:23 Bike maybe not.

16:18:37 karlosz and we can't return the mp_limb_t directly because it doesn't fit in a fixnum

16:18:52 karlosz so i actually don't see how to expose getting the limbs directly to lisp

16:23:01 Bike CL_DEFUN mp_size_t bignum_size(Bignum_sp big) { return big->length(); }

16:23:07 Bike er, core__bignum_size

16:23:10 Bike that kind of thing.

16:23:58 karlosz yeah, the length i get

16:24:18 karlosz but in terms of the acutal limb data, i guess we'd have to cons a bignum

16:24:41 karlosz or do return values in CL_DEFUNs automatically get bignum boxed?

16:26:49 karlosz i guess i should just try it

16:29:36 drmeister Why do you need to expose mp_limb_t to lisp?

16:30:27 drmeister No need to answer - I'll trust that you do.

16:30:40 Bike so that cmpliteral can dump them directly

16:31:26 Bike but yes, for numbers outside of fixnum range it would have to cons a bignum

16:31:39 drmeister Ah - can we not do this all in C++? Because yeah - they won't fit in a fixnum.

16:31:42 Bike you can have the limb retrieval return an Integer_sp and then use Integer_O::create

16:32:02 drmeister I've tried over and over again to fit 64 bits into 62 - no go.

16:32:45 drmeister How are they coded right now? As strings?

16:33:57 karlosz drmeister: yeah, they are dumped as strings, which actually has a measurable performance impact

16:34:57 drmeister Yeesh - that is unexpected (performance impact). I thought this would be rare.

16:35:20 drmeister We can write ltvc_xxx functions to read and write vectors of limbs - then everything is done in C++.

16:35:47 karlosz some numeric libraries have lots of (signed-byte 64) and (unsigned-byte 64) declarations, which causes +/-2^64 to be dumped a lot

16:36:25 karlosz it's mostly macros and type declarations generating the bignums ltierally

16:37:50 drmeister The literal runtime uses C++ functions to read and write objects for the literal bytecode interpreter.

16:38:58 drmeister We could add a reader/writer for vectors of limbs. I think we could code them in the bytecode stream as a size followed by size number of 64bit words.

16:39:07 drmeister The bytecode stream is a vector of raw bytes.

16:40:01 drmeister This function reads and initializes a literal bignum...

16:40:02 drmeister https://github.com/clasp-developers/clasp/blob/dispatch/src/core/byte-code-interpreter.cc#L114

16:41:00 drmeister The byte_index is where in the literal bytecode vector it starts reading.

16:41:23 drmeister It reads a tag - I think this tells the GCRootsInModule what to do with the object that we are about to create.

16:41:45 drmeister Objects can be put in the literal vector at a particular index, or they can be put in a scratch vector at a particular index.

16:42:10 drmeister The literal vector is the final product. The scratch vector is used to store temporatry values.

16:42:27 drmeister This: size_t index = ltvc_read_size_t( fin, log, byte_index );

16:42:51 drmeister Reads the index where the new bignum will be stored (depending on tag)

16:43:04 drmeister This: T_O* arg2 = ltvc_read_object(roots, fin, log, byte_index );

16:43:23 drmeister I think this is reading and creating the lisp string that will be converted to the bignum.

16:43:30 drmeister This is the part we want to change.

16:43:36 karlosz yup

16:43:50 drmeister Right - and the next line creates the bignum and puts it where it's supposed to go.

16:44:13 karlosz yeah, so i've been thinking about how to do this without consing at all

16:45:20 drmeister This is where the functions that read and write things out of the literal bytecode are defined... It's all C++

16:45:32 drmeister https://github.com/clasp-developers/clasp/blob/dispatch/src/core/compiler.cc#L1466

16:45:52 karlosz right, excecpt cmpliteral needs to actually write out the byte codes

16:46:05 drmeister Why does it need to write out the byte codes?

16:46:35 karlosz let me find it...

16:47:08 drmeister They way it's currently set up the cmpliteral.lsp file generates C++ code that works with these C++ functions.

16:47:56 karlosz https://github.com/clasp-developers/clasp/blob/d9b83576506a9156a3fa2a83d4967f60938cdc9f/src/lisp/kernel/cmp/cmpliteral.lsp#L339

16:48:01 karlosz that's where we convert the bignum to a string and then construct the bytecode call to ltvc

16:48:11 karlosz passing in the handle to the string that gets dumped by cmpliteral

16:48:35 karlosz so we need to somehow pass the vector of limbs there

16:48:48 karlosz or create an empty bignum and set the limbs with side-effect-calls

16:49:44 karlosz but the argument to ltv/bignum is a lisp object - a bignum O

16:50:11 drmeister Yeah - so we don't do this.

16:50:27 drmeister We change the underlying code to work with bignums directly.

16:50:45 karlosz right, so i think that entails changing how parse_ltvc_make_next_bignum works

16:50:54 karlosz it can just take the pointer to the Bignumm O directly

16:51:25 karlosz and parse out the limbs from there?

16:51:28 drmeister ltvc_make_next_bignum will basically become a stub.

16:52:13 drmeister https://www.irccloud.com/pastebin/zceZsLNL/

16:52:26 drmeister That's to get started.

16:52:50 karlosz eh, isn't that an infinite recursion?

16:55:41 drmeister Hmmm, thinking.

16:58:03 karlosz oh, maybe we can just add a clause for bignums here: https://github.com/clasp-developers/clasp/blob/d9b83576506a9156a3fa2a83d4967f60938cdc9f/src/lisp/kernel/cmp/cmpliteral.lsp#L558

16:58:17 drmeister I see your point.

16:58:21 karlosz and then the ltv/bignum method will look like the ltv/fixnum method

16:58:28 karlosz writing the bignum into the stream is handled by the c++ functions

16:58:31 karlosz look, no more consing

16:58:44 karlosz and we can just read out the limbs and write into the streamf from C++ directly

16:58:55 karlosz going to try that now

16:59:13 drmeister Yeah - there you go.

16:59:54 drmeister We want a C++ function that will write out a vector of limbs and then read it back.

17:00:35 drmeister So it will need to write out: [ N | limb0 .... limbN-1]

17:01:33 drmeister Alternatively you could write it out as a string as a string of bytes. I _think_ you can have any length string containing any characters.

17:02:43 drmeister Note - I have a facility in there so that size_t are written out in a sort of compressed format

17:03:43 drmeister Look at compact_write_size_t and compact_read_size_t

17:04:09 drmeister I don't think the limbs will be able to use this - they should just write out the full word.

17:04:27 drmeister because they should be random bits - right?

17:04:47 karlosz ah, sounds good, i'll watch out for the compression

17:05:07 karlosz yeah, they will just be full words in general

17:07:57 drmeister the compact_xxx_size_t functions are to take advantage of the fact that we often write out small numbers like 1 and 3 and 9. So why write out 0x0000000000000001 When you can write out '1' followed by the byte 0x01

17:08:02 drmeister I think that's how it works.

17:08:31 karlosz right, makes sense

17:08:41 drmeister So - in the worst case it looks like: '8' followed by 8 bytes - so nine bytes total.

17:10:05 karlosz right, for bignums that'd be a pessimization

17:10:24 karlosz except for the last word i guess

17:11:18 drmeister What kind of bignums are being written out so often?

17:11:35 drmeister I'm still a bit shocked

17:13:23 karlosz 2^64 and -2^64 mostly

17:13:41 karlosz like when you have things like (signed-byte 64) and (unsigned-byte 64) declarations from numeric libraries

17:13:48 karlosz since those are the bounds on word sizes

17:13:58 karlosz the compiler will dump those out as literals

17:14:25 karlosz so quicklips libraries which do a lot of number crunching with word size arithmetic will have those declarations