libera/commonlisp - IRC Chatlog
Search
12:23:43
hayley
Currently reading Mark Stuart Johnstone's thesis (supervised by Paul Wilson, for those playing along at home) for which he estimates "We think it is likely that the widespread use of poor allocators incurs a loss of main and cache memory (and CPU cycles) of over a billion and a half US dollars worldwide per year" in 1997. lisp123's prior estimate of productivity lost to not using Common Lisp seems quite small in comparison.
12:31:28
beach
Heh, that kind of calculation sounds familiar. I heard it a lot when I spent the year with them.
12:32:13
hayley
It would appear the copy at <https://www.cs.utexas.edu/ftp/techreports/tr97-29.pdf> is cut short though.
12:32:39
beach
Paul Wilson also created a system for compressed memory where paging was done in two levels. The first level was compression and the second was to disk. He did a similar estimate for how much RAM his system would save.
12:33:22
Nilby
hayley: I'm sure that's likely. I think the losses due to many other software practices are even more staggering. But sadly, in practice sbcl hogs more unused memory on my system then even a browser.
12:33:54
hayley
Guess I have to go through my university for access. "Your library or institution may give you access to the complete full text for this document in ProQuest." Yes, that's why I went on ProQuest, thanks.
12:35:28
hayley
Nilby: I don't think I would be able to reproduce that, without configuring SBCL to collect quite infrequently. But some have wanted SBCL to collect more frequently, to reduce the amount of floating garbage.
12:36:34
Nilby
Unfortunately, I have to set dynamic space to physical memory to prevent hard crashes when running out.
12:39:50
beach
As I recall, any system of automatic memory management needs quite a lot of additional memory, i.e., way more than 6%, so that the collector won't be triggered too often.
12:42:17
Nilby
beach: yes, technically i have like 1000's % overhead, but the active pages are the only thing that causes trouble
12:42:20
hayley
The time taken in garbage collection is inversely proportional to the space overhead allowed for floating garbage, so in a way no particular value is needed. But 100% to 200% space overhead is common.
12:43:50
Nilby
unfortunately, it has little to do with real program memory usage, it's just to prevent fatal crashes before "really" running out of memory
12:47:03
hayley
If your maximum heap size is much larger than the memory used, even after including space overhead, it is quite likely most of the heap has no physical memory mapped; last I checked, SBCL does unmap unused pages when possible.
12:49:48
hayley
...right around https://github.com/sbcl/sbcl/blob/master/src/runtime/gencgc.c#L4698-L4706
12:50:14
Nilby
hayley: right, i'm only really concerned with mapped and recently used memory, which the os considers "resident"
12:51:19
hayley
Then your resident memory usage should be somewhere between the size of all live objects in your program, and the maximum heap size.
12:54:12
hayley
Hm, maybe beach is referring to how much memory is needed to do a copying collection in the worst case (that nearly all objects survive and need to be copied). The worst case would require 100% space overhead, but the worst case doesn't happen too often. And generational collection tends to reduce the amount of memory that must be copied at a time, too.
12:55:46
beach
What is the point of unmapping pages. Is it to avoid that they migrate to secondary memory?
12:56:19
hayley
I can't reproduce that. After loading McCLIM, I see 110MB (nitpick: b for bits, B for bytes) of resident memory, and 1212MB of virtual memory.
12:57:02
Nilby
unmapping probably doesn't make that much of a practical diffence, it just makes the o/s's job a little easier
12:58:00
Nilby
when you multiply the practical overhead of about 6-10% of physical memory by 50-100 processes, it's pretty bad, when it's mostly unused.
12:58:07
beach
Yes, so it doesn't have to migrate it to secondary memory. Is there any other reason?
12:58:57
hayley
For comparison, HotSpot is much more lazy about unmapping c.f. <https://shipilev.net/jvm/anatomy-quarks/21-heap-uncommit/>.
13:01:41
hayley
beach: I think not having to migrate garbage pages to secondary memory is enough of a reason. At least, there is a similar problem when using bump-allocation with cache memory; even though everything after the allocation pointer is garbage, attempting to allocate will require garbage to be pulled into cache for seemingly no reason.
13:02:41
hayley
Cliff Click claimed that avoiding the latter phenomenon, by using special instructions in the Azul hardware, reduced the memory bandwidth of Java programs by 30% or so.
13:03:42
hayley
Nilby: In the same conversation with David Moon and Dan Weinreb, Cliff also stated that "swapping is death for GC."
13:05:04
Nilby
yes, old lisps, and lisps on older hardware (like azul) i think were more careful with that
13:05:34
hayley
<https://web.archive.org/web/20100302033837/http://blogs.azulsystems.com/cliff/2008/11/a-brief-conversation-with-david-moon.html> is a fascinating read still.
13:07:53
Nilby
i think it would just be lovely if someone made memory with room for tag bits. you would think current ecc memory could do it, but of course the architecture would have to be modded
13:12:34
Nilby
well, masking out tag bits is not without cost, but also knowing you always have those bits, you can omit a number of things that lisp has to do that, say C, doesn't
13:13:42
beach
Nilby: It is rare that you actually have to mask out the tag bits in current systems.
13:15:16
beach
Nilby: If the tag 0 is used for fixnums, then addition still works. And in many architectures, it is possible to include the tag as a small constant offset in memory operations.
13:17:35
hayley
I believe it is quite rare. On many processors a constant offset can be added to an address when performing loads and stores to memory. Suppose we have a CONS tag of 7 (as in SBCL), then the instruction for CAR needs to look like Ra <- load (Rb - 7). Similarly CDR adds 1 (8 bytes offset - 7 byte tag).
13:19:05
hayley
Though I have seen SBCL being less clever than it could be, and repeatedly unboxing array indices sometimes.
13:20:15
Nilby
well, right now i only have one compiler that's fast enough, and y'all know which one it is, and you can check for yourself, and especially try comparing to the output of gcc/llvm
13:25:50
hayley
gilberth and I came up with the idea to put parallel type-checking (and auto-increment) hardware on an old microprocessor, to see if we could make a low-budget "Lisp machine" in some useful sense of the term.
13:28:38
hayley
Depending on the sort of tag check, though, it is likely that most do not slow down the program much. A branch predictor can easily predict that type checks will not fail, and a superscalar processor can then run the (probably unnecessary) check in parallel with other code.
13:30:12
Nilby
hayley: I think a small set of mods to current archs could be worthwhile. one does't have to go "whole hog" like the lisp machines. just making the typical lisp function call take less ops, and maybe mild type/tag things, would be great.
13:32:00
Nilby
current toolchain compilers do so many freakish optimizations, it's weird to see what they generate
13:32:09
hayley
My wishlist for hardware features is quite similar to what Azul did: a read barrier in hardware, and hardware transactional memory. Everything else can be Sufficiently Smart Compiler-ed.
13:34:44
hayley
(It is also worth mentioning that, in my domain, some unboxing ops are inconsequential compared to having to "interpret" the matching automaton. So I still win despite the unboxing overhead.)
13:42:26
Nilby
yes, even the increasing use of llvm as library, can't really be as nice as cl:compile
13:46:24
hayley
If you want some cool architectural changes I'd recommend reading <https://dl.acm.org/doi/10.1145/3297858.3304006>
13:55:29
Nilby
hayley: nice paper. that makes me think that if someone just added a memory/object compiler to a lisp, we'd jump a 20% in speed
13:58:05
hayley
The architecture described in that paper would be implemented in hardware. Similarly though, in a general sense, CDR coding is a limited sort of compression, and Henry Baker said CDR coding should return due to memory bandwidth limits.
13:58:50
Nilby
something that would analyze objects and their use, make them less filled with zero bits
14:00:28
hayley
For what it's worth I think Cliff Click did in-memory compression for tabular data, that could "decompress into registers". This compression would also reduce memory bandwidth substantially.
14:00:41
Nilby
yes, but with compiler knowledge, e.g. when it can be reasoned that some bits will never be used
14:03:51
Nilby
because i'm weird and like to scroll around memory, it's very obvious how much is wasted, with the exception of compressed media
14:47:17
pjb
hayley: macOS does in memory compression, before swapping out pages to disk, it swap them out to compressed memory.
16:22:20
dbotton
Is there a lisp function to return the file name with out path and extension from a path or string?
17:20:34
pjb
dbotton: actually: (let ((path "~/.foo")) (or (pathname-name path) (concatenate (quote string) "." (pathname-type path))))