Search
Wednesday, 17th of May 2017, 18:23:59 UTC
19:14:14
phoe
drmeister: Networking issues on origin should be resolved now.
19:14:19
phoe
If they aren't - let me know immediately.
23:50:35
drmeister
Why would an llvm 'add' be converted to an 'or'?
23:51:13
stassats
sometimes they're equivalent
23:51:24
stassats
when some the bits are clear
23:52:00
drmeister
https://www.irccloud.com/pastebin/arOT3lvV/
23:52:19
drmeister
closure is a tagged pointer with the low bit 1
23:53:18
drmeister
It's being converted to this llvm-ir
23:53:19
drmeister
https://www.irccloud.com/pastebin/vvcqmpVq/
23:53:43
stassats
but OR and ADD have the same throughput and latency on, say, haswell
23:53:55
stassats
maybe some other arches have cheper OR
23:54:12
drmeister
but they aren't equivalent here.
23:54:38
drmeister
I need a: add i64 %ptrtoint, 7
23:54:47
drmeister
I asked for an add - I need an add.
23:58:24
drmeister
I do. Essentially I have a tagged pointer with low bits 0001, I need to add 7 (0111), I need to get 1000
23:58:35
drmeister
The 'or' will give me 0111
0:03:19
stassats
then your types are incorrect
0:04:53
drmeister
Hmmm, maybe my types are incorrect.
0:05:21
stassats
do they assume alignment?
0:05:26
drmeister
My types are wrong.
0:05:41
drmeister
I need to dereference closure
0:05:44
stassats
llvm may also be broken
0:06:24
stassats
doesn't llvm have better instruction for offsets for loading instructions?
0:06:43
stassats
cause effective addresses can express addition and whatnot
0:08:19
stassats
you have %entry-point-addr, align 8
0:08:24
stassats
what does align 8 mean?
0:09:38
stassats
googling stuff just confirms my opinion that llvm is badly document and has terrible API
0:11:17
drmeister
No, I needed to dereference the pointer. The {}** must be being treated as aligned and or is equiv to add in that case.
0:11:49
stassats
but there should be no OR in real machine code
0:12:07
drmeister
align 8 means what it says - the pointer is aligned to 8-byte words
0:12:11
stassats
it should be MOV ABC, [PTR+7]
0:18:10
stassats
just tried to look at what ((int*)x)[10] from C would look like in IR
0:18:19
stassats
it's %4 = getelementptr inbounds i32, i32* %3, i64 10 %5 = load i32, i32* %4, align 4
0:23:59
stassats
yours should looke something like %4 = getelementptr inbounds i8, i8* %3, i64 7
0:25:20
drmeister
Yeah - but casting and pointer arithmetic is easier on my brain at the moment.
0:25:52
drmeister
I'll do a search for ptrtoint later and change things to getelementptr - there's only a handful of these.
0:28:06
stassats
well, presumably this gep thing will allow to encode the offset in the load instruction and save on a temporary register
0:32:27
stassats
just looking at return (x & ~7) + 3;
0:32:34
stassats
%2 = and i32 %0, -8 %3 = or i32 %2, 3
0:32:44
stassats
so it does convert to OR when it thinks the low bits are clear
0:33:13
stassats
but that's in IR, shouldn't that be the job of whatever optimizes stuff to machine code?
0:34:20
drmeister
Optimization happens at the IR level - it may happen at others - but I'm really familiar with the IR level.
0:35:29
stassats
ADD to OR is a bit silly, at least in this case
0:35:38
stassats
though i can see (x & ~7) + 7 just going to OR 7
0:36:41
drmeister
The dereference did the trick - now cleavir is compiling things.
0:37:09
drmeister
I still have an exception handling bug. I have a stack unwind that is skipping a landing pad.
0:37:25
drmeister
This almost certainly means I have a CALL where I need an INVOKE.
0:37:31
drmeister
These are tough to find.
0:38:43
stassats
now google isn't working for me, great
0:41:47
stassats
"It's not you, it's us Bing isn't available right now, but everything should be back to normal very soon."
0:41:51
stassats
are you kidding me
0:43:13
Bike
how mysterious, it works here, except that getelementptr returns the wikipedia page on praseodymium
0:43:41
stassats
things work intermittently
0:48:01
stassats
i doubt any architecture would have different OR and ADD performance characteristics
0:50:04
stassats
on x86-64, that OR goes down to leaq 3(%rdi), %rax
0:50:32
stassats
so it does pass around the information about set bits
1:13:08
drmeister
Bike: dictionary.lisp - there is an apply with no test for call-arguments-limit
1:14:04
drmeister
What do I do with that?
1:14:18
drmeister
Clasp now has a 64 argument limit for funcalls.
1:16:18
drmeister
https://www.irccloud.com/pastebin/i6iA2Bc0/
1:17:07
drmeister
The issue is I have to set the limit somewhere and what do I do with APPLY's like this?
1:17:22
stassats
64 is far too small
1:17:46
drmeister
A few days ago you said that was enough for anyone?!?
1:17:56
drmeister
I based my life on your teachings.
1:18:15
stassats
your mixing me up with Bill Gates
1:18:18
drmeister
But whatever I set it to - someone is going to hit the limit.
1:19:07
stassats
that's why you make it unlimited
1:19:12
drmeister
What would you set it to if you had to set a limit?
1:19:31
stassats
65535, if really had to
1:19:43
drmeister
The only way I can see to make it unlimited would be to generate them for higher arities as they are needed and cache them.
1:19:54
stassats
but if you can make it 64, you can make it any number
1:20:12
drmeister
I have these monsters though...
1:21:13
drmeister
https://gist.github.com/drmeister/7495fed5dff16eb9203da7e7062449b2
1:21:52
stassats
is that to prove my point that llvm is poorly designed?
1:22:05
drmeister
That's more of a C/C++ problem
1:22:31
drmeister
I think so. ECL has the same thing.
1:22:46
stassats
ecl doesn't have access to assembly
1:22:59
drmeister
Neither does llvm.
1:23:18
drmeister
It generates assembly - but it doesn't have any better access to it.
1:25:03
stassats
well, what can i say
1:25:12
stassats
this is all really bad
1:26:16
stassats
i'd inline some assembly
1:33:48
Bike
ecl has the same thing but has a higher call-arguments-limit somehow.
1:34:13
drmeister
it may be a limit in the byte compiler but not the C compiler.
1:34:18
drmeister
I'm guessing here.
1:38:46
drmeister
Bike: Could we convert that into something that doesn't blow APPLY?
1:40:14
stassats
(reduce #'bag-join (cleavir-ir:predecessors instruction) :key (lambda (pred) (arc-bag pred instruction dictionary))) you mean?
1:40:21
Bike
ecl allows 200 arguments every way i can think of to compile something
1:40:38
Bike
and yes, that could be done, but i'd rather there be a higher call-arguments-limit
1:41:10
drmeister
What I'm wondering - but haven't stated explicitly is...
1:42:13
drmeister
The way it's written - using APPLY, isn't it possible that since there's no limit to the number of predecessors of an instruction (large functions) that there is no safe, low limit for funcall that will work?
1:43:19
drmeister
I can set the limit to 140 - but above that the Google 'pump' script goes nuts for some reason. Google 'pump' is a Python program that generates code based on a template.
1:45:30
drmeister
At a limit of 140 I can compile-file predlib.lsp
1:45:39
drmeister
I'm going to try and build cclasp with this.
1:45:41
stassats
reduce will not cons
1:46:55
drmeister
Bike: Would this be ok?
1:46:56
drmeister
https://www.irccloud.com/pastebin/jA40Pktp/
1:47:28
Bike
yeah. if it works i'll just push it to beach
1:48:15
Bike
little surprised it came up there. i guess an instruction would have a lot of predecessors if you had a tagbody with a lot of tags, maybe
3:12:28
drmeister
cclasp is building again
3:13:09
drmeister
It's not super fast
3:13:56
drmeister
But this is bclasp - the last big hurtle is all lexical variables in activation frames.
3:14:53
drmeister
The impact should be seen in cclasp.
4:27:21
beach
Good morning everyone!
Thursday, 18th of May 2017, 6:23:59 UTC