freenode/lisp - IRC Chatlog
Search
14:32:03
heisig
Sounds like just a few lines of code. Build an EQUAL hash table of forms, walk all files, populate the table and count their appearance.
14:35:19
heisig
It would reliably detect all duplicate code. Or do you mean similar code? Detecting that would be much harder.
14:37:13
heisig
Then you would need a code walker (e.g., agnostic-lizard) and a metric for similarity.
14:41:52
heisig
Maybe you could use a generic plagiarism checker. It might not be tailored towards CL, but could work reasonably well.
14:48:30
shka_
then go, calculate hashes of direct children operands, multiply with hash of the parent and also put it into filter
14:49:22
shka_
this way you would end up with fixed size data structure that would hold estimate of the whole structure
14:51:53
shka_
base case when you only considering two pieces of code while checking for duplication
21:28:35
fiddlerwoaroof
Yeah, it's a jane street project and so it's probably related to an ocaml codebase
21:43:47
no-defun-allowed
Well, at the moment beach is finalising a paper for ELS and I think there's some discussion about improving the compiler.
21:44:29
no-defun-allowed
You can test it, it only takes about two minutes to compile with low debug settings.
21:48:31
no-defun-allowed
Then you'd need to wire in the compiler and make a code generator too. I don't know how SICL manages that, or if the test environment/REPL goes through Cleavir even.
21:53:55
no-defun-allowed
However, that just goes to a backend which generates more Common Lisp to run on the host from memory.
21:54:30
francogrex
I was surprised to learn that closure cl always seeded itself from a previous image. few are lisp implementations that can be build from the ground up with just an assembler?
21:55:54
verisimilitude
That's how I want to do it. For one, you avoid the malicious compiler attack.
21:56:57
francogrex
computers work with assembly instructions and it is more transparent and clear to know what is going on
21:57:21
verisimilitude
With how I'd do it, there'd be no building at all; you'd have the machine code and that loads the rest of the Common Lisp package and there you have it.
21:57:47
verisimilitude
I wouldn't be harmed by a malicious assembler, because I'm writing my own machine code development tools.
21:58:09
no-defun-allowed
It's less transparent, there's much more to read and assembler requires a lot of state to remember.
21:58:42
verisimilitude
All you need to do is implement the very base everything else builds from, no-defun-allowed; you wouldn't even need the base to include the GC.
22:01:29
francogrex
but yes i expect that wiould be only the very base, everything else in lisp, sure
22:01:40
verisimilitude
Yes; I like programming without any complex building stages; it's disgusting to see software that wants over an hour to compile on a machine with well over a gigahertz of speed and gigabytes of working memory.
22:02:30
verisimilitude
If you want me to go in a tad more detail, I'll tell you about the general scheme I've thought up for this implementation I want to eventually find myself writing, francogrex.
22:02:49
francogrex
yes that being said, writing and loading machine code is not the greatest fun either
22:04:56
verisimilitude
Well, I have an article concerning the machine code development tool, I've been working on, if you'd want to see it.
22:06:26
verisimilitude
Now, how I'd like to do it is have a simple machine code program that implements READ and a few other necessary CL functions. You could have an EXT package that exposes everything that needs to be there, but isn't in CL.
22:07:20
verisimilitude
As an example, you could expose the data structures used for GC in this EXT package and after loading the functions that will manipulate it, UNINTERN this symbol. Then, it can't be accessed further and such a reference won't ever be created again.
22:07:58
pjb
francogrex: Toaster from assembler (from scratch): https://www.youtube.com/watch?v=R3Qn98bE880 Toaster from sophisticated tools: https://www.youtube.com/watch?v=crc8n7D4kYg
22:08:16
verisimilitude
You'd need a loading program that's what's actually called. This could also contain the version and help information. This would be written in, say, sh.
22:09:14
verisimilitude
Closing, all the machine code would do is READ and one of the first things that the loading program would do is feed in the rest of the CL package before locking it. Then you have a working implementation. One thing this would permit is customizing how most of the CL package is compiled, if that's desired.
22:10:03
pjb
verisimilitude: if you're building your assembler using stuff you didn't build yourself, you're fucked. See Reflection on Trusting Trust. https://www.ece.cmu.edu/~ganger/712.fall02/papers/p761-thompson.pdf
22:10:47
verisimilitude
Clearly, I'm eventually going to need to use an iteration of the tool to write a version of itself in machine code and then manually inspect that.
22:14:03
verisimilitude
I eventually want a 6502 version of this tool I can run on a C64, which is trustworthy enough. But that's then tangential.
22:14:18
verisimilitude
I'm glad you found it interesting, francogrex; did you refer to what I wrote here or the article I linked to?
22:14:47
francogrex
ok guys. I will save this to read it again later. verisimilitude yes i saved for reading it. thanks
22:15:34
verisimilitude
It's no issue. So, you haven't read the article yet; in that case, I anticipate your thoughts concerning that.
22:17:18
francogrex
not yet. I will and I come back here hoping to discuss it with you. I will try to understand :)