Search
13:34:51
ravenousmoose
** NICK ravenousmoose[aw
14:27:34
ebrasca
Do you know some cl lybrary for detecting duplicate code?
14:32:03
heisig
Sounds like just a few lines of code. Build an EQUAL hash table of forms, walk all files, populate the table and count their appearance.
14:33:49
ebrasca
heisig: It don't sound very inteligent system.
14:35:19
heisig
It would reliably detect all duplicate code. Or do you mean similar code? Detecting that would be much harder.
14:35:19
pjb
well, duplicate code can differ by the symbol names.
14:35:39
pjb
(lambda (x) (* 3 x)) and (lambda (y) (* 3 y)) are duplicates…
14:37:13
heisig
Then you would need a code walker (e.g., agnostic-lizard) and a metric for similarity.
14:39:12
ebrasca
I am not interested in writing one.
14:41:52
heisig
Maybe you could use a generic plagiarism checker. It might not be tailored towards CL, but could work reasonably well.
14:42:55
ebrasca
Do you have some recomendation?
14:45:05
shka_
i think that this can be perhaps done slightly differently
14:45:26
heisig
Some people at our university use jplag: https://jplag.ipd.kit.edu/
14:45:37
heisig
It seems to work for Scheme code...
14:45:40
shka_
so, lisp code is basicly always following this (operator arg1 arg2 ...) convention
14:46:17
shka_
therefore it would be possible to do the following
14:46:35
shka_
first, construct concrete tree of piece of code
14:47:01
shka_
secondly, get root of the tree
14:47:13
shka_
now, build large bloom filter
14:47:32
shka_
then go and hash operand of the root
14:48:30
shka_
then go, calculate hashes of direct children operands, multiply with hash of the parent and also put it into filter
14:49:03
ebrasca
shka_: Do you like to implement it?
14:49:22
shka_
this way you would end up with fixed size data structure that would hold estimate of the whole structure
14:50:11
shka_
actual detection would be just based around jaccard metric of two filters
14:50:27
shka_
ebrasca: nah, already busy
14:50:40
ebrasca
shka_: What about duplication is same file?
14:51:09
shka_
files are irrelevant, this would work on individual form level
14:51:53
shka_
base case when you only considering two pieces of code while checking for duplication
14:52:50
shka_
anyway, this is my idea, not sure if it would work but it sounds reasonable to me
14:53:48
shka_
would be kinda broken for large code trees i think
14:54:02
shka_
because leafs would quickly dominate the root
14:54:22
shka_
but perhaps this is how it should work…
15:04:03
ebrasca
shka_: What about macros?
15:04:42
shka_
ebrasca: i would just considered those as a everything else
15:57:03
ravenousmoose
** NICK ravenousmoose[aw
16:51:45
ravenousmoose[aw
** NICK ravenousmoose
16:51:53
ravenousmoose
** NICK ravenousmoose[aw
19:24:14
fiddlerwoaroof
shka_: there was a post about writing a s-exp diff tool on HN a while back
19:24:21
fiddlerwoaroof
I think it was implemented in racket, thoguh
19:25:11
fiddlerwoaroof
Hmm, maybe it was Ocaml
19:25:12
fiddlerwoaroof
http://thume.ca/2017/06/17/tree-diffing/
19:26:16
fiddlerwoaroof
two strikes, it's rust
19:26:18
fiddlerwoaroof
https://github.com/trishume/seqalign_pathing/blob/master/src/lib.rs
19:40:04
loli
Ocaml really enjoys using sexps
21:28:35
fiddlerwoaroof
Yeah, it's a jane street project and so it's probably related to an ocaml codebase
21:41:49
francogrex
dears, any plans to release SICL soon? updates are almost daily
21:43:25
francogrex
what's funny is that it's 67% lisp and 30% tex! (+ other)
21:43:47
no-defun-allowed
Well, at the moment beach is finalising a paper for ELS and I think there's some discussion about improving the compiler.
21:44:29
no-defun-allowed
You can test it, it only takes about two minutes to compile with low debug settings.
21:45:06
francogrex
yes I know, using sbcl. but not standalone yet
21:46:50
Bike
there is no backend. i do not think beach has set a target release date.