libera/#clasp - IRC Chatlog
Search
1:59:52
drmeister
It's in compile-file-parallel. I added a 'form' slot to ast-job and I dump it here...
2:09:07
Bike
hmm. i'm going to guess the problem is the loop ecase, since that could lead to block deletion and is in a loop
2:11:24
drmeister
Sometimes I have to compile three times, sometimes five times - but I can trigger it with the serial compiler.
2:13:49
Bike
does it only work in a file or can you do it at the repl? but yeah, there should probably be enough
2:32:40
Bike
there have been a few issues with code that never returns like that. the whole time, really, it's kind of a little unusual inherently
3:20:54
drmeister
DNA is amazing stuff. I've got 48 million sequences to analyze and I had a weird result show up 2775 times. It turns out it's a sequence that's been floating around for months at the lab that did the DNA amplification and sequencing.
3:22:30
Bike
there are some pretty crazy aspects. my girlfriend does RNA stuff and she has to do all these cleanliness measures because there are RNAses just all over everything since skin is covered in it and such
3:23:52
drmeister
We made these DNA encoded libraries and I've been writing code and analyzing it for the last couple of weeks.
3:55:52
drmeister
I asked our collaborator about 19012A01 and they said - "Oh yeah that was one of our first sequences and it keeps cropping up - you should see the entire thing 11-11-11-11-11" So I went looking.
3:56:54
drmeister
I'm being kind of sloppy with the codes because the 19012A01 style codes is a stupid code developed by some biologist and it takes a while to type out.
3:58:21
drmeister
Anyway - I gathered up all the full sequences with the 19012A01 library code and found one sequence...
3:59:48
drmeister
So that means this sample was probably contaminated by a SINGLE strand of DNA that lead to this this sequence and it got amplified in the PCR so that we see it in the 48M sequencing reads 2,775 times.
4:02:27
drmeister
I've been trying to figure out the difference between sequencing noise and low copy contaminating DNA sequence that got amplified with everything else. This gives me a kind of lower limit on that.
4:04:28
drmeister
The y-axis is the log10 of the number of times a sequence shows up and X-axis is just the index of the sequence.
4:06:13
drmeister
They have quality data - the "phred" score for each base. I use that to filter out sequences that I consider to noisy to be reliable.
4:08:49
drmeister
Each 8-base stretch codes for a number from 1-10 using only 10 sequences chosen carefully from the 4^8=65536 possible sequences.
4:09:45
drmeister
I exposed the SeqAn C++ library to align short sequences to the overlapping 3+8+3 "codons".
4:11:55
drmeister
It's all running in Cando in a Jupyter notebook. I'm going to show it to our collaborators tomorrow. It kicks ass.
4:12:38
drmeister
They have some R scripts that take 2.5 days to analyze this same data. Mine takes 1-2 hours.