libera/#clasp - IRC Chatlog

0:22:12 drmeister Bike: I hit an infinite loop of this: DELETE-IBLOCK^CLEAVIR-BIR^FN^^

0:22:42 drmeister It blows the stack. I'm working on getting a backtrace.

0:23:27 drmeister Here's the infinite part of it.

0:23:29 drmeister https://www.irccloud.com/pastebin/0ecguCPS/

0:30:54 drmeister Here is the bottom.

0:30:55 drmeister https://gist.github.com/drmeister/957c3235e799ebd5bc4614d3c8201d47

1:18:59 drmeister https://gist.github.com/drmeister/55fc3e92523fdcc9168981e724781a0e

1:19:40 drmeister Bike: I changed delete-iblock to this...

1:20:04 drmeister https://gist.github.com/drmeister/0f744e11bc68ad46762e066893f861d2

1:20:55 drmeister It's somewhat reproducable.

1:21:04 drmeister Like 50% of the time.

1:38:10 Bike hmm. like it's following a loop around. that's weird.

1:40:13 Bike although now that i look at it i'm not sure how that's prevented

1:40:18 Bike what code are you compiling that causes this?

1:58:59 drmeister I've been tracking it down and maybe with some success.

1:59:52 drmeister It's in compile-file-parallel. I added a 'form' slot to ast-job and I dump it here...

2:00:10 drmeister https://www.irccloud.com/pastebin/8t1wUcYC/

2:00:17 Bike i usually just switch to serial compilation when tracking down these things

2:00:32 Bike i gues you could also just print the source info of the block

2:00:33 drmeister Yeah - I wasn't sure it would reproduce. So I tried this...

2:01:05 drmeister It's not completely reproducible.

2:01:38 Bike yeah, probably it's deleting in different orders or some crap like that

2:01:46 drmeister https://www.irccloud.com/pastebin/rlRvziCC/

2:01:55 drmeister No.

2:02:18 drmeister This...

2:02:20 drmeister https://www.irccloud.com/pastebin/xYPilKq0/

2:06:24 drmeister I commented out part of the code on line 14 and then I got this...

2:06:33 drmeister https://www.irccloud.com/pastebin/rT6keric/

2:06:53 drmeister That looks like an interesting order.

2:09:07 Bike hmm. i'm going to guess the problem is the loop ecase, since that could lead to block deletion and is in a loop

2:09:49 drmeister I can get it to happen with the serial compiler.

2:09:55 drmeister I have to do it a couple of times though.

2:10:18 drmeister Do you have enough to go on?

2:11:24 drmeister Sometimes I have to compile three times, sometimes five times - but I can trigger it with the serial compiler.

2:13:18 drmeister I swapped the ecase for a cond but it still happens.

2:13:49 Bike does it only work in a file or can you do it at the repl? but yeah, there should probably be enough

2:14:42 drmeister Oh - wait - my loop can't exit.

2:15:06 Bike ah, true. i assume it's supposed to

2:28:53 drmeister Yeah

2:29:16 drmeister This compiles many times without a problem - so I'm going to assume it's ok.

2:29:40 drmeister https://www.irccloud.com/pastebin/5qzHSPXe/

2:32:40 Bike there have been a few issues with code that never returns like that. the whole time, really, it's kind of a little unusual inherently

2:32:49 Bike still, that should not happen ever, of course. i will look at it

2:36:58 drmeister Thanks.

3:10:27 beach Good morning everyone!

3:18:42 drmeister Things are much more stable now that I changed that code.

3:20:54 drmeister DNA is amazing stuff. I've got 48 million sequences to analyze and I had a weird result show up 2775 times. It turns out it's a sequence that's been floating around for months at the lab that did the DNA amplification and sequencing.

3:21:30 drmeister It's an old control sequence from when they started 11-11-11-11

3:22:30 Bike there are some pretty crazy aspects. my girlfriend does RNA stuff and she has to do all these cleanliness measures because there are RNAses just all over everything since skin is covered in it and such

3:22:43 drmeister Yeah

3:23:02 drmeister Thats part of our immune system

3:23:52 drmeister We made these DNA encoded libraries and I've been writing code and analyzing it for the last couple of weeks.

3:24:35 drmeister Each DNA code has 8 parts: aaaabbbb-ccccdddd-eeeeffff-gggghhhh

3:25:15 drmeister Oh wait - 10 parts. I dropped a part I don't think about much.

3:25:20 drmeister Example: ("11072202" "13062405" "15032608" "17022804" "19012A09") . 27262)

3:25:47 drmeister The last part is called the library code. Our library code is "19022A02"

3:25:59 drmeister Here is a histogram of all the library codes we saw...

3:26:15 drmeister https://www.irccloud.com/pastebin/HC5mWDCr/

3:53:52 drmeister Got called away.

3:54:21 drmeister Our library code is 10922A02 - we see that code 39.6M times.

3:54:43 drmeister Then there is 19012A01 and then a bunch of noise.

3:55:52 drmeister I asked our collaborator about 19012A01 and they said - "Oh yeah that was one of our first sequences and it keeps cropping up - you should see the entire thing 11-11-11-11-11" So I went looking.

3:56:54 drmeister I'm being kind of sloppy with the codes because the 19012A01 style codes is a stupid code developed by some biologist and it takes a while to type out.

3:57:20 drmeister Basically you got 1x01-1x10 and 2x01-2x10

3:57:36 drmeister I say "Zero" have you heard of it?

3:57:58 Bike i don't recognize this code, no

3:58:10 Bike i'm used to just the atcggctagc stuff

3:58:21 drmeister Anyway - I gathered up all the full sequences with the 19012A01 library code and found one sequence...

3:58:30 Bike two bits per character is rather suboptimal, i spose

3:58:38 drmeister ((("11012201" "13012401" "15012601" "17012801" "19012A01") . 2775))

3:58:57 drmeister That's the 11-11-11-11-11 code - that's the only one I see.

3:59:48 drmeister So that means this sample was probably contaminated by a SINGLE strand of DNA that lead to this this sequence and it got amplified in the PCR so that we see it in the 48M sequencing reads 2,775 times.

3:59:53 drmeister That's kinda neat.

4:00:31 Bike yeah, cool. shame it's messing up the read though.

4:01:04 drmeister It's not - it's literally seen only 2775/48000000 reads.

4:01:26 drmeister They said that it was a bigger problem a couple of months ago.

4:02:27 drmeister I've been trying to figure out the difference between sequencing noise and low copy contaminating DNA sequence that got amplified with everything else. This gives me a kind of lower limit on that.

4:03:31 drmeister Here's a histogram of the number of times different sequences show up...

4:03:49 drmeister https://usercontent.irccloud-cdn.com/file/lSJpe8jG/image.png

4:04:28 drmeister The y-axis is the log10 of the number of times a sequence shows up and X-axis is just the index of the sequence.

4:05:03 Bike nice x axis

4:05:07 Bike how long are these sequences?

4:05:14 drmeister 167 bases.

4:05:26 Bike and they're all the same length? i see...

4:05:35 drmeister Yes.

4:06:13 drmeister They have quality data - the "phred" score for each base. I use that to filter out sequences that I consider to noisy to be reliable.

4:06:49 drmeister Here's how it's organized...

4:06:50 drmeister https://usercontent.irccloud-cdn.com/file/NxYOH0Vz/image.png

4:07:12 drmeister I aligned about 200 of them in emacs and lined up the columns.

4:07:32 drmeister From left to right there's a ~40 base constant/forward-primer.

4:08:04 drmeister Then (3-bases-8-bases)x10-3-bases

4:08:49 drmeister Each 8-base stretch codes for a number from 1-10 using only 10 sequences chosen carefully from the 4^8=65536 possible sequences.

4:09:04 drmeister They are chosen to be at least 3 apart by Hamming distance.

4:09:45 drmeister I exposed the SeqAn C++ library to align short sequences to the overlapping 3+8+3 "codons".

4:10:13 drmeister SeqAn scores mismatches and gaps in a consistent way.

4:10:28 drmeister I end up throwing out about 2/5 of the sequences.

4:10:52 drmeister In the end it gave us two top molecules that we are resynthesizing.

4:11:02 drmeister We have more but there are two that really stand out.

4:11:55 drmeister It's all running in Cando in a Jupyter notebook. I'm going to show it to our collaborators tomorrow. It kicks ass.

4:12:06 drmeister It's also about 50x faster than what they have.

4:12:38 drmeister They have some R scripts that take 2.5 days to analyze this same data. Mine takes 1-2 hours.

4:14:44 drmeister I've got this kind of stuff going in the jupyter notebook:

4:14:46 drmeister https://usercontent.irccloud-cdn.com/file/H2AzF26G/image.png