libera/#clasp - IRC Chatlog

14:04:33 yitzi Seems like that would force every thread to handle a single job.

14:05:46 yitzi Did you get that? Try :parts (length jobs)

14:06:25 drmeister I ignored :parts up until now - I'm trying to make sense of :parts.

14:06:40 drmeister If I say `:parts (length jobs)` what does that mean...

14:07:18 yitzi I am pretty sure that means break the list up into single jobs

14:07:19 drmeister At first glance the docs examples are confusing.

14:07:25 yitzi Basically this...

14:09:11 yitzi You have a 4 core processor. That is the default number of parts it break stuff up into. When you do PMAP over (j1 j2 j3 j4 j5 j6 j7) then C1 gets (j1 j2), C2 gets (j3 j4), C3 gets (j5 j6) and C4 gets (j7) ....

14:09:41 drmeister Yeah - I follow that.

14:10:13 yitzi If you do :parts 7 then C1 gets (j1), C2 gets (j2), C3 gets (j3), C4 gets (j4), in the queue goes (j5), (j6), and (j7)

14:11:51 drmeister I don't see how that follows... wouldn't that be what you get with `:parts 1`

14:12:09 drmeister I'm not arguing though, thinking about what you say...

14:12:19 drmeister Meanwhile - I'm trying it.

14:12:46 yitzi Pretty sure parts 1 would be C1 gets the whole list.

14:13:32 drmeister Ok, yeah - that kinda makes sense.

14:14:11 yitzi And this is still assuming that pmap respects the ordering of the list.

14:14:23 drmeister The default for :parts is the number of workers. Saying `:parts (length x)` would give one job to each worker if there were `(length x)` workers.

14:14:43 drmeister There are not - so as each worker finishes - they take another job?

14:15:15 drmeister Right - the order needs to be followed.

14:15:20 yitzi Yes, I am assuming that if the length is greater then the number of workers then lparallel will queue them

14:16:07 drmeister It's running now. We will get a graph in 10 min or so.

14:16:34 drmeister It would be very convenient if this works. Then I wouldn't need MPI yet.

14:17:11 drmeister I have a problem in the search that gets smaller and smaller the more searching I do.

14:17:43 drmeister It's a bit difficult to describe but imagine I'm generating puzzle pieces that must connect to other puzzle pieces.

14:17:47 yitzi I seem to recall, that is how the "channels" in lparallel work. I think that there is a bug in lparallel in that the next job in the queue won't start if you don't retrieve the result waiting on the channel. But that shouldn't be a problem for PMAP. I ran into this issue for the TIRUN app when we are sketching the ligands.

14:17:57 drmeister I'm searching small combinations of puzzle pieces.

14:18:27 drmeister With a short search - of say 20 - about 1% of the puzzle pieces don't fit a following piece.

14:18:53 drmeister With a search of 200 - it's about 0.4% of the puzzle pieces don't fit a following piece.

14:22:28 drmeister I don't know much I need to search to drive that to zero - zero would be best.

14:22:58 drmeister It takes about 4 hours on 12 nodes each with 28 cores to search 200.

14:23:50 drmeister I thought it would be best to address that long tail before I do anything else.

14:25:12 yitzi makes sense.

14:25:42 drmeister Here it is 10 min in...

14:25:43 drmeister https://usercontent.irccloud-cdn.com/file/RDTWpo7x/image.png

14:26:45 drmeister The measurement for each node is not very good - or lparallel is doing crazy things.

14:26:49 drmeister Here's one of them...

14:26:50 drmeister https://usercontent.irccloud-cdn.com/file/pIva8N0a/image.png

14:27:27 drmeister That drop at 9:20am - I gotta believe it isn't real.

14:27:56 drmeister I am assuming I need to watch the trend - and the trend looks like there is still a long tail.

14:28:10 drmeister https://usercontent.irccloud-cdn.com/file/hhFi4HkW/image.png

14:36:33 drmeister Yep - not good

14:36:34 drmeister https://usercontent.irccloud-cdn.com/file/a6065YmP/image.png

14:38:13 stassats looks colorful though

14:39:09 yitzi Well, either submit the jobs to kernel yourself....or write your own threadpool?

14:41:02 yitzi There are some examples of using futures in https://github.com/cando-developers/cando/blob/0fc1fa09ee22521403bd46e1b8298f82ae2d94f5/src/lisp/cando-widgets/molecule-select.lisp

14:41:59 drmeister Did you do that?

14:42:04 drmeister So you use `eval`

14:42:11 stassats lparallel seems to be... kinda abandoned

14:42:37 drmeister Or finished?

14:42:59 yitzi drmeister: yes...that was me. You could also keep it simple https://lparallel.org/kernel/

14:43:00 stassats parallel? too complicated to ever be

14:43:25 yitzi Just submit the tasks and make sure to eventually read the results.

14:43:36 yitzi The "futures" are bit weird, IMHO.

14:43:41 stassats i only used the queues from lparallel and then did my own stuff

14:44:49 drmeister yitzi: I'll read up on the kernel API.

14:45:12 drmeister There is no doubt - the tail is still there...

14:45:13 drmeister https://usercontent.irccloud-cdn.com/file/ioIlpkoQ/image.png

14:46:26 yitzi drmeister: I am pretty sure you just add all the tasks and then just idle while waiting for the results ... which are just indicative that the job completed.

15:24:16 drmeister So you just open a channel and submit-task's to it and they automatically go the the *kernel* and then you call receive-result for each task?

15:27:21 yitzi Think so

16:30:58 drmeister I added per-node/per-thread logging and my attempt at load balancing is absolute shite.

16:31:14 yitzi oh?

16:31:37 drmeister I was sorting the jobs based on the number of atoms - figuring more atoms take more time.

16:32:34 drmeister That's not at all the case - the amount of time varies hugely. Now I suspect that some non-linear optimizations are getting trapped and I'm letting them wander too long.

16:32:47 drmeister Digging deeper.

16:34:21 drmeister Some worker threads can finish 12 jobs in the time that one takes for one job.

16:34:55 yitzi So maybe not the fault of lparallel. Hmm....

16:44:27 drmeister Right

19:35:58 stassats are the jobs independent?

19:37:10 stassats i would have made a queue of jobs from which each thread repeatedly gets a job (or a batch of jobs, if each individual one is very small)

20:03:58 drmeister The jobs are independent yes.

20:04:10 drmeister yitzi: I get this when I try to build apptainer with `:mpi t`

20:04:12 drmeister https://www.irccloud.com/pastebin/lbxd7r31/

20:04:50 drmeister It's not a burning issue - it looks like I can push MPI into the future a bit because I think I solved the issue with the tail. I had an almost infinite loop of error /error handling.

20:06:11 yitzi If mpic++ isnt in an obvious place you can specify the path with `:mpicxx <path>`

20:20:22 drmeister But where would it be? Is this in the apptainer?

20:20:47 drmeister It's on the host at `/usr/bin/mpic++`

20:22:10 yitzi No, you need it in the container. We may need to install debian packages.

20:23:35 yitzi Looks like it is libopenmpi-dev

20:24:06 yitzi If it is not already in the container then just add that to the apt-get install in the def file

20:36:17 drmeister Trying that.

22:16:02 drmeister No more long tail...

22:16:03 drmeister https://usercontent.irccloud-cdn.com/file/qGJE6dRZ/image.png

22:18:46 drmeister It was a handler that recognized 3 or 4 linear atoms (a problem for non-linear optimization) and that caught the error and tried to shake up the 3 or 4 linear atoms. It doesn't work very well probably because the rest of the structure forces the atoms back into a linear arrangement.

22:19:28 drmeister There was a potential infinite loop of handling the error and then restarting the calculation and it generating the error again. It would very occasionally knock itself out of that cycle.

22:19:39 drmeister I set it up so it only tries 3 times and then gives up.

22:20:24 drmeister I have an MPI build in apptainer. I'm not sure how to test it though.

23:15:58 drmeister Things are going well now with load balancing.

23:15:59 drmeister https://usercontent.irccloud-cdn.com/file/OG8k6vIX/image.png

23:16:12 drmeister 76% utilization.

23:17:32 drmeister I don't know why there are so many ups and downs