users > nat::xform_brain for Skeletons in Python
Showing 1-2 of 2 posts
Sep 10, 2019 10:09 PM | comic
nat::xform_brain for Skeletons in Python
Hi all,
I am working on an HPC pipeline for processing millions of skeleton (neuron) objects that need to be transformed through a known registration. Our colleagues are working in R / Matlab and were using nat::xform_brain. I dug through the code a big to find where nat::xformpoints was defined, but I had some trouble understanding how it interacts with the C++ cmtk codebase.
Our pipeline is in Python / C++ and ideally I would wrap the relevant parts of CMTK in Cython or pybind11 and integrate this functionality into our pipeline without the need for expensive IPC / IO.
Would someone be able to point me in the right direction?
Thanks!
I am working on an HPC pipeline for processing millions of skeleton (neuron) objects that need to be transformed through a known registration. Our colleagues are working in R / Matlab and were using nat::xform_brain. I dug through the code a big to find where nat::xformpoints was defined, but I had some trouble understanding how it interacts with the C++ cmtk codebase.
Our pipeline is in Python / C++ and ideally I would wrap the relevant parts of CMTK in Cython or pybind11 and integrate this functionality into our pipeline without the need for expensive IPC / IO.
Would someone be able to point me in the right direction?
Thanks!
Sep 11, 2019 11:09 PM | Greg Jefferis
RE: nat::xform_brain for Skeletons in Python
Dear Comic,
Originally posted by comic:
I'm guessing you're collaborating with Mala Murthy's group on the flywire project ...
https://github.com/schlegelp/navis/blob/...
The eventual end point is that R shells out to call the cmtk streamxform tool. The functions that you would want to inspect are here:
https://github.com/natverse/nat/blob/336...
in particular there is a private function cmtk.streamxform that prepares the command line and uses nat::cmtk.system2 to run it. In R, you could try setting
debug(nat::cmtk.system2)
and then running a single transform to see what the streamxform command line looks like. But ... see end of my response.
You have two options here. One is to call the R xform_brainfunction from python using rpy2. See
https://github.com/schlegelp/navis/blob/...
This may seem inelegant but xform_brain does look after quite a lot of details under the hood and more importantly your bottleneck may not be where you think it is. If you do want an example of wrapping CMTK directly, take a look at:
https://github.com/jefferis/cmtkr/
The key point is that provides a C++ wrapper of the CMTK streamxform tool:
https://github.com/jefferis/cmtkr/blob/m...
But if you take a look at the README for cmtkr, I never took this forwards both because there were some glitches getting this to integrate nicely with R that would have taken time to work out *and* becayse the speed-ups were relatively modest in normal use because the majority of the time was spent inside CMTK computing the transform rather than in the overhead.
So now, I come to what may be my most useful advice. I'm not going to quote Knuth at you, but I would recommend that you do a little experiment. You need to know how much time is spent by CMTK computing (nonrigid) transforms vs all the other stuff. I would suggest that the easiest way to do this is to make CMTK only apply the affine part of the transforms in your full transformation chain. As you can imagine, the affine computation time is negligible compared with the nonrigid component. Imagine your call in R looks like this
Try doing:
and timing both with system.time or bench::mark. Here is an example using some published EM data:
vfbcatmaid=catmaid_login(server='https://fafb.catmaid.virtualflybrain.org')
dapns=read.neurons.catmaid('name:PN glomerulus DA', conn=vfbcatmaid)
length(dapns)
## [1] 16
sum(nvertices(dapns))
## [1] 109498
system.time(dapns.pts.fcwb <- xform_brain(xyzmatrix(dapns), sample = FAFB14, reference = FCWB))
## user system elapsed
## 10.889 0.665 9.030
system.time(dapns.pts.fcwb.aff <- xform_brain(xyzmatrix(dapns), sample = FAFB14, reference = FCWB, transformtype="affine"))
## user system elapsed
## 6.515 0.723 4.599
Bottom line, I think you will find that you might get an order 2x speed-up by avoiding the shell step. I think that you will find that if you need to make big speed gains (think 100x) you will need to change the way the transformations themselves are represented. I'm happy to advise, but I think that's a longer conversation and I would need to know the details of the transforms that you want to apply.
All the best,
Greg.
Originally posted by comic:
I am working on an HPC pipeline for processing
millions of skeleton (neuron) objects that need to be transformed
through a known registration.
I'm guessing you're collaborating with Mala Murthy's group on the flywire project ...
Our colleagues are working in R / Matlab and
were using nat::xform_brain. I dug through the code a big to find
where nat::xformpoints was defined, but I had some trouble
understanding how it interacts with the C++ cmtk
codebase.
https://github.com/schlegelp/navis/blob/...
The eventual end point is that R shells out to call the cmtk streamxform tool. The functions that you would want to inspect are here:
https://github.com/natverse/nat/blob/336...
in particular there is a private function cmtk.streamxform that prepares the command line and uses nat::cmtk.system2 to run it. In R, you could try setting
debug(nat::cmtk.system2)
and then running a single transform to see what the streamxform command line looks like. But ... see end of my response.
Our pipeline is in Python / C++ and ideally I
would wrap the relevant parts of CMTK in Cython or pybind11
You have two options here. One is to call the R xform_brainfunction from python using rpy2. See
https://github.com/schlegelp/navis/blob/...
This may seem inelegant but xform_brain does look after quite a lot of details under the hood and more importantly your bottleneck may not be where you think it is. If you do want an example of wrapping CMTK directly, take a look at:
https://github.com/jefferis/cmtkr/
The key point is that provides a C++ wrapper of the CMTK streamxform tool:
https://github.com/jefferis/cmtkr/blob/m...
and integrate this functionality into our
pipeline without the need for expensive IPC / IO.
But if you take a look at the README for cmtkr, I never took this forwards both because there were some glitches getting this to integrate nicely with R that would have taken time to work out *and* becayse the speed-ups were relatively modest in normal use because the majority of the time was spent inside CMTK computing the transform rather than in the overhead.
So now, I come to what may be my most useful advice. I'm not going to quote Knuth at you, but I would recommend that you do a little experiment. You need to know how much time is spent by CMTK computing (nonrigid) transforms vs all the other stuff. I would suggest that the easiest way to do this is to make CMTK only apply the affine part of the transforms in your full transformation chain. As you can imagine, the affine computation time is negligible compared with the nonrigid component. Imagine your call in R looks like this
xform_brain(myneurons,
source=FAFB14, reference=FCWB)
Try doing:
xform_brain(myneurons, source=FAFB,
reference=FCWB, transformtype="affine")
and timing both with system.time or bench::mark. Here is an example using some published EM data:
vfbcatmaid=catmaid_login(server='https://fafb.catmaid.virtualflybrain.org')
dapns=read.neurons.catmaid('name:PN glomerulus DA', conn=vfbcatmaid)
length(dapns)
## [1] 16
sum(nvertices(dapns))
## [1] 109498
system.time(dapns.pts.fcwb <- xform_brain(xyzmatrix(dapns), sample = FAFB14, reference = FCWB))
## user system elapsed
## 10.889 0.665 9.030
system.time(dapns.pts.fcwb.aff <- xform_brain(xyzmatrix(dapns), sample = FAFB14, reference = FCWB, transformtype="affine"))
## user system elapsed
## 6.515 0.723 4.599
Bottom line, I think you will find that you might get an order 2x speed-up by avoiding the shell step. I think that you will find that if you need to make big speed gains (think 100x) you will need to change the way the transformations themselves are represented. I'm happy to advise, but I think that's a longer conversation and I would need to know the details of the transforms that you want to apply.
All the best,
Greg.