Why is my simple vec3 pyo3 pyclass so much slower than py glm’s equivalent class at construction and multiplication?

  Kiến thức lập trình

I’m trying to create a Python extension in Pyo3 that creates a type that is similar to a vec3/glm vec3, but instead from rust.

I’ve created the following directory structure:

.
├── Cargo.toml
├── pyproject.toml
├── python
│  └── main.py
└── src
   └── lib.rs

Where only lib.rs and main.py are custom made, the rest was generated by maturin init

lib.rs is :

use pyo3::prelude::*;

#[pyclass]
#[derive(Clone)]
struct Float3{
    #[pyo3(get, set)]
    x : f64,
    #[pyo3(get, set)]
    y : f64,
    #[pyo3(get, set)]
    z : f64,
}
#[pymethods]
impl Float3 {
    #[new]
    fn py_new(x : f64, y : f64, z : f64) -> Self {
        Float3 { x, y, z}
    }
    
    fn __rmul__ (&self, lhs : f64) -> Self{
        return Float3{ x: self.x * lhs, y : self.y * lhs, z : self.z * lhs};
    }

    fn __mul__ (&self, lhs : f64) -> Self{
        return Float3{ x: self.x * lhs, y : self.y * lhs, z : self.z * lhs};
    }

    fn __add__ (&self, lhs : &Self) -> Self{
        return Float3{x : self.x + lhs.x,  y: self.y + lhs.y, z: self.z + lhs.z};
    }
    
    fn __sub__ (&self, lhs : &Self) -> Self{
        return Float3{x : self.x - lhs.x,  y: self.y - lhs.y, z: self.z - lhs.z};
    }


    fn __iadd__ (&mut self, lhs : &Self) -> (){
        *self = Float3{x : self.x + lhs.x,  y: self.y + lhs.y, z: self.z + lhs.z}; 
    }
    
    fn __isub__ (&mut self, lhs : &Self) -> (){
        *self = Float3{x : self.x - lhs.x,  y: self.y - lhs.y, z: self.z - lhs.z};
    }
}


/// A Python module implemented in Rust.
#[pymodule]
fn test_pyo3(_py: Python, m: &PyModule) -> PyResult<()> {
    m.add_class::<Float3>()?; 
    Ok(())
}

And main.py is:

import test_pyo3
import glm 
import time

def main():

    samples = 100000
    tic = time.time()
    for i in range(samples):
        temp = glm.dvec3(0.0, 0.0, 9.81)
    print("average time per construction glm {}".format((time.time() - tic) / samples))    
    tic = time.time()
    for i in range(samples):
        temp = test_pyo3.Float3(0.0, 0.0, 9.81)
    print("average time per construction test_pyo3 {}".format((time.time() - tic) / samples))    
    tic = time.time()
    for i in range(samples):
        temp = 1.5 * glm.dvec3(0.0, 0.0, 9.81)
    print("average time per multiply operation glm {}".format((time.time() - tic) / samples))    
    tic = time.time()
    for i in range(samples):
        temp = 1.5 *  test_pyo3.Float3(0.0, 0.0, 9.81)
    print("average time per multiply operation test_pyo3 {}".format((time.time() - tic) / samples))    
    pass




if __name__ == '__main__':
    main()

main.py runs a benchmark on constructing glm.dvec3, multiplying with a scalar, and the equivalent for my class.

The output is :

average time per construction glm 2.125263214111328e-07
average time per construction test_pyo3 2.834796905517578e-07
average time per construction glm 2.486872673034668e-07
average time per construction test_pyo3 4.966330528259277e-07

I compile the rust side with “maturin develop -r” which should build in release, and I set pyo3 to be pyo3 = “0.21.2” in my Cargo.toml

There’s a 40 ish percentage between construction, and nearly 2x performance reduction in simple scalar multiplication. When looking at the call graph using py-spy, all I see is “trampoline” everywhere with a bunch of random numbers, and it’s not clear what is actually going on. Regardless, this should be an apples to apples comparison, PyGLM is just using C bindings directly, and not using rust to do so. I wouldn’t expect there to be a massive difference in performance with such a simple test case, both should be doing the same kind of work.

Is there a way to bring the performance of my simple class more inline with PyGLM?

LEAVE A COMMENT