Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support GPU execution engine in LazyFrame profiler #20039

Open
Matt711 opened this issue Nov 27, 2024 · 0 comments · May be fixed by #20693
Open

Support GPU execution engine in LazyFrame profiler #20039

Matt711 opened this issue Nov 27, 2024 · 0 comments · May be fixed by #20693
Labels
A-gpu Area: gpu engine enhancement New feature or an improvement of an existing feature

Comments

@Matt711
Copy link

Matt711 commented Nov 27, 2024

Description

The new API would look similar to collect. Eg.

lf = pl.LazyFrame(
    {
        "a": ["a", "b", "a", "b", "b", "c"],
        "b": [1, 2, 3, 4, 5, 6],
        "c": [6, 5, 4, 3, 2, 1],
    }
)
q = lf.group_by("a", maintain_order=True).agg(pl.all().sum()).sort(
    "a"
)
df, df_times = lf.profile(engine="gpu") # or GPU configuration object

If we allow the profile functions in the Polars rust layer to accept a callback (in the same way we do for collect), we can get the timing information. We'd need to change this profile function in the rust layer like

// PyLazyFrame::profile
fn profile(&self, py: Python, lambda_post_opt: Option<PyObject>) -> PyResult<(PyDataFrame, PyDataFrame)> {
    // follow the logic in collect
}

And we'd need to add an engine kwarg to profile function in polars.

def profile(
    self,
    *,
    ...
    engine: EngineType = "cpu",
) -> tuple[DataFrame, DataFrame]:
    ...
    # Following the logic in collect

We'd would also need to somehow pass a callable from rust to python that emulates state.node_timer.store, so we can time the nodes correctly. For example, in python_scan.rs, we'd want

let args = (
    python_scan_function,
    with_columns.map(|x| x.into_iter().map(|x| x.to_string()).collect::<Vec<_>>()),
    predicate,
    n_rows,
    batch_size,
    state.node_timer.store, // <<< add this
);
callable.call1(args).map_err(to_compute_err)

And we'd make the node_timer bridge to in pyo3 look something like

#[pyclass]
struct NodeTimer {
    inner: Option<NodeTimer>
}

#[pymethods]
impl NodeTimer {
    fn __call__(&self, py: Python<'_>, start: StartInstant, end: EndInstant, name: String) {
        ...
    }
}

Any advice or comments on the approach is appreciated?

cc. @wence-

@Matt711 Matt711 added the enhancement New feature or an improvement of an existing feature label Nov 27, 2024
@stinodego stinodego added the A-gpu Area: gpu engine label Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-gpu Area: gpu engine enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants