Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(signal schema): serialize base classes for custom types #777

Merged
merged 1 commit into from
Jan 6, 2025

Conversation

shcheklein
Copy link
Member

@shcheklein shcheklein commented Jan 4, 2025

Fixes https://github.com/iterative/studio/issues/11140 (private repo).

If we have a class hierarchy:

from typing import Iterator
from datachain import DataChain, File

dc = DataChain.from_storage("gs://mybucket")

class VideoInterval(File):
    start: int
    end: int


def get_actions(file: File) -> Iterator[VideoInterval]:
    yield VideoInterval(start=6, end=15, **file.model_dump())
    yield VideoInterval(start=39, end=51, **file.model_dump())
    yield VideoInterval(start=145, end=166, **file.model_dump())


dc = dc.gen(action=get_actions).save("res")

We want for the VideoInterval class to have enough information about its base class File to be available after we serialize / deserialize it. For downstream tools to be able to work with it like a File (e.g. preview in Studio).

Also, this PR fixes an existing issue with schema - complex types like list[MyType] were not properly handled during serialization / deserialization (and it was returning list[Any]).

Fix Description

It adds bases field to the _custom_types dict that we serialize to preserve fields and their types. bases includes all base classes, their names, etc. When we deserialize we take the first registered in the ModelStore and pass it as a base to Pydantic create model:

Demo

Screen.Recording.2025-01-03.at.8.31.02.PM.mov

TODO:

  • Add more tests
  • Refactor a bit to reduce the complexity (remove noqa)

@shcheklein shcheklein self-assigned this Jan 4, 2025
@shcheklein shcheklein force-pushed the fix-studio-11140/child-classes-support branch from 3e7173a to 9a7d439 Compare January 4, 2025 02:36
Copy link

cloudflare-workers-and-pages bot commented Jan 4, 2025

Deploying datachain-documentation with  Cloudflare Pages  Cloudflare Pages

Latest commit: 7d188a6
Status: ✅  Deploy successful!
Preview URL: https://e9f05e3b.datachain-documentation.pages.dev
Branch Preview URL: https://fix-studio-11140-child-class.datachain-documentation.pages.dev

View logs

Copy link

codecov bot commented Jan 4, 2025

Codecov Report

Attention: Patch coverage is 98.07692% with 1 line in your changes missing coverage. Please review.

Project coverage is 87.36%. Comparing base (8dfa4ff) to head (7d188a6).
Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
src/datachain/lib/signal_schema.py 98.07% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #777      +/-   ##
==========================================
+ Coverage   87.33%   87.36%   +0.02%     
==========================================
  Files         116      116              
  Lines       11147    11179      +32     
  Branches     1532     1539       +7     
==========================================
+ Hits         9735     9766      +31     
  Misses       1032     1032              
- Partials      380      381       +1     
Flag Coverage Δ
datachain 87.29% <98.07%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@shcheklein shcheklein force-pushed the fix-studio-11140/child-classes-support branch from 9a7d439 to a7d5da1 Compare January 4, 2025 04:21
@shcheklein shcheklein requested a review from a team January 4, 2025 04:32
@shcheklein shcheklein added the bug Something isn't working label Jan 4, 2025
@shcheklein shcheklein force-pushed the fix-studio-11140/child-classes-support branch from a7d5da1 to 67b9843 Compare January 4, 2025 05:03
@shcheklein shcheklein force-pushed the fix-studio-11140/child-classes-support branch from 67b9843 to a8e673f Compare January 4, 2025 18:10
@shcheklein shcheklein force-pushed the fix-studio-11140/child-classes-support branch from a8e673f to eb8b9cf Compare January 4, 2025 18:29
@shcheklein shcheklein force-pushed the fix-studio-11140/child-classes-support branch from eb8b9cf to a2d3742 Compare January 4, 2025 18:34
@shcheklein shcheklein force-pushed the fix-studio-11140/child-classes-support branch from a2d3742 to 6ac622b Compare January 4, 2025 20:48
@shcheklein shcheklein force-pushed the fix-studio-11140/child-classes-support branch from 6ac622b to 7d188a6 Compare January 4, 2025 21:18
Copy link
Contributor

@dreadatour dreadatour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this update, looks good, tests are awesome! 👍

@shcheklein shcheklein merged commit 2a81eb6 into main Jan 6, 2025
34 checks passed
@shcheklein shcheklein deleted the fix-studio-11140/child-classes-support branch January 6, 2025 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants