Skip to content
This repository has been archived by the owner on Jul 18, 2024. It is now read-only.

[ISSUE-510] Refactor doc_loader.py to load documents concurrently using Ray actor… #511

Merged
merged 1 commit into from
Jan 4, 2024
Merged

Conversation

chaojun-zhang
Copy link
Contributor

…s or Spark tasks, instead of loading them all at once and then putting them into a dataset

What changes were proposed in this pull request?

Refactor doc_loader.py to load documents concurrently using Ray actor or spark task

Why are the changes needed?

performance improve

How was this patch tested?

existing ut

Copy link

#510

…s or Spark tasks, instead of loading them all at once and then putting them into a dataset
Copy link
Contributor

@xuechendi xuechendi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xuechendi xuechendi merged commit c475413 into intel:main Jan 4, 2024
5 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants