Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote File fetcher and 401/404 HTML body responses #193

Closed
DiegoPino opened this issue Apr 8, 2024 · 2 comments
Closed

Remote File fetcher and 401/404 HTML body responses #193

DiegoPino opened this issue Apr 8, 2024 · 2 comments
Assignees
Labels
API CURLing your way into hell bug Something isn't working CSV Processing Things we do here and there to keep the tabulated goddesses happy File processing Everything is a file, even me. Ingest Setup Knobs and Levers you move while thinking about feelings and metadata and CSV files Reporting Errors, Logs, etc.
Milestone

Comments

@DiegoPino
Copy link
Member

What?

Bug. If you fetch remotely Files via AMI and those URLs are e.g protected. And the you figure it out, you open the remotes, run AMI again, even with the re-download option during processing we end re-using the already downloaded files (HTML.. all wrong). Why? Bc the file persister actually fetches/downloads the Bodies BUT delivers an HTML... (the 401 controller). They end in the composter but until the composter run, we will keep re-using the already saved one.

Das, meine Damen und Herren, ist die definition von einem 🐞

@alliomeria will fix so we never have to deal with this again. Pull coming tomorrow AM.

@DiegoPino DiegoPino self-assigned this Apr 8, 2024
@DiegoPino DiegoPino added bug Something isn't working Reporting Errors, Logs, etc. Ingest Setup Knobs and Levers you move while thinking about feelings and metadata and CSV files File processing Everything is a file, even me. CSV Processing Things we do here and there to keep the tabulated goddesses happy API CURLing your way into hell labels Apr 8, 2024
@DiegoPino DiegoPino added this to the 0.8.0 milestone Apr 8, 2024
@DiegoPino
Copy link
Member Author

The problem happens here.

$response = $this->httpClient->get($uri, ['sink' => $path, 'timeout' => round($max_time,2)]);

the combined action of a sink (which means will download a file/body) with a 401/404 like the way Drupal provides leads to a File being saved.
The solution could be
1.- Download (so keep the sink, but delete after checking the status CODE)
2.- Better, do a HEAD request. If the HEAD has No Size/is 4XX don't even attempt to download.

@DiegoPino
Copy link
Member Author

Resolved via 88899a7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API CURLing your way into hell bug Something isn't working CSV Processing Things we do here and there to keep the tabulated goddesses happy File processing Everything is a file, even me. Ingest Setup Knobs and Levers you move while thinking about feelings and metadata and CSV files Reporting Errors, Logs, etc.
Projects
None yet
Development

No branches or pull requests

1 participant