-
-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable accumulating files in memory while using attachFieldsToBody #385
Comments
How would you implement this feature? |
I think something like this is simple enough. |
Thanks for reporting! Would you like to send a Pull Request to address this issue? Remember to add unit tests. |
Since this usage does not accumulate:
I would support this new usage instead of adding a new option:
|
We have a need to decide on how to process files by stream inside the API method itself. From what I understand, this usage requires the files be processed before it even reaches the API method. Isn't this the case? Did I miss something, and it's possible to let the API method handle the file by stream even though an onFile callback is specified? |
Well you have somewhere to store the data. There is no rewind for these fileStreams. So you have to store them somewere if you want to still be able to process them. |
This was my motivation for creating the issue. The case where you can't both use attachFieldsToBody and streams in the API method at the same time. (This is why I went with a disableAccumulation option) Currently the only to use streams in this way is with attachFieldsToBody off |
I am really confused. What is what you want to achieve? You want to attach the fields to the body but on the other hand you want to stream the file content. Ok. So maybe you dont wantthat the files get attached to the body as strings? Is that what you want? |
Yeah, that's right. I'd like to have the fields attached to body, but don't want files handled at all. I want to directly stream the files myself in the API method (not before it reaches it) Because I can't do both, I've had to re-implement attachFieldsToBody, but it would be much cleaner to let the library do this (after all, the option exists in the first place). This way I can use fastify schema validations on the API method, rather than having to manually validate afterwards inside the API method itself (which results in messier code) |
Well imho there is way to do tihs. You can not "stop" the stream. You have to store it somewhere. The only solution I know is to store the file into the memory, or to store it into redis. With RedisWritable, you could store the uploaded file into redis as onFile, and then in the handler, you can read it again with the RedisReadable. |
Yes, in that case you would have to handle the file first, but as long as you order the fields first and have the file field be the very last one, this should be fine |
Well I think it will not be possible to implement it. |
For the time being I disabled attachFieldsToBody and attach fields in a loop like this:
What might go wrong if the async generator function Of course, |
Maybe, we can custom the onFile option , process the file stream, store files at local disk, and then attach local paths of files instead of files to body. |
To be honest... This is the implementers duty to handle the streams. and tbh there are so many ready to use stream Implementations... An implementer just needs to correctly pipe the file streams. |
yes. we can do this just by passing a custom onFile to save to a local disk.
However, I suggest whether the onFile configuration item can add several parameters, such as the options ( registration), or add an array parameter to collect all files (called requestFiles, for example, by which we can easily get files or clean them, or maybe add the logic to clean them in the cleanRequestFiles func) to enhance the customization ability of the onFile configuration item |
I have the exact same issue. Here what I would like to achieve :
I even think it should be the default behavior. Fields are named in the request formdata so i expect to retrieve all fields (file and non-file) by their name in the body. the attachFieldsToBody flag have probably been created for this purpose. I'm not sure it fits the usual need. I think binary files should be usable as stream and metadata should be available in value. So I tried to do it with an onFile function: // does not work
onFile: function(part) {
part.value = part;
} But this does not work as onFile must consume the stream in order to reach the next field or the end. The API method is only called when all files are consumed. |
I'm in the same situation as you @felicienfrancois. I think a "helper" function would be a good thing, allowing us to do some variable destructuring. const { fields, filename, mimetype } = await request.file(); // Actual behavior
const { myNonFileField, myOtherField } = fields.getFields(); // would be nice (but may return files)
// const { myNonFileField, myOtherField } = fields.getFields({ includeFiles: false }); // would be even nicer Since actual
The |
This technically reads the stream while keeping a "readable" interface to be consumed inside await app.register(fastifyMultipart, {
attachFieldsToBody: true,
onFile(part) {
part.file = part.file.pipe(new PassThrough())
},
}) However I'm pretty sure this also accumulates the file in memory, is that right @gurgunday? |
Yes, you only add an extra layer of stream. |
The solution to this problem is to save the files on disk instead of memory. Then, load them from disk once they are being read. This is possible, it needs a volunteer. |
Other than what @mcollina suggested, I don't know how to achieve this But the idea looks really solid, saving files to disk and reading them as a stream when needed Most places allow this |
https://github.com/kaka-ng/fastify-plugins/tree/main/packages/multipart For reference, I like to provide the users choice. |
As far as i understand, this issue is not about where to store / to accumulate. It is about stoping accumulating / storing when not necessary. |
Then, shouldn't the new API solves the problem? I don't know how it can be possible to disable accumulate with |
From my understanding, the request here is retrieve all the fields first and let the file handle in later. In generally case, you never know the order of fields and files sent by the client. This is also why we propose to have it accumulated in other places first. |
Or redis |
Most of my current problem today is how can I NOT accumulate the file in memory while also being able to use other fastify features, such as schema validation, other fields on body and swagger/openapi. Which all of them does not work without |
The purpose of streams is not to use it later. It's the opposite. It's using it while receiving and not at the end. So the solution is not to accumulate somewhere else. TLDR streaming would benefit to most real world use cases. It would reduce response time (and reduce memory footprint), which is basically why we choosed fastify over express or others. Then the question is how can be benefit at the same time from :
The biggest issue here is that fields can be appended in random order in the body (not truely random, it appears in the order the frontend appended it in the formData) and there is no "header" listing the fields that will be encountered in the body. Two ideas: But if appended last it would still accumulate and be called at the end but with (full) streams instead of buffer which ease usage in streaming libs If text field are appended in the middle it would still benefit the performance by calling handler in the middle. This solution would require to know in advance what text fields are expected. Either via Fastify Schema or via additional options. 2nd idea: async getters for everything
This solution is more flexible, does not requires a schema definition (if implemented with Object Proxy), but is more complicated to understand how to uses in an optimized way (properties should be accessed in the order they are appended in the form). That's just ideas for now, it may not be easy to implement with fastify and busboy constraints. |
I don't see how it is possible for both ideas without buffering in somewhere (Memory or File). Idea 1: If you don't rely on Idea 2: For all your assumption is the order of fields must be guarantee, but from my point of view. |
A short summary,
Personal advice, use my plugin import FastifyMultipart from '@kakang/fastify-multipart'
import { BusboyAdapter } from '@kakang/fastify-multipart/lib/adapter/busboy'
import { FileStorage } from '@kakang/fastify-multipart/lib/storage/file'
fastify.register(FastifyMultipart, {
addContentTypeParser: true,
adapter: BusboyAdapter,
storage: FileStorage,
storageOption: {
uploadDir: '/'
}
})
You can already archive with the fastify.post('/upload', async function (request, reply) {
for await (const part of request.parts()) {
// you can determine the action by field name
switch(part.fieldname) {
}
}
return ''
}) If you use my plugin, it is also similar. fastify.post('/upload', async function (request, reply) {
for await (const { type, name, value, info } of request.multipart()) {
// you can determine the action by field name
switch(name) {
}
}
} Another API that I can think of is providing an object that have all handler. request.multipart({
foo: 'string' // short alias to validate the type
bar(value) {
// direct handling
}
}) or more simple just provide handler of files in hooks. fastify.get('/', {
config: {
multipart: {
hello(stream) {
// you must return stream here
return stream
}
}
}
}, async function(request, reply) {
await requset.parseMultipart({
// maybe here too
hello(stream) {
// you must return stream here
return stream
}
})
// access in body or files
request.body.other
request.files.hello // stream
}) |
This is the fundamental reason why |
Hi all, is there a way to specify a dedicated behavior for a specific route? If I can be able to specify a different onFile function for a specific route I think the issue is solved because I can parse the files in a specific way and let other fields go in the body. |
Prerequisites
🚀 Feature Proposal
Allow a flag in options to disable accumulation of files in memory, and let the caller handle the files themselves
Currently, if you want to use attachFieldsToBody, you're forced to either allow the library to accumulate files in memory, or provide an onFile function... Which also forces you to process the file (whether by streaming or accumulating).
I'd like to use the attachFieldsToBody feature, but I don't want any of my files accumulated or processed before the api function, because I need to handle them separately later and do streaming.
Motivation
attachFieldsToBody is a useful feature, and I'd rather not have to re-implement it so I can disable the accumulation
Example
fastify.register(require('@fastify/multipart'), { attachFieldsToBody: 'keyValues', disableAccumulation: true })
The text was updated successfully, but these errors were encountered: