Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MsgPackLite: support InputStream as first-class object #219

Open
nicktorwald opened this issue Aug 14, 2019 · 2 comments
Open

MsgPackLite: support InputStream as first-class object #219

nicktorwald opened this issue Aug 14, 2019 · 2 comments
Labels
feature A new functionality

Comments

@nicktorwald
Copy link

nicktorwald commented Aug 14, 2019

In scope of #208, @Totktonada stated

I think we can write a potentially large request chunk-by-chunk to a socket. To achieve that we need to learn our msgpack encoder to work in a lazy manner: receive a stream as an input parameter and return a stream as a result.

My main concern is that we can avoid materializing streams in memory.

JDBC streams support was provided for PreparedStaement in #208. Due to the lack of streams support in IPPROTO as well as MsgPackLite implementation, the driver materializes a stream in memory before it will be encoded and sent to the socket.

Rationale: it's possible to send/receive quite long binary data. (for instance, MP_BIN32 stores a byte array whose length is up to (2^32)-1 bytes == 4Gb). As a result, it will be convenient to push/fetch bytes partially chunk-by-chunk for such enormous data. The InputStream, especially BufferredStream, looks a suitable abstraction over the bytes "on-demand".

There are a few open questions:
Do we support both byte arrays and streams to represent binary data?
Maybe it's worth providing such support only for mp_bin16/32 and for mp_bin8 use byte[].
The current implementation requires a packet must be fully encoded to be sent. And vice versa response, before it will be returned, should be completely read and decoded. How to support laziness here is a big question.

It seems the last point is the most important to be clarified.

@Totktonada
Copy link
Member

Do we support both byte arrays and streams to represent binary data?
Maybe it's worth providing such support only for mp_bin16/32 and for mp_bin8 use byte[].

Is this the questions about msgpack encoder / decoder API, a client API or JDBC client API? Can you expand it a bit, please?

The current implementation requires a packet must be fully encoded to be sent. And vice versa response, before it will be returned, should be completely read and decoded. How to support laziness here is a big question.

If this is the question how to support it with tarantool binary protocol, then let me cite my comment to PR #208:

I would not say that the protocol does not support streaming. At least if you know a size of your data you can calculate a packet size before encode it, send a packet header and then do a bunch of writes to a socket. When a size of data is known it is quite easy to calculate msgpack size. However you are right in the case when a size of data is not known: a client should materialize it or calculate a size of data somehow before start sending.

If the question is about the connector implementation, then I guess it requires to implement a size calculation w/o actual encoding and support of encoding by chunks in our msgpack encoder. The same for decoder.

When I did comment PR #208 I didn't consider that it is possible that a data size may be unknown according to JDBC API. We however need a data size to create a packet header. JDBC API allows to send streams w/o known size, they will need to be materized anyway. So now I think that support of lazy msgpack encoding has much less value. More on that: tarantool is not intended to be used as a storage for large data records (at least now).

So I think it not worth to implement this issue.

@nicktorwald Are there question need to be clarified (maybe the first one) or we can close this?

@Totktonada Totktonada added the feature A new functionality label Aug 21, 2019
@nicktorwald
Copy link
Author

Is this the questions about msgpack encoder / decoder API, a client API or JDBC client API? Can you expand it a bit, please?

That question isn't so important and it was about MsgPack API, should it support both byte[] for mp_bin8 and InputStream for bigger data size (mp_bin16/32).

I think we can close the issue now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new functionality
Projects
None yet
Development

No branches or pull requests

2 participants