-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ARRAY support to PostgresqlReader #264
Comments
Arrays in PG are tricky because they don't contain a type in information_schema.columns and they can have an arbitrary number of dimensions (int[] is not one dimensional; any number of dimensions can be stored in it). |
Hey Chris. I'm using recap and looking to leverage array functionality with postgres. I encountered this issue and had some questions about implementing this functionality. It seems it's possible to derive the innermost type of from |
Hi! I'll take a look at this tomorrow (hopefully). Spent all day fighting Hive and I'm out of gas now. :P |
Was reading over this just now.
So the |
After Googling around a bit more, it still seems like there's no way to get the number of dimensions from ARRAY columns. And as you pointed out, UDT_NAME in the Next question is how to represent this in Recap. Maybe: type: list
name: baz
alias: foo.bar.Baz
values:
- type: int32
- type: foo.bar.Baz This defines a list that can have either int32 (if the column has udf_name=_int4) or a list of the same. The cyclic reference makes it possible to have arbitrarily deep nesting as well as mixed lengths. I think this is what PG supports. WDYT? The downside to this (IMO) is that it's pretty complex to deal with for the user. But it is actually what you can get from PG, so... |
I think this makes sense! It's unfortunate dimensionality isn't a part of the schema, but given that I agree cyclic reference is way to go. PG doesn't support mixed lengths within a particular dimension, but I don't think there's a good user-friendly way that represent that constraint. |
So, another observation. For types like |
Oh interesting, it sounds like all this is actually doable with some edge-casing. We can also use |
Recap's PostgreSQL converter now supports ARRAY column types. The implementation uses `UDT_NAME` in `information_schema.columns` to get the type of the array elements. There are some weird behaviors with the implementation because of the way PostgreSQL handles arrays. Importantly, the element size is ignored (int(4)[] is just treated as int[]). PostgreSQL also treats all arrays as having an arbitrary number of dimensions, so int[] is the same as int[][], and so on. See #264 for more discussion on the nuances of this implementation. Closes #264
Here's my PR right now: I am not planning to implement the pg_attribute stuff right now. It just uses maximal defaults. If you want to submit a follow-on PR to add that support, you're welcome to. :) |
Recap's PostgreSQL converter now supports ARRAY column types. The implementation uses `UDT_NAME` in `information_schema.columns` to get the type of the array elements. There are some weird behaviors with the implementation because of the way PostgreSQL handles arrays. Importantly, the element size is ignored (int(4)[] is just treated as int[]). PostgreSQL also treats all arrays as having an arbitrary number of dimensions, so int[] is the same as int[][], and so on. See #264 for more discussion on the nuances of this implementation. Closes #264
Recap's PostgreSQL converter now supports ARRAY column types. The implementation uses `UDT_NAME` in `information_schema.columns` to get the type of the array elements. There are some weird behaviors with the implementation because of the way PostgreSQL handles arrays. Importantly, the element size is ignored (int(4)[] is just treated as int[]). PostgreSQL also treats all arrays as having an arbitrary number of dimensions, so int[] is the same as int[][], and so on. See #264 for more discussion on the nuances of this implementation. Closes #264
Recap's PostgreSQL converter now supports ARRAY column types. The implementation uses `UDT_NAME` in `information_schema.columns` to get the type of the array elements. There are some weird behaviors with the implementation because of the way PostgreSQL handles arrays. Importantly, the element size is ignored (int(4)[] is just treated as int[]). PostgreSQL also treats all arrays as having an arbitrary number of dimensions, so int[] is the same as int[][], and so on. See #264 for more discussion on the nuances of this implementation. Closes #264
Recap's PostgreSQL converter now supports ARRAY column types. The implementation uses `UDT_NAME` in `information_schema.columns` to get the type of the array elements. There are some weird behaviors with the implementation because of the way PostgreSQL handles arrays. Importantly, the element size is ignored (int(4)[] is just treated as int[]). PostgreSQL also treats all arrays as having an arbitrary number of dimensions, so int[] is the same as int[][], and so on. See #264 for more discussion on the nuances of this implementation. Closes #264
@alexdemeo recap-core 0.9.5 is on pypi with the latest changes for PostgreSQL arrays. Let me know if you have any issues with it. I'll try and update the docs this afternoon. |
Add array support to postgres reader/converter with dimensionality read from `pg_attribute.attndims`. This is grabbed by joining on `information_schema.columns.column_name = pg_attribute.attname` This a followup to #410 which lays a tiny bit of groundwork for #264 (comment)
Recap's PostgresqlReader does not currently support PG arrays as documented here:
https://www.postgresql.org/docs/current/arrays.html
The reader should be updated to support this type.
The text was updated successfully, but these errors were encountered: