Add ARRAY support to PostgresqlReader #264

criccomini · 2023-06-14T18:10:22Z

Recap's PostgresqlReader does not currently support PG arrays as documented here:

https://www.postgresql.org/docs/current/arrays.html

The reader should be updated to support this type.

criccomini · 2023-06-29T17:09:12Z

Arrays in PG are tricky because they don't contain a type in information_schema.columns and they can have an arbitrary number of dimensions (int[] is not one dimensional; any number of dimensions can be stored in it).

https://www.postgresql.org/docs/current/arrays.html

alexdemeo · 2023-12-18T18:34:22Z

Hey Chris. I'm using recap and looking to leverage array functionality with postgres. I encountered this issue and had some questions about implementing this functionality. It seems it's possible to derive the innermost type of from UDT_NAME (slicing off the underscore prefix, eg: "UDT_NAME": "_int4"). Does that sound accurate? As far as the multidimensional array issue, do you see any path forward with still being able to add array support ambiguous to the dimensionality?

criccomini · 2023-12-19T22:12:04Z

Hi! I'll take a look at this tomorrow (hopefully). Spent all day fighting Hive and I'm out of gas now. :P

criccomini · 2023-12-20T00:11:47Z

Was reading over this just now.

data_type character_data
Data type of the column, if it is a built-in type, or ARRAY if it is some array (in that case, see the view element_types), ...

So the element_types view seems promising. I'm curious how it handles multi-dimensional arrays, if at all. Need to poke around more.

criccomini · 2023-12-20T00:32:41Z

After Googling around a bit more, it still seems like there's no way to get the number of dimensions from ARRAY columns. And as you pointed out, UDT_NAME in the columns view has what we need for the types.

Next question is how to represent this in Recap. Maybe:

type: list
name: baz
alias: foo.bar.Baz
values:
    - type: int32
    - type: foo.bar.Baz

This defines a list that can have either int32 (if the column has udf_name=_int4) or a list of the same. The cyclic reference makes it possible to have arbitrarily deep nesting as well as mixed lengths. I think this is what PG supports.

WDYT? The downside to this (IMO) is that it's pretty complex to deal with for the user. But it is actually what you can get from PG, so...

alexdemeo · 2023-12-20T15:43:12Z

I think this makes sense! It's unfortunate dimensionality isn't a part of the schema, but given that I agree cyclic reference is way to go. PG doesn't support mixed lengths within a particular dimension, but I don't think there's a good user-friendly way that represent that constraint.

criccomini · 2023-12-20T19:00:08Z

So, another observation. For types like varchar(255)[], there's no way in PG to get the 255 octet length back. It allows you to define tables that way, but I don't think it's enforced?

criccomini · 2023-12-20T19:39:15Z

This is interesting, from ChatGPT:

I was wondering how PostgreSQL handles bit(8)[] columns. It sounds like it doesn't covert it to bit varying[] the way it does with VARCHAR. But it also doens't use CHARACTER_MAXIMUM_LENGTH the way bit(8) columns would.

alexdemeo · 2023-12-20T19:54:04Z

Oh interesting, it sounds like all this is actually doable with some edge-casing. We can also use pg_attribute.attndims to get the dimensionality. I guess it is stored, just not enforced.

Recap's PostgreSQL converter now supports ARRAY column types. The implementation uses `UDT_NAME` in `information_schema.columns` to get the type of the array elements. There are some weird behaviors with the implementation because of the way PostgreSQL handles arrays. Importantly, the element size is ignored (int(4)[] is just treated as int[]). PostgreSQL also treats all arrays as having an arbitrary number of dimensions, so int[] is the same as int[][], and so on. See #264 for more discussion on the nuances of this implementation. Closes #264

criccomini · 2023-12-20T19:56:24Z

Here's my PR right now:

#410

I am not planning to implement the pg_attribute stuff right now. It just uses maximal defaults. If you want to submit a follow-on PR to add that support, you're welcome to. :)

Recap's PostgreSQL converter now supports ARRAY column types. The implementation uses `UDT_NAME` in `information_schema.columns` to get the type of the array elements. There are some weird behaviors with the implementation because of the way PostgreSQL handles arrays. Importantly, the element size is ignored (int(4)[] is just treated as int[]). PostgreSQL also treats all arrays as having an arbitrary number of dimensions, so int[] is the same as int[][], and so on. See #264 for more discussion on the nuances of this implementation. Closes #264

criccomini · 2023-12-20T20:48:52Z

@alexdemeo recap-core 0.9.5 is on pypi with the latest changes for PostgreSQL arrays. Let me know if you have any issues with it. I'll try and update the docs this afternoon.

Add array support to postgres reader/converter with dimensionality read from `pg_attribute.attndims`. This is grabbed by joining on `information_schema.columns.column_name = pg_attribute.attname` This a followup to #410 which lays a tiny bit of groundwork for #264 (comment)

criccomini mentioned this issue Dec 20, 2023

Add array type for PostgreSQL converter #410

Merged

criccomini closed this as completed in #410 Dec 20, 2023

criccomini mentioned this issue Dec 20, 2023

Update docs for FilesystemClient and strict #409

Open

alexdemeo mentioned this issue Dec 22, 2023

support PG array dimensionality #411

Merged

criccomini mentioned this issue Jan 13, 2024

Add ARRAY support to SnowflakeReader #263

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ARRAY support to PostgresqlReader #264

Add ARRAY support to PostgresqlReader #264

criccomini commented Jun 14, 2023

criccomini commented Jun 29, 2023

alexdemeo commented Dec 18, 2023 •

edited

Loading

criccomini commented Dec 19, 2023

criccomini commented Dec 20, 2023

criccomini commented Dec 20, 2023

alexdemeo commented Dec 20, 2023

criccomini commented Dec 20, 2023

criccomini commented Dec 20, 2023

alexdemeo commented Dec 20, 2023

criccomini commented Dec 20, 2023

criccomini commented Dec 20, 2023

Add ARRAY support to PostgresqlReader #264

Add ARRAY support to PostgresqlReader #264

Comments

criccomini commented Jun 14, 2023

criccomini commented Jun 29, 2023

alexdemeo commented Dec 18, 2023 • edited Loading

criccomini commented Dec 19, 2023

criccomini commented Dec 20, 2023

criccomini commented Dec 20, 2023

alexdemeo commented Dec 20, 2023

criccomini commented Dec 20, 2023

criccomini commented Dec 20, 2023

alexdemeo commented Dec 20, 2023

criccomini commented Dec 20, 2023

criccomini commented Dec 20, 2023

alexdemeo commented Dec 18, 2023 •

edited

Loading