Skip to content

Releases: apache/iceberg-python

pyiceberg-0.8.1

06 Dec 19:43
Compare
Choose a tag to compare

Full Changelog: pyiceberg-0.8.0...pyiceberg-0.8.1

Patch Release PR: #1384

What's Changed

The behavior of Table.name is changed to return the table name without the catalog name. This is a broader effort to remove references to the catalog name in pyiceberg.

  • Replace usage of Table.identifier with Table.name which returns the table name without the catalog name
  • Replace the use of a deprecated function (identifier_to_tuple_without_catalog) in pyiceberg; remove unnecessary warnings

Documentation updates are included to reflect the updated process in https://py.iceberg.apache.org/

  • Update β€œhow to release” documentation
  • 0.8.0 post-release steps

Bug fixes

  • Fix add_files for parquet files without column stats
  • Allow leading underscore in column name used in row filter
  • Ignore tables without table_type property from Glue and Hive
  • Write null in manifest list metadata when there is no parent-snapshot-id

Remove upper bound restrictions for dependency libraries; allow early testing of new versions

  • Remove Python library version upper bound restriction; allow Python 3.13
  • Remove fsspec library version upper bound restriction

Commits

36 new commits since the 0.8.0 release.

12 new commits will be included in 0.8.1

  • 11 commits cherry-picked as bug fixes (listed below)
  • 1 commit to bump version to 0.8.1

11 bug fixes (cherry-picked)

acbd071 Write null when there is no parent-snapshot-id (#1383)
bb078cf Add instruction for patch release (#1373)
ab43c6c fix KeyError raised by add_files when parquet file doe not have column stats (#1354)
cc1ab2c Improve documentation for "how to release" (#1359)
64dc6fe Remove Python 3.13 upper bound restriction (#1355)
d86ab6e Allow leading underscore in column name used in row filter (#1358)
7a4734e Replace reference of Table.identifier with Table.name (#1346)
a66ddc0 Ignore tables without table_type from Glue and Hive (#1332)
2cbc77d Drop upper bounds for fsspec and it's implementations (#1341)
7660a5b 0.8.0 post release steps (#1334)
b2f0a9e use the non-deprecated func (#1326)

New Contributors

pyiceberg-0.8.0

18 Nov 19:35
3ccdc44
Compare
Choose a tag to compare

What's Changed

PR

  • Update PyIceberg Verify Release doc by @chinmay-bhat in #976
  • DOCS: Add Github Actions Screenshots to Release Notes by @sungwy in #975
  • Bump up version in dev Dockerfile and Issue Template by @ndrluis in #981
  • Fix pydantic warning in the commit process by @ndrluis in #972
  • Bump up Iceberg version to 1.6.0 by @ndrluis in #982
  • Bug Fix: use appropriate partition spec for delete by @sungwy in #984
  • [Bug Fix]Use self.table_metadata when in transaction by @HonahX in #985
  • DOCS: Add more post release notes by @sungwy in #983
  • Treat warning as error in CI/Dev by @ndrluis in #973
  • Use 'strtobool' instead of comparing with a string. by @ndrluis in #988
  • Fix: accept empty arrays in struct field lookup by @grobgl in #997
  • Add ndrluis as collaborator by @sungwy in #1009
  • Fix list namespace response in rest catalog by @ndrluis in #995
  • Pyarrow IO property for configuring large v small types on read by @sungwy in #986
  • Update metadata-log for non-rest catalogs by @soumya-ghosh in #977
  • Exclude Python 3.9.7 due to import error in catalog module by @ndrluis in #526
  • Deprecate rest.authorization-url in favor of oauth2-server-uri by @ndrluis in #962
  • Allow setting write.parquet.row-group-limit by @Fokko in #1016
  • Deprecate Redundant Identifier Support in TableIdentifier, and row_filter by @sungwy in #994
  • Fix: Handle Empty RecordBatch within _task_to_record_batches, fix correctness issue with positional deletes by @sungwy in #1026
  • Fix overwrite when filtering all the data by @ndrluis in #1023
  • Allow setting write.parquet.page-row-limit by @Fokko in #1017
  • DOCS: Remove older row for write.parquet.row-group-limit by @sungwy in #1030
  • Improve test_version_format() error message for version mismatches by @laksh-krishna-sharma in #1015
  • Bump version to 0.7.1 by @sungwy in #1034
  • Support s3.signer.endpoint for nessie by @guitcastro in #1029
  • [bug] fix reading with to_arrow_batch_reader and limit by @kevinjqliu in #1042
  • Use VisitorWithPartner for name-mapping by @Fokko in #1014
  • Fix tracing existing entries when there are deletes by @Fokko in #1046
  • Coverage Run unit tests first before docker containers are set up by @Minfante377 in #1055
  • Update "verify release" instruction by @kevinjqliu in #1064
  • Fix Install Issues with docutils = 0.21.post1 and exclude 3.12 from supported python dependencies by @sungwy in #1067
  • Post Release 0.7.1 version updates by @sungwy in #1073
  • Update create table doc to clarify ID re-assignment by @paulcichonski in #1072
  • Refactor PyArrow DataFiles Projection functions by @sungwy in #1043
  • DOCS: Exclude signature files from twine upload by @sungwy in #1071
  • Increase the minimal required pyarrow version to 14.0.0 by @ndrluis in #1090
  • Fix table_exists behavior in REST catalog by @ndrluis in #1096
  • fix: improve makefile by @TiansuYu in #1091
  • fix (issue-1079): allow update_column to set doc as '' by @TiansuYu in #1083
  • prevent adding duplicate files by @amitgilad3 in #1036
  • Add list_views to rest catalog by @ndrluis in #817
  • Emit warnings instead of failing when seeing unsupported configuration by @Fokko in #1111
  • Use markdownlint instead of mdformat by @kevinjqliu in #1118
  • Add drop_view to the rest catalog by @ndrluis in #820
  • Support python 3.12 by @kevinjqliu in #1068
  • Make commit_table public by @Fokko in #1112
  • Refactoring: Break down very large table/__init__.py module by @sungwy in #1144
  • fix: Invert case_sensitive logic in StructType by @AnthonyLam in #1147
  • Bump duckdb to version 1.1.0 by @kevinjqliu in #1149
  • Deprecate ADLFS prefix in favor of ADLS by @ndrluis in #961
  • Cache Manifest files by @chinmay-bhat in #787
  • Use the correct spec when rewiting existing manifests by @Fokko in #1157
  • Bug Fix: Use historical partition field name by @sungwy in #1161
  • fix: remove old, incorrect docstring by @dataders in #1166
  • Preserve Backward compatibility in 0.8.0 for #1144 by @sungwy in #1151
  • follow up for more cleanup by @dataders in #1168
  • [bug] [REST] Dont remove identifier root by @kevinjqliu in #1172
  • fix: support MonthTransform for partitioning by @felixscherz in #1176
  • Add metadata tables for data_files and delete_files by @soumya-ghosh in #1066
  • Use ArrowScan.to_table to replace project_table by @JE-Chen in #1180
  • Add Docstrings to pyiceberg/table/__init__.py by @sungwy in #1189
  • Support python 3.12 in poetry by @kevinjqliu in #1192
  • Use cachetools's LRUCache to cache manifest list by @kevinjqliu in #1187
  • HA HMS support by @awdavidson in #752
  • Bug Fix: Position Deletes + row_filter yields less data when the DataFile is large by @sungwy in #1141
  • Remove dead loom link by @kevinjqliu in #1213
  • Drop support for Python 3.8 by @raulcd in #1221
  • Add clarifying docs to transform result types by @kevinzwang in #1211
  • Add flag to allow disabling creation of catalog tables by @isc-patrick in #1155
  • Bug Fix: Glue and Hive catalog return only Iceberg tables by @mark-major in #1145
  • Move snapshot history expire table properties to constants by @ndrluis in #1217
  • abort the whole table transaction if any updates in the transaction has failed by @stevie9868 in #1246
  • PyArrow: Pass in null-mask by @Fokko in #1264
  • Bump PyArrow to 18.0.0 by @Fokko in #1256
  • Remove numpy as a hard dependency by @Fokko in #1270
  • Allow for missing operation by @Fokko in #1263
  • fix: list_tables method in glue catalog now only return tables. by @omkenge in #1258
  • Replace numpy usage and remove from pyproject.toml by @kevinjqliu in #1272
  • Bump version to 0.8.0 by @Fokko in #1276
  • Remove initial_change when CreateTableTransaction apply table updates on an empty metadata by @HonahX in #1219
  • Deprecate for 0.8.0 release by @kevinjqliu in #1269
  • Pass table-token to commit endpoint by @Fokko in #1278
  • Updating configuration docs by @Samreay in #1292
  • Allow union of {int,long}, {float,double}, etc by @Fokko in #1283
  • Allow passing in ARN Role and Session name to the PyArrowFileIO by @Fokko in #1...
Read more

pyiceberg-0.8.0rc2

14 Nov 20:47
3ccdc44
Compare
Choose a tag to compare
pyiceberg-0.8.0rc2 Pre-release
Pre-release

What's Changed

PR

  • Update PyIceberg Verify Release doc by @chinmay-bhat in #976
  • DOCS: Add Github Actions Screenshots to Release Notes by @sungwy in #975
  • Bump up version in dev Dockerfile and Issue Template by @ndrluis in #981
  • Fix pydantic warning in the commit process by @ndrluis in #972
  • Bump up Iceberg version to 1.6.0 by @ndrluis in #982
  • Bug Fix: use appropriate partition spec for delete by @sungwy in #984
  • [Bug Fix]Use self.table_metadata when in transaction by @HonahX in #985
  • DOCS: Add more post release notes by @sungwy in #983
  • Treat warning as error in CI/Dev by @ndrluis in #973
  • Use 'strtobool' instead of comparing with a string. by @ndrluis in #988
  • Fix: accept empty arrays in struct field lookup by @grobgl in #997
  • Add ndrluis as collaborator by @sungwy in #1009
  • Fix list namespace response in rest catalog by @ndrluis in #995
  • Pyarrow IO property for configuring large v small types on read by @sungwy in #986
  • Update metadata-log for non-rest catalogs by @soumya-ghosh in #977
  • Exclude Python 3.9.7 due to import error in catalog module by @ndrluis in #526
  • Deprecate rest.authorization-url in favor of oauth2-server-uri by @ndrluis in #962
  • Allow setting write.parquet.row-group-limit by @Fokko in #1016
  • Deprecate Redundant Identifier Support in TableIdentifier, and row_filter by @sungwy in #994
  • Fix: Handle Empty RecordBatch within _task_to_record_batches, fix correctness issue with positional deletes by @sungwy in #1026
  • Fix overwrite when filtering all the data by @ndrluis in #1023
  • Allow setting write.parquet.page-row-limit by @Fokko in #1017
  • DOCS: Remove older row for write.parquet.row-group-limit by @sungwy in #1030
  • Improve test_version_format() error message for version mismatches by @laksh-krishna-sharma in #1015
  • Bump version to 0.7.1 by @sungwy in #1034
  • Support s3.signer.endpoint for nessie by @guitcastro in #1029
  • [bug] fix reading with to_arrow_batch_reader and limit by @kevinjqliu in #1042
  • Use VisitorWithPartner for name-mapping by @Fokko in #1014
  • Fix tracing existing entries when there are deletes by @Fokko in #1046
  • Coverage Run unit tests first before docker containers are set up by @Minfante377 in #1055
  • Update "verify release" instruction by @kevinjqliu in #1064
  • Fix Install Issues with docutils = 0.21.post1 and exclude 3.12 from supported python dependencies by @sungwy in #1067
  • Post Release 0.7.1 version updates by @sungwy in #1073
  • Update create table doc to clarify ID re-assignment by @paulcichonski in #1072
  • Refactor PyArrow DataFiles Projection functions by @sungwy in #1043
  • DOCS: Exclude signature files from twine upload by @sungwy in #1071
  • Increase the minimal required pyarrow version to 14.0.0 by @ndrluis in #1090
  • Fix table_exists behavior in REST catalog by @ndrluis in #1096
  • fix: improve makefile by @TiansuYu in #1091
  • fix (issue-1079): allow update_column to set doc as '' by @TiansuYu in #1083
  • prevent adding duplicate files by @amitgilad3 in #1036
  • Add list_views to rest catalog by @ndrluis in #817
  • Emit warnings instead of failing when seeing unsupported configuration by @Fokko in #1111
  • Use markdownlint instead of mdformat by @kevinjqliu in #1118
  • Add drop_view to the rest catalog by @ndrluis in #820
  • Support python 3.12 by @kevinjqliu in #1068
  • Make commit_table public by @Fokko in #1112
  • Refactoring: Break down very large table/__init__.py module by @sungwy in #1144
  • fix: Invert case_sensitive logic in StructType by @AnthonyLam in #1147
  • Bump duckdb to version 1.1.0 by @kevinjqliu in #1149
  • Deprecate ADLFS prefix in favor of ADLS by @ndrluis in #961
  • Cache Manifest files by @chinmay-bhat in #787
  • Use the correct spec when rewiting existing manifests by @Fokko in #1157
  • Bug Fix: Use historical partition field name by @sungwy in #1161
  • fix: remove old, incorrect docstring by @dataders in #1166
  • Preserve Backward compatibility in 0.8.0 for #1144 by @sungwy in #1151
  • follow up for more cleanup by @dataders in #1168
  • [bug] [REST] Dont remove identifier root by @kevinjqliu in #1172
  • fix: support MonthTransform for partitioning by @felixscherz in #1176
  • Add metadata tables for data_files and delete_files by @soumya-ghosh in #1066
  • Use ArrowScan.to_table to replace project_table by @JE-Chen in #1180
  • Add Docstrings to pyiceberg/table/__init__.py by @sungwy in #1189
  • Support python 3.12 in poetry by @kevinjqliu in #1192
  • Use cachetools's LRUCache to cache manifest list by @kevinjqliu in #1187
  • HA HMS support by @awdavidson in #752
  • Bug Fix: Position Deletes + row_filter yields less data when the DataFile is large by @sungwy in #1141
  • Remove dead loom link by @kevinjqliu in #1213
  • Drop support for Python 3.8 by @raulcd in #1221
  • Add clarifying docs to transform result types by @kevinzwang in #1211
  • Add flag to allow disabling creation of catalog tables by @isc-patrick in #1155
  • Bug Fix: Glue and Hive catalog return only Iceberg tables by @mark-major in #1145
  • Move snapshot history expire table properties to constants by @ndrluis in #1217
  • abort the whole table transaction if any updates in the transaction has failed by @stevie9868 in #1246
  • PyArrow: Pass in null-mask by @Fokko in #1264
  • Bump PyArrow to 18.0.0 by @Fokko in #1256
  • Remove numpy as a hard dependency by @Fokko in #1270
  • Allow for missing operation by @Fokko in #1263
  • fix: list_tables method in glue catalog now only return tables. by @omkenge in #1258
  • Replace numpy usage and remove from pyproject.toml by @kevinjqliu in #1272
  • Bump version to 0.8.0 by @Fokko in #1276
  • Remove initial_change when CreateTableTransaction apply table updates on an empty metadata by @HonahX in #1219
  • Deprecate for 0.8.0 release by @kevinjqliu in #1269
  • Pass table-token to commit endpoint by @Fokko in #1278
  • Updating configuration docs by @Samreay in #1292
  • Allow union of {int,long}, {float,double}, etc by @Fokko in #1283
  • Allow passing in ARN Role and Session name to the PyArrowFileIO by @Fokko in #1...
Read more

pyiceberg-0.8.0-rc1

07 Nov 20:49
0eaadb9
Compare
Choose a tag to compare
pyiceberg-0.8.0-rc1 Pre-release
Pre-release

What's Changed

PRs

Read more

pyiceberg-0.7.1

19 Aug 18:34
Compare
Choose a tag to compare

What's Changed

  • Fix delete to trace existing manifests when a data file is partially rewritten by @Fokko in #1046
  • Fix 'to_arrow_batch_reader' to respect the limit input arg by @kevinjqliu in #1042
  • Fix correctness of applying positional deletes on Merge-On-Read tables by @sungwy in #1026
  • Fix overwrite when filtering data by @ndrluis in #1023
  • Bug fix for deletes across multiple partition specs on partition evolution by @sungwy in #984
  • Fix evolving the table and writing in the same transaction by @HonahX in #985
  • Fix scans when result is empty by @grobgl in #997
  • Fix ListNamespace response in REST Catalog by @ndrluis in #995
  • Exclude Python 3.9.7 from list of supported versions by @ndrluis in #526
  • Allow setting write.parquet.row-group-limit by @Fokko in #1016
  • Allow setting write.parquet.page-row-limit by @Fokko in #1017
  • Fix pydantic warning during commit by @ndrluis in #972

Full Changelog: pyiceberg-0.7.0...pyiceberg-0.7.1

pyiceberg-0.7.0

30 Jul 23:44
be5c426
Compare
Choose a tag to compare

What's Changed

Read more

PyIceberg 0.6.1

PyIceberg 0.6.0

20 Feb 10:34
cc44926
Compare
Choose a tag to compare

What's Changed

Read more

PyIceberg 0.5.1

30 Oct 13:49
Compare
Choose a tag to compare