Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump to 8.0 #839

Open
wants to merge 90 commits into
base: master
Choose a base branch
from
Open

Bump to 8.0 #839

wants to merge 90 commits into from

Conversation

misiugodfrey
Copy link
Collaborator

@misiugodfrey misiugodfrey commented Aug 29, 2024

Catching up OS repo to a point that uses 8.0 deps.

There were a couple of commits here that required custom fixes. Specifically, there were some commits that added dependencies to libraries that are not in the OS deps list (GfxDriver and CPR). There are custom fixes in the last two commits to remove these extra deps as the functionality should not be necessary in the OS repo.

Update: The CPR dep has been pulled into OS, so I've re-added that code in. The only major deviation should now be the GfxDriver changes.

simoneves and others added 30 commits August 27, 2024 11:17
* Remove copy of GDAL_DATA files in main source tree
* Copy both GDAL and PROJ helper files from deps
* Copy to sibling gdal-data and proj-data dirs
* Use CPL API to set config, instead of pushing to the process environment
* Add importer-additional-proj-data-path config
* Set either PROJ_DATA or PROJ_LIB config, dependent on GDAL/PROJ version

Signed-off-by: Misiu Godfrey <[email protected]>
…that is not a multiple of the page size

Signed-off-by: Misiu Godfrey <[email protected]>
* Fixup check logic

* Avoid duplicated expr in the

Signed-off-by: Misiu Godfrey <[email protected]>
* Enable the device extension
* Enable the validation layer feature
* Add compile option (default OFF)

Signed-off-by: Misiu Godfrey <[email protected]>
* Fix path of cpack install of gdal and proj from deps to ThirdParty
Keep gdal and proj directory names for simplicity
Revert to setenv() as CPLSetConfigOption() does NOT work as documented

* Add OSDependent/heavyai_env
Use heavyai::setenv()

* Add heavyai::env_path_separator()

Signed-off-by: Misiu Godfrey <[email protected]>
… cases (#7618)

* Modify isDirectColumnarConversionPossible logic

* Simplify op

* make auto_parallel_row_count_threshold be a global constant

* Add logs

* Add test

* Fixup dist test

Signed-off-by: Misiu Godfrey <[email protected]>
* Add artificial project node between filter and left-join nodes

* Change rewriter function name

* Add comments

* Add test

* Cover more edge cases

* Address comment

* Pass reference of the node_list to the query rewriting function

Signed-off-by: Misiu Godfrey <[email protected]>
* Remove num tuple threshold

* Address nullptr issue of EXPECT_TRUE

* Remove global g_is_test_env from PerfectJoinHashTable

* Modify comment

* Fixup join column buckettized entry count

* Fixup test failures: re-introduce g_is_test_env to control the behavior

Signed-off-by: Misiu Godfrey <[email protected]>
Signed-off-by: Misiu Godfrey <[email protected]>
* Add deps include path when compiling extensions functions with nvcc

Change the `custom_command` for extension function compilation to include
CMAKE_PREFIX_PATH/include to allow nvcc to find (and prefer) the custom
build of boost on Ubuntu (required to update boost to 1.84)

* Disable boost::serialization support for std::optional

Define BOOST_NO_CXX17_HDR_OPTIONAL before including boost/serialization/optional.hpp
to disable boost's new native std::optional serialization as it results in compile
errors when using boost::serialization::split_free

Signed-off-by: Misiu Godfrey <[email protected]>
* Add row-level llm_transform UDF WIP

* Support LLM_TRANSFORM

* Return response string, not an entire response JSON

* Add newline after answer keyword

* Improve error handling logic

* Support dictionary type

* Support llm_transform as filter expr

* Move nlohmann json to thirdparty folder

* Use iq-url instead of hard-coded url

* Convert LLM_TRANSFORM to string function

* Add test

* Allow controlling # concurrent request

* Check input operands

* Use translation_cache to avoid double translation

* Remove duplicated json impl

* Fixup compilation failure

* Add missing close character for prompt

* Add missing newline

* Use RapidJSON

* Use robinhood unordered map

* Allow none-encoded string input

* Update llm call endpoint

* Change the default value of g_max_concurrent_llm_transform_call to 16

* Modify the exception msg

* Use shared/unique lock to access translation_cache

* Address comments

* Add cpr lib

* Make an individual test

* Address comments v2

---------

Co-authored-by: Todd Mostak <[email protected]>
Signed-off-by: Misiu Godfrey <[email protected]>
* Add log

* Remove unnecessary logging about row_size

* Address comments

* Address comments

Signed-off-by: Misiu Godfrey <[email protected]>
BE-6491 Geospatial Dependencies Update for 8.0

Update geospatial libs as follows:

GDAL 3.7.3
PROJ 9.3.1
TIFF 4.5.1
GeoTIFF 1.7.1
GEOS 3.12.1

Adds GDAL support for JPEG2000/JP2 raster, using OpenJPEG library. Also include additional dependent libs LittleCMS2 and WebP. Removed now-superfluous GDAL patch.

Disabled ImportExportTest ExportTest/Shapefile/MULTIPOLYGON for now, as the new GDAL seems to write Shapefiles slightly differently, which breaks a comparison. New comparison Shapefiles are included in this PR, but the test is disabled. The test can be re-enabled once the deps transition is complete.

Fix the GEOS version check which enables ST_ConcaveHull, and fix the test of ST_ConcaveHull, both broken due to not actually including the GEOS version header, so the pre-processor logic was silently failing.

Changed use of boost::process::system() to call ogrinfo to use the more explicit form, as the simple form does not work with the new GDAL.

Added PDAL patch for GDAL API const change. PDAL version is unchanged.

Added fix for static GDAL data destruction (avoids ASAN error on exit in GeospatialTest).

Added separate shared build of PROJ and GDAL tools, so they are still present on a CentOS build.

Install PROJ and GDAL run-time data files directly from deps.

Moved LLVM builds after geo libs build to allow faster iteration. Move back again once dust settles.

NOTE: There is a crash in GDAL FlatGeobuf geo export which we are still tracking down. Work on this is ongoing, but we are rolling the minimal deps change out anyway so that proper testing for 8.0 can proceed. For now, there is a new build option to disable FlatGeobuf export, defaulting to disabled. Interactive export will error, and the corresponding tests in ImportExportTest are disabled.

---------

Co-authored-by: Steve Blackmon <[email protected]>
Signed-off-by: Misiu Godfrey <[email protected]>
… in RelWithDebInfo mode w/ GCC 13.2 and LLVM 14 (#7621)

Signed-off-by: Misiu Godfrey <[email protected]>
…e geometry projection (#7678)

* Throw an appropriate exception to avoid the crash

* Add test

Signed-off-by: Misiu Godfrey <[email protected]>
* Add Calcite query rewrite rule to pushdown any expr on a column expr used in a join qual

* Add tests

* Address comment

Signed-off-by: Misiu Godfrey <[email protected]>
* Allow the `THREADS` key to be passed to data wrappers.

Signed-off-by: Misiu Godfrey <[email protected]>
… python cleanup, and more (#7683)

* Bump centos gcc from 11.1 to 11.4 to match Ubuntu

* Add baseline support for 22.04 / 23.10 and remove pre-20.04 support

 - Use 22.04 prebuilts for 23.10

* Remove legacy python and use modern cmake find_package

  - Remove python, python2 and python-yaml from the Ubuntu package
    installs, and install python3 and python3-yaml

  - Remove python-devel from CentoOS

  - Replace the cmake find_package(PythonInterp) with the new
    find_package(Python). The latter defines slightly different vars
    (e.g. PYTHON_EXECUTABLE becomes Python_EXECUTABLE)

* Bump boost to 1.84

 - Build boost on ubuntu and remove libboost-all-dev from packages

 - Add libnuma-dev (for memkind)

* Bump folly to 2023.01.16.00 and FMT to 9.1.0

 - Disable exception tracer in folly, required for static linking libstdc++
   with the new version

* Update to OpenSSL 3.0.10

 - Build openssl and curl on all ubuntu versions

* Misc dependency script cleanup

 - Remove LLVM 9 patches from scripts

 - Change prebuilt script to always overwrite mapd-deps.sh and not append.
   Fixes the double entries

 - Fix several interactive prompts

Signed-off-by: Misiu Godfrey <[email protected]>
* Add ASAN suppression for ForeignTableDmlTest shutdown leak

Leak in CRYPTO_zalloc during test shutdown

* Add TSAN suppression for lock inversion in GeospatialTest

* Add TSAN suppression for gdal lock inversion (RasterImporter)

* TSAN - Disable GeoJoinTest SingleAndMultipleBins

* TSAN - Disable MLRegressionTests

Signed-off-by: Misiu Godfrey <[email protected]>
steveblackmon-mapd and others added 23 commits August 27, 2024 11:49
* Replace boost::multi_index with AliasInfoMap class

* Move SQLType <-> BufferAttrType conversions to QueryRenderer/TypeUtils

* Move table id and col id extraction to AnalyzerUtils

* Use SQLTypeInfo equality operator for AttrAliasInfo

Signed-off-by: Misiu Godfrey <[email protected]>
* Changed catalog function to return table names instead of table desciptors

Signed-off-by: Misiu Godfrey <[email protected]>
* Remove exitHandler again and put GDAL::init() inside static map accessor
* Also fix a couple of other unrelated TSAN warning sources

Co-authored-by: Matt Pulver <[email protected]>
Signed-off-by: Misiu Godfrey <[email protected]>
* Replace LOG(FATAL) with just return kNULLT in low-level comparison
* Abort function match attempt on receipt of kNULLT indicating type incompatibility
* Add second variant of ST_Contains_MultiPolygon_Point that does not require bounds
* Add simple tests
This may be replaced by a fix to carry bounds with temp geo, after discussion with Pearu

Signed-off-by: Misiu Godfrey <[email protected]>
* Fixup (incorrect) assumption that reading batches from parquet files
always guarantees a fixed number of levels read

* Improve logging during detect in the case where no valid data is
obtained

Signed-off-by: Misiu Godfrey <[email protected]>
This reverts commit 51ebdc38485b5208762ce6b046cff19b30301bfe.

Signed-off-by: Misiu Godfrey <[email protected]>
* Add buffer holders for GPU execution

* Rename structures used to codegen

* Introduce WindowFunctionCtx namespace

* Add preparation for GPU execution in window ctx

* Cleanup & improve WindowFunctionContext::compute()

* Improve a logic to build aggregate tree w/ supporting reusing

* Improve segment tree constructor

* Rebase

* Address comments #1

* Address comments #2: refactor bool param functions

* Address comments #3

* Address comments #4: tbb

* Address comments #5

* Address comments #6

* Fixup test failures

Signed-off-by: Misiu Godfrey <[email protected]>
* Remove DeviceClock class and unnecessary timing event

* Add trailing underscore for variables

* Refactor query timing logic defined in QueryExecutionContext

* Address comments

Signed-off-by: Misiu Godfrey <[email protected]>
* Register new threads created during S3 import with debug timers.

Signed-off-by: Misiu Godfrey <[email protected]>
* Equivalent of OSGeo/gdal#9313 which will be in the next GDAL
* Resolves clash with the other default-namespaced flatbuffers symbols in Arrow 9.0.0

Signed-off-by: Misiu Godfrey <[email protected]>
* Reinstate fatal error which is useful in future development to catch unhandled types
* Check for pointer type to avoid getting to the fatal error in the first place

Signed-off-by: Misiu Godfrey <[email protected]>
Extract what was BaseQueryDataTableSQL into a standalone
concrete class
--
2.43.0

Signed-off-by: Misiu Godfrey <[email protected]>
…(#7732)

* Skip logging D2H for varlen resulset row value fetching while resultset iteration.

* Address comment

Signed-off-by: Misiu Godfrey <[email protected]>
* Move GfxContext creation to DBHandler
* Plumb GfxContext to TableFunctionExecutionContext
* DeviceContexts now owned by GfxContext
* Implement example graphics TF (headless DrawOneTriangle)
also (unrelated)
* Remove a self-#include

---------

Co-authored-by: Steve Blackmon <[email protected]>
Signed-off-by: Misiu Godfrey <[email protected]>
First iteration of raster data wrapper.

Signed-off-by: Misiu Godfrey <[email protected]>
* Add a mutex to VulkanQueue submission

The lock is also invoked prior to checking the internal device_lost_
state flag

* Add CommandExecutionContext and Vulkan implementation

Holds command components that must be unique per execution thread:
CommandList, CommandExecutor, VulkanCommandPool

* includes cleanup

Signed-off-by: Misiu Godfrey <[email protected]>
Signed-off-by: Misiu Godfrey <[email protected]>
@misiugodfrey misiugodfrey requested a review from jack-mapd August 29, 2024 19:04
@CLAassistant
Copy link

CLAassistant commented Aug 29, 2024

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
6 out of 9 committers have signed the CLA.

✅ paul-aiyedun
✅ yoonminnam
✅ mattpulver
✅ misiugodfrey
✅ jack-mapd
✅ mattgara
❌ simoneves
❌ steveblackmon-mapd
❌ Misiu Godfrey


Misiu Godfrey seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@mapd-bot-os
Copy link
Contributor

clang-format failed

@mapd-bot-os
Copy link
Contributor

clang-format failed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants