Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

h3 index which is varchar in the parquet file is being read as as Uint8Array a and breaks H3HexagonLayer in deck.gl #3122

Open
shaunakv1 opened this issue Oct 7, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@shaunakv1
Copy link

shaunakv1 commented Oct 7, 2024

Loader

import { ParquetLoader } from '@loaders.gl/parquet';

Description

Consider the following code which renders counts against h3 indexes in a parquet file. As the code is below it works but that's because ( marked in the code below) I am having to convert each value of h3 index from Uint8Array(15) using d.h3.toString() . Without that conversion, H3HexagonLayer doesn't render!

<script>
import { MapboxOverlay } from '@deck.gl/mapbox';
import { Map } from 'maplibre-gl';
import 'maplibre-gl/dist/maplibre-gl.css';
import { H3HexagonLayer } from '@deck.gl/geo-layers';
import { ParquetLoader } from '@loaders.gl/parquet';
import { ZstdCodec } from 'zstd-codec';
let first = true;
export default {
    data() {
        return {
            msg: 'Hello World!'
        }
    },
    async mounted() {
        const layer = new H3HexagonLayer({
            id: 'H3HexagonLayer',
            //data: 'https://raw.githubusercontent.com/visgl/deck.gl-data/master/website/sf.h3cells.json',
            data: '/ais_h3_8.parquet',
            loadOptions: {
                worker: false, //see if we can turn on the worker. see https://deck.gl/docs/developer-guide/loading-data#loaders-and-web-workers
                modules: {
                    'zstd-codec': ZstdCodec
                }
            },
            loaders: [ParquetLoader],
            extruded: false,
            getHexagon: d => {
                //this is very inefficient, see if parquet can be correctly encoded to yield string
                if (first) {
                    console.log(d); // prints {count: 7153252, h3: Uint8Array(15)}
                    first = false;
                }
                return d.h3.toString(); // <<<<<<------- IMP: without this conversion the rendering breaks! 
            },
            gpuAggregation: true,
            getFillColor: d => [255, (1 - d.count / 500) * 255, 0],
            pickable: false,
            beforeId: 'watername_ocean',
            opacity: 0.5
        });

        const map = new Map({
            container: 'map',
            style: 'https://basemaps.cartocdn.com/gl/dark-matter-gl-style/style.json',
            center: [-122.45, 37.8],
            zoom: 8
        });

        await map.once('load');

        const deckOverlay = new MapboxOverlay({
            interleaved: true,
            layers: [
                layer
            ]
        });

        map.addControl(deckOverlay);
    }
}
</script>

<template>
    <div id="map" class="map">
    </div>
</template>

<style scoped>
.map {
    height: 100%;
    width: 100%;
}
</style>

Here's the details on my parquet file created with DuckDB

select (*) from read_parquet('data/parquet-files/ais_h3_8.parquet') limit 10;
┌─────────┬─────────────────┐
│  count  │       h3        │
│  int64  │     varchar     │
├─────────┼─────────────────┤
│ 7153252 │ 8828d5476bfffff │
select * from parquet_schema('data/parquet-files/ais_h3.parquet');
┌───────────────────────────────────┬───────────────┬────────────┬─────────────┬─────────────────┬──────────────┬────────────────┬───────┬───────────┬──────────┬──────────────┐
│             file_name             │     name      │    type    │ type_length │ repetition_type │ num_children │ converted_type │ scale │ precision │ field_id │ logical_type │
│              varchar              │    varchar    │  varchar   │   varchar   │     varchar     │    int64     │    varchar     │ int64 │   int64   │  int64   │   varchar    │
├───────────────────────────────────┼───────────────┼────────────┼─────────────┼─────────────────┼──────────────┼────────────────┼───────┼───────────┼──────────┼──────────────┤
│ data/parquet-files/ais_h3.parquet │ duckdb_schema │            │             │ REQUIRED        │            2 │                │       │           │          │              │
│ data/parquet-files/ais_h3.parquet │ count         │ INT64      │             │ OPTIONAL        │              │ INT_64         │       │           │          │              │
│ data/parquet-files/ais_h3.parquet │ h3            │ BYTE_ARRAY │             │ OPTIONAL        │              │ UTF8           │       │           │          │              │
└───────────────────────────────────┴───────────────┴────────────┴─────────────┴─────────────────┴──────────────┴────────────────┴───────┴───────────┴──────────┴──────────────┘

Expected Behavior

Since the parquert file has varchars for h3 indexes, I am expecting the deck.gl loader code to work without having to explicit convert to string for every single value.

I am wondering if this is a bug somewhere in Deck.gl or Loaders.gl or if there's something I need to do to change the parquet file itself which uses zstd compression.

image

Steps to Reproduce

Steps to reproduce is in the above example.

Environment

Node Version: 20.0.15
Browser: Chrome
OS: MacOS Sequoia (M1 Max)

"dependencies": {
"@deck.gl/core": "^9.0.32",
"@deck.gl/geo-layers": "^9.0.32",
"@deck.gl/layers": "^9.0.32",
"@deck.gl/mapbox": "^9.0.32",
"@loaders.gl/compression": "^4.2.5",
"@loaders.gl/core": "^4.2.5",
"@loaders.gl/parquet": "^4.2.5",
"maplibre-gl": "^4.7.1",
"vue": "^3.4.29",
"zstd-codec": "^0.1.5"
},

Logs

One row from parquet file on console:

{
    "count": 7153252,
    "h3": {
        "type": "Buffer",
        "data": [
            56,
            56,
            50,
            56,
            100,
            53,
            52,
            55,
            54,
            98,
            102,
            102,
            102,
            102,
            102
        ]
    }
}
@shaunakv1 shaunakv1 added the bug Something isn't working label Oct 7, 2024
@ibgreen
Copy link
Collaborator

ibgreen commented Oct 7, 2024

  • We currently have two implementations of the ParquetLoader, one in JS and one based on WASM for rust.
  • While it would be nice to improve the JS loader, there is no active maintainer working on that right now.
  • So, the strategy for the ParquetLoader will be to focus on the WASM based ParquetWasmLoader going forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants