Skip to content

Q3 2023 Release

Compare
Choose a tag to compare
@zachmu zachmu released this 05 Jul 20:29
· 2318 commits to main since this release
812b68d

This is the quarterly roll-up release, containing many new features and bug fixes.

Interfaces will not be stable until 1.0.

Merged PRs

go-mysql-server

  • 1861: chore: remove refs to deprecated io/ioutil
  • 1860: chore: unnecessary use of fmt.Sprintf
  • 1859: chore: use copy(to, from) instead of a loop
  • 1856: Support IPV6 loopback address for looking up user credentials
    Map "::1" and "127.0.0.1" to localhost when looking up users.
    There don't appear to be tests for this code path. TBD if I'll add some.
    Related to: dolthub/dolt#6239
  • 1854: Prevent loops in stored procedures from returning multiple result sets
    The query in dolthub/dolt#6230 was causing rows from many result sets to be returned from a stored procedure. We already have code that limits BEGIN/END blocks to return the last SELECTed result set; this PR extends that logic to loop constructs as well.
    Fixes: dolthub/dolt#6230
    Dolt CI Checks: dolthub/dolt#6245
  • 1853: chore: slice replace loop
  • 1852: Alter stored procedure execution to deal with statements that commit transactions
    This change adds checks to begin a new transaction whenever there isn't one during stored procedure execution. This lets things like dolt_commit() execute correctly in stored procedures.
  • 1851: memo.Literal has different type than lookup
    This panics on dolt:
    CREATE TABLE tab2(pk INTEGER PRIMARY KEY, col0 INTEGER, col1 FLOAT, col2 TEXT, col3 INTEGER, col4 FLOAT, col5 TEXT);
    CREATE UNIQUE INDEX idx_tab2_0 ON tab2 (col1 DESC,col4 DESC);
    CREATE INDEX idx_tab2_1 ON tab2 (col1,col0);
    CREATE INDEX idx_tab2_2 ON tab2 (col4,col0);
    CREATE INDEX idx_tab2_3 ON tab2 (col3 DESC);
    INSERT INTO tab2 VALUES(0,344,171.98,'nwowg',833,149.54,'wjiif');
    INSERT INTO tab2 VALUES(1,353,589.18,'femmh',44,621.85,'qedct');
    SELECT pk FROM tab2 WHERE ((((((col0 IN (SELECT col3 FROM tab2 WHERE ((col1 = 672.71)) AND col4 IN (SELECT col1 FROM tab2 WHERE ((col4 > 169.88 OR col0 > 939 AND ((col3 > 578))))) AND col0 >= 377) AND col4 >= 817.87 AND (col4 > 597.59)) OR col4 >= 434.59 AND ((col4 < 158.43)))))) AND col0 < 303) OR ((col0 > 549)) AND (col4 BETWEEN 816.92 AND 983.96) OR (col3 BETWEEN 421 AND 96);
    The PutField function expects the value to match the tuple descriptor exactly, and will panic if it does not.
    The section of code in memo that creates a new range uses the type from the expression, but in other places it uses the index column expression types.
    An alternative solution would be to have some logic in dolt to convert to the corresponding sql.Type based off the val.Enc
  • 1848: IntDiv.Type() should always return either uint64 or int64
    Previously, our IntDiv.convertLeftRight() used IntDiv.Type() to determine the larger type between IntDiv.Left.Type() and IntDiv.Right.Type() to avoid precision loss when doing internal calculations. Now, that logic is moved from IntDiv.Type() to IntDiv.convertLeftRight(), and IntDiv.Type() can only return uint64 or int64.
    This should fix the sql correctness regression from #1834
  • 1847: Fix TargetSchema.Resolved() to check targetSchema column default expressions
    A couple SchemaTarget implementations weren't checking if the targetSchema was resolved as part of the Resolved() method. Added tests, audited the other implementations, and simplified the logic to use a new method on Schema to check that column default expressions are resolved.
    Fixes: dolthub/dolt#6206
    Dolt CI Run: dolthub/dolt#6213
  • 1846: update information_schema.processlist to correctly display status of processes and databases
    We used to hardcode "Query", now we reference process.Command
    Additionally, we now get the database from the current session and use that variable.
    fix for: dolthub/dolt#6023
  • 1844: fix panic for group by binary type
    We made a bad type assertion for sql.StringType.
    Additionally, this fixes a issue where UnaryExpressions with GetFields would incorrectly throw a functional dependency error with ONLY_FULL_GROUP_BY enabled.
    Fix for second part of: dolthub/dolt#6179
  • 1843: Improvements to CAST and CONVERT functions
    This PR adds support for casting/converting to FLOAT and DOUBLE types with the CAST and CONVERT functions. It also adds support for length (aka precision) and scale type constraints (e.g. CAST(1.2345 AS DECIMAL(3,2))).
    Parser support for DOUBLE and FLOAT with CAST and CONVERT: dolthub/vitess#249
    Fixes: dolthub/dolt#5835
  • 1841: adding version and version_comment values
    @@version now returns 8.0.11
    @@version_comment now returns "Dolt"; in mysql, this appears to be dependent on OS / method of install
    • Some people get MySQL Community Server - GPL
    • Others get Homebrew
      Fix for first part of: dolthub/dolt#6179
  • 1840: deduplicate (hash) intuple for and queries
    This PR was originally supposed to fix it: original fix: #1677, but AND statements weren't covered.
    fix for: dolthub/dolt#6189
  • 1839: Slow degenerate semi join, hoist select opt
    This enables recursive subquery decorrelations, and adds a hash join execution option for semi joins that is equivalent to cached subquery existence checks.
  • 1838: resolve aliases in subqueries in function arguments
    The rule reorderProjection also replaces subqueries with getfields in projections when they are used by subqueries, but it did not check for function expressions.
    This meant that aliases in subqueries as arguments to functions threw a "x" could not be found error.
    This PR just has the section of reorderProjection that is supposed to find deferredColumns also look at the arguments of functions recursively (because we can nest functions).
    Additionally, there was another schema type bug:
    tmp> select 0 as foo, if((select foo), 123, 456);
    +-----+----------------------------+
    | foo | if((select foo), 123, 456) |
    +-----+----------------------------+
    | 0   | 127                        |
    +-----+----------------------------+
    1 row in set (0.00 sec)
    MySQL returns an Integer type for if statement, and if either argument is a String, it always returns a String.
    fix for: dolthub/dolt#6174
  • 1836: update cached table count in prepared statements
    Prepared statements were caching table counts. We need to update the table count when finalizing prepared statements to bring table count up to date with any intermediate edits.
  • 1834: fix expected schema for sum(literal)
    The code path we take when print rows to shell is different than spooling from server.
    In the sql case, we ignore the schema we get from analysis.
    In the server case, we actually read the schema, and ensure that the rows are of that type.
    When doing sum(literal), we use the type of the literal. In this issue, the literal was 1, so an INT8, which caps out at 127.
    sum() is always supposed to return a float64, so I made a change to do that.
    I checked by starting mysql with --column-type-info option, and it does appear that any columns coming from sum() has a DECIMAL type.
    Fix for: dolthub/dolt#6120
  • 1830: Use SO_REUSEADDR and SO_REUSEPORT options when creating the sql server on Unix
    This prevents a transient error we've been seeing where the server sometimes fails to start, and the OS claims port already in use, even though we've already confirmed that the port is not in use prior to running dolt sql-server.
  • 1829: plan.TableCountLookup short circuits count()
    In many cases it is unnecessary to read an entire table to report count(*). We can use the RowCount() interface to jump to the answer.
  • 1828: Consolidated collation maps
    Main file to check is the generate/main.go file. After running the updated generation program, these are the consolidated files:
    common_utf8mb4_es_0900_ai_ci_Weights: [utf8mb4_es_0900_ai_ci_Weights, utf8mb4_es_trad_0900_ai_ci_Weights]
    common_utf8mb4_es_0900_as_cs_Weights: [utf8mb4_es_0900_as_cs_Weights, utf8mb4_es_trad_0900_as_cs_Weights]
    common_utf_croatian_ci_Weights: [utf16_croatian_ci_Weights, utf32_croatian_ci_Weights, utf8mb3_croatian_ci_Weights, utf8mb4_croatian_ci_Weights]
    common_utf_czech_ci_Weights: [utf16_czech_ci_Weights, utf32_czech_ci_Weights, utf8mb3_czech_ci_Weights, utf8mb4_czech_ci_Weights]
    common_utf_danish_ci_Weights: [utf16_danish_ci_Weights, utf32_danish_ci_Weights, utf8mb3_danish_ci_Weights, utf8mb4_danish_ci_Weights]
    common_utf_esperanto_ci_Weights: [utf16_esperanto_ci_Weights, utf32_esperanto_ci_Weights, utf8mb3_esperanto_ci_Weights, utf8mb4_esperanto_ci_Weights]
    common_utf_estonian_ci_Weights: [utf16_estonian_ci_Weights, utf32_estonian_ci_Weights, utf8mb3_estonian_ci_Weights, utf8mb4_estonian_ci_Weights]
    common_utf_german2_ci_Weights: [utf16_german2_ci_Weights, utf32_german2_ci_Weights, utf8mb3_german2_ci_Weights, utf8mb4_german2_ci_Weights]
    common_utf_hungarian_ci_Weights: [utf16_hungarian_ci_Weights, utf32_hungarian_ci_Weights, utf8mb3_hungarian_ci_Weights, utf8mb4_hungarian_ci_Weights]
    common_utf_icelandic_ci_Weights: [utf16_icelandic_ci_Weights, utf32_icelandic_ci_Weights, utf8mb3_icelandic_ci_Weights, utf8mb4_icelandic_ci_Weights]
    common_utf_latvian_ci_Weights: [utf16_latvian_ci_Weights, utf32_latvian_ci_Weights, utf8mb3_latvian_ci_Weights, utf8mb4_latvian_ci_Weights]
    common_utf_lithuanian_ci_Weights: [utf16_lithuanian_ci_Weights, utf32_lithuanian_ci_Weights, utf8mb3_lithuanian_ci_Weights, utf8mb4_lithuanian_ci_Weights]
    common_utf_persian_ci_Weights: [utf16_persian_ci_Weights, utf32_persian_ci_Weights, utf8mb3_persian_ci_Weights, utf8mb4_persian_ci_Weights]
    common_utf_polish_ci_Weights: [utf16_polish_ci_Weights, utf32_polish_ci_Weights, utf8mb3_polish_ci_Weights, utf8mb4_polish_ci_Weights]
    common_utf_roman_ci_Weights: [utf16_roman_ci_Weights, utf32_roman_ci_Weights, utf8mb3_roman_ci_Weights, utf8mb4_roman_ci_Weights]
    common_utf_romanian_ci_Weights: [utf16_romanian_ci_Weights, utf32_romanian_ci_Weights, utf8mb3_romanian_ci_Weights, utf8mb4_romanian_ci_Weights]
    common_utf_sinhala_ci_Weights: [utf16_sinhala_ci_Weights, utf32_sinhala_ci_Weights, utf8mb3_sinhala_ci_Weights, utf8mb4_sinhala_ci_Weights]
    common_utf_slovak_ci_Weights: [utf16_slovak_ci_Weights, utf32_slovak_ci_Weights, utf8mb3_slovak_ci_Weights, utf8mb4_slovak_ci_Weights]
    common_utf_slovenian_ci_Weights: [utf16_slovenian_ci_Weights, utf32_slovenian_ci_Weights, utf8mb3_slovenian_ci_Weights, utf8mb4_slovenian_ci_Weights]
    common_utf_spanish2_ci_Weights: [utf16_spanish2_ci_Weights, utf16_spanish_ci_Weights, utf32_spanish2_ci_Weights, utf32_spanish_ci_Weights, utf8mb3_spanish2_ci_Weights, utf8mb3_spanish_ci_Weights, utf8mb4_spanish2_ci_Weights, utf8mb4_spanish_ci_Weights]
    common_utf_swedish_ci_Weights: [utf16_swedish_ci_Weights, utf32_swedish_ci_Weights, utf8mb3_swedish_ci_Weights, utf8mb4_swedish_ci_Weights]
    common_utf_turkish_ci_Weights: [utf16_turkish_ci_Weights, utf32_turkish_ci_Weights, utf8mb3_turkish_ci_Weights, utf8mb4_turkish_ci_Weights]
    common_utf_unicode_520_ci_Weights: [utf16_unicode_520_ci_Weights, utf32_unicode_520_ci_Weights, utf8mb4_unicode_520_ci_Weights]
    common_utf_unicode_ci_Weights: [utf16_unicode_ci_Weights, utf32_unicode_ci_Weights, utf8mb3_unicode_ci_Weights, utf8mb4_unicode_ci_Weights]
    common_utf_vietnamese_ci_Weights: [utf16_vietnamese_ci_Weights, utf32_vietnamese_ci_Weights, utf8mb3_vietnamese_ci_Weights, utf8mb4_vietnamese_ci_Weights]
    
  • 1827: Remove db-specific transaction interfaces / logic
  • 1825: implement create spatial ref sys
    This implements the create spatial reference system ..., which lets users add custom SRID to the information schema.
    MySQL docs: https://dev.mysql.com/doc/refman/8.0/en/create-spatial-reference-system.html
    MySQL is much more restrictive when it comes to what is a valid DEFINITION for an entry in this table, and the rules are unclear, so we are much more permissive for now.
    Additionally, this information persist in MySQL between server restarts, which we do not do. However, MySQL does throw a warning stating that updating may discard any changes the user makes.
    Lastly, the values persist between test runs, and we don't support deleting from information_schema, so some tests are modified.
    fix for: dolthub/dolt#6002
  • 1823: Trim spaces and empty statements to the right in planbuilder.Parse
  • 1822: join filter closure and constant join lookups
    This PR adds a set of join planning improvements.
    1. Table aliases can accept multi column indexes
      We have never been able to choose a multi-expression range scan through table aliases.
      Before:
    tmp2> explain select * from t alias where a = 1 and b = 1 and c = 1;
    +-----------------------------------------------------------+
    | plan                                                      |
    +-----------------------------------------------------------+
    | Filter                                                    |
    |  ├─ (((alias.a = 1) AND (alias.b = 1)) AND (alias.c = 1)) |
    |  └─ TableAlias(alias)                                     |
    |      └─ IndexedTableAccess(t)                             |
    |          ├─ index: [t.a]                                  |
    |          ├─ filters: [{[1, 1]}]                           |
    |          └─ columns: [a b c]                              |
    +-----------------------------------------------------------+
    After:
    tmp2> explain select * from t alias where a = 1 and b = 1 and c = 1;
    +-----------------------------------------------------------+
    | plan                                                      |
    +-----------------------------------------------------------+
    | Filter                                                    |
    |  ├─ (((alias.a = 1) AND (alias.b = 1)) AND (alias.c = 1)) |
    |  └─ TableAlias(alias)                                     |
    |      └─ IndexedTableAccess(t)                             |
    |          ├─ index: [t.a,t.b,t.c]                          |
    |          ├─ filters: [{[1, 1], [1, 1], [1, 1]}]           |
    |          └─ columns: [a b c]                              |
    +-----------------------------------------------------------+
    This has silently been impacting join performance in particular, where table aliases are more common. This is a small change but I'd expect this to have a broad positive impact for customers.
    2. Join equivalence closure
    A join like select * from xy join uv on x = u join ab on u = a has two initial join edges, x = u and u = a. Those edges create expression groupings xy x uv, uv x ab, xy x uv x ab. There misses a transitive edge, x = a, with a corresponding join group xy x ab. We should generate plans for most transitive edges now (transitive edges in apply joins are harder).
    For joins with many tables this will unlock many potential join paths.
    3. Use functional dependencies to find more lookup and merge joins
    We can use constants and aggregated equivalency sets (equal filters) to be more aggressive with lookup join selection. Previously we only searched the current join ON equal conditions for expressions that match an index prefix for a lookup join, but constants are also valid lookup keys.
    Refer to dolthub/dolt#5993 and dolthub/dolt#3797 for in-depth examples.
    4. Use functional dependencies to do better lookup join costing
    Even though we do not have index statistics, we can still use functional dependencies on indexes to detect whether a lookup will have MAX_1_ROW. Two examples where we can detect MAX_1_ROW: our lookup index is the primary key, and our lookup key provides a constant or equals expression for every pk column; our lookup index is unique, our lookup key has constants or equal expressions for every column, and we can prove that every key expression is non-nullable.
    MAX_1_ROW lookups are a rare binary condition, most of the time selectivity is in the continuous range 0-1. When they do occur they are usually the most efficient access pattern. Many of the test changes from HASH_JOIN or MERGE_JOIN to LOOKUP_JOIN are a result of this improvement. The issues linked above in (3) have practical examples.
  • 1821: Bug fix: The result schema of SELECT INTO should always be an empty schema
    SELECT INTO currently returns it's child node's schema as its result schema, but it doesn't actually return row data in that schema. This causes a problem over a SQL connection when clients see a result schema and then see row data that doesn't match that schema. This causes clients to freak out and close the connection from their side. Since SELECT INTO always sends its results to a file or SQL vars (and NOT over the SQL connection), its result schema should always be the empty schema.
    Fixes: dolthub/dolt#6105
  • 1819: Lazy load large character set encoding weight maps.
    Improves dolt startup times substantially.
  • 1818: Throw correct error for non-existent qualified column on table function
    Fix for: dolthub/dolt#6101
  • 1817: Ignore FULLTEXT in CREATE TABLE
    This change allows us to ignore any FULLTEXT keys when using CREATE TABLE. This should unblock customers who just need their statements to parse, but don't actually need the functionality of FULLTEXT. We still error when trying to manually add a FULLTEXT index using ALTER or CREATE INDEX. Since this isn't really "correct" behavior, I did not add any tests.
  • 1816: Support TableFunction aliasing
    Added string field to expression.UnresolvedTableFunction to so an alias can be specified.
    Removed the rule disambiguate_table_functions, TableFunctions will default to using function name as table name when alias isn't provided.
    As a result we now throw error ambiguous table name on
    select * from dolt_diff(...) join dolt_diff(...);
    which is also what happens with
    select * from t join t;
    Aliases are required for the this to correctly execute
    select * from dolt_diff(...) a join dolt_diff(...) b;
    Companion PR: dolthub/vitess#244
    Fix for: dolthub/dolt#5928
  • 1814: Add filters to memo
    Scalar expressions added to memo along with scalar properties, expression ids, filter closures. Goal here is equivalent behavior to before, just with filters represented differently. Filter organization mostly mirrors the plan package, except scalar and relational expressions are both represented as expression groups here. Done in a rush, still back and forth on whether there should be an interface there.
    Additionally:
    scalar expressions added to memo along with scalar properties, expression id
    rewrites join planning and costing to use bitset representations of filters
    refactors codegen so definition files are yaml, source is compiled independently from target code
    The organization is bit wonky b/c this should be using my name resolution symbol tables, and the entire tree should be memoized not just the join tree (used temporary solutions for the problems created by both of these).
    Re: dolthub/dolt#5993
  • 1812: Support load data ignore/replace
    Vitess PR: dolthub/vitess#243
    Here are the docs for load data with ignore/replace modifiers: https://dev.mysql.com/doc/refman/8.0/en/load-data.html#load-data-error-handling
    This also changes LOCAL to have the same effect as IGNORE to match mysql
  • 1811: Hoist subquery filters bug
    Hoist filters is supposed to move filters that do not reference tables in the current sope upwards. We did not descend subqueries when checking for that condition, mistakenly hoisting filters in some cases.
    Re: dolthub/dolt#6089
  • 1810: Added many collations and character sets
    Added a lot of missing collations and character sets. I had to manually type in all of the new entries in charactersets.go and collations.go, so it's possible I made a mistake. I double checked them all, but I still might have overlooked one.
  • 1809: Add support for JSON -> and ->> operators
    MySQL column->path Documentation
    MySQL column->>path Documentation
    Fixes: dolthub/dolt#5662
  • 1808: escape special characters in strings
  • 1807: add support for more JSON_TABLE() functionality
    This PR adds support for:
    • FOR ORDINALITY columns, which is just an auto increment
    • DEFAULT <value> ON ERROR/EMPTY , which fills in values when encountering either an error or a missing value
    • when this isn't specified, NULL is used
    • ERROR <value> on ERROR/EMPTY, which throws an error when encountering either an error or a missing value
    • when this isn't specified, we ignore errors and fill in values with NULL
    • NESTED columns, which is a way to extract data from objects within objects and so on
    • when there are multiple NESTED columns, they are "sibling" nested, they take turns being NULL
      Note: there is a skipped test highlighting a bug in either our jsonpath implementation or it's something here...
      Companion PR: dolthub/vitess#240
      MySQL docs: https://dev.mysql.com/doc/refman/8.0/en/json-table-functions.html
  • 1806: improve conversion from JsonDocument to string
  • 1804: Reworked implicit PK handling for referenced foreign keys
    Builds on #1798, introduces an interface which exposes internal (and implicit) primary keys on indexes. Not all integrators will have the functionality, so it's an optional interface to expand compatibility. Also quite a bit simpler.
  • 1803: Added "utf8mb3_czech_ci" collation, fixed missing collation check for enum/set
    Fixes #1801
    Adds the requested collation, and fixes the panic. The panic came from an oversight when checking for a collation's implementation. enum and set use the collation during type creation, which occurs before we've verified the collation's implementation. The other string types do not use the collation during type creation, so we return the appropriate error as a result.
  • 1802: Update README.md to sync with _example/main.go
  • 1799: show all indexes, and prevent creating indexes named 'primary'
    We had some logic in SHOW CREATE TABLE to prevent PRIMARY KEYs from showing up twice in because they are within tables IndexCollection. This logic relied on checking if all the columns in the index were part of the Primary Key. MySQL allows and shows SECONDARY INDEXES that are identical to the primary key. It appears to differentiate them by naming the PK index "PRIMARY". Additionally, MySQL prevents users from creating SECONDARY INDEXES named "primary".
    MySQL names PK index Primary
    fix for: dolthub/dolt#6049
  • 1798: include primary key in index mapping for foreign keys
    MySQL does some behind the scenes magic and appends the primary key columns to secondary indexes.
    This creates not obvious prefixes on secondary indexes for foreign keys.
    A plus is that we save on creating secondary indexes when we don't need to.
    Note: it appears this special prefix matching applies when looking for a secondary index on the referenced table, but not on the child table itself.
    Fix for: dolthub/dolt#6038
  • 1797: Add filters to memo
    Scalar expressions added to memo along with scalar properties, expression ids, filter closures. Goal here is equivalent behavior to before, just with filters represented differently. Filter organization mostly mirrors the plan package, except scalar and relational expressions are both represented as expression groups here. Done in a rush, still back and forth on whether there should be an interface there.
    Additionally:
    • scalar expressions added to memo along with scalar properties, expression id
    • rewrites join planning and costing to use bitset representations of filters
    • refactors codegen so definition files are yaml, source is compiled independently from target code
      The organization is bit wonky b/c this should be using my name resolution symbol tables, and the entire tree should be memoized not just the join tree (used temporary solutions for the problems created by both of these).
      Re: dolthub/dolt#5993
  • 1796: Added a method to SystemVariable to let them be compared using their underlying types, used to this to fix least / greatest not working with system vars
    Fixes dolthub/dolt#6022
  • 1795: qualify json_table columns
    Likely due to improved aliasing code, it's simple to qualify columns for json_table.
    This PR
    1. adds skipped tests for currently unsupported functionality for json_tables
    • for ordinality
    • nested paths
    • default
    • error
    1. adds prepared tests for existing json table script and query tests
    2. reorganizes json tests
      MySQL docs for missing functionality: https://dev.mysql.com/doc/refman/8.0/en/json-table-functions.html
      Fix for: dolthub/dolt#6004
  • 1791: Functional dependencies
    Functional dependencies track 1) key uniqueness, constant propagation, column equivalence, nullability sets.
    This information is built bottom-up from tables scans through projections, and is used to answer certain questions about relational nodes:
    1. What is the equivalence closure for a join condition?
    2. Are a set of filters redundant?
    3. Do a set of index expressions comprise a strict key for a LOOKUP_JOIN?
    4. Does a subquery decorrelation scope have a strict/null-safe key for an ANTI_JOIN?
    5. Are the grouping columns a strict key of the table (only_full_group_by is unnecessary)
    6. Is the relation sorted on a given column set? (is a Sort already enforced)
    7. Is a relation constant? (Max1Row)
      Questions (1) and (3) contribute towards fixing this issue: dolthub/dolt#5993. Question (2) contributes to filter pruning. Question (4) is relevant for this issue: dolthub/dolt#5954.
      This master's thesis explains how to build the derivation graph starting at page 113: https://cs.uwaterloo.ca/research/tr/2000/11/CS-2000-11.thesis.pdf. The graph is composed of (determinant) -> (dependent) relationships on columns to track these properties. They color edges and nodes to differentiate constant, nullability, equivalence attributes. Any set of of columns uniquely determines the value of constants, so they have empty determinants: () -> (colSet). We differentiate strict keys (set of columns unique and non-nullable index) from lax keys (index that maybe be non-unique or nullable).
      Cockroach implemented a version that uses flattened to/from sets rather than individual nodes for determinant/dependents, and makes optimizations for quickly computing candidate keys: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/opt/props/func_dep.go.
      My encoding is a little different. First, I assume that attributes trickle down from nullability -> constant -> equivalence -> functional dependencies. An FD built in this order simplifies the upstream additions in a way that avoids having to recompute dependency closures (ex: nullability, constant, and equiv columns don't recompute keys). Second, I assume FDs will be limited to primary and secondary key indexes; keys will have either strict or lax determinants, and the dependents are always assumed to be the rest of the table. So far this drops LEFT_JOIN right-equivalence relations that translate to lax-keys after the join, which could opportunistically be converted back to strict keys by downstream operators. If this was a mistake we can undo that, add back in dependent column sets.
      We need to support a handful of operators to use FDs in the join memo:
    • Table scan
    • Cross join
    • Inner join
    • Left join
    • Project (Distinct)
    • Filter
      Missing:
    • Full outer join
    • Synthesized columns
      Additionally:
    • the memo needs to embed equal filters in a format with expression ids
    • join reordering should compute equivalence closures for join edges
    • join selection should use functional dependencies to check if lookup expressions are valid
      Missing practical considerations:
    • when we determine a lookup expression comprises a strict key for a table, we need a way to backfill constants and equivalences used to make that decision
    • filters should maybe be represented in memo selection nodes to support redundancy elimination
  • 1790: Reverting @@system_time_zone support for returning system timezone
    I'm not sure yet why this change causes Dolt's MySQL Connector J integ test to fail, but after testing commits yesterday and testing this fix this morning, this is definitely causing some problem. In CI, but not locally, the MySQL Connector J integ test is failing with a generic/empty SQLException each time, immediately after executing the first query. Dolt logs look fine, but there's obviously something happening that the MySQL Connector J can't deal with. I have a few more repro ideas that I'll follow up with, but wanted to get the GMS -> Dolt pipeline cleared up first.
    Until then, this PR reverts @@system_time_zone from returning the actual timezone in use by the runtime and OS.
    This change was able to get the Dolt MySQL Connector integ suite passing again: https://github.com/dolthub/dolt/actions/runs/5059419161/jobs/9081665231
  • 1789: foreign key index match should be on prefix not subset
    Turns out the logic for using an existing secondary index for foreign keys was matching on a subset instead of a prefix.
    For example:
    fks> create table parent (fk1 int, fk2 int, primary key (fk1, fk2));
    fks> create table child (id int unique, fk1 int, fk2, primary key (fk2, fk1, id), foreign key (fk1, fk2) references parent (fk1, fk2));
    
    This should produce a child table that creates a new secondary index over the columns (fk1, fk2).
    But, we don't do that and instead reference some other index (not sure what), causing errors when calling dolt constraints verify --all
    This PR ensures that we only pick an existing secondary index as the underlying index for a foreign key if that index is a prefix for the foreign key.
  • 1788: Divide pushdown
    Pushdown currently performs two sets of transforms.
    1. Push filters as low in the tree as possible:
    Filter(x=1) -> Join(Tablescan(x), Tablescan(y), x=y)
    =>
    Join(Filter(x=1)->Tablescan(x), Tablescan(y), x=y)
    
    1. Convert filters into static index scans:
    Filter(x=1)->Tablescan(x)
    =>
    Indexscan(x, x in [1,1])
    
    There are now two rules, pushFilters and generateIndexScans. Running pushFilters
    before join planning lets join planning make better use of functional dependencies.
    Also:
    • Small rule reorganizations
    • Small correctness changes exposed by refactor
    • Remove pushdown tests
    • Filter pushdown appears to work in more cases, see plan changes
    • sql.FilteredTable interface is now unused, marked as deprecated but did not delete related functions
  • 1787: Changes to USE and prepared statements
    This introduces two changes to how databases are resolved:
    1. USE statements now are handled by the Session with a new interface
    2. Tables in prepared statements now retain a copy of their Database implementation, rather than re-resolving it by name during execution.
      Both of these changes are to support Dolt's new database name semantics.
  • 1785: Copy MySQL information_schema.st_spatial_reference_systems table
    This PR essentially copies the information_schema.st_spatial_reference_systems table from MySQL.
    Small refactor to move constants and help function to types package from functions/spatial package.
    Additionally, changes show create table statement to print SRIDs inside a MySQL special comment.
    Example:
    tmp> show create table t2;
    +-------+------------------------------------------------------------------+
    | Table | Create Table                                                     |
    +-------+------------------------------------------------------------------+
    | t2    | CREATE TABLE `t2` (                                              |
    |       |   `p` point /*!80003 SRID 2000 */,                               |
    |       |   `l` linestring /*!80003 SRID 2001 */                           |
    |       | ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin |
    +-------+------------------------------------------------------------------+
    1 row in set (0.00 sec)
    
    Partially Addresses this: dolthub/dolt#5973
  • 1781: Small aggregation perf improvements
    • We used to make a new error object for every cache.Get() miss. Use shared object instead.
    • Add reuse pool for collation weight buffers rather than making thousands of [4]byte.
      before:
    image after: image
  • 1779: Add extra filters to AntiJoin to guarentee correct behavior around NULLs.
    This is a correctness fix: generating AntiJoins is not currently equivalent to MySql if either column being used in the not in expression contains NULL.
    This will break a lot of regression tests. If this doesn't break Turbine's tests, we should submit this while we work on a fix.
  • 1777: Add SRID 3857 to list of "supported" SRIDs
    The ogr2ogr tool performs some information gathering queries beforehand, which change the queries it generates later on to populate MySQL tables.
    When running ogr2ogr -f MySQL "MYSQL:ogr,user=root,port=3307,host=127.0.0.1" "data.gpkg", ogr2ogr sends these queries
    SELECT VERSION();
    SHOW TABLES;
    SELECT SRS_ID FROM INFORMATION_SCHEMA.ST_SPATIAL_REFERENCE_SYSTEMS WHERE ORGANIZATION = 'EPSG' AND ORGANIZATION_COORDSYS_ID = 3857;
    SELECT SRS_ID FROM INFORMATION_SCHEMA.ST_SPATIAL_REFERENCE_SYSTEMS WHERE DEFINITION = 'PROJCS["WGS 84 / Pseudo-Mercator",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.25722356
    3,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Mercator_1SP"],PARAMETER["central_meridian",0],PARAMETER["scale_factor",1],PARAMETER["false_easting",0],PARAMETER[
    "false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],EXTENSION["PROJ4","+proj=merc +a=6378137 +b=6378137 +lat_ts=0 +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=@null +wktext +no_defs"],AUTHORITY["EPSG","3857"]]'
    However, Dolt's information_schema.st_spatial_reference_system only contains SRID 0 and 4326, as a result both queries ogr2ogr makes return Empty Set.
    Consequently, this causes ogr2ogr to generate the create table queries like so
    CREATE TABLE `aeropuertos` ( fid INT UNIQUE NOT NULL AUTO_INCREMENT, geom GEOMETRY NOT NULL)
    When it should be
    CREATE TABLE `aeropuertos` ( fid INT UNIQUE NOT NULL AUTO_INCREMENT, geom GEOMETRY NOT NULL /*!80003 SRID 3857 */)
    On initial insert, this causes ogr2ogr to insert with SRID 0
    On further inserts/appends, this causes ogr2ogr to just not fill in SRID, leaving it as -2 aka not initialized
    This PR adds SRID 3857 to dolt's information_schema.st_spatial_reference_system.
    Now, the ogr2ogr command loudly fails with a useful error:
    ERROR 1: MySQL error message:unsupported feature: unsupported SRID value Description: CREATE TABLE `aeropuertos` (    fid INT UNIQUE NOT NULL AUTO_INCREMENT,    geom GEOMETRY NOT NULL /*!80003 SRID 3857 */)
    ERROR 1: Terminating translation prematurely after failed
    translation of layer aeropuertos (use -skipfailures to skip errors)
    Addresses these issues:
  • 1776: Returning @@system_time_zone based on OS time zone
    The @@system_time_zone global system variable is a read-only system variable that shows the timezone that the server is running in. MySQL and Golang both load the system timezone from the OS, typically using the TZ env var or /etc/timezone. This change exposes that system timezone information through the @@system_time_zone variable, which was previously hardcoded to UTC.
    Because timezone settings can change while the system is running (thanks daylight savings time!), we need to check with the runtime for the system timezone whenever it is requested. I gave SystemVariable the ability to include a function that gets executed to return the sys var value and moved the uptime sys var over to that, too.
  • 1775: Error for out of range SRIDs
    This PR fixes an error inconsistency between Dolt and MySQL.
    For SRIDs that are out of range [0, MAX_UINT32], MySQL throws a ERROR: 1690: SRID value is out of range in '<func_name>'.
    For SRIDS that are in range, but just don't exist, MySQL throws a ERROR: 3548: There's no spatial reference system with SRID <invalid srid>.
    Dolt used to only throw ERROR 3548, and would incorrectly report the negative numbers as overflowed positive uint32s
    partial fix for: dolthub/dolt#5948
  • 1774: Added utf8mb3_tolower_ci
  • 1773: Add plan option for AntiJoins to be LeftJoin
    This PR adds a plan option to convert ANTI_JOINs to LEFT_JOINs with an additional filter condition.
    In some cases, the LEFT_JOIN performs better than the ANTI_JOIN.
    Additionally, this PR adds a new join hint for LEFT_OUTER_LOOKUP_JOINs.
    Fix for: dolthub/dolt#5825
  • 1770: Union offset+limit bug
    Unions were dropping limit, applying offset as limit.
  • 1768: CREATE LIKE statements should preserve check-constraints on the new table.
  • 1767: Timezone improvements
    This PR starts handling some timezone differences between GMS and MySQL:
    • The NOW() and CURTIME() functions now respect the sessions's current time zone
    • Allows CONVERT_TZ to mix use of timezone names (e.g. UTC) and timezone offset strings (e.g. +00:00)
    • Adds some skipped tests for timestamp input/output session timezone conversion
  • 1763: prevent large varbinary column from being created
    We don't enforce column size limit for varbinary, causing panics.
    fix for: dolthub/dolt#5059
  • 1762: Add support for ALTER TABLE <table> MODIFY COLUMN <col> <type> UNIQUE
    This is a MySql syntax that is effectively syntactic sugar for
    ALTER TABLE <table> MODIFY COLUMN <col> <type>;
    ALTER TABLE <table> ADD UNIQUE INDEX `<col>` (col);
  • 1760: sql/core.go: SystemVariable: Add a NotifyChanged field which can be used to be notified when the variable value changes.
  • 1759: server/server.go: Log a message when we start accepting connections.
  • 1756: Allow all built-in functions to be used in column default value expressions
    We currently maintain an allow list of functions that can be used in column default value expressions. MySQL has changed what they support in column default value expressions over time and the current support allows all built-in functions to be used:

    Literals, built-in functions (both deterministic and nondeterministic), and operators are permitted.
     
    This PR removes our allow list so that any function registered as a built-in function can be used in a default expression. This has a couple of implications, which I think is overall a good tradeoff:

    • we don't have to maintain the allow list anymore (nice, but not a huge deal, since this hasn't been updated often)
    • if customers typo a function name in a column default expression, they will get a better error message (this is what motivated this change; for more details see: dolthub/dolt#5887)
    • any Dolt functions that we register as built-ins will be eligible for column default value expressions. I think this makes sense and might be useful for some customers (e.g. a column with a default value that inserts active_branch() to record the branch a row was created on) Related, any functions other GMS integrators register will also be eligible for use in column default value expressions.
      My only concern with this change is that we don't currently distinguish between built-in functions and customer functions, which should not be allowed in column default value expressions. However, that seems like a moot point right now, since we don't support CREATE FUNCTION in GMS so it doesn't seem like there's a way to trigger that. I also added a TODO to the code to point out that when we do have support for user created functions, we need to check for that when in the resolve column defaults code.
      Fixes dolthub/dolt#5887
  • 1754: support ALTER EVENT statement
    Support for ALTER EVENT statements excluding case for moving events across databases using RENAME TO clause.
    Depends on dolthub/vitess#233
  • 1751: Unions finalize subquery expressions
    The SUBQUERY -> UNION -> SUBQUERY_EXPR pattern was subject to an edge case where the outer SUBQUERY disabled recursively finalizing the inner SUBQUERY_EXPR. The intervening UNION node blocked the outer reach, but failed to re-enable finalizeSubquery during finalizeUnion. This PR edits the finalizeUnionSelector to explicitly re-enable finalize nested subqueries.
    This PR also touches hoistOutOfScopeFilters for the same edge case.
    Issue: dolthub/dolt#5875
  • 1747: Added latin1_spanish_ci and latin1_danish_ci collations
    Fixes dolthub/dolt#5866
  • 1746: update json_set support for edge cases
    updated logic of json_set to support more input edge cases
  • 1744: Adding tests for altering keyless tables
    Added tests to cover basic keyless table column alterations, to test Dolt change in dolthub/dolt#5867
  • 1743: Optimize division with integers by using floats internally instead of decimal.Decimal
    Fix for dolthub/dolt#5832
    Seems unlikely to break anything in Dolt, but I went ahead and ran Dolt CI tests: dolthub/dolt#5857
  • 1742: convert sort over pks with index
    This PR fixes some issues with the implementation of the rule replaceSortPk. Namely, the rule now works with both TableAliases, ColumnAliases, and SubqueriyAliases.
    Additionally, the rule is able to be applied when sorting in Descending order.
    Overally, this means queries like select * from t order by pk desc limit 1 will just be a Limit over an IndexTabledAccess
    The rule does skip for Distinct and Join nodes, but it might be possible to get it to work for those.
    The plans with Distinct causes issues with IndexedTableAccess, but it definitely could work as PKs are distinct by definition.
    It is also possible to get this optimization to work with Joins, but I'm not convinced on their correctness, especially across different kinds of joins, so I decided to not implement it here.
    Fix for: dolthub/dolt#5812
  • 1741: allow renaming views with RENAME TABLE statement
    This is the same PR as #1712 (it was reverted)
    • allows renaming views using RENAME TABLE statement
    • fails on renaming views using ALTER TABLE ... RENAME ... statement
  • 1739: Populating the Decimal property in Field response metadata
    Fixes dolthub/dolt#5834
    In addition to the unit tests here, I'm also working on a Dolt PR to update the mysql connector library integration tests to test the Ruby mysql2 library.
  • 1738: ICU Regex Implementation
    To try and prevent memory leaks, I'm having the regex only work under a callback. The idea is that we'll do all of our matches under the callback, using a node placed by an analyzer rule. I think this approach should work, and it'll expand to any other functions within the regex that need to hold memory that will be freed later.
    For now though, this portion works, and I have a small test showing such.
  • 1737: Coalesce.Type() needs to handle type types.Null
    Coalesce.Type() only checks if its arguments to have type nil. But NULL constants have type types.Null, and we need to check for that too.
  • 1736: order multi-alter statements
    When running multi-alter statements, MySQL reorders the alters.
    This PR adds a sort function to organize these statements; the precedence order is currently:
    1. RENAME COLUMN
    2. DROP COLUMN
    3. ADD COLUMN
    4. ALTER INDEX
      Note: When this is supposed to work, it does, but when it doesn't our error messages are different than the ones returned by MySQL. There are existing alter tests that also return the incorrect error, but I don't think this is worth fixing right now.
      I also unskipped one BrokenScriptTest and fix the expected results for the others.
      Companion PR: dolthub/dolt#5831
  • 1735: only hash expression.UnresolvedColumn for OnDuplicateExpressions
    Special logic is used to qualify columns for InsertInto.OnDuplicateExprs. Since qualify uses transform.OneNodeExprWithNode it will try to qualify all expressions (Literals, Tuples, etc). This change makes it so that we only try to qualify Columns.
    fix for: dolthub/dolt#5799
  • 1731: sql/parse: expose utility functions to convert parsed index and check constraint definitions
  • 1728: hashjoin indexing
    When we have plan of the following format:
    InSubquery
    ...
    CrossJoin
    Left: Table
    Right: SubqueryAlias
    OuterScopeVisibility: true
    ...
    HashJoin
    HashLookup
    source: ...
    target: TableAlias
    ...
    
    The indexes we assign to GetFields during analysis don't align with the indexes of the actual columns in each row during execution time. This is a result of StripNode, PrependNode, and the nested Joins with SubqueryAlias.
    This error wasn't caught sooner as the incorrect indexes are too low, so they never threw IndexOutOfBounds errors and just returned potentially incorrect results instead.
    The fix was to correct these indexes at analysis time.
    Firstly, SubqueryAlias nodes with OuterScopeVisibilty = true inside joins need to see the left sibling node (in addition to the parent nodes). So Scope was modified to include some new fields, specifically for sibling nodes. Additionally, the file finalizeSubquery was changed to track the parent as well, so we could detect when we're analyzing a SubqueryAlias on the right side of a join, and add the left child to the scope.
    Additionally, pushdownFilters was modified to not undo all the changes to the Analyzer for HashLookups.
    At runtime, the PrependRow nodes cache the rows outside the InSubquery, while the buildJoinIter for CrossJoin would include both the outside and the left row. This meant that some parts of the inner HashJoin would receive extra columns while others didn't. The fix here was to alter the scope.InJoin depending on which parts of HashJoin we were building.
    Lastly, to have these changes not break for PreparedStatements, we just needed to not redo finalizeUnions in postPrepared, as we don't replan joins in postPrepared, so we don't know if we're in a join or not, and the correct indexes are set in prePrepared.
    Along the way, we discovered a query that panics, but the cause is different than the purpose of this fix, and it panicked the same way before these changes, so it is left as a skipped test.
    Fix for: dolthub/dolt#5714
  • 1727: implement json_set function
    implements the json_set function with a few edge cases outstanding
    fixes: dolthub/dolt#5680
  • 1726: subquery indexing tests
  • 1724: Fix README.md
    Replace spaces with tabs in code indentation.
  • 1722: go.mod: Use dolthub/flatbuffers/v23 instead of google/flatbuffers.
  • 1721: go.mod: Move oliveagle/jsonpath -> dolthub/jsonpath.
  • 1720: memory: extract rangeFilterExpr into expression package
    rangeFilterExpr contains a complex set of logic to build a sql expression given the list of sql ranges and the list of expressions on the index.
    Extract the majority of this function into NewRangeFilterExpr in the expression package. Replace the Or and And helper functions with JoinOr and JoinAnd.
    Update the call in the memory/ package to call the new expression function.
    Fix JoinOr and JoinAnd to check if the expressions are nil.
    Remove ineffective nil check in JoinAnd.
    Move nil checks into NewOr and NewAnd.
    Add comment explaining parameters to NewFilterRangeExpr.
  • 1719: Added serving tray and bowtie

vitess

  • 251: Fixed missing support for collations as strings
    Fixes issue dolthub/dolt#6192
    We only allowed collations to be declared after the CREATE TABLE portion in their non-string form. This adds support for the string form.
  • 249: Various small parser improvements
    Various small parser improvements:
    • Allow column definitions to use the MySQL INVISIBLE keyword. The implementation still ignores the INVISIBLE keyword, but it will no longer cause a parser error.
    • Support DOUBLE and FLOAT in the CAST and CONVERT functions.
    • Allow a trigger body to be a single CALL statement.
  • 248: Support for index hint in foreign key definition
  • 247: Support FK definitions inline in column definitions
    Adds support for declaring FK references inline in column definitions. Does not support ON DELETE and ON UPDATE yet. Example: ALTER TABLE t ADD COLUMN col2 int REFERENCES other_table(id);
    Also cleaned up a few rules around non-reserved keywords to enable event to be used unquoted in ALTER TABLE statements.
  • 246: support CREATE SPATIAL REFERENCE SYSTEM ... syntax
    Syntax for: dolthub/dolt#6002
  • 245: allow event as table and column name
    The PR allows EVENT non-reserved keyword to be used as table and column name without quoting.
    The missing edge case includes using EVENT for user name or host name.
  • 244: parse table_functions with aliases
    Syntax support for: dolthub/dolt#5928
  • 243: Add ignore/replace modifiers to load data
  • 242: allow EVENTS to be parsed as non-reserved keyword
    Transferred EVENTS keywords into non_reserved_keyword list, allowing statements using information_schema.events table to parse.
    For some reason EVENT cannot be transferred into non_reserved_keyword, causing shift/reduce and reduce/reduce conflicts.
  • 241: Walking sub-nodes for SHOW TABLE statements
    When preparing a SHOW TABLES statement with a bound variable in the filter clause (e.g. SHOW TABLES FROM mydb WHERE Tables_in_mydb = ?;) GMS and Vitess were identifying the bound variable parameters differently and causing the SQL client on the other end to panic. Vitess code in conn.go walks the parsed tree and looks for SQLVal instances to identify the parameters and then returns that metadata over the SQL connection. The SHOW TABLES statement above fails because the sqlparser AST wasn't including all the members of SHOW TABLES node in the walk. This case is a little tricky to test directly in go-mysql-server, because it only repros in a running sql-server when running over a Vitess conn.
    The GMS and Vitess layers are both calculating bind variable metadata, with two different techniques, and whenever they get out of sync, we will see issues like this that only appear when running over a SQL connection. Longer term, we may want consider allowing GMS to return its bind variable metadata and avoid Vitess needing to re-calculate it, if we see more instances of this problem.
    Fixes: #1793
  • 240: Support more JSON_TABLE functionality
    Source: https://dev.mysql.com/doc/refman/8.0/en/json-table-functions.html
    JSON_TABLE(
    expr,
    path COLUMNS (column_list)
    )   [AS] alias
    column_list:
    column[, column][, ...]
    column:
    name FOR ORDINALITY
    |  name type PATH string path [on_empty] [on_error]
    |  name type EXISTS PATH string path
    |  NESTED [PATH] path COLUMNS (column_list)
    on_empty:
    {NULL | DEFAULT json_string | ERROR} ON EMPTY
    on_error:
    {NULL | DEFAULT json_string | ERROR} ON ERROR
    
    Note: the MySQL docs indicate that PATH is optional in the NESTED case, but it doesn't seem that way.
    I chose to follow what they say rather than what they do.
  • 238: Fix for charset introducers in default values
    Fixes dolthub/dolt#5970 by adding an additional default expression rule that handles charset introducers.
  • 237: Small fix for DEFAULT CHARSET when creating tables
    Super small fix for dolthub/dolt#5749. We were referencing the wrong variable.
  • 235: Add support for INSERT INTO <table> VALUE ...;
    Adds support for VALUE as a synonym of VALUES in INSERT INTO <table> VALUES ..., to match MySQL's syntax.
    Fixes #1750
  • 234: go/netutil/conn.go: Avoid panicing when ConnWithTimeouts has a Set{,}Deadline method called.
    Clients does not expect setting deadlines on connections to panic. In particular, the standard library's TLS implementation adopts an existing net.Conn and will call SetWriteDeadline on it in certain cases.
    It makes more sense to allow the deadlines to be managed by the client when they see fit. This changes the behavior to simply forward the deadlines along as soon as the client code has shown an interest in managing the deadlines.
  • 233: support parsing ALTER EVENT statements
    Supports parsing ALTER EVENT statements
  • 231: support 'show events' statement parsing
    • Added support for SHOW EVENTS statement parsing
    • Added support for SHOW CREATE EVENT statement parsing
    • Removed FULL option from SHOW TRIGGERS as it's not supported in MySQL.

Closed Issues

  • 1800: Extracting update query columns
  • 1801: Creating an enum column with collation utf8_czech_ci causes panic
  • 1793: PrepareStatement got incorrect paramsCount in response packet from mock mysql server.
  • 1749: [Feature Request] Support for JSON operator
  • 1771: utf8mb3_tolower_ci not implemented
  • 1750: [Feature Request] Support for INSERT INTO ... VALUE