Q3 2023 Release
This is the quarterly roll-up release, containing many new features and bug fixes.
Interfaces will not be stable until 1.0.
Merged PRs
go-mysql-server
- 1861: chore: remove refs to deprecated io/ioutil
- 1860: chore: unnecessary use of fmt.Sprintf
- 1859: chore: use copy(to, from) instead of a loop
- 1856: Support IPV6 loopback address for looking up user credentials
Map "::1" and "127.0.0.1" to localhost when looking up users.
There don't appear to be tests for this code path. TBD if I'll add some.
Related to: dolthub/dolt#6239 - 1854: Prevent loops in stored procedures from returning multiple result sets
The query in dolthub/dolt#6230 was causing rows from many result sets to be returned from a stored procedure. We already have code that limitsBEGIN/END
blocks to return the last SELECTed result set; this PR extends that logic to loop constructs as well.
Fixes: dolthub/dolt#6230
Dolt CI Checks: dolthub/dolt#6245 - 1853: chore: slice replace loop
- 1852: Alter stored procedure execution to deal with statements that commit transactions
This change adds checks to begin a new transaction whenever there isn't one during stored procedure execution. This lets things likedolt_commit()
execute correctly in stored procedures. - 1851:
memo.Literal
has different type than lookup
This panics on dolt:The PutField function expects the value to match the tuple descriptor exactly, and will panic if it does not.CREATE TABLE tab2(pk INTEGER PRIMARY KEY, col0 INTEGER, col1 FLOAT, col2 TEXT, col3 INTEGER, col4 FLOAT, col5 TEXT); CREATE UNIQUE INDEX idx_tab2_0 ON tab2 (col1 DESC,col4 DESC); CREATE INDEX idx_tab2_1 ON tab2 (col1,col0); CREATE INDEX idx_tab2_2 ON tab2 (col4,col0); CREATE INDEX idx_tab2_3 ON tab2 (col3 DESC); INSERT INTO tab2 VALUES(0,344,171.98,'nwowg',833,149.54,'wjiif'); INSERT INTO tab2 VALUES(1,353,589.18,'femmh',44,621.85,'qedct'); SELECT pk FROM tab2 WHERE ((((((col0 IN (SELECT col3 FROM tab2 WHERE ((col1 = 672.71)) AND col4 IN (SELECT col1 FROM tab2 WHERE ((col4 > 169.88 OR col0 > 939 AND ((col3 > 578))))) AND col0 >= 377) AND col4 >= 817.87 AND (col4 > 597.59)) OR col4 >= 434.59 AND ((col4 < 158.43)))))) AND col0 < 303) OR ((col0 > 549)) AND (col4 BETWEEN 816.92 AND 983.96) OR (col3 BETWEEN 421 AND 96);
The section of code in memo that creates a new range uses the type from the expression, but in other places it uses the index column expression types.
An alternative solution would be to have some logic in dolt to convert to the correspondingsql.Type
based off theval.Enc
- 1848:
IntDiv.Type()
should always return eitheruint64
orint64
Previously, ourIntDiv.convertLeftRight()
usedIntDiv.Type()
to determine the larger type betweenIntDiv.Left.Type()
andIntDiv.Right.Type()
to avoid precision loss when doing internal calculations. Now, that logic is moved fromIntDiv.Type()
toIntDiv.convertLeftRight()
, andIntDiv.Type()
can only returnuint64
orint64
.
This should fix the sql correctness regression from #1834 - 1847: Fix TargetSchema.Resolved() to check targetSchema column default expressions
A coupleSchemaTarget
implementations weren't checking if thetargetSchema
was resolved as part of theResolved()
method. Added tests, audited the other implementations, and simplified the logic to use a new method onSchema
to check that column default expressions are resolved.
Fixes: dolthub/dolt#6206
Dolt CI Run: dolthub/dolt#6213 - 1846: update
information_schema.processlist
to correctly display status of processes and databases
We used to hardcode"Query"
, now we referenceprocess.Command
Additionally, we now get the database from the current session and use that variable.
fix for: dolthub/dolt#6023 - 1844: fix panic for group by binary type
We made a bad type assertion forsql.StringType
.
Additionally, this fixes a issue whereUnaryExpressions
withGetFields
would incorrectly throw a functional dependency error withONLY_FULL_GROUP_BY
enabled.
Fix for second part of: dolthub/dolt#6179 - 1843: Improvements to
CAST
andCONVERT
functions
This PR adds support for casting/converting toFLOAT
andDOUBLE
types with theCAST
andCONVERT
functions. It also adds support for length (aka precision) and scale type constraints (e.g.CAST(1.2345 AS DECIMAL(3,2))
).
Parser support forDOUBLE
andFLOAT
withCAST
andCONVERT
: dolthub/vitess#249
Fixes: dolthub/dolt#5835 - 1841: adding
version
andversion_comment
values
@@version
now returns8.0.11
@@version_comment
now returns "Dolt"; in mysql, this appears to be dependent on OS / method of install- Some people get
MySQL Community Server - GPL
- Others get
Homebrew
Fix for first part of: dolthub/dolt#6179
- Some people get
- 1840: deduplicate (hash) intuple for and queries
This PR was originally supposed to fix it: original fix: #1677, butAND
statements weren't covered.
fix for: dolthub/dolt#6189 - 1839: Slow degenerate semi join, hoist select opt
This enables recursive subquery decorrelations, and adds a hash join execution option for semi joins that is equivalent to cached subquery existence checks. - 1838: resolve aliases in subqueries in function arguments
The rulereorderProjection
also replaces subqueries with getfields in projections when they are used by subqueries, but it did not check for function expressions.
This meant that aliases in subqueries as arguments to functions threw a"x" could not be found
error.
This PR just has the section ofreorderProjection
that is supposed to find deferredColumns also look at the arguments of functions recursively (because we can nest functions).
Additionally, there was another schema type bug:MySQL returns an Integer type for if statement, and if either argument is a String, it always returns a String.tmp> select 0 as foo, if((select foo), 123, 456); +-----+----------------------------+ | foo | if((select foo), 123, 456) | +-----+----------------------------+ | 0 | 127 | +-----+----------------------------+ 1 row in set (0.00 sec)
fix for: dolthub/dolt#6174 - 1836: update cached table count in prepared statements
Prepared statements were caching table counts. We need to update the table count when finalizing prepared statements to bring table count up to date with any intermediate edits. - 1834: fix expected schema for
sum(literal)
The code path we take when print rows to shell is different than spooling from server.
In the sql case, we ignore the schema we get from analysis.
In the server case, we actually read the schema, and ensure that the rows are of that type.
When doingsum(literal)
, we use the type of the literal. In this issue, the literal was1
, so anINT8
, which caps out at127
.
sum()
is always supposed to return a float64, so I made a change to do that.
I checked by starting mysql with--column-type-info
option, and it does appear that any columns coming fromsum()
has aDECIMAL
type.
Fix for: dolthub/dolt#6120 - 1830: Use SO_REUSEADDR and SO_REUSEPORT options when creating the sql server on Unix
This prevents a transient error we've been seeing where the server sometimes fails to start, and the OS claimsport already in use
, even though we've already confirmed that the port is not in use prior to runningdolt sql-server
. - 1829: plan.TableCountLookup short circuits count()
In many cases it is unnecessary to read an entire table to report count(*). We can use the RowCount() interface to jump to the answer. - 1828: Consolidated collation maps
Main file to check is thegenerate/main.go
file. After running the updated generation program, these are the consolidated files:common_utf8mb4_es_0900_ai_ci_Weights: [utf8mb4_es_0900_ai_ci_Weights, utf8mb4_es_trad_0900_ai_ci_Weights] common_utf8mb4_es_0900_as_cs_Weights: [utf8mb4_es_0900_as_cs_Weights, utf8mb4_es_trad_0900_as_cs_Weights] common_utf_croatian_ci_Weights: [utf16_croatian_ci_Weights, utf32_croatian_ci_Weights, utf8mb3_croatian_ci_Weights, utf8mb4_croatian_ci_Weights] common_utf_czech_ci_Weights: [utf16_czech_ci_Weights, utf32_czech_ci_Weights, utf8mb3_czech_ci_Weights, utf8mb4_czech_ci_Weights] common_utf_danish_ci_Weights: [utf16_danish_ci_Weights, utf32_danish_ci_Weights, utf8mb3_danish_ci_Weights, utf8mb4_danish_ci_Weights] common_utf_esperanto_ci_Weights: [utf16_esperanto_ci_Weights, utf32_esperanto_ci_Weights, utf8mb3_esperanto_ci_Weights, utf8mb4_esperanto_ci_Weights] common_utf_estonian_ci_Weights: [utf16_estonian_ci_Weights, utf32_estonian_ci_Weights, utf8mb3_estonian_ci_Weights, utf8mb4_estonian_ci_Weights] common_utf_german2_ci_Weights: [utf16_german2_ci_Weights, utf32_german2_ci_Weights, utf8mb3_german2_ci_Weights, utf8mb4_german2_ci_Weights] common_utf_hungarian_ci_Weights: [utf16_hungarian_ci_Weights, utf32_hungarian_ci_Weights, utf8mb3_hungarian_ci_Weights, utf8mb4_hungarian_ci_Weights] common_utf_icelandic_ci_Weights: [utf16_icelandic_ci_Weights, utf32_icelandic_ci_Weights, utf8mb3_icelandic_ci_Weights, utf8mb4_icelandic_ci_Weights] common_utf_latvian_ci_Weights: [utf16_latvian_ci_Weights, utf32_latvian_ci_Weights, utf8mb3_latvian_ci_Weights, utf8mb4_latvian_ci_Weights] common_utf_lithuanian_ci_Weights: [utf16_lithuanian_ci_Weights, utf32_lithuanian_ci_Weights, utf8mb3_lithuanian_ci_Weights, utf8mb4_lithuanian_ci_Weights] common_utf_persian_ci_Weights: [utf16_persian_ci_Weights, utf32_persian_ci_Weights, utf8mb3_persian_ci_Weights, utf8mb4_persian_ci_Weights] common_utf_polish_ci_Weights: [utf16_polish_ci_Weights, utf32_polish_ci_Weights, utf8mb3_polish_ci_Weights, utf8mb4_polish_ci_Weights] common_utf_roman_ci_Weights: [utf16_roman_ci_Weights, utf32_roman_ci_Weights, utf8mb3_roman_ci_Weights, utf8mb4_roman_ci_Weights] common_utf_romanian_ci_Weights: [utf16_romanian_ci_Weights, utf32_romanian_ci_Weights, utf8mb3_romanian_ci_Weights, utf8mb4_romanian_ci_Weights] common_utf_sinhala_ci_Weights: [utf16_sinhala_ci_Weights, utf32_sinhala_ci_Weights, utf8mb3_sinhala_ci_Weights, utf8mb4_sinhala_ci_Weights] common_utf_slovak_ci_Weights: [utf16_slovak_ci_Weights, utf32_slovak_ci_Weights, utf8mb3_slovak_ci_Weights, utf8mb4_slovak_ci_Weights] common_utf_slovenian_ci_Weights: [utf16_slovenian_ci_Weights, utf32_slovenian_ci_Weights, utf8mb3_slovenian_ci_Weights, utf8mb4_slovenian_ci_Weights] common_utf_spanish2_ci_Weights: [utf16_spanish2_ci_Weights, utf16_spanish_ci_Weights, utf32_spanish2_ci_Weights, utf32_spanish_ci_Weights, utf8mb3_spanish2_ci_Weights, utf8mb3_spanish_ci_Weights, utf8mb4_spanish2_ci_Weights, utf8mb4_spanish_ci_Weights] common_utf_swedish_ci_Weights: [utf16_swedish_ci_Weights, utf32_swedish_ci_Weights, utf8mb3_swedish_ci_Weights, utf8mb4_swedish_ci_Weights] common_utf_turkish_ci_Weights: [utf16_turkish_ci_Weights, utf32_turkish_ci_Weights, utf8mb3_turkish_ci_Weights, utf8mb4_turkish_ci_Weights] common_utf_unicode_520_ci_Weights: [utf16_unicode_520_ci_Weights, utf32_unicode_520_ci_Weights, utf8mb4_unicode_520_ci_Weights] common_utf_unicode_ci_Weights: [utf16_unicode_ci_Weights, utf32_unicode_ci_Weights, utf8mb3_unicode_ci_Weights, utf8mb4_unicode_ci_Weights] common_utf_vietnamese_ci_Weights: [utf16_vietnamese_ci_Weights, utf32_vietnamese_ci_Weights, utf8mb3_vietnamese_ci_Weights, utf8mb4_vietnamese_ci_Weights]
- 1827: Remove db-specific transaction interfaces / logic
- 1825: implement create spatial ref sys
This implements thecreate spatial reference system ...
, which lets users add custom SRID to the information schema.
MySQL docs: https://dev.mysql.com/doc/refman/8.0/en/create-spatial-reference-system.html
MySQL is much more restrictive when it comes to what is a validDEFINITION
for an entry in this table, and the rules are unclear, so we are much more permissive for now.
Additionally, this information persist in MySQL between server restarts, which we do not do. However, MySQL does throw a warning stating that updating may discard any changes the user makes.
Lastly, the values persist between test runs, and we don't support deleting frominformation_schema
, so some tests are modified.
fix for: dolthub/dolt#6002 - 1823: Trim spaces and empty statements to the right in planbuilder.Parse
- 1822: join filter closure and constant join lookups
This PR adds a set of join planning improvements.- Table aliases can accept multi column indexes
We have never been able to choose a multi-expression range scan through table aliases.
Before:
After:tmp2> explain select * from t alias where a = 1 and b = 1 and c = 1; +-----------------------------------------------------------+ | plan | +-----------------------------------------------------------+ | Filter | | ├─ (((alias.a = 1) AND (alias.b = 1)) AND (alias.c = 1)) | | └─ TableAlias(alias) | | └─ IndexedTableAccess(t) | | ├─ index: [t.a] | | ├─ filters: [{[1, 1]}] | | └─ columns: [a b c] | +-----------------------------------------------------------+
This has silently been impacting join performance in particular, where table aliases are more common. This is a small change but I'd expect this to have a broad positive impact for customers.tmp2> explain select * from t alias where a = 1 and b = 1 and c = 1; +-----------------------------------------------------------+ | plan | +-----------------------------------------------------------+ | Filter | | ├─ (((alias.a = 1) AND (alias.b = 1)) AND (alias.c = 1)) | | └─ TableAlias(alias) | | └─ IndexedTableAccess(t) | | ├─ index: [t.a,t.b,t.c] | | ├─ filters: [{[1, 1], [1, 1], [1, 1]}] | | └─ columns: [a b c] | +-----------------------------------------------------------+
2. Join equivalence closure
A join likeselect * from xy join uv on x = u join ab on u = a
has two initial join edges,x = u
andu = a
. Those edges create expression groupingsxy x uv
,uv x ab
,xy x uv x ab
. There misses a transitive edge,x = a
, with a corresponding join groupxy x ab
. We should generate plans for most transitive edges now (transitive edges in apply joins are harder).
For joins with many tables this will unlock many potential join paths.
3. Use functional dependencies to find more lookup and merge joins
We can use constants and aggregated equivalency sets (equal filters) to be more aggressive with lookup join selection. Previously we only searched the current join ON equal conditions for expressions that match an index prefix for a lookup join, but constants are also valid lookup keys.
Refer to dolthub/dolt#5993 and dolthub/dolt#3797 for in-depth examples.
4. Use functional dependencies to do better lookup join costing
Even though we do not have index statistics, we can still use functional dependencies on indexes to detect whether a lookup will have MAX_1_ROW. Two examples where we can detect MAX_1_ROW: our lookup index is the primary key, and our lookup key provides a constant or equals expression for every pk column; our lookup index is unique, our lookup key has constants or equal expressions for every column, and we can prove that every key expression is non-nullable.
MAX_1_ROW lookups are a rare binary condition, most of the time selectivity is in the continuous range 0-1. When they do occur they are usually the most efficient access pattern. Many of the test changes from HASH_JOIN or MERGE_JOIN to LOOKUP_JOIN are a result of this improvement. The issues linked above in (3) have practical examples. - Table aliases can accept multi column indexes
- 1821: Bug fix: The result schema of
SELECT INTO
should always be an empty schema
SELECT INTO
currently returns it's child node's schema as its result schema, but it doesn't actually return row data in that schema. This causes a problem over a SQL connection when clients see a result schema and then see row data that doesn't match that schema. This causes clients to freak out and close the connection from their side. SinceSELECT INTO
always sends its results to a file or SQL vars (and NOT over the SQL connection), its result schema should always be the empty schema.
Fixes: dolthub/dolt#6105 - 1819: Lazy load large character set encoding weight maps.
Improves dolt startup times substantially. - 1818: Throw correct error for non-existent qualified column on table function
Fix for: dolthub/dolt#6101 - 1817: Ignore FULLTEXT in CREATE TABLE
This change allows us to ignore any FULLTEXT keys when usingCREATE TABLE
. This should unblock customers who just need their statements to parse, but don't actually need the functionality of FULLTEXT. We still error when trying to manually add a FULLTEXT index usingALTER
orCREATE INDEX
. Since this isn't really "correct" behavior, I did not add any tests. - 1816: Support
TableFunction
aliasing
Added string field toexpression.UnresolvedTableFunction
to so an alias can be specified.
Removed the ruledisambiguate_table_functions
, TableFunctions will default to using function name as table name when alias isn't provided.
As a result we now throwerror ambiguous table name
onwhich is also what happens withselect * from dolt_diff(...) join dolt_diff(...);
Aliases are required for the this to correctly executeselect * from t join t;
Companion PR: dolthub/vitess#244select * from dolt_diff(...) a join dolt_diff(...) b;
Fix for: dolthub/dolt#5928 - 1814: Add filters to memo
Scalar expressions added to memo along with scalar properties, expression ids, filter closures. Goal here is equivalent behavior to before, just with filters represented differently. Filter organization mostly mirrors the plan package, except scalar and relational expressions are both represented as expression groups here. Done in a rush, still back and forth on whether there should be an interface there.
Additionally:
scalar expressions added to memo along with scalar properties, expression id
rewrites join planning and costing to use bitset representations of filters
refactors codegen so definition files are yaml, source is compiled independently from target code
The organization is bit wonky b/c this should be using my name resolution symbol tables, and the entire tree should be memoized not just the join tree (used temporary solutions for the problems created by both of these).
Re: dolthub/dolt#5993 - 1812: Support load data ignore/replace
Vitess PR: dolthub/vitess#243
Here are the docs for load data with ignore/replace modifiers: https://dev.mysql.com/doc/refman/8.0/en/load-data.html#load-data-error-handling
This also changesLOCAL
to have the same effect asIGNORE
to match mysql - 1811: Hoist subquery filters bug
Hoist filters is supposed to move filters that do not reference tables in the current sope upwards. We did not descend subqueries when checking for that condition, mistakenly hoisting filters in some cases.
Re: dolthub/dolt#6089 - 1810: Added many collations and character sets
Added a lot of missing collations and character sets. I had to manually type in all of the new entries incharactersets.go
andcollations.go
, so it's possible I made a mistake. I double checked them all, but I still might have overlooked one. - 1809: Add support for JSON
->
and->>
operators
MySQLcolumn->path
Documentation
MySQLcolumn->>path
Documentation
Fixes: dolthub/dolt#5662 - 1808: escape special characters in strings
- 1807: add support for more
JSON_TABLE()
functionality
This PR adds support for:FOR ORDINALITY
columns, which is just an auto incrementDEFAULT <value> ON ERROR/EMPTY
, which fills in values when encountering either an error or a missing value- when this isn't specified,
NULL
is used ERROR <value> on ERROR/EMPTY
, which throws an error when encountering either an error or a missing value- when this isn't specified, we ignore errors and fill in values with
NULL
NESTED
columns, which is a way to extract data from objects within objects and so on- when there are multiple
NESTED
columns, they are "sibling" nested, they take turns beingNULL
Note: there is a skipped test highlighting a bug in either our jsonpath implementation or it's something here...
Companion PR: dolthub/vitess#240
MySQL docs: https://dev.mysql.com/doc/refman/8.0/en/json-table-functions.html
- 1806: improve conversion from JsonDocument to string
- 1804: Reworked implicit PK handling for referenced foreign keys
Builds on #1798, introduces an interface which exposes internal (and implicit) primary keys on indexes. Not all integrators will have the functionality, so it's an optional interface to expand compatibility. Also quite a bit simpler. - 1803: Added "utf8mb3_czech_ci" collation, fixed missing collation check for enum/set
Fixes #1801
Adds the requested collation, and fixes the panic. The panic came from an oversight when checking for a collation's implementation.enum
andset
use the collation during type creation, which occurs before we've verified the collation's implementation. The other string types do not use the collation during type creation, so we return the appropriate error as a result. - 1802: Update README.md to sync with _example/main.go
- 1799: show all indexes, and prevent creating indexes named
'primary'
We had some logic inSHOW CREATE TABLE
to preventPRIMARY KEY
s from showing up twice in because they are within tables IndexCollection. This logic relied on checking if all the columns in the index were part of the Primary Key. MySQL allows and showsSECONDARY INDEXES
that are identical to the primary key. It appears to differentiate them by naming the PK index "PRIMARY". Additionally, MySQL prevents users from creatingSECONDARY INDEXES
named"primary"
.
MySQL names PK index Primary
fix for: dolthub/dolt#6049 - 1798: include primary key in index mapping for foreign keys
MySQL does some behind the scenes magic and appends the primary key columns to secondary indexes.
This creates not obvious prefixes on secondary indexes for foreign keys.
A plus is that we save on creating secondary indexes when we don't need to.
Note: it appears this special prefix matching applies when looking for a secondary index on the referenced table, but not on the child table itself.
Fix for: dolthub/dolt#6038 - 1797: Add filters to memo
Scalar expressions added to memo along with scalar properties, expression ids, filter closures. Goal here is equivalent behavior to before, just with filters represented differently. Filter organization mostly mirrors the plan package, except scalar and relational expressions are both represented as expression groups here. Done in a rush, still back and forth on whether there should be an interface there.
Additionally:- scalar expressions added to memo along with scalar properties, expression id
- rewrites join planning and costing to use bitset representations of filters
- refactors codegen so definition files are yaml, source is compiled independently from target code
The organization is bit wonky b/c this should be using my name resolution symbol tables, and the entire tree should be memoized not just the join tree (used temporary solutions for the problems created by both of these).
Re: dolthub/dolt#5993
- 1796: Added a method to SystemVariable to let them be compared using their underlying types, used to this to fix least / greatest not working with system vars
Fixes dolthub/dolt#6022 - 1795: qualify json_table columns
Likely due to improved aliasing code, it's simple to qualify columns forjson_table
.
This PR- adds skipped tests for currently unsupported functionality for json_tables
for ordinality
nested paths
default
error
- adds prepared tests for existing json table script and query tests
- reorganizes json tests
MySQL docs for missing functionality: https://dev.mysql.com/doc/refman/8.0/en/json-table-functions.html
Fix for: dolthub/dolt#6004
- 1791: Functional dependencies
Functional dependencies track 1) key uniqueness, constant propagation, column equivalence, nullability sets.
This information is built bottom-up from tables scans through projections, and is used to answer certain questions about relational nodes:- What is the equivalence closure for a join condition?
- Are a set of filters redundant?
- Do a set of index expressions comprise a strict key for a LOOKUP_JOIN?
- Does a subquery decorrelation scope have a strict/null-safe key for an ANTI_JOIN?
- Are the grouping columns a strict key of the table (only_full_group_by is unnecessary)
- Is the relation sorted on a given column set? (is a Sort already enforced)
- Is a relation constant? (Max1Row)
Questions (1) and (3) contribute towards fixing this issue: dolthub/dolt#5993. Question (2) contributes to filter pruning. Question (4) is relevant for this issue: dolthub/dolt#5954.
This master's thesis explains how to build the derivation graph starting at page 113: https://cs.uwaterloo.ca/research/tr/2000/11/CS-2000-11.thesis.pdf. The graph is composed of(determinant) -> (dependent)
relationships on columns to track these properties. They color edges and nodes to differentiate constant, nullability, equivalence attributes. Any set of of columns uniquely determines the value of constants, so they have empty determinants:() -> (colSet)
. We differentiate strict keys (set of columns unique and non-nullable index) from lax keys (index that maybe be non-unique or nullable).
Cockroach implemented a version that uses flattened to/from sets rather than individual nodes for determinant/dependents, and makes optimizations for quickly computing candidate keys: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/opt/props/func_dep.go.
My encoding is a little different. First, I assume that attributes trickle down from nullability -> constant -> equivalence -> functional dependencies. An FD built in this order simplifies the upstream additions in a way that avoids having to recompute dependency closures (ex: nullability, constant, and equiv columns don't recompute keys). Second, I assume FDs will be limited to primary and secondary key indexes; keys will have either strict or lax determinants, and the dependents are always assumed to be the rest of the table. So far this drops LEFT_JOIN right-equivalence relations that translate to lax-keys after the join, which could opportunistically be converted back to strict keys by downstream operators. If this was a mistake we can undo that, add back in dependent column sets.
We need to support a handful of operators to use FDs in the join memo:
- Table scan
- Cross join
- Inner join
- Left join
- Project (Distinct)
- Filter
Missing: - Full outer join
- Synthesized columns
Additionally: - the memo needs to embed equal filters in a format with expression ids
- join reordering should compute equivalence closures for join edges
- join selection should use functional dependencies to check if lookup expressions are valid
Missing practical considerations: - when we determine a lookup expression comprises a strict key for a table, we need a way to backfill constants and equivalences used to make that decision
- filters should maybe be represented in memo selection nodes to support redundancy elimination
- 1790: Reverting
@@system_time_zone
support for returning system timezone
I'm not sure yet why this change causes Dolt's MySQL Connector J integ test to fail, but after testing commits yesterday and testing this fix this morning, this is definitely causing some problem. In CI, but not locally, the MySQL Connector J integ test is failing with a generic/emptySQLException
each time, immediately after executing the first query. Dolt logs look fine, but there's obviously something happening that the MySQL Connector J can't deal with. I have a few more repro ideas that I'll follow up with, but wanted to get the GMS -> Dolt pipeline cleared up first.
Until then, this PR reverts@@system_time_zone
from returning the actual timezone in use by the runtime and OS.
This change was able to get the Dolt MySQL Connector integ suite passing again: https://github.com/dolthub/dolt/actions/runs/5059419161/jobs/9081665231 - 1789: foreign key index match should be on prefix not subset
Turns out the logic for using an existing secondary index for foreign keys was matching on a subset instead of a prefix.
For example:This should produce a child table that creates a new secondary index over the columns (fk1, fk2).fks> create table parent (fk1 int, fk2 int, primary key (fk1, fk2)); fks> create table child (id int unique, fk1 int, fk2, primary key (fk2, fk1, id), foreign key (fk1, fk2) references parent (fk1, fk2));
But, we don't do that and instead reference some other index (not sure what), causing errors when callingdolt constraints verify --all
This PR ensures that we only pick an existing secondary index as the underlying index for a foreign key if that index is a prefix for the foreign key. - 1788: Divide pushdown
Pushdown currently performs two sets of transforms.- Push filters as low in the tree as possible:
Filter(x=1) -> Join(Tablescan(x), Tablescan(y), x=y) => Join(Filter(x=1)->Tablescan(x), Tablescan(y), x=y)
- Convert filters into static index scans:
There are now two rules,Filter(x=1)->Tablescan(x) => Indexscan(x, x in [1,1])
pushFilters
andgenerateIndexScans
. RunningpushFilters
before join planning lets join planning make better use of functional dependencies.
Also:- Small rule reorganizations
- Small correctness changes exposed by refactor
- Remove pushdown tests
- Filter pushdown appears to work in more cases, see plan changes
sql.FilteredTable
interface is now unused, marked as deprecated but did not delete related functions
- 1787: Changes to USE and prepared statements
This introduces two changes to how databases are resolved:- USE statements now are handled by the Session with a new interface
- Tables in prepared statements now retain a copy of their Database implementation, rather than re-resolving it by name during execution.
Both of these changes are to support Dolt's new database name semantics.
- 1785: Copy MySQL
information_schema.st_spatial_reference_systems
table
This PR essentially copies theinformation_schema.st_spatial_reference_systems
table from MySQL.
Small refactor to move constants and help function totypes
package fromfunctions/spatial
package.
Additionally, changesshow create table
statement to print SRIDs inside a MySQL special comment.
Example:Partially Addresses this: dolthub/dolt#5973tmp> show create table t2; +-------+------------------------------------------------------------------+ | Table | Create Table | +-------+------------------------------------------------------------------+ | t2 | CREATE TABLE `t2` ( | | | `p` point /*!80003 SRID 2000 */, | | | `l` linestring /*!80003 SRID 2001 */ | | | ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin | +-------+------------------------------------------------------------------+ 1 row in set (0.00 sec)
- 1781: Small aggregation perf improvements
- We used to make a new error object for every
cache.Get()
miss. Use shared object instead. - Add reuse pool for collation weight buffers rather than making thousands of
[4]byte
.
before:
- We used to make a new error object for every
- 1779: Add extra filters to AntiJoin to guarentee correct behavior around NULLs.
This is a correctness fix: generating AntiJoins is not currently equivalent to MySql if either column being used in thenot in
expression contains NULL.
This will break a lot of regression tests. If this doesn't break Turbine's tests, we should submit this while we work on a fix. - 1777: Add SRID 3857 to list of "supported" SRIDs
Theogr2ogr
tool performs some information gathering queries beforehand, which change the queries it generates later on to populate MySQL tables.
When runningogr2ogr -f MySQL "MYSQL:ogr,user=root,port=3307,host=127.0.0.1" "data.gpkg"
,ogr2ogr
sends these queriesHowever, Dolt'sSELECT VERSION(); SHOW TABLES; SELECT SRS_ID FROM INFORMATION_SCHEMA.ST_SPATIAL_REFERENCE_SYSTEMS WHERE ORGANIZATION = 'EPSG' AND ORGANIZATION_COORDSYS_ID = 3857; SELECT SRS_ID FROM INFORMATION_SCHEMA.ST_SPATIAL_REFERENCE_SYSTEMS WHERE DEFINITION = 'PROJCS["WGS 84 / Pseudo-Mercator",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.25722356 3,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Mercator_1SP"],PARAMETER["central_meridian",0],PARAMETER["scale_factor",1],PARAMETER["false_easting",0],PARAMETER[ "false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],EXTENSION["PROJ4","+proj=merc +a=6378137 +b=6378137 +lat_ts=0 +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=@null +wktext +no_defs"],AUTHORITY["EPSG","3857"]]'
information_schema.st_spatial_reference_system
only contains SRID 0 and 4326, as a result both queries ogr2ogr makes return Empty Set.
Consequently, this causes ogr2ogr to generate the create table queries like soWhen it should beCREATE TABLE `aeropuertos` ( fid INT UNIQUE NOT NULL AUTO_INCREMENT, geom GEOMETRY NOT NULL)
On initial insert, this causes ogr2ogr to insert with SRID 0CREATE TABLE `aeropuertos` ( fid INT UNIQUE NOT NULL AUTO_INCREMENT, geom GEOMETRY NOT NULL /*!80003 SRID 3857 */)
On further inserts/appends, this causes ogr2ogr to just not fill in SRID, leaving it as-2
akanot initialized
This PR adds SRID 3857 to dolt'sinformation_schema.st_spatial_reference_system
.
Now, the ogr2ogr command loudly fails with a useful error:Addresses these issues:ERROR 1: MySQL error message:unsupported feature: unsupported SRID value Description: CREATE TABLE `aeropuertos` ( fid INT UNIQUE NOT NULL AUTO_INCREMENT, geom GEOMETRY NOT NULL /*!80003 SRID 3857 */) ERROR 1: Terminating translation prematurely after failed translation of layer aeropuertos (use -skipfailures to skip errors)
- 1776: Returning
@@system_time_zone
based on OS time zone
The@@system_time_zone
global system variable is a read-only system variable that shows the timezone that the server is running in. MySQL and Golang both load the system timezone from the OS, typically using theTZ
env var or/etc/timezone
. This change exposes that system timezone information through the@@system_time_zone
variable, which was previously hardcoded toUTC
.
Because timezone settings can change while the system is running (thanks daylight savings time!), we need to check with the runtime for the system timezone whenever it is requested. I gaveSystemVariable
the ability to include a function that gets executed to return the sys var value and moved theuptime
sys var over to that, too. - 1775: Error for out of range SRIDs
This PR fixes an error inconsistency between Dolt and MySQL.
For SRIDs that are out of range[0, MAX_UINT32]
, MySQL throws aERROR: 1690: SRID value is out of range in '<func_name>'
.
For SRIDS that are in range, but just don't exist, MySQL throws aERROR: 3548: There's no spatial reference system with SRID <invalid srid>.
Dolt used to only throwERROR 3548
, and would incorrectly report the negative numbers as overflowed positiveuint32
s
partial fix for: dolthub/dolt#5948 - 1774: Added utf8mb3_tolower_ci
- 1773: Add plan option for
AntiJoin
s to beLeftJoin
This PR adds a plan option to convertANTI_JOIN
s toLEFT_JOIN
s with an additional filter condition.
In some cases, theLEFT_JOIN
performs better than theANTI_JOIN
.
Additionally, this PR adds a new join hint forLEFT_OUTER_LOOKUP_JOIN
s.
Fix for: dolthub/dolt#5825 - 1770: Union offset+limit bug
Unions were dropping limit, applying offset as limit. - 1768:
CREATE LIKE
statements should preserve check-constraints on the new table. - 1767: Timezone improvements
This PR starts handling some timezone differences between GMS and MySQL:- The
NOW()
andCURTIME()
functions now respect the sessions's current time zone - Allows
CONVERT_TZ
to mix use of timezone names (e.g.UTC
) and timezone offset strings (e.g.+00:00
) - Adds some skipped tests for
timestamp
input/output session timezone conversion
- The
- 1763: prevent large varbinary column from being created
We don't enforce column size limit forvarbinary
, causing panics.
fix for: dolthub/dolt#5059 - 1762: Add support for
ALTER TABLE <table> MODIFY COLUMN <col> <type> UNIQUE
This is a MySql syntax that is effectively syntactic sugar forALTER TABLE <table> MODIFY COLUMN <col> <type>; ALTER TABLE <table> ADD UNIQUE INDEX `<col>` (col);
- 1760: sql/core.go: SystemVariable: Add a NotifyChanged field which can be used to be notified when the variable value changes.
- 1759: server/server.go: Log a message when we start accepting connections.
- 1756: Allow all built-in functions to be used in column default value expressions
We currently maintain an allow list of functions that can be used in column default value expressions. MySQL has changed what they support in column default value expressions over time and the current support allows all built-in functions to be used:Literals, built-in functions (both deterministic and nondeterministic), and operators are permitted.
This PR removes our allow list so that any function registered as a built-in function can be used in a default expression. This has a couple of implications, which I think is overall a good tradeoff:- we don't have to maintain the allow list anymore (nice, but not a huge deal, since this hasn't been updated often)
- if customers typo a function name in a column default expression, they will get a better error message (this is what motivated this change; for more details see: dolthub/dolt#5887)
- any Dolt functions that we register as built-ins will be eligible for column default value expressions. I think this makes sense and might be useful for some customers (e.g. a column with a default value that inserts
active_branch()
to record the branch a row was created on) Related, any functions other GMS integrators register will also be eligible for use in column default value expressions.
My only concern with this change is that we don't currently distinguish between built-in functions and customer functions, which should not be allowed in column default value expressions. However, that seems like a moot point right now, since we don't supportCREATE FUNCTION
in GMS so it doesn't seem like there's a way to trigger that. I also added a TODO to the code to point out that when we do have support for user created functions, we need to check for that when in the resolve column defaults code.
Fixes dolthub/dolt#5887
- 1754: support
ALTER EVENT
statement
Support forALTER EVENT
statements excluding case for moving events across databases usingRENAME TO
clause.
Depends on dolthub/vitess#233 - 1751: Unions finalize subquery expressions
The SUBQUERY -> UNION -> SUBQUERY_EXPR pattern was subject to an edge case where the outer SUBQUERY disabled recursively finalizing the inner SUBQUERY_EXPR. The intervening UNION node blocked the outer reach, but failed to re-enable finalizeSubquery during finalizeUnion. This PR edits the finalizeUnionSelector to explicitly re-enable finalize nested subqueries.
This PR also toucheshoistOutOfScopeFilters
for the same edge case.
Issue: dolthub/dolt#5875 - 1747: Added latin1_spanish_ci and latin1_danish_ci collations
Fixes dolthub/dolt#5866 - 1746: update
json_set
support for edge cases
updated logic ofjson_set
to support more input edge cases - 1744: Adding tests for altering keyless tables
Added tests to cover basic keyless table column alterations, to test Dolt change in dolthub/dolt#5867 - 1743: Optimize division with integers by using floats internally instead of decimal.Decimal
Fix for dolthub/dolt#5832
Seems unlikely to break anything in Dolt, but I went ahead and ran Dolt CI tests: dolthub/dolt#5857 - 1742: convert sort over pks with index
This PR fixes some issues with the implementation of the rulereplaceSortPk
. Namely, the rule now works with bothTableAliases
,ColumnAliases
, andSubqueriyAliases
.
Additionally, the rule is able to be applied when sorting inDescending
order.
Overally, this means queries likeselect * from t order by pk desc limit 1
will just be aLimit
over anIndexTabledAccess
The rule does skip forDistinct
andJoin
nodes, but it might be possible to get it to work for those.
The plans withDistinct
causes issues withIndexedTableAccess
, but it definitely could work as PKs are distinct by definition.
It is also possible to get this optimization to work withJoin
s, but I'm not convinced on their correctness, especially across different kinds of joins, so I decided to not implement it here.
Fix for: dolthub/dolt#5812 - 1741: allow renaming views with
RENAME TABLE
statement
This is the same PR as #1712 (it was reverted)- allows renaming views using
RENAME TABLE
statement - fails on renaming views using
ALTER TABLE ... RENAME ...
statement
- allows renaming views using
- 1739: Populating the
Decimal
property in Field response metadata
Fixes dolthub/dolt#5834
In addition to the unit tests here, I'm also working on a Dolt PR to update the mysql connector library integration tests to test the Rubymysql2
library. - 1738: ICU Regex Implementation
To try and prevent memory leaks, I'm having the regex only work under a callback. The idea is that we'll do all of our matches under the callback, using a node placed by an analyzer rule. I think this approach should work, and it'll expand to any other functions within the regex that need to hold memory that will be freed later.
For now though, this portion works, and I have a small test showing such. - 1737: Coalesce.Type() needs to handle type types.Null
Coalesce.Type() only checks if its arguments to have typenil
. But NULL constants have typetypes.Null
, and we need to check for that too. - 1736: order multi-alter statements
When running multi-alter statements, MySQL reorders the alters.
This PR adds a sort function to organize these statements; the precedence order is currently:RENAME COLUMN
DROP COLUMN
ADD COLUMN
ALTER INDEX
Note: When this is supposed to work, it does, but when it doesn't our error messages are different than the ones returned by MySQL. There are existing alter tests that also return the incorrect error, but I don't think this is worth fixing right now.
I also unskipped one BrokenScriptTest and fix the expected results for the others.
Companion PR: dolthub/dolt#5831
- 1735: only hash
expression.UnresolvedColumn
for OnDuplicateExpressions
Special logic is used to qualify columns forInsertInto.OnDuplicateExprs
. Since qualify usestransform.OneNodeExprWithNode
it will try to qualify all expressions (Literals
,Tuples
, etc). This change makes it so that we only try to qualify Columns.
fix for: dolthub/dolt#5799 - 1731: sql/parse: expose utility functions to convert parsed index and check constraint definitions
- 1728: hashjoin indexing
When we have plan of the following format:The indexes we assign to GetFields during analysis don't align with the indexes of the actual columns in each row during execution time. This is a result of StripNode, PrependNode, and the nested Joins with SubqueryAlias.InSubquery ... CrossJoin Left: Table Right: SubqueryAlias OuterScopeVisibility: true ... HashJoin HashLookup source: ... target: TableAlias ...
This error wasn't caught sooner as the incorrect indexes are too low, so they never threwIndexOutOfBounds
errors and just returned potentially incorrect results instead.
The fix was to correct these indexes at analysis time.
Firstly,SubqueryAlias
nodes withOuterScopeVisibilty = true
inside joins need to see the left sibling node (in addition to the parent nodes). SoScope
was modified to include some new fields, specifically for sibling nodes. Additionally, the filefinalizeSubquery
was changed to track the parent as well, so we could detect when we're analyzing aSubqueryAlias
on the right side of a join, and add the left child to the scope.
Additionally, pushdownFilters was modified to not undo all the changes to the Analyzer for HashLookups.
At runtime, thePrependRow
nodes cache the rows outside theInSubquery
, while the buildJoinIter forCrossJoin
would include both the outside and the left row. This meant that some parts of the inner HashJoin would receive extra columns while others didn't. The fix here was to alter thescope.InJoin
depending on which parts of HashJoin we were building.
Lastly, to have these changes not break for PreparedStatements, we just needed to not redofinalizeUnions
inpostPrepared
, as we don't replan joins inpostPrepared
, so we don't know if we're in a join or not, and the correct indexes are set inprePrepared
.
Along the way, we discovered a query that panics, but the cause is different than the purpose of this fix, and it panicked the same way before these changes, so it is left as a skipped test.
Fix for: dolthub/dolt#5714 - 1727: implement json_set function
implements thejson_set
function with a few edge cases outstanding
fixes: dolthub/dolt#5680 - 1726: subquery indexing tests
- 1724: Fix README.md
Replace spaces with tabs in code indentation. - 1722: go.mod: Use dolthub/flatbuffers/v23 instead of google/flatbuffers.
- 1721: go.mod: Move oliveagle/jsonpath -> dolthub/jsonpath.
- 1720: memory: extract rangeFilterExpr into expression package
rangeFilterExpr contains a complex set of logic to build a sql expression given the list of sql ranges and the list of expressions on the index.
Extract the majority of this function into NewRangeFilterExpr in the expression package. Replace the Or and And helper functions with JoinOr and JoinAnd.
Update the call in the memory/ package to call the new expression function.
Fix JoinOr and JoinAnd to check if the expressions are nil.
Remove ineffective nil check in JoinAnd.
Move nil checks into NewOr and NewAnd.
Add comment explaining parameters to NewFilterRangeExpr. - 1719: Added serving tray and bowtie
vitess
- 251: Fixed missing support for collations as strings
Fixes issue dolthub/dolt#6192
We only allowed collations to be declared after theCREATE TABLE
portion in their non-string form. This adds support for the string form. - 249: Various small parser improvements
Various small parser improvements:- Allow column definitions to use the MySQL
INVISIBLE
keyword. The implementation still ignores theINVISIBLE
keyword, but it will no longer cause a parser error. - Support
DOUBLE
andFLOAT
in theCAST
andCONVERT
functions. - Allow a trigger body to be a single
CALL
statement.
- Allow column definitions to use the MySQL
- 248: Support for index hint in foreign key definition
- 247: Support FK definitions inline in column definitions
Adds support for declaring FK references inline in column definitions. Does not supportON DELETE
andON UPDATE
yet. Example:ALTER TABLE t ADD COLUMN col2 int REFERENCES other_table(id);
Also cleaned up a few rules around non-reserved keywords to enableevent
to be used unquoted inALTER TABLE
statements. - 246: support
CREATE SPATIAL REFERENCE SYSTEM ...
syntax
Syntax for: dolthub/dolt#6002 - 245: allow
event
as table and column name
The PR allowsEVENT
non-reserved keyword to be used as table and column name without quoting.
The missing edge case includes usingEVENT
for user name or host name. - 244: parse
table_functions
with aliases
Syntax support for: dolthub/dolt#5928 - 243: Add ignore/replace modifiers to load data
- 242: allow
EVENTS
to be parsed as non-reserved keyword
TransferredEVENTS
keywords intonon_reserved_keyword
list, allowing statements usinginformation_schema.events
table to parse.
For some reasonEVENT
cannot be transferred intonon_reserved_keyword
, causing shift/reduce and reduce/reduce conflicts. - 241: Walking sub-nodes for
SHOW TABLE
statements
When preparing aSHOW TABLES
statement with a bound variable in the filter clause (e.g.SHOW TABLES FROM mydb WHERE Tables_in_mydb = ?;
) GMS and Vitess were identifying the bound variable parameters differently and causing the SQL client on the other end to panic. Vitess code inconn.go
walks the parsed tree and looks forSQLVal
instances to identify the parameters and then returns that metadata over the SQL connection. TheSHOW TABLES
statement above fails because thesqlparser
AST wasn't including all the members ofSHOW TABLES
node in the walk. This case is a little tricky to test directly in go-mysql-server, because it only repros in a running sql-server when running over a Vitessconn
.
The GMS and Vitess layers are both calculating bind variable metadata, with two different techniques, and whenever they get out of sync, we will see issues like this that only appear when running over a SQL connection. Longer term, we may want consider allowing GMS to return its bind variable metadata and avoid Vitess needing to re-calculate it, if we see more instances of this problem.
Fixes: #1793 - 240: Support more
JSON_TABLE
functionality
Source: https://dev.mysql.com/doc/refman/8.0/en/json-table-functions.htmlNote: the MySQL docs indicate thatJSON_TABLE( expr, path COLUMNS (column_list) ) [AS] alias column_list: column[, column][, ...] column: name FOR ORDINALITY | name type PATH string path [on_empty] [on_error] | name type EXISTS PATH string path | NESTED [PATH] path COLUMNS (column_list) on_empty: {NULL | DEFAULT json_string | ERROR} ON EMPTY on_error: {NULL | DEFAULT json_string | ERROR} ON ERROR
PATH
is optional in theNESTED
case, but it doesn't seem that way.
I chose to follow what they say rather than what they do. - 238: Fix for charset introducers in default values
Fixes dolthub/dolt#5970 by adding an additional default expression rule that handles charset introducers. - 237: Small fix for DEFAULT CHARSET when creating tables
Super small fix for dolthub/dolt#5749. We were referencing the wrong variable. - 235: Add support for
INSERT INTO <table> VALUE ...
;
Adds support forVALUE
as a synonym ofVALUES
inINSERT INTO <table> VALUES ...
, to match MySQL's syntax.
Fixes #1750 - 234: go/netutil/conn.go: Avoid panicing when ConnWithTimeouts has a Set{,}Deadline method called.
Clients does not expect setting deadlines on connections to panic. In particular, the standard library's TLS implementation adopts an existing net.Conn and will call SetWriteDeadline on it in certain cases.
It makes more sense to allow the deadlines to be managed by the client when they see fit. This changes the behavior to simply forward the deadlines along as soon as the client code has shown an interest in managing the deadlines. - 233: support parsing
ALTER EVENT
statements
Supports parsingALTER EVENT
statements - 231: support 'show events' statement parsing
- Added support for
SHOW EVENTS
statement parsing - Added support for
SHOW CREATE EVENT
statement parsing - Removed
FULL
option fromSHOW TRIGGERS
as it's not supported in MySQL.
- Added support for
Closed Issues
- 1800: Extracting update query columns
- 1801: Creating an enum column with collation utf8_czech_ci causes panic
- 1793: PrepareStatement got incorrect paramsCount in response packet from mock mysql server.
- 1749: [Feature Request] Support for JSON operator
- 1771: utf8mb3_tolower_ci not implemented
- 1750: [Feature Request] Support for
INSERT INTO ... VALUE