-
across(everything())
doesn't select grouping columns created via.by
insummarise()
(@mgirlich, #1493). -
mutate()
now supports the argument.keep = "transmute"
.
-
Specification of table names with schema/catalogs has been overhauled to make it simpler. This includes the following features and fixes:
-
The simplest way to refer to a qualified table is now to wrap it in
I()
, e.g.I("schema_name.table_name")
. -
Use of
sql()
andident_q()
insidein_catalog()
andin_schema()
is once again supported (#1388). -
It's ok to use
ident_q()
once again (#1413) and you should no longer see unsuppressable warnings about usingin_schema()
(#1408). -
The names of the arguments to
Id()
no longer matter, only their order (#1416). Additionally, thanks to changes to the DBI package, you no longer need to name each argument. -
If you accidentally pass a named vector to any of the database identifer functions, those names will be automatically stripped (#1404).
-
tbl_sql(check_from)
is now deprecated.
-
-
dbplyr now exports some tools to work with the internal
table_path
class which is useful for certain backends that need to work with this data structure (#1300).
-
New translations for clock functions
add_years()
,add_days()
,date_build()
,get_year()
,get_month()
,get_day()
, andbase::difftime()
on SQL server, Redshift, Snowflake, and Postgres. -
select()
will keep computed columns used toarrange()
subqueries that are eliminated by a subsequent select (@ejneer, #1437). -
semi_join()
will no longer inline away an aggregate filter (i.e.HAVING
clause) that was followed by aselect()
(@ejneer, #1474) -
Improved function translations:
-
Functions qualified with the base namespace are now also translated, e.g.
base::paste0(x, "_1")
is now translated (@mgirlich, #1022). -
-1 + x
now generates a translation instead erroring (#1420). -
x$name
never attempts to evaluatename
(#1368). -
You can once again use
NULL
on the LHS of an infix operator in order to generate SQL with unusual syntax (#1345). -
Namespaced calls now error if the function doesn't exist, or a translation is not available (#1426).
-
lead()
translation coercesn
to an integer.
-
-
Databricks: now supports creating non-temporary tables too (#1418).
-
Oracle:
-
db_explain()
now works (@thomashulst, #1353). -
as.Date()
works when applied to a string (#1389). -
head()
is once again translated toFETCH FIRST
. This does require Oracle 12c or newer, but it actually works, compared to the approach usingROWNUM
from #1292 (#1436). -
Added support for
str_replace()
andstr_replace_all()
viaREGEXP_REPLACE()
(@thomashulst, #1402).
-
-
Snowflake (@nathanhaigh, #1406)
-
Added support for
str_starts()
andstr_ends()
viaREGEXP_INSTR()
-
Refactored
str_detect()
to useREGEXP_INSTR()
so now supports regular expressions. -
Refactored
grepl()
to useREGEXP_INSTR()
so now supports case-insensitive matching throughgrepl(..., ignore.case = TRUE)
-
-
SQL server:
-
Now products a clear error if you attempt to use
n_distinct()
inmutate()
orfilter()
(#1366). -
filter()
does a better job of converting logical vectors from bit to boolean (@ejneer, #1288).
-
-
MySQL:
as.integer()
gets correct translation (@krlmlr, #1375).
-
Deprecation status of functions deprecated in previous versions (at least 2 years old) have been advanced. In particular,
src_sql()
is now defunct, as is the use ofpartial_eval()
with characterdata
. -
Database errors now show the generated SQL, which hopefully will make it faster to track down problems (#1401).
-
When dbplyr creates an index on a table in a schema (e.g.
schema.table
), it now only includes the table name in the index name, not the schema name. -
The class of remote sources now includes all S4 class names, not just the first (#918).
-
compute()
passes additional arguments all the way tosql_query_save()
-methods (@rsund). -
db_sql_render()
correctly passes on...
when re-calling withsql_options
set (#1394). -
reframe()
now gives an informative error that it isn't supported (#1148). -
rows_patch(in_place = FALSE)
now works when more than one column should be patched (@gorcha, #1443). -
New
simulate_mariadb()
(@krlmlr, #1375). -
sql_translator()
now checks for duplicated definitions (@krlmlr, #1374).
-
Using
compute(temporary = FALSE)
without providing a name is now deprecated (@mgirlich, #1154). -
ntile()
's first argument has been renamed fromorder_by
tox
to match the interface ofdplyr::ntile()
(@mgirlich, #1242). -
simulate_vars()
andsimulate_vars_is_typed()
were removed as they weren't used and tidyselect now offerstidyselect_data_proxy()
andtidyselect_data_has_predicates()
(@mgirllich, #1199). -
sql_not_supported()
now expects a function name without parentheses. -
sql_query_append()
,sql_query_insert()
,sql_query_update()
,sql_query_upsert()
, andsql_query_delete()
changed their arguments to make them more consistent to the othersql_query_*()
functions:x_name
was renamed totable
.y
was renamed tofrom
and must now be a table identifier or SQL instead of a lazy table.sql_query_append()
andsql_query_insert()
have gained the argumentcols
.
-
remote_name()
now returns a string with the name of the table. To get the qualified identifier use the newly addedremote_table()
(@mgirlich, #1280). -
tbl_lazy()
losessrc
argument after it has been deprecated for years (@mgirlich, #1208). -
translate_sql()
now requires thecon
argument (@mgirlich, #1311). Thevars
argument has been removed after it threw an error for the last 7 years (@mgirlich).
-
Preliminary databricks Spark SQL backend (#1377).
-
Joins
-
*_join()
afterfull_join()
works again (@mgirlich, #1178). -
*_join()
now allows specifying the relationship argument. It must beNULL
or"many-to-many"
(@bairdj, #1305). -
Queries now qualify
*
with the table alias for better compatibility (@mgirlich, #1003). -
full_join()
can now handle column names that only differ in case (@ejneer, #1255). -
The
na_matches
argument ofsemi_join()
andanti_join()
works again (@mgirlich, #1211). -
A
semi/anti_join()
on fitleredy
is inlined when possible (@mgirlich, #884). -
Joins now work again for Pool and Oracle connections (@mgirlich, #1177, #1181).
-
-
A sequence of
union()
resp.union_all()
now produces a flat query instead of subqueries (@mgirlich, #1269). -
Added translations for:
nzchar()
(@MichaelChirico, @mgirlich, #1094).str_detect()
,str_starts()
andstr_ends()
with fixed patterns (@mgirlich, #1009).runif()
(@mgirlich, #1200).
-
if_any()
andif_all()
translations are now wrapped in parentheses. This makes sure it can be combined via&
with other conditions (@mgirlich, #1153). -
nth()
,first()
, andlast()
now support thena_rm
argument (@mgirlich, #1193).
-
across()
now supports namespaced functions, e.g.across(x, dplyr::dense_rank)
(@mgirlich, #1231). -
db_copy_to(overwrite = TRUE)
now actually works. -
db_copy_to()
's...
are now passed todb_write_table()
(@mgirlich, #1237). -
Added
db_supports_table_alias_with_as()
to customise whether a backend supports specifying a table alias withAS
or not (@mgirlich). -
db_write_table()
anddb_save_query()
gain theoverwrite
argument. -
dbplyr_pivot_wider_spec()
is now exported. Unlikepivot_wider()
this can be lazy. Note that this will be removed soon afterpivot_wider_spec()
becomes a generic (@mgirlich). -
filter()
ing with window functions now generates columns calledcol01
rather thanq01
(@mgirlich, #1258). -
pivot_wider()
now matches tidyrNA
column handling (@ejneer #1238). -
select()
can once again be used afterarrange(desc(x))
(@ejneer, #1240). -
show_query()
andremote_query()
gain the argumentsql_options
that allows to control how the SQL is generated. It can be created viasql_options()
which has the following arguments:cte
: use common table expressions?use_star
: useSELECT *
or explicitly select every column?qualify_all_columns
: qualify all columns in a join or only the ambiguous ones? (@mgirlich, #1146).
Consequently the
cte
argument ofshow_query()
andremote_query()
has been deprecated (@mgirlich, #1146). -
slice_min/max()
can now order by multiple variables like dplyr, e.g. useslice_min(lf, tibble(x, y))
(@mgirlich, #1167). -
slice_*()
now supports the data masking pronouns.env
and.data
(@mgirlich, #1294). -
sql_join_suffix()
gains the argumentsuffix
so that methods can check whether the suffix is valid for the backend (@mgirlich). -
sql_random()
is now deprecated. It was used to powerslice_sample()
which is now done via the translation forrunif()
(@mgirlich, #1200). -
tbl()
now informs when the user probably forgot to wrap the table identifier within_schema()
orsql()
(@mgirlich, #1287).
-
Access
- Added translation for
!=
to<>
(@erikvona, #1219).
- Added translation for
-
DuckDB
- now supports the
returning
argument ofrows_*()
.
- now supports the
-
MySQL/MariaDB:
rows_update()
androws_patch()
now give an informative error when the unsupportedreturning
argument is used (@mgirlich, #1279).rows_upsert()
now gives an informative error that it isn't supported (@mgirlich, #1279).rows_*()
use the column types ofx
when auto copyingy
(@mgirlich, #1327).copy_inline()
now works (@mgirlich, #1188).- Fix translation of
as.numeric()
,as.POSIXct()
,as_datetime()
, andas.integer64()
(@avsdev-cw, #1189).
-
MS SQL:
row_number()
now works when no order is specified (@ejneer, @fh-mthomson, #1332)
-
Oracle:
- Fix translation of
rows_upsert()
(@mgirlich, @TBlackmore, #1286) head(n)
is now translated toWHERE ROWNUM <= n
to also support old versions <= 11.2 (@JeremyPasco, #1292).
- Fix translation of
-
Postgres
- The
rows_*()
functions now also work inside a transaction (@mgirlich, #1183).
- The
-
SQLite
- Subqueries now also get an alias. This makes it consistent with other backends and simplifies the implementation.
-
SQL Server
distinct(.keep_all = TRUE)
now works (@mgirlich, #1053).- The translation of
between()
now also works when used inmutate()
(@mgirlich, #1241). any()
andall()
now work (@ejneer, #1273).- Fixed negation of bit (boolean) fields (@ejneer, #1239)
-
Snowflake:
na.rm = TRUE
is now respected inpmin()
andpmax()
instead of being silently ignored (@fh-mthomson, #1329)row_number()
now works when no order is specified (@fh-mthomson, #1332)
-
Teradata
distinct()
+head()
now work (@mgirlich, #685).as.Date(x)
is now translate toCAST(x AS DATE)
again unlessx
is a string (@mgirlich, #1285).row_number()
no longer defaults to partitioning by groups (now aligned with other databases when no order is specified:ROW_NUMBER()
defaults toORDER BY (SELECT NULL)
) (@fh-mthomson, #1331)
- Hot patch release to resolve R CMD check failures.
- Hot patch to fix R CMD check issues
- Hot patch to fix R CMD check issues
window_order()
now only accepts bare symbols or symbols wrapped indesc()
. This breaking change is necessary to allowselect()
to drop and rename variables used inwindow_order()
(@mgirlich, #1103).
-
quantile()
andmedian()
now error for SQL Server when used insummarise()
and for PostgreSQL when used inmutate()
as they can't be properly translated (@mgirlich, #1110). -
Added an informative error for unsupported join arguments
unmatched
andmultiple
(@mgirlich). -
Using predicates, e.g.
where(is.integer)
, inacross()
now produces an error as they never worked anyway (@mgirlich, #1169). -
Catch unsupported argument
pivot_wider(id_expand = TRUE)
andpivot_longer(cols_vary)
(@mgirlich, #1109).
-
Fixed an issue when using a window function after a
summarise()
andselect()
(@mgirlich, #1104). -
Fixed an issue when there where at least 3 joins and renamed variables (@mgirlich, #1101).
-
mutate()
andselect()
afterdistinct()
now again produce a subquery to generate the correct translation (@mgirlich, #1119, #1141). -
Fixed an issue when using
filter()
on a summarised variable (@mgirlich, #1128). -
mutate()
+filter()
now again produces a new query if themutate()
uses a window function or SQL (@mgirlich, #1135). -
across()
andpick()
can be used (again) indistinct()
(@mgirlich, #1125). -
The
rows_*()
function work again for tables in a schema in PostgreSQL (@mgirlich, #1133).
-
sql()
now evaluates its arguments locally also when used inacross()
(@mgirlich, #1039). -
The rank functions (
row_number()
,min_rank()
,rank()
,dense_rank()
,percent_rank()
, andcume_dist()
) now support multiple variables by wrapping them intibble()
, e.g.rank(tibble(x, y))
(@mgirlich, #1118). -
pull()
now supports the argumentname
(@mgirlich, #1136). -
Added support for
join_by()
added in dplyr 1.1.0 (@mgirlich, #1074). -
Using
by = character()
to perform a cross join is now soft-deprecated in favor ofcross_join()
. -
full_join()
andright_join()
are now translated directly toFULL JOIN
andRIGHT JOIN
for SQLite as native support was finally added (@mgirlich, #1150). -
case_match()
now works with strings on the left hand side (@mgirlich, #1143). -
The rank functions (
row_number()
,min_rank()
,rank()
,dense_rank()
,percent_rank()
, andcume_dist()
) now work again for variables wrapped indesc()
, e.g.row_number(desc(x))
(@mgirlich, #1118). -
Moved argument
auto_index
after...
in*_join()
(@mgirlich, #1115). -
Removed dependency on assertthat (@mgirlich, #1112).
-
across()
now uses the original value when a column is overridden to match the behaviour of dplyr. For examplemutate(df, across(c(x, y), ~ .x / x))
now producesSELECT `x` / `x` AS `x`, `y` / `x` AS `y` FROM `df`
instead of
SELECT `x`, `y` / `x` AS `y` FROM ( SELECT `x` / `x` AS `x`, `y` FROM `df` )
(@mgirlich, #1015).
-
Restricted length of table aliases to avoid truncation on certain backends (e.g., Postgres) (@fh-mthomson, #1096)
- Compatibility with purrr 1.0.0 (@mgirlich, #1085).
-
stringr::str_like()
(new in 1.5.0) is translated to the closestLIKE
equivalent (@rjpat, #509) -
In preparation for dplyr 1.1.0:
- The
.by
argument is supported (@mgirlich, #1051). - Passing
...
toacross()
is deprecated because the evaluation timing of...
is ambiguous. Now instead of (e.g.)across(a:b, mean, na.rm = TRUE)
useacross(a:b, \(x) mean(x, na.rm = TRUE)
pick()
is translated (@mgirlich, #1044).case_match()
is translated (@mgirlich, #1020).case_when()
now supports the.default
argument (@mgirlich, #1017).
- The
-
Variables that aren't found in either the data or in the environment now produce an error (@mgirlich, #907).
-
dbplyr now produces fewer subqueries resulting in shorter, more readable, and, in some cases, faster SQL. The following combination of verbs now avoids a subquery if possible:
*_join()
+select()
(@mgirlich, #876).select()
+*_join()
(@mgirlich, #875).mutate()
+filter()
andfilter()
+filter()
(@mgirlich, #792).distinct()
(@mgirlich, #880).summarise()
+filter()
now translates toHAVING
(@mgirlich, #877).left/inner_join()
+left/inner_join()
(@mgirlich, #865).
-
dbplyr now uses
SELECT *
after a join instead of explicitly selecting every column, where possible (@mgirlich, #898). -
Joins only use the table aliases ("LHS" and "RHS") if necessary (@mgirlich).
-
When using common table expressions, the results of joins and set operations are now reused (@mgirlich, #978).
-
Many errors have been improved and now show the function where the error happened instead of a helper function (@mgirlich, #907).
-
Errors produced by the database, e.g. in
collect()
orrows_*()
, now show the verb where the error happened (@mgirlich). -
window_order()
now produces a better error message when applied to a data frame (@mgirlich, #947). -
Using a named
across()
now gives a clear error message (@mgirlich, #761).
-
Keyword highlighting can now be customised via the option
dbplyr_highlight
. Turn it off viaoptions(dbplyr_highlight = FALSE)
or pass a custom ansi style, e.g.options(dbplyr_highlight = cli::combine_ansi_styles("bold", "cyan"))
(@mgirlich, #974). -
The rank functions (
row_number()
,min_rank()
,rank()
,dense_rank()
,percent_rank()
, andcume_dist()
) now give missing values the rank NA to match the behaviour of dplyr (@mgirlich, #991). -
NA
s inblob()
s are correctly translated toNULL
(#983). -
copy_inline()
gains atypes
argument to specify the SQL column types (@mgirlich, #963). -
cur_column()
is now supported (@mgirlich, #951). -
distinct()
returns columns ordered the way you request, not the same as the input data (@mgirlich). -
fill()
can now fill "downup" and "updown" (@mgirlich, #1057), and now order by non-numeric columns also in the up direction (@mgirlich, #1057). -
filter()
now works when using a window function and an external vector (#1048). -
group_by()
+ renamed columns works once again (@mgirlich, #928). -
last()
is correctly translated when no window frame is specified (@mgirlich, #1063). -
setOldClass()
uses a namespace, fixing an installation issue (@mgirlich, #927). -
sql()
is now translated differently. The...
are now evaluated locally instead of being translated withtranslate_sql()
(@mgirlich, #952).
-
HANA:
- Correctly translates
as.character()
(#1027). copy_inline()
now works for Hana (#950)
- Correctly translates
-
MySQL:
str_flatten()
usescollapse = ""
by default (@fh-afrachioni, #993)
-
Oracle:
slice_sample()
now works for Oracle (@mgirlich, #986).copy_inline()
now works for Oracle (#972)
-
PostgreSQL:
- Generates correct literals for Dates (#727).
str_flatten()
usescollapse = ""
by default (@fh-afrachioni, #993)rows_*()
use the column types ofx
when auto copying (@mgirlich, #909).
-
Redshift:
round()
now respects thedigits
argument (@owenjonesuob, #1033).- No longer tries to use named windows anymore (@owenjonesuob, #1035).
copy_inline()
now works for Redshift (#949, thanks to @ejneer for an initial implementation).str_flatten()
usescollapse = ""
by default (@fh-afrachioni, #993)
-
Snowflake:
-
numeric functions:
all()
,any()
,log10()
,round()
,cor()
,cov()
andsd()
. -
date functions:
day()
,mday()
,wday()
,yday()
,week()
,isoweek()
,month()
,quarter()
,isoyear()
,seconds()
,minutes()
,hours()
,days()
,weeks()
,months()
,years()
andfloor_date()
. -
string functions:
grepl()
,paste()
,paste0()
,str_c()
,str_locate()
,str_detect()
,str_replace()
,str_replace_all()
,str_remove()
,str_remove_all()
,str_trim()
,str_squish()
andstr_flatten()
(@fh-afrachioni, #860). -
str_flatten()
usescollapse = ""
by default (@fh-afrachioni, #993) -
SQLite:
quantile()
gives a better error saying that it is not supported (@mgirlich, #1000).
-
SQL server:
as.POSIXct()
now translated correctly (@krlmlr, #1011).median()
now translated correctly (#1008).pivot_wider()
works again for MS SQL (@mgirlich, #929).- Always use 1 and 0 as literals for logicals (@krlmlr, #934).
-
Teradata:
- Querying works again. Unfortunately, the fix requires every column to once again by explicitly selected (@mgirlich, #966).
- New translations for
as.Date()
,week()
,quarter()
,paste()
,startsWith()
,row_number()
,weighted.mean()
,lead()
,lag()
, andcumsum()
(@overmar, #913).
-
Querying Oracle databases works again. Unfortunately, the fix requires every column to be explicitly selected again (@mgirlich, #908).
-
semi_join()
andanti_join()
work again for Spark (@mgirlich, #915). -
str_c()
is now translated to||
in Oracle (@mgirlich, #921). -
sd()
,var()
,cor()
andcov()
now give clear error messages on databases that don't support them. -
any()
andall()
gain default translations for all backends.
-
SQL formatting has been considerably improved with new wrapping and indenting.
show_query()
creates more readable queries by printing the keywords in blue (@mgirlich, #644). When possible dbplyr now usesSELECT *
instead of explicitly selecting every column (@mgirlich). -
Added support for
rows_insert()
,rows_append()
,rows_update()
,rows_patch()
,rows_upsert()
, androws_delete()
(@mgirlich, #736). -
Added
copy_inline()
as acopy_to()
equivalent that does not need write access (@mgirlich, #628). -
remote_query()
,show_query()
,compute()
andcollect()
have an experimentalcte
argument. IfTRUE
the SQL query will use common table expressions instead of nested queries (@mgirlich, #638). -
New
in_catalog()
, which works likein_schema()
, but allows creation of table identifiers consisting of three components: catalog, schema, name (#806, @krlmlr).
-
When possible, dbplyr now uses
SELECT *
instead of explicitly selecting every column (@mgirlich). -
New translation for
cut()
(@mgirlich, #697). -
Improved translations for specific backends:
as.Date()
for Oracle (@mgirlich, #661).case_when()
with a final clause of the formTRUE ~ ...
usesELSE ...
for SQLite (@mgirlich, #754).day()
,week()
,isoweek()
, andisoyear()
for Postgres (@mgirlich, #675).explain()
for ROracle (@mgirlich).fill()
for SQL Server (#651, @mgirlich) and RPostgreSQL (@mgirlich).quantile()
for SQL Server (@mgirlich, #620).str_flatten()
for Redshift (@hdplsa, #804)slice_sample()
for MySQL/MariaDB and SQL Server (@mgirlich, #617).union()
for Hive (@mgirlich, #663).
-
The backend function
dbplyr_fill0()
(used for databases that lackIGNORE NULLS
support) now respects database specific translations (@rsund, #753). -
Calls of the form
stringr::foo()
orlubridate::foo()
are now evaluated in the database, rather than locally (#197). -
Unary plus (e.g.
db %>% filter(x == +1)
) now works (@mgirlich, #674). -
is.na()
,ifelse()
,if_else()
,case_when()
, andif()
generate slightly more compact SQL (@mgirlich, #738). -
if_else()
now supports themissing
argument (@mgirlich, #641). -
n()
now respects the window frame (@mgirlich, #700). -
quantile()
no longer errors when using thena.rm
argument (@mgirlich, #600). -
remote_name()
now returns a name in more cases where it makes sense (@mgirlich, #850). -
The partial evaluation code is now more aligned with
dtplyr
. This makes it easier to transfer bug fixes and new features from one package to the other. In this process the second argument ofpartial_eval()
was changed to a lazy frame instead of a character vector of variables (@mgirlich, #766). Partially evaluated expressions with infix operations are now correctly translated. For exampletranslate_sql(!!expr(2 - 1) * x)
now works (@mgirlich, #634).
-
New
pillar::tbl_format_header()
method for lazy tables: Printing a lazy table where all rows are displayed also shows the exact number of rows in the header. The threshold is controlled bygetOption("pillar.print_min")
, with a default of 10 (#796, @krlmlr). -
The 1st edition extension mechanism is formally deprecated (#507).
-
across()
,if_any()
andif_all()
now defaults to.cols = everything()
(@mgirlich, #760). If.fns
is not providedif_any()
andif_all()
work like a parallel version ofany()
/any()
(@mgirlich, #734). -
across()
,if_any()
, andif_all()
can now translate evaluated lists and functions (@mgirlich, #796), and accept the name of a list of functions (@mgirlich, #817). -
Multiple
across()
calls inmutate()
andtransmute()
can now access freshly created variables (@mgirlich, #802). -
add_count()
now doesn't change the groups of the input (@mgirlich, #614). -
compute()
can now handle whenname
is named by unnaming it first (@mgirlich, #623), and now works whentemporary = TRUE
for Oracle (@mgirlich, #621). -
distinct()
now supports.keep_all = TRUE
(@mgirlich, #756). -
expand()
now works in DuckDB (@mgirlich, #712). -
explain()
passes...
to methods (@mgirlich, #783), and works for Redshift (@mgirlich, #740). -
filter()
throws an error if you supply a named argument (@mgirlich, #764). -
Joins disambiguates columns that only differ in case (@mgirlich, #702). New arguments
x_as
andy_as
allow you to control the table alias used in SQL query (@mgirlich, #637). Joins withna_matches = "na"
now work for DuckDB (@mgirlich, #704). -
mutate()
andtransmute()
use named windows if a window definition is used at least twice and the backend supports named windows (@mgirlich, #624). -
mutate()
now supports the arguments.keep
,.before
, and.after
(@mgirlich, #802). -
na.rm = FALSE
only warns once every 8 hours across all functions (#899). -
nesting()
now supports the.name_repair
argument (@mgirlich, #654). -
pivot_longer()
can now pivot a column namedname
(@mgirlich, #692), can repair names (@mgirlich, #694), and can work with multiplenames_from
columns (@mgirlich, #693). -
pivot_wider(values_fn = )
andpivot_longer(values_transform = )
can now be formulas (@mgirlich, #745). -
pivot_wider()
now supports the argumentsnames_vary
,names_expand
, andunused_fn
(@mgirlich, #774). -
remote_name()
now returns a name in more cases where it makes sense (@mgirlich, #850). -
sql_random()
is now exported. -
ungroup()
removes variables in...
from grouping (@mgirlich, #689). -
transmute()
now keeps grouping variables (@mgirlich, #802).
-
New support for Snowflake (@edgararuiz)
-
compute()
,sql_table_index()
, andsql_query_wrap()
now work with schemas (@mgirlich, #595). -
if_any()
andif_all()
are now translated. -
group_by()
now ungroups when the dots argument is empty and.add
isFALSE
(@mgirlich, #615). -
sql_escape_date()
andsql_escape_datetime
gain methods for MS Access (@erikvona, #608).
-
Thanks to @mgirlich, dbplyr gains support for key verbs from tidyr:
pivot_longer()
(#532),pivot_wider()
(#543),expand()
(#538),complete()
(#538),replace_na()
(#538),fill()
(#566). -
@mgirlich is now a dbplyr author in recognition of his significant and sustained contributions.
-
across()
implementation has been rewritten to support more inputs: it now translates formulas (#525), works with SQL functions that don't have R translations (#534), and work withNULL
(#554) -
summarise()
now supports argument.groups
(@mgirlich, #584).
-
All backends:
str_sub()
,substr()
andsubstring()
get better translations (#577). Most importantly, the results of using negative locations should match the underlying R implementations more closely. -
MS SQL:
-
as.integer()
andas.integer64()
translations cast first toNUMERIC
to avoid CASTing weirdness (@DavidPatShuiFong, #496). -
Assumes a boolean context inside of
[
(#546) -
str_sub()
withend = -1
now works (#577).
-
-
Redshift:
lag()
andlead()
lose thedefault
parameter since it's not supported (@hdplsa, #548). -
SQLite: custom translation of
full_join()
andright_join()
(@mgirlich, #536).
-
RPostgreSQL backend warns if
temporary = TRUE
since temporary tables are not supported byRPostgreSQL::dbWriteTable()
(#574). -
count()
method provides closer match to dplyr semantics (#347). -
distinct()
now respects grouping (@mgirlich, #535). -
db_connection_describe()
no longer uses partial matching (@mgirlich, #564). -
pull()
no longerselect()
s the result when there's already only one variable (#562). -
select()
no longer relocates grouping variables to the front (@mgirlich, #568). and informs when adding missing grouping variables (@mgirlich, #559). -
tbl.src_dbi(...)
now passed on totbl_sql()
(#530).
-
across()
is now translated into individual SQL statements (#480). -
rename()
andselect()
support dplyr 1.0.0 tidyselect syntax (apart from predicate functions which can't easily work on computed queries) (#502). -
relocate()
makes it easy to move columns (#494) andrename_with()
makes it easy to rename columns programmatically (#502). -
slice_min()
,slice_max()
, andslice_order()
are now supported.slice_head()
andslice_tail()
throw clear error messages (#394)
-
Documentation has been radically improved with new topics for each major verb and each backend giving more details about the SQL translation.
-
intersect()
,union()
andsetdiff()
gain anall
argument to add theALL
argument (#414). -
Join functions gains a
na_matches
argument that allows you to control whether or notNA
(NULL
) values match otherNA
values. The default is"never"
, which is the usual behaviour in databases. You can setna_matches = "na"
to match R's usual join behaviour (#180). Additional arguments error (instead of being silently swallowed) (#382). -
Joins now only use aliases where needed to disambiguate columns; this should make generated queries more readable.
-
Subqueries no longer include an
ORDER BY
clause. This is not part of the SQL spec, and has very limited support across databases. Now such queries generate a warning suggesting that you move yourarrange()
call later in the pipeline (#276). (There's one exception:ORDER BY
is still generated ifLIMIT
is present; this tends to affect the returns rows but not necessarily their order). -
Subquery names are now scoped within the query. This makes query text deterministic which helps some query optimisers/cachers (#336).
-
sql_optimise()
now can partially optimise a pipeline; due to an unfortunate bug it previously gave up too easily. -
in_schema()
quotes each input individually (#287) (usesql()
to opt out of quoting, if needed). AndDBI::Id()
should work anywhere thatin_schema()
does.
-
Experimental new SAP HANA backend (#233). Requires the latest version of odbc.
-
All backends:
-
You can now use
::
in translations, so that (e.g.)dbplyr::n()
is translated tocount(*)
(#207). -
[[
can now also translate numeric indices (#520). -
%/%
now generates a clear error message; previously it was translated to/
which is not correct (#108). -
n()
is translated tocount(*)
instead ofcount()
(#343). -
sub_str()
translation is more consistent in edge cases (@ianmcook). -
All
median()
(@lorenzwalthert, #483),pmin()
,pmax()
(#479),sd()
andvar()
functions have anna.rm
argument that warns once when notTRUE
. This makes them consistent withmean()
andsum()
. -
substring()
is now translated the same way assubstr()
(#378).
-
-
blob vectors can now be used with
!!
and!!!
operators, for example infilter()
(@okhoma, #433) -
MySQL uses standard SQL for index creation.
-
MS SQL translation does better a distinguishing between bit and boolean (#377, #318).
if
andifelse
once again generateIIF
, creating simpler expressions.as.*()
function usesTRY_CAST()
instead ofCAST()
for version 11+ (2012+) (@DavidPatShuiFong, #380). -
odbc no longer translates
count()
; this was an accidental inclusion. -
Oracle translation now depends on Oracle 12c, and uses a "row-limiting" clause for
head()
. It gains translations fortoday()
andnow()
, and improvedas.Date()
translation (@rlh1994, #267). -
PostgreSQL: new translations for lubridate period functions
years()
,months()
,days()
, andfloor_date()
(@bkkkk, #333) and stringr functionsstr_squish()
,str_remove()
, andstr_remove_all()
(@shosaco). -
New RedShift translations when used with
RPostgres::Redshift()
.-
str_replace()
errors since there's no Redshift translation, andstr_replace_all()
usesREGEXP_REPLACE()
(#446). -
paste()
andpaste0()
use||
(#458). -
as.numeric()
andas.double()
cast toFLOAT
(#408). -
substr()
andstr_sub()
useSUBSTRING()
(#327).
-
-
SQLite gains translations for lubridate functions
today()
,now()
,year()
,month()
,day()
,hour()
,minute()
,second()
,yday()
(#262), and correct translation formedian()
(#357).
If you are the author of a dbplyr backend, please see vignette("backend-2")
for details.
-
New
dbplyr_edition()
generic allows you to opt-in to the 2nd edition of the dbplyr API. -
db_write_table()
now callsDBI::dbWriteTable()
instead of nine generics that formerly each did a small part:db_create_indexes()
,db_begin()
,db_rollback()
,db_commit()
,db_list_tables()
,drop_drop_table()
,db_has_table()
,db_create_table()
, anddb_data_types()
. You can now delete the methods for these generics.db_query_rows()
is no longer used; it appears that it hasn't been used for some time, so if you have a method, you can delete it. -
DBI::dbQuoteIdentifier()
is now used instead ofsql_escape_ident()
andDBI::dbQuoteString()
instead ofsql_escape_string()
. -
A number of
db_*
generics have been replaced with new SQL generation generics:dplyr::db_analyze()
->dbplyr::sql_table_analyze()
dplyr::db_create_index()
->dbplyr::sql_table_index()
dplyr::db_explain()
->dbplyr::sql_queriy_explain()
dplyr::db_query_fields()
->dbplyr::sql_query_fields()
dplyr::db_save_query()
->dbplyr::sql_query_save()
This makes them easier to test and is an important part of the process of moving all database generics in dbplyr (#284).
-
A number of other generics have been renamed to facilitate the move from dplyr to dbplyr:
dplyr::sql_select()
->dbplyr::sql_query_select()
dplyr::sql_join()
->dbplyr::sql_query_join()
dplyr::sql_semi_join()
->dbplyr::sql_query_semi_join()
dplyr::sql_set_op()
->dbplyr::sql_query_set_op()
dplyr::sql_subquery()
->dbplyr::sql_query_wrap()
dplyr::db_desc()
->dbplyr::db_connection_describe()
-
New
db_temporary_table()
generic makes it easier to work with databases that require temporary tables to be specially named. -
New
sql_expr_matches()
generic allows databases to use more efficient alternatives when determine if two values "match" (i.e. like equality but a pair ofNULL
s will also match). For more details, see https://modern-sql.com/feature/is-distinct-from -
New
sql_join_suffix()
allows backends to control the default suffixes used (#254).
-
All old lazy eval shims have been removed. These have been deprecated for some time.
-
Date-time escaping methods for Athena and Presto have moved to the packages where they belong.
-
Attempting to embed a Shiny reactive in a query now gives a helpful error (#439).
-
copy_lahman()
andcopy_nycflights13()
(and hencenycflights13_sqlite()
) and friends now return DBI connections rather than the now deprecatedsrc_dbi()
(#440). -
copy_to()
can nowoverwrite
when table is specified with schema (#489), and gains anin_transaction
argument used to optionally suppress the transaction wrapper (#368). -
distinct()
no longer duplicates column if grouped (#354). -
transmute()
now correctly tracks variables it needs when creating subqueries (#313). -
mutate()
grouping variables no longer generates a downstream error (#396) -
mutate()
correctly generates subqueries when you re-use the same variable three or more times (#412). -
window_order()
overrides ordering, rather than appending to it.
-
Internally
DBI::dbExecute()
now usesimmediate = TRUE
; this improves support for session-scoped temporary tables in MS SQL (@krlmlr, #438). -
Subqueries with
ORDER BY
useTOP 9223372036854775807
instead ofTOP 100 PERCENT
on SQL Server for compatibility with Azure Data Warehouse (#337, @alexkyllo). -
escape()
now supportsblob
vectors using newsql_escape_raw()
generic. It enables using blob variables in dplyr verbs, for example to filter nvarchar values by UTF-16 blobs (see r-dbi/DBI#215 (comment)). (@okhoma, #433) -
Added
setOldClass()
calls for"ident"
and"ident_q"
classes for compatibility with dplyr 1.0.0 (#448, @krlmlr). -
Postgres
str_detect()
translation uses same argument names as stringr, and gains anegate
argument (#444). -
semi_join()
andanti_join()
now correctly support thesql_on
argument (#443, @krlmlr).
-
dbplyr now uses RPostgres (instead of RPostgreSQL) and RMariaDB (instead of RMySQL) for its internal tests and data functions (#427).
-
The Date and POSIXt methods for
escape()
now use exportedsql_escape_date()
andsql_escape_datetime()
generics to allow backend specific formatting of date and datetime literals. These are used to provide methods for Athena and Presto backends (@OssiLehtinen, #384, #391). -
first()
,last()
,nth()
,lead()
andlag()
now respect thewindow_frame()
(@krlmlr, #366). -
SQL server: new translations for
str_flatten()
(@PauloJhonny, #405). -
SQL server: temporary datasets are now session-local, not global (#401).
-
Postgres: correct
str_detect()
,str_replace()
andstr_replace_all()
translation (@shosaco, #362).
-
Fix bug when partially evaluating unquoting quosure containing a single symbol (#317)
-
Fixes for rlang and dpylr compatibility.
Minor improvements to SQL generation
-
x %in% y
strips names ofy
(#269). -
Enhancements for scoped verbs (
mutate_all()
,summarise_if()
,filter_at()
etc) (#296, #306). -
MS SQL use
TOP 100 PERCENT
as stop-gap to allow subqueries withORDER BY
(#277). -
Window functions now translated correctly for Hive (#293, @cderv).
-
Error: `con` must not be NULL
: If you see this error, it probably means that you have forgotten to passcon
down to a dbplyr function. Previously, dbplyr defaulted to usingsimulate_dbi()
which introduced subtle escaping bugs. (It's also possible I have forgotten to pass it somewhere that the dbplyr tests don't pick up, so if you can't figure it out, please let me know). -
Subsetting (
[[
,$
, and[
) functions are no longer evaluated locally. This makes the translation more consistent and enables useful new idioms for modern databases (#200).
-
MySQL/MariaDB (https://mariadb.com/kb/en/library/window-functions/) and SQLite (https://www.sqlite.org/windowfunctions.html) translations gain support for window functions, available in Maria DB 10.2, MySQL 8.0, and SQLite 3.25 (#191).
-
Overall, dplyr generates many fewer subqueries:
-
Joins and semi-joins no longer add an unneeded subquery (#236). This is facilitated by the new
bare_identifier_ok
argument tosql_render()
; the previous argument was calledroot
and confused me. -
Many sequences of
select()
,rename()
,mutate()
, andtransmute()
can be collapsed into a single query, instead of always generating a subquery (#213).
-
-
New
vignette("sql")
describes some advantages of dbplyr over SQL (#205) and gives some advice about writing literal SQL inside of dplyr, when you need to (#196). -
New
vignette("reprex")
gives some hints on creating reprexes that work anywhere (#117). This is supported by a newtbl_memdb()
that matches the existingtbl_lazy()
. -
All
..._join()
functions gain ansql_on
argument that allows specifying arbitrary join predicates in SQL code (#146, @krlmlr).
-
New translations for some lubridate functions:
today()
,now()
,year()
,month()
,day()
,hour()
,minute()
,second()
,quarter()
,yday()
(@colearendt, @derekmorr). Also added new translation foras.POSIXct()
. -
New translations for stringr functions:
str_c()
,str_sub()
,str_length()
,str_to_upper()
,str_to_lower()
, andstr_to_title()
(@colearendt). Non-translated stringr functions throw a clear error. -
New translations for bitwise operations:
bitwNot()
,bitwAnd()
,bitwOr()
,bitwXor()
,bitwShiftL()
, andbitwShiftR()
. Unlike the base R functions, the translations do not coerce arguments to integers (@davidchall, #235). -
New translation for
x[y]
toCASE WHEN y THEN x END
. This enablessum(a[b == 0])
to work as you expect from R (#202).y
needs to be a logical expression; if not you will likely get a type error from your database. -
New translations for
x$y
andx[["y"]]
tox.y
, enabling you to index into nested fields in databases that provide them (#158). -
The
.data
and.env
pronouns of tidy evaluation are correctly translated (#132). -
New translation for
median()
andquantile()
. Works for all ANSI compliant databases (SQL Server, Postgres, MariaDB, Teradata) and has custom translations for Hive. Thanks to @edavidaja for researching the SQL variants! (#169) -
na_if()
is correct translated toNULLIF()
(rather thanNULL_IF
) (#211). -
n_distinct()
translation throws an error when given more than one argument. (#101, #133). -
New default translations for
paste()
,paste0()
, and the hyperbolic functions (these previously were only available for ODBC databases). -
Corrected translations of
pmin()
andpmax()
toLEAST()
andGREATEST()
for ANSI compliant databases (#118), toMIN()
andMAX()
for SQLite, and to an error for SQL server. -
New translation for
switch()
to the simple form ofCASE WHEN
(#192).
SQL simulation makes it possible to see what dbplyr will translate SQL to, without having an active database connection, and is used for testing and generating reprexes.
-
SQL simulation has been overhauled. It now works reliably, is better documented, and always uses ANSI escaping (i.e.
`
for field names and'
for strings). -
tbl_lazy()
now actually puts adbplyr::src
in the$src
field. This shouldn't affect any downstream code unless you were previously working around this weird difference betweentbl_lazy
andtbl_sql
classes. It also includes thesrc
class in its class, and when printed, shows the generated SQL (#111).
-
MySQL/MariaDB
-
Translations also applied to connections via the odbc package (@colearendt, #238)
-
Basic support for regular expressions via
str_detect()
and
str_replace_all()
(@colearendt, #168). -
Improved translation for
as.logical(x)
toIF(x, TRUE, FALSE)
.
-
-
Oracle
- New custom translation for
paste()
andpaste0()
(@cderv, #221)
- New custom translation for
-
Postgres
- Basic support for regular expressions via
str_detect()
and
str_replace_all()
(@colearendt, #168).
- Basic support for regular expressions via
-
SQLite
explain()
translation now generatesEXPLAIN QUERY PLAN
which generates a higher-level, more human friendly explanation.
-
SQL server
-
Improved translation for
as.logical(x)
toCAST(x as BIT)
(#250). -
Translates
paste()
,paste0()
, andstr_c()
to+
. -
copy_to()
method applies temporary table name transformation earlier so that you can now overwrite temporary tables (#258). -
db_write_table()
method uses correct argument name for passing along field types (#251).
-
-
Aggregation functions only warn once per session about the use of
na.rm = TRUE
(#216). -
table names generated by
random_table_name()
have the prefix "dbplyr_", which makes it easier to find them programmatically (@mattle24, #111) -
Functions that are only available in a windowed (
mutate()
) query now throw an error when called in a aggregate (summarise()
) query (#129) -
arrange()
understands the.by_group
argument, making it possible sort by groups if desired. The default isFALSE
(#115) -
distinct()
now handles computed variables likedistinct(df, y = x + y)
(#154). -
escape()
,sql_expr()
andbuild_sql()
no longer acceptcon = NULL
as a shortcut forcon = simulate_dbi()
. This made it too easy to forget to passcon
along, introducing extremely subtle escaping bugs.win_over()
gains acon
argument for the same reason. -
New
escape_ansi()
always uses ANSI SQL 92 standard escaping (for use in examples and documentation). -
mutate(df, x = NULL)
dropsx
from the output, just like when working with local data frames (#194). -
partial_eval()
processes inlined functions (including rlang lambda functions). This makes dbplyr work with more forms of scoped verbs likedf %>% summarise_all(~ mean(.))
,df %>% summarise_all(list(mean))
(#134). -
sql_aggregate()
now takes an optional argumentf_r
for passing tocheck_na_rm()
. This allows the warning to show the R function name rather than the SQL function name (@sverchkov, #153). -
sql_infix()
gains apad
argument for the rare operator that doesn't need to be surrounded by spaces. -
sql_prefix()
no longer turns SQL functions into uppercase, allowing for correct translation of case-sensitive SQL functions (#181, @mtoto). -
summarise()
gives a clear error message if you refer to a variable created in that samesummarise()
(#114). -
New
sql_call2()
which is torlang::call2()
assql_expr()
is torlang::expr()
. -
show_query()
andexplain()
usecat()
rather than message. -
union()
,union_all()
,setdiff()
andintersect()
do a better job of matching columns across backends (#183).
- Now supports for dplyr 0.8.0 (#190) and R 3.1.0
-
Calls of the form
dplyr::foo()
are now evaluated in the database, rather than locally (#197). -
vars
argument totbl_sql()
has been formally deprecated; it hasn't actually done anything for a while (#3254). -
src
andtbl
objects now include a class generated from the class of the underlying connection object. This makes it possible for dplyr backends to implement different behaviour at the dplyr level, when needed. (#2293)
-
x %in% y
is now translated toFALSE
ify
is empty (@mgirlich, #160). -
New
as.integer64(x)
translation toCAST(x AS BIGINT)
(#3305) -
case_when
now translates with a ELSE clause if a formula of the formTRUE~<RHS>
is provided . (@cderv, #112) -
cummean()
now generatesAVG()
notMEAN()
(#157) -
str_detect()
now uses correct parameter order (#3397) -
MS SQL
- Cumulative summary functions now work (#157)
ifelse()
usesCASE WHEN
instead ofIIF
; this allows more complex operations, such as%in%
, to work properly (#93)
-
Oracle
- Custom
db_drop_table()
now only drops tables if they exist (#3306) - Custom
setdiff()
translation (#3493) - Custom
db_explain()
translation (#3471)
- Custom
-
SQLite
- Correct translation for
as.numeric()
/as.double()
(@chris-park, #171).
- Correct translation for
-
Redshift
substr()
translation improved (#3339)
-
copy_to()
will only remove existing table whenoverwrite = TRUE
and the table already exists, eliminating a confusing "NOTICE" from PostgreSQL (#3197). -
partial_eval()
handles unevaluated formulas (#184). -
pull.tbl_sql()
now extracts correctly from grouped tables (#3562). -
sql_render.op()
now correctly forwards thecon
argument (@kevinykuo, #73).
- R CMD check fixes
- Forward compatibility fixes for rlang 0.2.0
-
New translations for
- MS Access (#2946) (@DavisVaughan)
- Oracle, via odbc or ROracle (#2928, #2732, @edgararuiz)
- Teradata.
- Redshift.
-
dbplyr now supplies appropriate translations for the RMariaDB and RPostgres packages (#3154). We generally recommend using these packages in favour of the older RMySQL and RPostgreSQL packages as they are fully DBI compliant and tested with DBItest.
-
copy_to()
can now "copy" tbl_sql in the same src, providing another way to cache a query into a temporary table (#3064). You can alsocopy_to
tbl_sqls from another source, andcopy_to()
will automatically collect then copy. -
Initial support for stringr functions:
str_length()
,str_to_upper()
,str_to_lower()
,str_replace_all()
,str_detect()
,str_trim()
. Regular expression support varies from database to database, but most simple regular expressions should be ok.
-
db_compute()
gains ananalyze
argument to matchdb_copy_to()
. -
New
remote_name()
,remote_con()
,remote_src()
,remote_query()
andremote_query_plan()
provide a standard API for get metadata about a remote tbl (#3130, #2923, #2824). -
New
sql_expr()
is a more convenient building block for low-level SQL translation (#3169). -
New
sql_aggregate()
andwin_aggregate()
for generating SQL and windowed SQL functions for aggregates. These take one argument,x
, and warn ifna.rm
is notTRUE
(#3155).win_recycled()
is equivalent towin_aggregate()
and has been soft-deprecated. -
db_write_table
now needs to return the table name
-
Multiple
head()
calls in a row now collapse to a single call. This avoids a printing problem with MS SQL (#3084). -
escape()
now works with integer64 values from the bit64 package (#3230) -
if
,ifelse()
, andif_else()
now correctly scope the false condition so that it only applies to non-NULL conditions (#3157) -
ident()
andident_q()
handle 0-length inputs better, and should be easier to use with S3 (#3212) -
in_schema()
should now work in more places, particularly incopy_to()
(#3013, @baileych) -
SQL generation for joins no longer gets stuck in a endless loop if you request an empty suffix (#3220).
-
mutate()
has better logic for splitting a single mutate into multiple subqueries (#3095). -
Improved
paste()
andpaste0()
support in MySQL, PostgreSQL (#3168), and RSQLite (#3176). MySQL and PostgreSQL gain support forstr_flatten()
which behaves likepaste(x, collapse = "-")
(but for technical reasons can't be implemented as a straightforward translation ofpaste()
). -
same_src.tbl_sql()
now performs correct comparison instead of always returningTRUE
. This means thatcopy = TRUE
once again allows you to perform cross-database joins (#3002). -
select()
queries no longer alias column names unnecessarily (#2968, @DavisVaughan). -
select()
andrename()
are now powered by tidyselect, fixing a few renaming bugs (#3132, #2943, #2860). -
summarise()
once again performs partial evaluation before database submission (#3148). -
test_src()
makes it easier to access a single test source.
-
MS SQL
-
Better support for temporary tables (@Hong-Revo)
-
Different translations for filter/mutate contexts for:
NULL
evaluation (is.na()
,is.null()
), logical operators (!
,&
,&&
,|
,||
), and comparison operators (==
,!=
,<
,>
,>=
,<=
)
-
-
MySQL:
copy_to()
(viadb_write_table()
) correctly translates logical variables to integers (#3151). -
odbc: improved
n()
translation in windowed context. -
SQLite: improved
na_if
translation (@cwarden) -
PostgreSQL: translation for
grepl()
added (@zozlak) -
Oracle: changed VARVHAR to VARCHAR2 datatype (@washcycle, #66)
-
full_join()
over non-overlapping columnsby = character()
translated toCROSS JOIN
(#2924). -
case_when()
now translates to SQL "CASE WHEN" (#2894) -
x %in% c(1)
now generates the same SQL asx %in% 1
(#2898). -
New
window_order()
andwindow_frame()
give you finer control over the window functions that dplyr creates (#2874, #2593). -
Added SQL translations for Oracle (@edgararuiz).
-
x %in% c(1)
now generates the same SQL asx %in% 1
(#2898). -
head(tbl, 0)
is now supported (#2863). -
select()
ing zero columns gives a more information error message (#2863). -
Variables created in a join are now disambiguated against other variables in the same table, not just variables in the other table (#2823).
-
PostgreSQL gains a better translation for
round()
(#60). -
Added custom
db_analyze_table()
for MS SQL, Oracle, Hive and Impala (@edgararuiz) -
Added support for
sd()
for aggregate and window functions (#2887) (@edgararuiz) -
You can now use the magrittr pipe within expressions, e.g.
mutate(mtcars, cyl %>% as.character())
. -
If a translation was supplied for a summarise function, but not for the equivalent windowed variant, the expression would be translated to
NULL
with a warning. Nowsql_variant()
checks that all aggregate functions have matching window functions so that correct translations or clean errors will be generated (#2887)
-
tbl()
andcopy_to()
now work directly with DBI connections (#2423, #2576), so there is no longer a need to generate a dplyr src.library(dplyr) con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") copy_to(con, mtcars) mtcars2 <- tbl(con, "mtcars") mtcars2
-
glimpse()
now works with remote tables (#2665) -
dplyr has gained a basic SQL optimiser, which collapses certain nested SELECT queries into a single query (#1979). This will improve query execution performance for databases with less sophisticated query optimisers, and fixes certain problems with ordering and limits in subqueries (#1979). A big thanks goes to @hhoeflin for figuring out this optimisation.
-
compute()
andcollapse()
now preserve the "ordering" of rows. This only affects the computation of window functions, as the rest of SQL does not care about row order (#2281). -
copy_to()
gains anoverwrite
argument which allows you to overwrite an existing table. Use with care! (#2296) -
New
in_schema()
function makes it easy to refer to tables in schema:in_schema("my_schema_name", "my_table_name")
.
query()
is no longer exported. It hasn't been useful for a while so this shouldn't break any code.
-
Partial evaluation occurs immediately when you execute a verb (like
filter()
ormutate()
) rather than happening when the query is executed (#2370). -
mutate.tbl_sql()
will now generate as many subqueries as necessary so that you can refer to variables that you just created (like in mutate with regular dataframes) (#2481, #2483). -
SQL joins have been improved:
-
SQL joins always use the
ON ...
syntax, avoidingUSING ...
even for natural joins. Improved handling of tables with columns of the same name (#1997, @javierluraschi). They now generate SQL more similar to what you'd write by hand, eliminating a layer or two of subqueries (#2333) -
[API] They now follow the same rules for including duplicated key variables that the data frame methods do, namely that key variables are only kept from
x
, and never fromy
(#2410) -
[API] The
sql_join()
generic now gains avars
argument which lists the variables taken from the left and right sides of the join. If you have a customsql_join()
method, you'll need to update how your code generates joins, following the template insql_join.generic()
. -
full_join()
throws a clear error when you attempt to use it with a MySQL backend (#2045) -
right_join()
andfull_join()
now return results consistent with local data frame sources when there are records in the right table with no match in the left table.right_join()
returns values ofby
columns from the right table.full_join()
returns coalesced values ofby
columns from the left and right tables (#2578, @ianmcook)
-
-
group_by()
can now perform an inline mutate for database backends (#2422). -
The SQL generation set operations (
intersect()
,setdiff()
,union()
, andunion_all()
) have been considerably improved.By default, the component SELECT are surrounded with parentheses, except on SQLite. The SQLite backend will now throw an error if you attempt a set operation on a query that contains a LIMIT, as that is not supported in SQLite (#2270).
All set operations match column names across inputs, filling in non-matching variables with NULL (#2556).
-
rename()
andgroup_by()
now combine correctly (#1962) -
tbl_lazy()
andlazy_tbl()
have been exported. These help you test generated SQL with out an active database connection. -
ungroup()
correctly resets grouping variables (#2704).
-
New
as.sql()
safely coerces an input to SQL. -
More translators for
as.character()
,as.integer()
andas.double()
(#2775). -
New
ident_q()
makes it possible to specifier identifiers that do not need to be quoted. -
Translation of inline scalars:
-
Logical values are now translated differently depending on the backend. The default is to use "true" and "false" which is the SQL-99 standard, but not widely support. SQLite translates to "0" and "1" (#2052).
-
Inf
and-Inf
are correctly escaped -
Better test for whether or not a double is similar to an integer and hence needs a trailing 0.0 added (#2004).
-
Quoting defaults to
DBI::dbEscapeString()
andDBI::dbQuoteIdentifier()
respectively.
-
-
::
and:::
are handled correctly (#2321) -
x %in% 1
is now correctly translated tox IN (1)
(#511). -
ifelse()
andif_else()
use correct argument names in SQL translation (#2225). -
ident()
now returns an object with classc("ident", "character")
. It no longer contains "sql" to indicate that this is not already escaped. -
is.na()
andis.null()
gain extra parens in SQL translation to preserve correct precedence (#2302). -
[API]
log(x, b)
is now correctly translated to the SQLlog(b, x)
(#2288). SQLite does not support the 2-argument log function so it is translated tolog(x) / log(b)
. -
nth(x, i)
is now correctly translated tonth_value(x, i)
. -
n_distinct()
now accepts multiple variables (#2148). -
[API]
substr()
is now translated to SQL, correcting for the difference in the third argument. In R, it's the position of the last character, in SQL it's the length of the string (#2536). -
win_over()
escapes expression using current database rules.
-
copy_to()
now usesdb_write_table()
instead ofdb_create_table()
anddb_insert_into()
.db_write_table.DBIConnection()
usesdbWriteTable()
. -
New
db_copy_to()
,db_compute()
anddb_collect()
allow backends to override the entire database process behindcopy_to()
,compute()
andcollect()
.db_sql_render()
allow additional control over the SQL rendering process. -
All generics whose behaviour can vary from database to database now provide a DBIConnection method. That means that you can easily scan the NAMESPACE to see the extension points.
-
sql_escape_logical()
allows you to control the translation of literal logicals (#2614). -
src_desc()
has been replaced bydb_desc()
and now dispatches on the connection, eliminating the last method that required dispatch on the class of the src. -
win_over()
,win_rank()
,win_recycled()
,win_cumulative()
,win_current_group()
andwin_current_order()
are now exported. This should make it easier to provide customised SQL for window functions (#2051, #2126). -
SQL translation for Microsoft SQL Server (@edgararuiz)
-
SQL translation for Apache Hive (@edgararuiz)
-
SQL translation for Apache Impala (@edgararuiz)
-
collect()
once again defaults to return all rows in the data (#1968). This makes it behave the same asas.data.frame()
andas_tibble()
. -
collect()
only regroups by variables present in the data (#2156) -
collect()
will automatically LIMIT the result to then
, the number of rows requested. This will provide the query planner with more information that it may be able to use to improve execution time (#2083). -
common_by()
gets a better error message for unexpected inputs (#2091) -
copy_to()
no longer checks that the table doesn't exist before creation, instead preferring to fall back on the database for error messages. This should reduce both false positives and false negative (#1470) -
copy_to()
now succeeds for MySQL if a character column containsNA
(#1975, #2256, #2263, #2381, @demorenoc, @eduardgrebe). -
copy_to()
now returns its output invisibly (since you're often just calling for the side-effect). -
distinct()
reports improved variable information for SQL backends. This means that it is more likely to work in the middle of a pipeline (#2359). -
Ungrouped
do()
on database backends now collects all data locally first (#2392). -
Call
dbFetch()
instead of the deprecatedfetch()
(#2134). UseDBI::dbExecute()
for non-query SQL commands (#1912) -
explain()
andshow_query()
now invisibly return the first argument, making them easier to use inside a pipeline. -
print.tbl_sql()
displays ordering (#2287) and prints table name, if known. -
print(df, n = Inf)
andhead(df, n = Inf)
now work with remote tables (#2580). -
db_desc()
andsql_translate_env()
get defaults for DBIConnection. -
Formatting now works by overriding the
tbl_sum()
generic instead ofprint()
. This means that the output is more consistent with tibble, and thatformat()
is now supported also for SQL sources (tidyverse#14).
-
[API] The signature of
op_base
has changed toop_base(x, vars, class)
-
[API]
translate_sql()
andpartial_eval()
have been refined:-
translate_sql()
no longer takes a vars argument; instead callpartial_eval()
yourself. -
Because it no longer needs the environment
translate_sql()_
now works with a list of dots, rather than alazy_dots
. -
partial_eval()
now takes a character vector of variable names rather than a tbl. -
This leads to a simplification of the
op
data structure: dots is now a list of expressions rather than alazy_dots
.
-
-
[API]
op_vars()
now returns a list of quoted expressions. This enables escaping to happen at the correct time (i.e. when the connection is known).