Replies: 6 comments 20 replies
-
One concern I had was around array literals but I think they would actually be ok. So instead of from [
{`small number'=1e-10, `large number`=1e10},
{`small number'=2e-10, `large number`=2e10},
]
select {`small number', `large number`} we would write: from [
{[small number]=1e-10, [large number]=1e10},
{[small number]=2e-10, [large number]=2e10},
]
select {[small number], [large number]} The brackets in the |
Beta Was this translation helpful? Give feedback.
-
What does this mean? |
Beta Was this translation helpful? Give feedback.
-
That seems like the parser is going to have to be very loose to allow that — it's not going to know whether We don't use arrays much yet, and I guess there aren't many times we have an array of columns, so I agree it's possible to shoehorn at the moment. But mixing syntax like this arguably makes a language quite difficult to understand for people as well as the compiler. "What do brackets do?" — "They're for arrays, unless there's a single item, in which case they can escape columns names" — I would argue that's not simple, for a very basic question. I do agree backticks have a disadvantage in shells or markdown. But if it really were a big issue, I would much sooner repurpose single or double quotes than make ambiguity with arrays. |
Beta Was this translation helpful? Give feedback.
-
Is this related to the question "A scalar is equivalent to an array of length 1?" |
Beta Was this translation helpful? Give feedback.
-
Tables can be thought of as lists of tuples/records (the traditional OLTP model) or as tuples of arrays/columns (the OLAP model). Take for example >>> d1 = pd.DataFrame.from_records( [ {'a':1, 'b':2 } ] )
>>> d2 = pd.DataFrame.from_dict( { 'a':[1], 'b':[2] } )
>>> d1 == d2
a b
0 True True
>>> d1
a b
0 1 2 Hopefully you can already see hints of where I'm wanting to go. If we ignore the constructor methods and just look at the arguments we can see that there we roughly have that [ {'a':1, 'b':2 } ] ~ { 'a':[1], 'b':[2] } In PRQL we already have the array literal syntax, so you can do from [{a=1, b=2}]
select {a, b} # or select {`a`, `b`} With the proposal above, you could instead also write from [{a=1, b=2}]
select {[a], [b]} My larger point is that this actually clearly communicates the structure of the data as tuple of column arrays. I think this is quite elegant! I think this also ties in with the discussion with @aljazerzen in #2723 around how to think of the arguments to aggregation and window functions, especially this comment #2723 (comment). I think the proposed notation makes some of these notions actually much clearer: let tax_rate = 0.15
from invoices
derive [tax amount] = tax_rate * [invoice amount]
aggregate {
[invoice total] = sum [invoice amount],
[tax total] = sum [tax amount],
} Let's look at the following line in detail to explain how I suggest one should think about this: derive [tax amount] = tax_rate * [invoice amount] I read that as the After seeing this EDIT: I changed |
Beta Was this translation helpful? Give feedback.
-
To put it bluntly: I'm not in favor of this proposal, -1.
|
Beta Was this translation helpful? Give feedback.
-
One great use case for PRQL is in working with data in the terminal; see for example prql-query and my recent tweet on turning UK Census data csv files into parquet files. The code from that is (slightly edited):
The problem is that when table and column names have to be quoted with like
table
, the backticks `` have a special meaning in the shell (subshell invocation). In order to avoid that behaviour you then have to quote your PRQL code with single quotes '' which in turn means you miss out on many of the string interpolation features that make shells such great tools for interactive work.So what alternative quoting characters could we look at? The single-quotes and double-quotes are already heavily used and there are reasons why we don't use them for this so we don't need to revisit that. This Modern SQL Style Guide reminded me that SQL Server uses brackets for this like
[table].[column]
. I've never been much of a fan of this but after exploring some thoughts around this, I've come to the conclusion that it's actually a very good option.Your first concern about this might be that we want to use brackets for arrays/lists so won't this create an ambiguity/inconsistency? I have two responses to this:
[my column]
vs["my string"]
.[column]
would actually aid in expressing this visually.Take for example the first example from the aggregation tutorial page:
with the proposed syntax this could be written as:
which I think actually quite nicely communicates that you are summing an array of values.
Of course the bracket quoting wouldn't be necessary in the example above and would only be required when there is something like a space in the column name like:
Of course this would resolve the CLI usage problem with the backticks.
One question that arose for me is what would we use on the LHS of assignments? Would we use the same? Some languages have different rules for this but I think it would make sense to stay consistent.
I still want to test this out in a few more scenarios to see how it interacts with other parts of the language, but so far it is looking very promising to me.
Beta Was this translation helpful? Give feedback.
All reactions