Skip to content

Commit

Permalink
[SPARK-31753][SQL][DOCS][FOLLOW-UP] Add missing keywords in the SQL docs
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?
update sql-ref docs, the following key words will be added in this PR.

CLUSTERED BY
SORTED BY
INTO num_buckets BUCKETS

### Why are the changes needed?
let more users know the sql key words usage

### Does this PR introduce _any_ user-facing change?
No
![image](https://user-images.githubusercontent.com/46367746/94428281-0a6b8080-01c3-11eb-9ff3-899f8da602ca.png)
![image](https://user-images.githubusercontent.com/46367746/94428285-0d667100-01c3-11eb-8a54-90e7641d917b.png)
![image](https://user-images.githubusercontent.com/46367746/94428288-0f303480-01c3-11eb-9e1d-023538aa6e2d.png)

### How was this patch tested?
generate html test

Closes apache#29883 from GuoPhilipse/add-sql-missing-keywords.

Lead-authored-by: GuoPhilipse <[email protected]>
Co-authored-by: GuoPhilipse <[email protected]>
Signed-off-by: Takeshi Yamamuro <[email protected]>
  • Loading branch information
2 people authored and maropu committed Sep 30, 2020
1 parent ece8d8e commit 3bdbb55
Show file tree
Hide file tree
Showing 2 changed files with 38 additions and 1 deletion.
7 changes: 6 additions & 1 deletion docs/sql-ref-syntax-ddl-create-table-datasource.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,12 @@ as any order. For example, you can write COMMENT table_comment after TBLPROPERTI

* **SORTED BY**

Determines the order in which the data is stored in buckets. Default is Ascending order.
Specifies an ordering of bucket columns. Optionally, one can use ASC for an ascending order or DESC for a descending order after any column names in the SORTED BY clause.
If not specified, ASC is assumed by default.

* **INTO num_buckets BUCKETS**

Specifies buckets numbers, which is used in `CLUSTERED BY` clause.

* **LOCATION**

Expand Down
32 changes: 32 additions & 0 deletions docs/sql-ref-syntax-ddl-create-table-hiveformat.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,9 @@ CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier
[ COMMENT table_comment ]
[ PARTITIONED BY ( col_name2[:] col_type2 [ COMMENT col_comment2 ], ... )
| ( col_name1, col_name2, ... ) ]
[ CLUSTERED BY ( col_name1, col_name2, ...)
[ SORTED BY ( col_name1 [ ASC | DESC ], col_name2 [ ASC | DESC ], ... ) ]
INTO num_buckets BUCKETS ]
[ ROW FORMAT row_format ]
[ STORED AS file_format ]
[ LOCATION path ]
Expand Down Expand Up @@ -65,6 +68,21 @@ as any order. For example, you can write COMMENT table_comment after TBLPROPERTI

Partitions are created on the table, based on the columns specified.

* **CLUSTERED BY**

Partitions created on the table will be bucketed into fixed buckets based on the column specified for bucketing.

**NOTE:** Bucketing is an optimization technique that uses buckets (and bucketing columns) to determine data partitioning and avoid data shuffle.

* **SORTED BY**

Specifies an ordering of bucket columns. Optionally, one can use ASC for an ascending order or DESC for a descending order after any column names in the SORTED BY clause.
If not specified, ASC is assumed by default.

* **INTO num_buckets BUCKETS**

Specifies buckets numbers, which is used in `CLUSTERED BY` clause.

* **row_format**

Use the `SERDE` clause to specify a custom SerDe for one table. Otherwise, use the `DELIMITED` clause to use the native SerDe and specify the delimiter, escape character, null character and so on.
Expand Down Expand Up @@ -203,6 +221,20 @@ CREATE EXTERNAL TABLE family (id INT, name STRING)
STORED AS INPUTFORMAT 'com.ly.spark.example.serde.io.SerDeExampleInputFormat'
OUTPUTFORMAT 'com.ly.spark.example.serde.io.SerDeExampleOutputFormat'
LOCATION '/tmp/family/';

--Use `CLUSTERED BY` clause to create bucket table without `SORTED BY`
CREATE TABLE clustered_by_test1 (ID INT, AGE STRING)
CLUSTERED BY (ID)
INTO 4 BUCKETS
STORED AS ORC

--Use `CLUSTERED BY` clause to create bucket table with `SORTED BY`
CREATE TABLE clustered_by_test2 (ID INT, NAME STRING)
PARTITIONED BY (YEAR STRING)
CLUSTERED BY (ID, NAME)
SORTED BY (ID ASC)
INTO 3 BUCKETS
STORED AS PARQUET
```

### Related Statements
Expand Down

0 comments on commit 3bdbb55

Please sign in to comment.