Skip to content

Commit

Permalink
[SPARK-29458][SQL][DOCS] Add a paragraph for scalar function in sql g…
Browse files Browse the repository at this point in the history
…etting started

### What changes were proposed in this pull request?
Add a paragraph for scalar function in sql getting started

### Why are the changes needed?
To make 3.0 doc complete.

### Does this PR introduce any user-facing change?
before:
<img width="870" alt="Screen Shot 2020-04-21 at 10 11 12 PM" src="https://user-images.githubusercontent.com/13592258/79943182-16d1fd00-841d-11ea-9744-9cdd58d83f81.png">

after:
<img width="865" alt="Screen Shot 2020-04-22 at 11 49 59 PM" src="https://user-images.githubusercontent.com/13592258/80068256-26704500-84f4-11ea-9845-c835927c027e.png">

<img width="1033" alt="Screen Shot 2020-04-23 at 6 22 53 PM" src="https://user-images.githubusercontent.com/13592258/80165100-82d47280-858f-11ea-8c84-1ef702cc1bff.png">

### How was this patch tested?

Closes apache#28290 from huaxingao/scalar.

Authored-by: Huaxin Gao <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
  • Loading branch information
huaxingao authored and srowen committed Apr 28, 2020
1 parent 54996be commit dcc0902
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 10 deletions.
13 changes: 5 additions & 8 deletions docs/sql-getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -347,16 +347,13 @@ For example:
</div>

## Scalar Functions
(to be filled soon)

## Aggregations
Scalar functions are functions that return a single value per row, as opposed to aggregation functions, which return a value for a group of rows. Spark SQL supports a variety of [Built-in Scalar Functions](sql-ref-functions.html#scalar-functions). It also supports [User Defined Scalar Functions](sql-ref-functions-udf-scalar.html).

The [built-in DataFrames functions](api/scala/org/apache/spark/sql/functions$.html) provide common
aggregations such as `count()`, `countDistinct()`, `avg()`, `max()`, `min()`, etc.
While those functions are designed for DataFrames, Spark SQL also has type-safe versions for some of them in
[Scala](api/scala/org/apache/spark/sql/expressions/scalalang/typed$.html) and
[Java](api/java/org/apache/spark/sql/expressions/javalang/typed.html) to work with strongly typed Datasets.
Moreover, users are not limited to the predefined aggregate functions and can create their own. For more details
## Aggregate Functions

Aggregate functions are functions that return a single value on a group of rows. The [Built-in Aggregation Functions](sql-ref-functions-builtin.html#aggregate-functions) provide common aggregations such as `count()`, `countDistinct()`, `avg()`, `max()`, `min()`, etc.
Users are not limited to the predefined aggregate functions and can create their own. For more details
about user defined aggregate functions, please refer to the documentation of
[User Defined Aggregate Functions](sql-ref-functions-udf-aggregate.html).

Expand Down
7 changes: 5 additions & 2 deletions docs/sql-ref-functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,16 @@ Built-in functions are commonly used routines that Spark SQL predefines and a co
Spark SQL has some categories of frequently-used built-in functions for aggregtion, arrays/maps, date/timestamp, and JSON data.
This subsection presents the usages and descriptions of these functions.

* [Aggregate Functions](sql-ref-functions-builtin.html#aggregate-functions)
* [Window Functions](sql-ref-functions-builtin.html#window-functions)
#### Scalar Functions
* [Array Functions](sql-ref-functions-builtin.html#array-functions)
* [Map Functions](sql-ref-functions-builtin.html#map-functions)
* [Date and Timestamp Functions](sql-ref-functions-builtin.html#date-and-timestamp-functions)
* [JSON Functions](sql-ref-functions-builtin.html#json-functions)

#### Aggregate-like Functions
* [Aggregate Functions](sql-ref-functions-builtin.html#aggregate-functions)
* [Window Functions](sql-ref-functions-builtin.html#window-functions)

### UDFs (User-Defined Functions)

User-Defined Functions (UDFs) are a feature of Spark SQL that allows users to define their own functions when the system's built-in functions are not enough to perform the desired task. To use UDFs in Spark SQL, users must first define the function, then register the function with Spark, and finally call the registered function. The User-Defined Functions can act on a single row or act on multiple rows at once. Spark SQL also supports integration of existing Hive implementations of UDFs, UDAFs and UDTFs.
Expand Down

0 comments on commit dcc0902

Please sign in to comment.