Can someone explain Redshift to me in detail

EXPLAIN

Displays the execution plan for a query statement without executing the query.

syntax

Parameters

VERBOSE

Displays the full query plan, not just a summary.

query

The query statement to be explained. The query can be a SELECT, INSERT, CREATE TABLE AS, UPDATE, or DELETE statement.

Usage instructions

Explain performance is sometimes affected by the time it takes to create temporary tables. For example, a query that uses general subexpression optimization requires temporary tables to be created and parsed to return the explain output. The query plan depends on the schema and statistics of the temporary tables. As a result, the EXPLAIN command for this type of query may take longer than expected to complete.

You can only use EXPLAIN for the following commands:

  • SELECT

  • SELECT INTO

  • CREATE TABLE AS

  • INSERT

  • UPDATE

  • DELETE

The EXPLAIN command does not succeed if you use it for other SQL commands, such as data definition language (DDL) or database operations.

Query planning and execution steps

The execution plan for a given Amazon Redshift query statement breaks down the execution and computation of a query into a separate series of steps and table operations that ultimately return a final result set for the query. For information about query scheduling, see Processing Queries.

The following table summarizes the steps Amazon Redshift can use in developing an execution plan for a query that a user submits for execution.

EXPLAIN operators Query execution steps description
SCAN:
Sequential scan scan - Refer to Amazon Redshift can or table scan operator or step. Scans the entire table sequentially from top to bottom and also evaluates query restrictions for each row (filter) when specified with the WHERE clause. Also used for executing INSERT, UPDATE, and DELETE statements.
JOINS: Amazon Redshift from the physical design of the tables on which a join operator is performed, the location of the data needed for the join, and the specific attributes of the query itself. Subquery Scan - Subquery Scans and Appendices are used to run UNION queries.
Nested loop nloop This is the least optimal join; is mainly used for cross joins (Cartesian products; no join condition) and some inequality joins.
Hash join hjoin Also used for internal joins and external left and right joins, and is typically faster than joining through a nested loop. Hash joins read the external table, hash the attached column, and look for matches in the internal hash table. This step can spread to the hard disk. (The internal input to hjoin is a hash step, which can be disk-based.)
Merge join mjoin Also used for internal and external joins (for join tables that are distributed and sorted based on the join columns). Typically, this is the fastest Amazon Redshift join algorithm without considering other cost considerations.
AGGREGATION: Operators and steps used in queries that use aggregation functions and GROUP BY operations.
Aggregates aggr Operator / step for scalar summary functions.
HashAggregate aggr Operator / step for grouped summary functions. Can be executed from the hard disk if the hash table spills over to the hard disk.
GroupAggregate aggr An operator sometimes chosen for grouped aggregate queries when the Amazon Redshift setting for the force_hash_grouping setting is not enabled.
SORT BY: Operators and steps used when queries need to sort or merge result sets.
Sorting sort "Sort" performs the sorting specified by the ORDER BY clause as well as other operations such as UNION and joins. Can be run from the hard drive.
Merge merge Creates the final sorted results of a query based on intermediate sorted results derived from operations performed in parallel.
EXCEPT, INTERSECT and UNION operations:
SetOp Except [Distinct] hjoin Used for EXCEPT queries. Can be run from disk as the input hash can be disk based.
Hash Intersect [Distinct] hjoin Is used for INTERSECT queries. Can be run from disk as the input hash can be disk based.
Append [All | Distinct] Save An attach that is performed with the subquery scan to implement UNION and UNION ALL queries. Can be executed via the hard disk due to "save".
Miscellaneous / miscellaneous:
Hash hash Used for internal joins and external left and right joins (provides inputs for a hash join). The hash operator creates the hash table for the internal table of a join. (The internal table is the table that is checked for matches and, in the case of a join between two tables, is usually the smaller of the two tables.)
limit limit Evaluates the limit clause.
Materialize Save Translates lines for input for joins with nested loops and some merge joins. Can be run from the hard drive.
-- parse Used to parse textual input data while loading.
-- project Used to represent columns and data processing expressions; H. Project data to rearrange.
Result -- Performs scalar functions that do not require table access.
-- return Returns rows to the leader or client.
Subplan -- Used for certain subqueries.
Unique clearly Eliminates duplicates from SELECT DISTINCT and UNION queries.
Window window Data processing for aggregation and classification of window functions. Can be run from the hard drive.
Network operations:
Network (broadcast) bcast Broadcast is also an attribute of Join Explain operators and steps.
Network (Distribute) dist Distribution of rows to data processing nodes for parallel processing by a data warehouse cluster.
Network (Send to Leader) return Sends the results back to the leader for further processing.
DML operations (operators that modify data):
Insert (using the result) insert Inserts data.
Delete (Scan + Filter) delete Erases data. Can be run from the hard drive.
Update (scan + filter) delete, insert Implemented as a delete and insert operation.

Examples

For these examples, the sample output may differ depending on the Amazon Redshift configuration.

The following example returns the query plan for a query that selects EVENTID, EVENTNAME, VENUEID, and VENUENAME from the EVENT and VENUE tables:

The following example returns the query plan for the same query with verbose output:

The following example returns the query plan for a CREATE TABLE AS (CTAS) statement: