sridharavulapati
This WordPress.com site is the cat’s pajamas
Spark’s Logical and Physical plans … When, Why, How and Beyond.
Execution plan aim
An execution plan is the set of operations executed to translate a query language statement (SQL, Spark SQL, Dataframe operations etc.) to a set of optimized logical and physical operations.
To sum up, it’s a set of operations that will be executed from the SQL (or Spark SQL) statement to the DAG which will be send to Spark Executors.
If you don’t know what a DAG is, it stands for “Directed Acyclic Graph”. A DAG is an acyclic graph produced by the DAGScheduler in Spark. As a graph, it is composed of vertices and edges that will represent RDDs and operations (transformations and actions) performed on them.
On Spark, the optimizer is named “Catalyst” and can be represented by the schema below. It will produce different types of plans:

Operation name are:
- Analysis
- Logical Optimization
- Physical Planning
- Cost Model Analysis
- Code Generation
And those operations will produce various plans:
- Unresolved logical plan
- Resolved logical plan
- Optimized logical plan
- Physical plans