Chapter 19

Internals of ggplot2

Author

Aditya Dahiya

Published

September 26, 2024

This chapter has no exercises. So, I will try to summarize some concepts using mermaid flowcharts in Quarto

19.1 The ggplot2 Plot Rendering Process

This Figure 1 illustrates the 5 key steps involved in rendering a ggplot2 object into an image, generated using Quarto’s native support for mermaid diagrams. (Credits: Code help from ChatGPT also.)

Code
flowchart TD
    A["1. Create ggplot object"] --> B["2. ggplot_build(): Prepare data for each layer"]
    B --> C["3. ggplot_gtable(): Convert data to graphical elements (gtable)"]
    C --> D["4. grid::grid.newpage(): Create new image page"]
    D --> E["5. grid::grid.draw(): Draw gtable on the image"]

flowchart TD
    A["1. Create ggplot object"] --> B["2. ggplot_build(): Prepare data for each layer"]
    B --> C["3. ggplot_gtable(): Convert data to graphical elements (gtable)"]
    C --> D["4. grid::grid.newpage(): Create new image page"]
    D --> E["5. grid::grid.draw(): Draw gtable on the image"]

Figure 1: A flowchart with 5 steps on how ggplot2 object is actually drawn into a graphic.

19.2 Steps in the ggplot2 Build Process:

  • Data Preparation:
    • ggplot_build() starts by preparing data for each layer of the plot.
    • Each layer can provide its own data, inherit the global data, or use a function to generate data.
    • The data is passed through the plot’s layout, which organizes coordinate systems and facets (different sections of the plot).
    • The PANEL column is added to the data, which ensures that each row is linked to a specific plot panel.
  • Data Transformation:
    • Any transformations (e.g., log scaling) specified in the scales are applied first to the data.
    • Position scales are applied next, such as continuous or discrete scales (e.g., for axes), which may remove out-of-bounds values or adjust data into bins.
    • Statistical transformations (e.g., smoothing or regression) are then performed based on the data and layers.
    • After this, the geom (geometry) layers adjust the positions and apply any necessary transformations (e.g., jittering).
    • Finally, all non-positional aesthetics (e.g., colors, line types) are mapped, and the data is prepared for rendering.
  • Final Output:
    • The result of ggplot_build() is a structured list with the final prepared data, layout details, trained scales, and the original plot object, now ready for rendering.
Code
flowchart TD
    A{{Data Preparation}} --> B{{Gather Data}}
    B --> C{{Add PANEL Column}}
    C --> D{{Data Transformation}}
    D --> E{{Apply Scale Transformations}}
    E --> F{{Map Position Aesthetics}}
    F --> G{{Perform Statistical Transformations}}
    G --> H{{Adjust Geometry Positions}}
    H --> I{{Map Non-Positional Aesthetics}}
    I --> J{{Final Output}}

flowchart TD
    A{{Data Preparation}} --> B{{Gather Data}}
    B --> C{{Add PANEL Column}}
    C --> D{{Data Transformation}}
    D --> E{{Apply Scale Transformations}}
    E --> F{{Map Position Aesthetics}}
    F --> G{{Perform Statistical Transformations}}
    G --> H{{Adjust Geometry Positions}}
    H --> I{{Map Non-Positional Aesthetics}}
    I --> J{{Final Output}}

Figure 2: Flowchart illustrating the step-by-step process of data preparation and transformation in ggplot2, leading to the final output of a plot.
Code
flowchart TD
    A{{Start}} --> B{{Layer Data Frame}}
    B --> C{{Add PANEL Column}}
    C --> D{{Coordinate Transformed Data}}
    D --> E{{Faceted Data}}
    E --> F{{Calculated Aesthetics}}
    F --> G{{Position Mapped Data}}
    G --> H{{Statistical Transformed Data}}
    H --> I{{Position Adjusted Data}}
    I --> J{{"Final Aesthetic Mapped Data (a list object)"}}

flowchart TD
    A{{Start}} --> B{{Layer Data Frame}}
    B --> C{{Add PANEL Column}}
    C --> D{{Coordinate Transformed Data}}
    D --> E{{Faceted Data}}
    E --> F{{Calculated Aesthetics}}
    F --> G{{Position Mapped Data}}
    G --> H{{Statistical Transformed Data}}
    H --> I{{Position Adjusted Data}}
    I --> J{{"Final Aesthetic Mapped Data (a list object)"}}

Figure 3: This flowchart lists each intermediate step along with the data-frames that are created or transformed.

19.3 The gtable step

This section explains how ggplot_gtable() converts the output of the build step into a graphical table (gtable) for rendering. The Figure 4 illustrates the process of transforming plot data into graphical objects (grobs), assembling panels and legends, and adding final elements like titles and margins to produce a complete plot ready for rendering in ggplot2.

Code
flowchart TD
    A("ggplot_gtable()") --> B("Convert Data to Grobs")
    B --> C("Split Data by PANEL and Group")
    C --> D("Coordinate Transformation (Normalize Data)")
    D --> E("Convert Layers to gList of Grobs")
    E --> F("Facet Collects Grobs per Panel")
    F --> G("Assemble Panels into gtable")
    G --> H("Render Axes and Panels")
    H --> I("Train and Merge Legends")
    I --> J("Create Key Grobs for Legends")
    J --> K("Assemble Legend gtable")
    K --> L("Add Title, Subtitle, Caption, Tag")
    L --> M("Add Background and Margins")
    M --> N("Final gtable Object")

flowchart TD
    A("ggplot_gtable()") --> B("Convert Data to Grobs")
    B --> C("Split Data by PANEL and Group")
    C --> D("Coordinate Transformation (Normalize Data)")
    D --> E("Convert Layers to gList of Grobs")
    E --> F("Facet Collects Grobs per Panel")
    F --> G("Assemble Panels into gtable")
    G --> H("Render Axes and Panels")
    H --> I("Train and Merge Legends")
    I --> J("Create Key Grobs for Legends")
    J --> K("Assemble Legend gtable")
    K --> L("Add Title, Subtitle, Caption, Tag")
    L --> M("Add Background and Margins")
    M --> N("Final gtable Object")

Figure 4: Flowchart illustrating the gtable step in ggplot2, where data is transformed into graphical objects, panels and legends are assembled, and final plot adornments are added, resulting in a fully rendered plot ready for grid-based drawing.

19.4 Introducing ggproto

What is a ggproto Object?

  • ggproto is a system within ggplot2 that allows for the creation of R objects with both data and methods. It is a way to define classes and objects that encapsulate functionality and state in a more controlled manner.
  • ggproto objects are similar to object-oriented programming (OOP) concepts, allowing for inheritance, encapsulation, and polymorphism, making it easier to build complex and reusable components.

Structure of ggproto Objects

  • A ggproto object is created using the ggproto() function. It typically includes:
    • Fields: These can store data or parameters relevant to the object.
    • Methods: These define the functions that can be performed on the object or by the object.

Key Characteristics

  1. Inheritance:
    • ggproto allows one object to inherit properties and methods from another. For example, a specific geom (like geom_point) can inherit from a more general geom class, sharing common functionality while allowing for customization.
  2. Encapsulation:
    • Each ggproto object can have its own internal state (data and methods), making it self-contained. This helps manage complexity by organizing related functions and data together.
  3. Polymorphism:
    • Different ggproto objects can have methods with the same name, allowing them to behave differently based on their class. This means you can call the same function on different ggproto objects, and they will respond according to their specific implementation.

Use in ggplot2

  1. Geoms, Stats, and Coordinates:
    • Each component of a plot in ggplot2 (like geoms, stats, and coordinate systems) is represented as a ggproto object. For instance:
      • geom_point is a ggproto object that defines how to draw points on a plot.
      • stat_smooth is another ggproto object that defines statistical transformations.
  2. Customization:
    • Users can create their own custom geoms or stats by defining a new ggproto object that extends the existing ones, making it easy to customize the behavior of plots without altering the core ggplot2 functionality.
  3. Efficiency:
    • ggproto objects are efficient in terms of performance because they allow for method dispatching without the overhead of traditional R object systems.

Thus, understanding ggproto is essential for anyone looking to customize or extend the ggplot2 system effectively.

Imagine ggproto as the quirky, mad scientist of the ggplot2 lab, mixing up wild potions of plots and graphs. Each ggproto object is like a unique recipe card that says, “Add a dash of data here, sprinkle some fancy methods there,” allowing you to whip up custom visualizations that are both deliciously informative and visually appealing.