<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Julia Community 🟣: Roland Schätzle</title>
    <description>The latest articles on Julia Community 🟣 by Roland Schätzle (@rolandka).</description>
    <link>https://forem.julialang.org/rolandka</link>
    <image>
      <url>https://forem.julialang.org/images/MuyDgePE6VZoo6upJ_5pcJFjr7p8vgC895njW41sUDk/rs:fill:90:90/g:sm/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L3VzZXIvcHJvZmls/ZV9pbWFnZS84OTQv/OTZlNzIzOWUtMzFm/Yy00OTgyLTliN2Ut/ZGNhYmIzZTNlMmZk/LmpwZWc</url>
      <title>Julia Community 🟣: Roland Schätzle</title>
      <link>https://forem.julialang.org/rolandka</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.julialang.org/feed/rolandka"/>
    <language>en</language>
    <item>
      <title>Statistical Plotting with Julia: VegaLite.jl</title>
      <dc:creator>Roland Schätzle</dc:creator>
      <pubDate>Fri, 09 Dec 2022 16:17:37 +0000</pubDate>
      <link>https://forem.julialang.org/rolandka/statistical-plotting-with-julia-vegalitejl-3f85</link>
      <guid>https://forem.julialang.org/rolandka/statistical-plotting-with-julia-vegalitejl-3f85</guid>
      <description>&lt;p&gt;&lt;strong&gt;&lt;em&gt;How to create statistical plots using the VegaLite.jl package&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the &lt;em&gt;second&lt;/em&gt; of several articles where I compare different Julia graphics packages for creating statistical plots. I've started with the Gadfly package (&lt;a href="https://forem.julialang.org/rolandka/statistical-plotting-with-julia-gadflyjl-2lmo"&gt;&lt;em&gt;Statistical Plotting with Julia: Gadfly.jl&lt;/em&gt;&lt;/a&gt;, [SPJ02]) and continue the series here with the VegaLite package.&lt;/p&gt;

&lt;p&gt;In the introduction to the series (&lt;a href="https://forem.julialang.org/rolandka/the-grammar-of-graphics-or-how-to-do-ggplot-style-plotting-in-julia-1fkp"&gt;&lt;em&gt;The Grammar of Graphics or how to do ggplot-style plotting in Julia&lt;/em&gt;&lt;/a&gt;, [SPJ01]), I've explained the Grammar of Graphics (GoG) which is the conceptual base for these graphics packages. There I've also introduced the data which will be used for the plotting examples.&lt;/p&gt;

&lt;p&gt;The objective of this article (and the ones which will follow in the series) is to reproduce the visualizations from [SPJ02] using the exact same data, but each time of course with another graphics package in order to achieve a 1:1 comparison of all packages.&lt;/p&gt;

&lt;p&gt;Therefore the descriptions of these visualizations in the following text will be identical to the ones in [SPJ02]. I.e. this article is self-contained (you can read and understand it, without having read [SPJ02]). It has also the same structure (headlines etc.) like [SPJ02], so that it easy to make a side-by-side comparison.&lt;/p&gt;

&lt;h2&gt;
  
  
  VegaLite
&lt;/h2&gt;

&lt;p&gt;VegaLite.jl is (like Gadfly.jl) a very complete implementation of the Grammar of Graphics (GoG). It has been written by a group led by Prof. David Anthoff (University of Berkeley) consisting of more than 20 &lt;a href="https://github.com/queryverse/VegaLite.jl/graphs/contributors"&gt;contributors&lt;/a&gt;. VegaLite is part of a larger ecosystem of data science packages (called &lt;a href="https://www.queryverse.org/"&gt;Queryverse&lt;/a&gt;) which includes query languages (Query.jl), tools for file IO and UI Tools (ElectronDisplay.jl).&lt;/p&gt;

&lt;p&gt;Technically VegaLite takes quite a different approach: Whereas Gadfly is completely written in Julia, VegaLite is more like a language interface for the &lt;a href="https://vega.github.io/vega-lite/"&gt;&lt;em&gt;Vega-Lite&lt;/em&gt;&lt;/a&gt; graphics package (note the dash in its name in contrast to &lt;em&gt;VegaLite&lt;/em&gt;, which denotes the Julia package). Vega-Lite takes &lt;em&gt;specifications&lt;/em&gt; of visualizations in JSON format as inputs which the Vega-Lite compiler transforms into the corresponding visualizations. &lt;/p&gt;

&lt;p&gt;Vega-Lite is completely independent of the Julia ecosystem and apart from VegaLite there exist interfaces for other languages like JavaScript, Python, R or Scala (see "&lt;a href="https://vega.github.io/vega-lite/ecosystem.html"&gt;Vega-Lite Ecosystem&lt;/a&gt;" for a complete list).&lt;/p&gt;

&lt;p&gt;As Vega-Lite uses JSON as its input format, these specifications have a rather declarative nature. VegaLite tries to mimic this format with the &lt;code&gt;@vlplot&lt;/code&gt;-macro, which is the basis for all visualizations as we will see in the following examples. This makes it less Julian than e.g. Gadfly, but has on the other hand the advantage, that somebody who is familiar with Vega-Lite will easily learn how to use VegaLite. And if there is something missing in the VegaLite documentation it is often easy to find the corresponding part within the Vega-Lite docs. &lt;/p&gt;

&lt;p&gt;A distinguishing feature of Vega-Lite (as well as VegaLite) is its interactivity. Its specifications may not only describe a visualization but also events, points of interest and rules about how to react to these events. But this feature is beyond the article at hand. For readers interested in this aspect, I recommend to have a look at the &lt;a href="https://vega.github.io/vega-lite/"&gt;Vega-Lite home page&lt;/a&gt; or the paper "&lt;a href="https://ieeexplore.ieee.org/document/7539624"&gt;Vega-Lite: A Grammar of Interactive Graphics&lt;/a&gt;".&lt;/p&gt;

&lt;h2&gt;
  
  
  Example Plots
&lt;/h2&gt;

&lt;p&gt;As in the preceding article, I will use for the comparison a few diagram types (or geometries as they are called by the GoG) which are commonly used in data science, namely:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;bar plots&lt;/li&gt;
&lt;li&gt;scatter plots&lt;/li&gt;
&lt;li&gt;histograms&lt;/li&gt;
&lt;li&gt;box plots&lt;/li&gt;
&lt;li&gt;violin plots&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;VegaLite offers of course many types more as you can see in this &lt;a href="https://vega.github.io/vega-lite/examples/"&gt;gallery&lt;/a&gt;. But in order to obtain a 1:1-comparison between all packages, I stuck with the types listed above.&lt;/p&gt;

&lt;p&gt;The data for the examples is assumed to be ready in the DataFrames structures &lt;code&gt;countries&lt;/code&gt;, &lt;code&gt;subregions_cum&lt;/code&gt; and &lt;code&gt;regions_cum&lt;/code&gt; presented in the &lt;a href="https://forem.julialang.org/rolandka/the-grammar-of-graphics-or-how-to-do-ggplot-style-plotting-in-julia-1fkp"&gt;introducing article&lt;/a&gt; [SPJ01] to the series.&lt;/p&gt;

&lt;p&gt;Most plots are first presented in a basic version, using the defaults of the graphics package and get then refined using customized attributes (for labels, background color etc.).&lt;/p&gt;

&lt;h2&gt;
  
  
  Bar Plots
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Population by Region
&lt;/h3&gt;

&lt;p&gt;As in [SPJ02] we start with a simple bar chart, that shows population size (in 2019) by region. This is done using the following &lt;code&gt;@vlplot&lt;/code&gt;-command mapping data to aesthetics and using a bar-geometry as we learned in the introducing article about the Grammar of Graphics. Julia’s pipeline syntax is used (&lt;code&gt;|&amp;gt;&lt;/code&gt;) to specify the &lt;code&gt;regions_cum&lt;/code&gt; -DataFrame as being the input to &lt;code&gt;@vlplot&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="n"&gt;regions_cum&lt;/span&gt; &lt;span class="o"&gt;|&amp;gt;&lt;/span&gt;
    &lt;span class="nd"&gt;@vlplot&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;
         &lt;span class="n"&gt;width&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;600&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;height&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt;
         &lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;bar&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt;
         &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;Region&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;Pop2019&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;Region&lt;/span&gt;
     &lt;span class="x"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This results in the following bar chart:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/OY9ySgNB1Dl9RnGNWShHY-ba6_1Qi1VzU73-5C6oN84/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzLzJ6/ZDRpYzJxNHRia3Nr/cGs2czE1LnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/OY9ySgNB1Dl9RnGNWShHY-ba6_1Qi1VzU73-5C6oN84/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzLzJ6/ZDRpYzJxNHRia3Nr/cGs2czE1LnBuZw" alt="region by population - 1" width="880" height="454"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In a second version we don’t rely on defaults, but set axis labels, title and background color manually. Apart from that, we want the bar labels on the x-axis with a horizontal orientation for better readability. This leads to the following code, where &lt;code&gt;title&lt;/code&gt;-attributes are used for the labels as well as the diagram title, an &lt;code&gt;axis&lt;/code&gt;-attribute for changing the orientation of the bar labels and a &lt;code&gt;config&lt;/code&gt; for general attributes like background color.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;



&lt;p&gt;… creating the following beautified bar chart:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/Wa3193HvuEvyXiWyGjUlA4cDczlPR7EXsQIGl2WtNSg/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL3Q4/eGc1amRicndnZ3Fo/OW13YWVvLnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/Wa3193HvuEvyXiWyGjUlA4cDczlPR7EXsQIGl2WtNSg/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL3Q4/eGc1amRicndnZ3Fo/OW13YWVvLnBuZw" alt="region by population - 2" width="880" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Population by Subregion
&lt;/h3&gt;

&lt;p&gt;The next bar chart depicts population by subregion using the following &lt;code&gt;@vlplot&lt;/code&gt;-command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="n"&gt;subregions_cum&lt;/span&gt; &lt;span class="o"&gt;|&amp;gt;&lt;/span&gt;
    &lt;span class="nd"&gt;@vlplot&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;width&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;600&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;height&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt;
        &lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;bar&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;Subregion&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;Pop2019&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;Region&lt;/span&gt;
    &lt;span class="x"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;… resulting in the following bar chart:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/XnYD3K9vRwYx_AiZL5zF9ZU_AnpuBTy_Tj429Tvv7nM/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL2N4/eG1namlrbGJjN255/eHRpaWtnLnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/XnYD3K9vRwYx_AiZL5zF9ZU_AnpuBTy_Tj429Tvv7nM/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL2N4/eG1namlrbGJjN255/eHRpaWtnLnBuZw" alt="population by subregion - 1" width="880" height="550"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can see that there is room for improvement: As there are quite a few subregions and their names a relatively long, a horizontal bar diagram might be more readable. Apart from this, we adapt again labels, title, background color etc. leading to the following code, where we switch to a horizontal layout just by flipping the data attributes for the x- and the y-axis:&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;



&lt;p&gt;… resulting indeed in a more readable bar chart:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/NGV6y3PRy61U50_GiVAxMeWzUQcD5xkdLJlC02YRric/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL2cz/cTJrcWtxbGFpcjEy/em9jM2VoLnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/NGV6y3PRy61U50_GiVAxMeWzUQcD5xkdLJlC02YRric/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL2cz/cTJrcWtxbGFpcjEy/em9jM2VoLnBuZw" alt="population by subregion - 2" width="880" height="379"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It get’s even more readable, if we sort the subregions by population size before rendering the diagram. We could sort the &lt;code&gt;subregions_cum&lt;/code&gt;-DataFrame using Julia (as we did in the Gadfly-example), but VegaLite offers the possibility to sort the data in the graphics engine using the &lt;code&gt;sort&lt;/code&gt;-attribute.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;If we apply this code we finally get:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/PYLCGI-5KqHR9rvNiZu7awzjgG0nR3QV3TjEssSGOm4/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL2d6/a2FtYXU4YTQ4Mnhm/dzdvNWFiLnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/PYLCGI-5KqHR9rvNiZu7awzjgG0nR3QV3TjEssSGOm4/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL2d6/a2FtYXU4YTQ4Mnhm/dzdvNWFiLnBuZw" alt="population by subregion - 3" width="880" height="379"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A word of caution at this point: While it is possible to sort data within the graphics engine I wouldn’t recommend it with larger data sets, because it is considerably slower than doing it directly using Julia.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scatter Plots
&lt;/h2&gt;

&lt;p&gt;In the next step we have a look at the population at the country level in relation to the growth rate. A scatter plot is a good way to visualize this relationship. We get one, using a point geometry as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="n"&gt;countries&lt;/span&gt; &lt;span class="o"&gt;|&amp;gt;&lt;/span&gt;
    &lt;span class="nd"&gt;@vlplot&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;width&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;600&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;height&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt;
        &lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;point&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;Pop2019&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;PopChangePct&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;Region&lt;/span&gt;
    &lt;span class="x"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;… resulting in this scatter plot:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/-di-TvWYG_r0Xr6LV5Ya5wZRNOgFJwkdV2KTyUoYJi0/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL2p2/OGp1cDM5YXVvdjM4/NHFmdGh6LnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/-di-TvWYG_r0Xr6LV5Ya5wZRNOgFJwkdV2KTyUoYJi0/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL2p2/OGp1cDM5YXVvdjM4/NHFmdGh6LnBuZw" alt="population in relation to growth rate - 1" width="880" height="423"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As we also mapped the region to the color aesthetics, we get a more differentiated picture involving region information in addition.&lt;/p&gt;

&lt;p&gt;But the distribution of the data is quite skewed — most countries have a population below 200 Mio. So a logarithmic scale on the x-axis might give a better insight into the data. And again, we add some labels, background color etc. leading to the following code:&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;



&lt;p&gt;… giving us the following improved scatter plot:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/GH87SfWgEM6pXJJm6gY74g4YflINfbMRlKb7-du2Nek/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL3J6/MGcwYWgyZ2ZycmU0/b3dscnV3LnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/GH87SfWgEM6pXJJm6gY74g4YflINfbMRlKb7-du2Nek/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL3J6/MGcwYWgyZ2ZycmU0/b3dscnV3LnBuZw" alt="population in relation to growth rate - 2" width="880" height="443"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Histograms
&lt;/h2&gt;

&lt;p&gt;Bar plots and histograms have the same geometry (in the sense of the “Grammar of Graphics”). But in order to get categorical data on the x-axis, the data used for a histogram has to be mapped to (artificial) categories in a process called &lt;em&gt;binning&lt;/em&gt;. In the GoG this is done using a so-called bin statistic.&lt;/p&gt;

&lt;p&gt;VegaLite follows the GoG strictly. So we get a histogram that shows the distribution of GDP per capita among the different countries with the following &lt;code&gt;@vlplot&lt;/code&gt;-command using a bar geometry with the parameter &lt;code&gt;bin&lt;/code&gt; set to true:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="n"&gt;countries&lt;/span&gt; &lt;span class="o"&gt;|&amp;gt;&lt;/span&gt;
    &lt;span class="nd"&gt;@vlplot&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;width&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;600&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;height&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt;
        &lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;bar&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="x"&gt;{&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;GDPperCapita&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bin&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="x"&gt;},&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;“count&lt;/span&gt;&lt;span class="x"&gt;()&lt;/span&gt;&lt;span class="n"&gt;”&lt;/span&gt;
    &lt;span class="x"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;… resulting in this histogram:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/bjuBLW4twA3AsBJYpQXLUB8rhyGI_qSLKzXEU_Z3uhU/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL3R6/amVhZ29qOG82cmx5/aDM3MmZtLnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/bjuBLW4twA3AsBJYpQXLUB8rhyGI_qSLKzXEU_Z3uhU/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL3R6/amVhZ29qOG82cmx5/aDM3MmZtLnBuZw" alt="distribution of GDP per capita - 1" width="880" height="470"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A reasonable bin size has been chosen by default (which wasn’t the case with Gadfly).&lt;/p&gt;

&lt;p&gt;And again we can add labels etc. And in order to have exactly the same number of bins as in the Gadfly example, we set it explicitly to 20 using the following code:&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;



&lt;p&gt;… leading to the following improved histogram:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/q3UYbNXvH4PM1j8laIo17ktfIFvPWZZJwTscAoar4bM/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL3hj/azBvenRvNDF0cWdq/azIwNWRrLnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/q3UYbNXvH4PM1j8laIo17ktfIFvPWZZJwTscAoar4bM/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL3hj/azBvenRvNDF0cWdq/azIwNWRrLnBuZw" alt="distribution of GDP per capita - 2" width="880" height="482"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Box Plots and Violin Plots
&lt;/h2&gt;

&lt;p&gt;To obtain an insight into the distribution of some numerical data, box plots or violin plots are typically used. Each of these diagram types has its specific virtues. So let’s visualize the distribution of the GDP per capita for each region using these plots.&lt;/p&gt;

&lt;h3&gt;
  
  
  Box Plot
&lt;/h3&gt;

&lt;p&gt;Let’s immediately use the ‘beautified’ version based on a &lt;code&gt;boxplot&lt;/code&gt;-geometry:&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;… giving us the following box plot:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/30bzSGOVpuCbZP4FszLecT20KXVcVqKd_8i_L9UB1qM/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL3Bt/YzdkZnhsbDJtOWtx/YTVkOXFmLnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/30bzSGOVpuCbZP4FszLecT20KXVcVqKd_8i_L9UB1qM/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL3Bt/YzdkZnhsbDJtOWtx/YTVkOXFmLnBuZw" alt="distribution of GDP per capita by region - 1" width="880" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Violin Plot
&lt;/h3&gt;

&lt;p&gt;As VegaLite doesn’t support violin plots as a geometry on its own, they have to be constructed using &lt;em&gt;density plots&lt;/em&gt; (one for each region) which are lined up horizontally. This leads to the following, rather complicated specification:&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;The basic geometry used to create the density plots is an &lt;em&gt;area&lt;/em&gt; geometry. The data is then grouped by region and for each group the density is computed. This is done using a &lt;code&gt;transform&lt;/code&gt;-operation. Assigning the density to the x-axis results in vertical density plots. In the next step all five density plots are lined up horizontally using the &lt;code&gt;column&lt;/code&gt;-attribute.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;width&lt;/code&gt; and &lt;code&gt;spacing&lt;/code&gt; attributes in the last line define each column (i.e. each density plot) to have a width of 120 pixels horizontally and to leave no space between these plots.&lt;/p&gt;

&lt;p&gt;So we finally get the following violin plot:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/Vjjbm0HOyqdkGfh4McH0ymU9ImOMyJ7JU69uonME55c/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL3N5/bWw3YmUzcWh2aHdm/Mmdlcng5LnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/Vjjbm0HOyqdkGfh4McH0ymU9ImOMyJ7JU69uonME55c/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL3N5/bWw3YmUzcWh2aHdm/Mmdlcng5LnBuZw" alt="distribution of GDP per capita by region - 2" width="880" height="544"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Zooming in
&lt;/h2&gt;

&lt;p&gt;As in the Gadfly-examples we note, that the really interesting part of the distributions lies in the range from 0 to 100,000$. Therefore we want to restrict the plot to that range on the y-axis, doing sort of a zoom-in.&lt;/p&gt;

&lt;p&gt;In the Gadfly example we restricted the values on the y-axis to this range to achieve the desired effect. Such a restriction can also be specified in VegaLite using &lt;code&gt;scale = {domain = [0, 100000]}&lt;/code&gt;. Unfortunately this doesn’t give us the result we want: The diagram will be plotted in this range but the plots themselves still use the whole range up to 200,000$, thus getting partly plotted outside the diagram:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/ghhRxYVWmLNoLYFUc20--vdeMSUTTNj1z6T0x4NTDXM/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL3E5/YW0xZWJ5N2djNXJz/Y2w3d2hzLnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/ghhRxYVWmLNoLYFUc20--vdeMSUTTNj1z6T0x4NTDXM/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL3E5/YW0xZWJ5N2djNXJz/Y2w3d2hzLnBuZw" alt="distribution of GDP per capita by region - 3" width="880" height="746"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The only way to get a roughly similar result in VegaLite would be to restrict the &lt;em&gt;data&lt;/em&gt; to values in that range up to 100,000$ using a &lt;code&gt;filter&lt;/code&gt; expression. But be aware: this is conceptually something different, giving us not exactly the same plots as if we would do it on the whole dataset. So we don’t have a real solution for this visualization.&lt;/p&gt;

&lt;p&gt;This may be just a problem of the VegaLite documentation, where I couldn’t find any other solution (or my fault for not doing enough research and e.g. using also the extensive documentation of Vega-Lite).&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusions
&lt;/h2&gt;

&lt;p&gt;As we can see, VegaLite follows most of the time quite closely the concepts of the Grammar of Graphics (even more closely than Gadfly does). That’s one of the reasons why the plot specifications are so consistent (same things are always specified in the same way independent of context) und thus easy to learn and to memorize.&lt;/p&gt;

&lt;p&gt;But as we can see with the violin plot, if things are not predefined, the specifications can become quite complex. Together with the rather non-Julian syntax which needs some time to learn and to get used to, I wouldn’t recommend VegaLite to occasional users. It needs some learning and training. But if you invest that time and effort, you get a really powerful (and interactive) visualization tool.&lt;/p&gt;

&lt;p&gt;An interesting add-on to VegaLite, which I would like to mention, is the interactive data explorer &lt;em&gt;Voyager&lt;/em&gt; (see: &lt;a href="https://github.com/queryverse/DataVoyager.jl"&gt;DataVoyager.jl&lt;/a&gt;). It’s an application that allows to load data and create a variety of visualizations without any programming.&lt;/p&gt;

&lt;p&gt;If you want to try out the examples from above by yourself you can get a &lt;a href="https://github.com/roland-KA/StatisticalPlotsWithJulia/blob/main/notebooks/DV-Basics-VegaLite.jl"&gt;Pluto notebook&lt;/a&gt; which is sort of an executable variant of this article from my GitHub repository.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Statistical Plotting with Julia: Gadfly.jl</title>
      <dc:creator>Roland Schätzle</dc:creator>
      <pubDate>Mon, 05 Dec 2022 15:58:09 +0000</pubDate>
      <link>https://forem.julialang.org/rolandka/statistical-plotting-with-julia-gadflyjl-2lmo</link>
      <guid>https://forem.julialang.org/rolandka/statistical-plotting-with-julia-gadflyjl-2lmo</guid>
      <description>&lt;p&gt;&lt;small&gt; This article appeared in &lt;a href="https://medium.com/towards-data-science/statistical-plotting-with-julia-gadfly-jl-39582f91d7cc"&gt;Towards Data Science&lt;/a&gt; on Aug 25th, 2022 &lt;/small&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;How to create statistical plots using the Gadfly.jl package&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the first of several articles where I compare different Julia graphics packages for creating statistical plots. I start the series here with the &lt;a href="http://gadflyjl.org/stable/"&gt;Gadfly-package&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In the introduction to the series (&lt;a href="https://forem.julialang.org/rolandka/the-grammar-of-graphics-or-how-to-do-ggplot-style-plotting-in-julia-1fkp"&gt;The Grammar of Graphics or how to do ggplot-style plotting in Julia&lt;/a&gt;), I’ve explained the Grammar of Graphics (GoG) which is the conceptual base for these graphics packages. In that article I’ve also introduced the data which will be used for the plotting examples.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gadfly
&lt;/h2&gt;

&lt;p&gt;Gadfly is a very complete implementation of the Grammar of Graphics. Its original author is &lt;a href="https://github.com/dcjones"&gt;Daniel C. Jones&lt;/a&gt;, but the package has currently more than 100 contributors listed on GitHub. The first versions appeared in 2014. In the meantime it is a very mature package with only a few new releases per year.&lt;/p&gt;

&lt;p&gt;It’s completely written in Julia and plays well with rest of the Julia ecosystem. There is e.g. a tight integration with &lt;em&gt;DataFrames.jl&lt;/em&gt; and via the &lt;em&gt;IJulia&lt;/em&gt; package it can be directly used within Jupyter notebooks.&lt;/p&gt;

&lt;p&gt;For the rendering of publication quality graphics it’s able to render SVG out of the box and using &lt;em&gt;Cairo.jl&lt;/em&gt; and &lt;em&gt;Fontconfig.jl&lt;/em&gt; it can also produce formats like PNG, PDF, PS and PGF.&lt;/p&gt;

&lt;p&gt;The plots produced by Gadfly offer some interactivity like panning, zooming and toggling.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example Plots
&lt;/h2&gt;

&lt;p&gt;For the comparison I will use a few diagram types (or &lt;em&gt;geometries&lt;/em&gt; as they are called by the GoG) which are commonly used in data science, namely:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;bar plots&lt;/li&gt;
&lt;li&gt;scatter plots&lt;/li&gt;
&lt;li&gt;histograms&lt;/li&gt;
&lt;li&gt;box plots&lt;/li&gt;
&lt;li&gt;violin plots&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Gadfly offers of course many types more as you can see in this &lt;a href="http://gadflyjl.org/stable/gallery/geometries/"&gt;gallery&lt;/a&gt;. But in order to obtain a 1:1-comparison between all packages, I stuck with the types listed above.&lt;/p&gt;

&lt;p&gt;The data for the examples is assumed to be ready in the DataFrames structures &lt;code&gt;countries&lt;/code&gt;, &lt;code&gt;subregions_cum&lt;/code&gt; and &lt;code&gt;regions_cum&lt;/code&gt; presented in the introducing article to the series.&lt;/p&gt;

&lt;p&gt;Most plots are first presented in a basic version, using the defaults of the graphics package and get then refined using customized attributes (for labels, background color etc.).&lt;/p&gt;

&lt;h2&gt;
  
  
  Bar Plots
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Population by Region
&lt;/h3&gt;

&lt;p&gt;We start with a simple bar chart, that shows population size (in 2019) by region. This is done using the following &lt;code&gt;plot&lt;/code&gt;-command mapping data to aesthetics and using a bar-geometry as we learned in the introducing article about the Grammar of Graphics:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="n"&gt;plot&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;regions_cum&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;Region&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;Pop2019&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;Region&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;Geom&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bar&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;… resulting in the following bar chart:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/46YV4HrH3Rhl9rk_29Mx4SvTjWGW1z62zxgbj9lp9ss/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL25w/eGN1ZjhyZHJvMjgy/dm81eGY2LnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/46YV4HrH3Rhl9rk_29Mx4SvTjWGW1z62zxgbj9lp9ss/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL25w/eGN1ZjhyZHJvMjgy/dm81eGY2LnBuZw" alt="region by population - 1" width="880" height="475"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In a second version we don’t rely on defaults, but set axis labels, title and background color manually. Apart from that we don’t want the numbers on the y-axis in scientific format and there should be some space between the bars (to conform to the definition of a bar chart). This leads to the following code, where &lt;code&gt;Guide&lt;/code&gt;-elements are used for the labels, a &lt;code&gt;Scale&lt;/code&gt; for changing the numbers on the y-axis and a &lt;code&gt;Theme&lt;/code&gt; for general attributes like background color or bar spacing.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;



&lt;p&gt;… creating the following beautified bar chart:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/b2_Nkki9fl70tSUepN-G-6bgaJc0b2GTqrJ9k_jMH1U/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzLzgw/ejM5bnpjbmhzNTJq/OG82b2Q0LnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/b2_Nkki9fl70tSUepN-G-6bgaJc0b2GTqrJ9k_jMH1U/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzLzgw/ejM5bnpjbmhzNTJq/OG82b2Q0LnBuZw" alt="population by region - 2" width="880" height="475"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Population by Subregion
&lt;/h3&gt;

&lt;p&gt;The next bar chart depicts population by subregion using the following &lt;code&gt;plot&lt;/code&gt;-command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="n"&gt;plot&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;subregions_cum&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;Subregion&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;Pop2019&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;Region&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;Geom&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bar&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;… resulting in the following bar chart:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/wPjTEl7HgWQalrJxMnImX0OkI066j7YldVmLI46dFSU/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL2p4/ZDM2Y211aXJkZmp5/Njdtbjl1LnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/wPjTEl7HgWQalrJxMnImX0OkI066j7YldVmLI46dFSU/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL2p4/ZDM2Y211aXJkZmp5/Njdtbjl1LnBuZw" alt="population by subregion - 1" width="880" height="475"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can see that there is room for improvement: As there are quite a few subregions and their names a relatively long, a horizontal bar diagram might be more readable. Apart from this we adapt again labels, title, background color etc. leading to the following code, where we switch to a horizontal layout using the parameter &lt;code&gt;orientation&lt;/code&gt; on the bar geometry:&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;



&lt;p&gt;… resulting indeed in a more readable bar chart:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/MSB36Ul0BhgKeNC0EeZDyOubO2jTdOmXM6toTDrNWSI/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL3Fk/czhvNjhkaDZyMjRl/dTJjbTYzLnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/MSB36Ul0BhgKeNC0EeZDyOubO2jTdOmXM6toTDrNWSI/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL3Fk/czhvNjhkaDZyMjRl/dTJjbTYzLnBuZw" alt="population by subregion - 2" width="880" height="475"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It gets even more readable, if we sort the subregions &lt;code&gt;subregions_cum&lt;/code&gt; by population size (&lt;code&gt;Pop2019&lt;/code&gt;) before rendering the diagram using the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="n"&gt;subregions_cum_sorted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sort&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;subregions_cum&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;Pop2019&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;If we apply the plot command from above to the sorted data &lt;code&gt;subregions_cum_sorted&lt;/code&gt; we finally get:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/El9c82ML1245L13ShXvSxx8a2IYqL_57QSphFhQoXh0/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzLzMz/Z251N2czNmgxdXI5/ZTI2cnIzLnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/El9c82ML1245L13ShXvSxx8a2IYqL_57QSphFhQoXh0/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzLzMz/Z251N2czNmgxdXI5/ZTI2cnIzLnBuZw" alt="population by subregion - 3" width="880" height="475"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Scatter Plots
&lt;/h2&gt;

&lt;p&gt;In the next step we have a look at the population at the country level in relation to the growth rate. A scatter plot is good way to visualize this relationship. We get one using a point geometry as follows:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="n"&gt;plot&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;countries&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;Pop2019&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;PopChangePct&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;Region&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Geom&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;point&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;… resulting in this scatter plot:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/k3gqlQOwncs85q0V9PZ5k1m1bgEBJX2eXLhwXIYD4oU/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL2hi/aWYyejFlcGpyd2M3/OW5uYXV3LnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/k3gqlQOwncs85q0V9PZ5k1m1bgEBJX2eXLhwXIYD4oU/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL2hi/aWYyejFlcGpyd2M3/OW5uYXV3LnBuZw" alt="Population in relation to growth rate - 1" width="880" height="475"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As we also mapped the region to the color aesthetics, we get a more differentiated picture involving region information in addition.&lt;/p&gt;

&lt;p&gt;But the distribution of the data is quite skewed — most countries have a population below 200 Mio. So a logarithmic scale on the x-axis might give a better insight into the data. And again, we add some labels, background color etc. leading to the following code:&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;



&lt;p&gt;… giving us the following improved scatter plot:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/pP6fYGnApByJRT0SjGoUvxXnhrTtYbwq7wfJINH6_Ek/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL2c4/NDQ1dHEybjk1aGV3/ZTd5dTExLnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/pP6fYGnApByJRT0SjGoUvxXnhrTtYbwq7wfJINH6_Ek/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL2c4/NDQ1dHEybjk1aGV3/ZTd5dTExLnBuZw" alt="Population in relation to growth rate - 2" width="880" height="475"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;labels&lt;/code&gt;-parameter for the log scale needs a bit of an explanation: Without this specification we would get the logarithms (to base 10) on the x-axis, which is for many people hard to understand. Instead we want just population numbers (e.g. 100.0 instead of 2). So we pass a function to &lt;code&gt;labels&lt;/code&gt; which calculates the ‘correct’ labels. The log value &lt;code&gt;x&lt;/code&gt; is converted to 

&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;10x10^x &lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;1&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;0&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;x&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 to get a ‘readable’ number, then rounded to two digits and finally converted to a string (which is the expected type for a label).&lt;/p&gt;
&lt;h2&gt;
  
  
  Histograms
&lt;/h2&gt;

&lt;p&gt;Bar plots and histograms have the same geometry (in the sense of the “Grammar of Graphics”). But in order to get categorical data on the x-axis the data used for a histogram has to be mapped to (artificial) categories in a process called ‘binning’. In the GoG this is done using a so-called &lt;em&gt;bin statistic&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Gadfly doesn’t follow (or at least doesn’t show) the theory in this place. It introduces instead a separate geometry for histograms (which might be more practical for everyday use).&lt;/p&gt;

&lt;p&gt;So we get a histogram that shows the distribution of GDP per capita among the different countries with the following &lt;code&gt;plot&lt;/code&gt;-command using a histogram geometry:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="n"&gt;plot&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;countries&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;GDPperCapita&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Geom&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;histogram&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;… resulting in this histogram:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/0vRYWmDIlm22VjOVUQNn3GU7DTXsYjv9IiPCBYEPeCs/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL25k/Zm9jc3F3OGVmYnJr/NjJsOXlpLnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/0vRYWmDIlm22VjOVUQNn3GU7DTXsYjv9IiPCBYEPeCs/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL25k/Zm9jc3F3OGVmYnJr/NjJsOXlpLnBuZw" alt="distribution of GDP per capita - 1" width="880" height="475"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The number of bins used can be controlled by the &lt;code&gt;bincount&lt;/code&gt;-parameter of the histogram geometry. And again we can add labels etc. resulting in the following code:&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;



&lt;p&gt;… leading to the following improved histogram:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/JfUZMoIXFrLAHi2I29vXOnRlYxtsShzmwZvF1EgO8rg/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL3lv/NDM1dnV3ZzlqcTQ2/dmRpZXVsLnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/JfUZMoIXFrLAHi2I29vXOnRlYxtsShzmwZvF1EgO8rg/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL3lv/NDM1dnV3ZzlqcTQ2/dmRpZXVsLnBuZw" alt="distribution of GDP per capita - 2" width="880" height="475"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Box Plots and Violin Plots
&lt;/h2&gt;

&lt;p&gt;To obtain an insight into the distribution of some numerical data, box plots or violin plots are typically used. Each of these diagram types has its specific virtues. So let’s visualize the distribution of the GDP per capita for each region using these plots.&lt;/p&gt;

&lt;h3&gt;
  
  
  Box Plot
&lt;/h3&gt;

&lt;p&gt;Let’s immediately use the ‘beautified’ version using a &lt;code&gt;boxplot&lt;/code&gt;-geometry:&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;… giving us the following box plot:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/aOtnM3TRttRG8xgA6BCQi3oJzBrhEThcFIAVRPe7ZZk/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL3Jl/ZTRxa3Uwenczdjlr/YzZ3d3ZxLnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/aOtnM3TRttRG8xgA6BCQi3oJzBrhEThcFIAVRPe7ZZk/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL3Jl/ZTRxa3Uwenczdjlr/YzZ3d3ZxLnBuZw" alt="distribution of GDP per capita by region - 1" width="880" height="474"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Violin Plot
&lt;/h3&gt;

&lt;p&gt;The code for a violin plot for this visualization looks quite similar. The only difference being the use of a &lt;code&gt;violin&lt;/code&gt;-geometry (instead of a &lt;code&gt;boxplot&lt;/code&gt;):&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;… leading to the following violin plot:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/DkdPo35mlIP-GN610yJM2CZ_J89tDYPo7p0G8gAb-bE/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzLzBl/NGlxYmpoeWdjMTJq/YzhyZXdpLnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/DkdPo35mlIP-GN610yJM2CZ_J89tDYPo7p0G8gAb-bE/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzLzBl/NGlxYmpoeWdjMTJq/YzhyZXdpLnBuZw" alt="distribution of GDP per capita by region - 2" width="880" height="474"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here we note that the defaults for the scaling of the y-axis don’t work as good as with the box plot. Apart from that, the really interesting part of the distribution lies in the range from 0 to 100,000. Therefore we want to restrict the plot to that range on the y-axis, doing sort of a zoom-in.&lt;/p&gt;

&lt;h3&gt;
  
  
  Zooming in
&lt;/h3&gt;

&lt;p&gt;This can easily be achieved by adding the following line to the list of &lt;code&gt;plot&lt;/code&gt;-parameters:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="n"&gt;Coord&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cartesian&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ymin&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ymax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100000&lt;/span&gt;&lt;span class="x"&gt;),&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;… leading to the following violin diagram:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/JG8Tzk4JyNM1QNwTEO2U6NhGOvTfux2H8GbpimES3cI/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL2Jt/d2U2eDVtOG15YTFp/NnNoZ3cwLnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/JG8Tzk4JyNM1QNwTEO2U6NhGOvTfux2H8GbpimES3cI/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL2Jt/d2U2eDVtOG15YTFp/NnNoZ3cwLnBuZw" alt="distribution of GDP per capita by region - 3" width="880" height="474"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The same restriction to the y-axis can be applied to the box plot:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/uDuWjRVSam2ySRqw5r8V5faIIGO6eSn23_sbSgVt3sU/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL3o4/YWkwaWU3aGtxMmtj/cmgyem81LnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/uDuWjRVSam2ySRqw5r8V5faIIGO6eSn23_sbSgVt3sU/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL3o4/YWkwaWU3aGtxMmtj/cmgyem81LnBuZw" alt="distribution of GDP per capita by region - 4" width="880" height="474"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusions
&lt;/h2&gt;

&lt;p&gt;As we can see, Gadfly follows most of the time quite closely the concepts of the Grammar of Graphics. That’s one of the reasons why the plot specifications are so consistent (same things are always specified in the same way independent of context) und thus easy to learn and to memorize.&lt;/p&gt;

&lt;p&gt;You reach only some limits when it comes to edge cases. E.g. if you specify a scatter plot where there is only a mapping to the x-axis but not to the y-axis. According to the GoG you should get points distributed on a line (the x-axis). That doesn’t work with Gadfly. And there is e.g. no polar coordinate system implemented (but could be done in the future).&lt;/p&gt;

&lt;p&gt;But if your visualization needs are centered around the (large) list of geometries which are implemented in Gadfly and you don’t need rather exotic customizations of these diagrams then you will be quite happy with Gadfly.&lt;/p&gt;

&lt;p&gt;If you want to try out the examples by yourself you can get a &lt;a href="https://github.com/roland-KA/StatisticalPlotsWithJulia/blob/main/notebooks/DV-Basics-Gadfly.jl"&gt;Pluto notebook&lt;/a&gt; which is sort of an executable variant of this article from my GitHub repository.&lt;/p&gt;

</description>
      <category>graphics</category>
      <category>comparison</category>
      <category>ggplot</category>
    </item>
    <item>
      <title>The Grammar of Graphics or how to do ggplot-style plotting in Julia</title>
      <dc:creator>Roland Schätzle</dc:creator>
      <pubDate>Thu, 01 Dec 2022 18:01:30 +0000</pubDate>
      <link>https://forem.julialang.org/rolandka/the-grammar-of-graphics-or-how-to-do-ggplot-style-plotting-in-julia-1fkp</link>
      <guid>https://forem.julialang.org/rolandka/the-grammar-of-graphics-or-how-to-do-ggplot-style-plotting-in-julia-1fkp</guid>
      <description>&lt;p&gt;&lt;small&gt; This article appeared in &lt;a href="https://towardsdatascience.com/the-grammar-of-graphics-or-how-to-do-ggplot-style-plotting-in-julia-1b0ac2162c82"&gt;Towards Data Science&lt;/a&gt; on Aug 12th, 2022 &lt;/small&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Introduction to a comparison of Julia graphics packages for statistical plotting&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The &lt;em&gt;Grammar of Graphics (GoG)&lt;/em&gt; is a concept that has been developed by Leland Wilkinson (&lt;a href="https://books.google.de/books/about/The_Grammar_of_Graphics.html?id=_kRX4LoFfGQC&amp;amp;redir_esc=y"&gt;The Grammar of Graphics, Springer, 1999&lt;/a&gt;) and refined by Hadley Wickham (&lt;a href="https://vita.had.co.nz/papers/layered-grammar.html"&gt;A Layered Grammar of Graphics, &lt;em&gt;Journal of Computational and Graphical Statistics&lt;/em&gt;, vol. 19, no. 1, pp. 3–28, 2010&lt;/a&gt;; &lt;a href="https://byrneslab.net/classes/biol607/readings/wickham_layered-grammar.pdf"&gt;pdf&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Its main idea is that every statistical plot can be created by a combination of a few basic building blocks (or mechanisms). This allows &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a simple and concise definition of a visualization&lt;/li&gt;
&lt;li&gt;an easy adaptation of a visualization by exchanging only the building blocks which are affected in a modular way&lt;/li&gt;
&lt;li&gt;reusable specifications (the same visualization can e.g. be applied to different data)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Wickham showed that this concept is not only a nice theory. He implemented it in the R-package &lt;code&gt;ggplot2&lt;/code&gt; which became quite popular. Several GoG-implementations are also available for the Julia programming language.&lt;/p&gt;

&lt;p&gt;In this article I will first explain the basic concepts and ideas of the Grammar of Graphics. In follow-up articles I will then present the following four Julia graphics packages which are based (completely or partially) on the GoG:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="http://gadflyjl.org/stable/"&gt;Gadfly.jl&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.queryverse.org/VegaLite.jl/stable/"&gt;VegaLite.jl&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.juliaplots.org/stable/"&gt;Plots.jl&lt;/a&gt; (with &lt;a href="https://github.com/JuliaPlots/StatsPlots.jl"&gt;StatsPlots.jl&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://makie.juliaplots.org/stable/"&gt;Makie.jl&lt;/a&gt; and &lt;a href="http://juliaplots.org/AlgebraOfGraphics.jl/stable/"&gt;AlgebraOfGraphics.jl&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In order to allow you a 1:1-comparison of these Julia packages, I will use the same example plots and the same underlying data for each article. In the second part of this article, I will present the data used for the examples, so I don’t have to repeat that in each of the follow-up articles.&lt;/p&gt;

&lt;h2&gt;
  
  
  Grammar of Graphics
&lt;/h2&gt;

&lt;p&gt;In the next sections I will explain the basic ideas of “The Grammar of Graphics” by Wilkinson as well as “A Layered Grammar of Graphics” by Wickham. I won’t go into every detail and in aspects where both concepts differ, I will deliberately pick one and give a rather “unified” view.&lt;/p&gt;

&lt;p&gt;For the code examples, I’m using Julia’s Gadfly-package (vers. 1.3.4 &amp;amp; Julia 1.7.3).&lt;/p&gt;

&lt;h3&gt;
  
  
  The main ingredients
&lt;/h3&gt;

&lt;p&gt;The main building blocks for a visualization are&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;data&lt;/li&gt;
&lt;li&gt;aesthetics&lt;/li&gt;
&lt;li&gt;geometry&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Data
&lt;/h4&gt;

&lt;p&gt;The most familiar of these three concepts is probably &lt;em&gt;data&lt;/em&gt;. We assume here, that data comes in tabular form (like a database table). For a visualization it’s important to distinguish between &lt;em&gt;numerical&lt;/em&gt; and &lt;em&gt;categorical&lt;/em&gt; data.&lt;/p&gt;

&lt;p&gt;Here we have e.g. the inventory list of a fruit dealer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Row │ quantity  fruit   price 
────────────────────────────────
 1  │    3     apples   2.5
 2  │   20     oranges  3.9
 3  │    8     bananas  1.9

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;It consists of the three variables &lt;code&gt;quantity&lt;/code&gt;, &lt;code&gt;fruit&lt;/code&gt; and &lt;code&gt;price&lt;/code&gt;. &lt;code&gt;fruit&lt;/code&gt; is a &lt;em&gt;categorical&lt;/em&gt; variable whereas the other two variables are &lt;em&gt;numerical&lt;/em&gt;.&lt;/p&gt;
&lt;h4&gt;
  
  
  Aesthetics
&lt;/h4&gt;

&lt;p&gt;To visualize a data variable, it is mapped to one or more &lt;em&gt;aesthetics&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Numerical variables can be mapped e.g. to a&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;position&lt;/em&gt; on the x-, y- or z-axis&lt;/li&gt;
&lt;li&gt;&lt;em&gt;size&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Categorical variables can be mapped e.g. to a&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;color&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;shape&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;texture&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Geometry
&lt;/h4&gt;

&lt;p&gt;Apart from data variables and aesthetics we need at least a &lt;em&gt;geometry&lt;/em&gt; to specify a complete visualization. The geometry tells us basically which type of diagram we want. Some examples are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;line&lt;/em&gt; (= line diagram)&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;point&lt;/em&gt; (= scatter plot)&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;bar&lt;/em&gt; (= bar plot)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Basic examples
&lt;/h3&gt;

&lt;p&gt;Now we have enough information to build our first visualizations based on the Grammar of Graphics. For the code examples using the Gadfly-package we assume, that the inventory table above is in a variable named &lt;code&gt;inventory&lt;/code&gt; of type &lt;code&gt;DataFrame&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;First we want to see how the quantities are distributed by price. Depending on the geometry chosen, we get either a scatter plot or a line diagram:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Map &lt;strong&gt;price&lt;/strong&gt; to the &lt;strong&gt;x-axis&lt;/strong&gt;, &lt;strong&gt;quantity&lt;/strong&gt; to the &lt;strong&gt;y-axis&lt;/strong&gt; using a &lt;strong&gt;point geometry&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In Gadfly: &lt;code&gt;plot(inventory, x = :price, y = :quantity, Geom.point)&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/b4ZrF9WNAnthsnPZi8e9PfoiXvo8vLwOt3MM_lwqTgU/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL28w/N2czMDA0c21ydGFk/OXQycHN0LnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/b4ZrF9WNAnthsnPZi8e9PfoiXvo8vLwOt3MM_lwqTgU/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL28w/N2czMDA0c21ydGFk/OXQycHN0LnBuZw" alt="scatter plot" width="416" height="340"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Map &lt;strong&gt;price&lt;/strong&gt; to the &lt;strong&gt;x-axis&lt;/strong&gt;, &lt;strong&gt;quantity&lt;/strong&gt; to the &lt;strong&gt;y-axis&lt;/strong&gt; using a &lt;strong&gt;line geometry&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In Gadfly: &lt;code&gt;plot(inventory, x = :price, y = :quantity, Geom.line)&lt;/code&gt;`&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/j6luyU6HxlF4fplz7t8HfKXSjTttGLYWk6IrsBnC-FQ/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL3Zo/eHI2aW0zcHcyNTFz/NWF2a2t3LnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/j6luyU6HxlF4fplz7t8HfKXSjTttGLYWk6IrsBnC-FQ/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL3Zo/eHI2aW0zcHcyNTFz/NWF2a2t3LnBuZw" alt="line diagram" width="416" height="340"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the next step we want additionally see, which fruits are involved. So we have to map &lt;code&gt;fruit&lt;/code&gt; to a suitable aesthetic too. In the following two examples first a &lt;em&gt;shape&lt;/em&gt; is used and then a &lt;em&gt;color&lt;/em&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Map &lt;strong&gt;price&lt;/strong&gt; to the &lt;strong&gt;x-axis&lt;/strong&gt;, &lt;strong&gt;quantity&lt;/strong&gt; to the &lt;strong&gt;y-axis&lt;/strong&gt;, &lt;strong&gt;fruit&lt;/strong&gt; to a &lt;strong&gt;shape&lt;/strong&gt; using a &lt;strong&gt;point geometry&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In Gadfly: &lt;code&gt;plot(inventory, x = :price, y = :quantity, shape = :fruit, Geom.point)&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/8hvRPDn9yEdQFAhLArr91UKxWSBd-kD6kP-GnTGjclU/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL2hv/MG8yY2pvbTFsOXV5/dndoMzZuLnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/8hvRPDn9yEdQFAhLArr91UKxWSBd-kD6kP-GnTGjclU/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL2hv/MG8yY2pvbTFsOXV5/dndoMzZuLnBuZw" alt="fruit mapped to shape" width="416" height="340"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Map &lt;strong&gt;price&lt;/strong&gt; to the &lt;strong&gt;x-axis&lt;/strong&gt;, &lt;strong&gt;quantity&lt;/strong&gt; to the &lt;strong&gt;y-axis&lt;/strong&gt;, &lt;strong&gt;fruit&lt;/strong&gt; to a &lt;strong&gt;color&lt;/strong&gt; using a &lt;strong&gt;point geometry&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In Gadfly: &lt;code&gt;plot(inventory, x = :price, y = :quantity, color = :fruit, Geom.point)&lt;/code&gt;`&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/Pn0oLaSXlHSzJINjCJeGnLMb_Lz71QlnTuDEoo_z6Fk/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL2Zs/cjJ2cWRheWNkNnRy/aWt5MXN2LnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/Pn0oLaSXlHSzJINjCJeGnLMb_Lz71QlnTuDEoo_z6Fk/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL2Zs/cjJ2cWRheWNkNnRy/aWt5MXN2LnBuZw" alt="fruit mapped to color" width="416" height="340"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It is also possible to map &lt;em&gt;one&lt;/em&gt; variable to &lt;em&gt;several&lt;/em&gt; aesthetics. We can e.g. map &lt;code&gt;fruit&lt;/code&gt; to &lt;em&gt;shape&lt;/em&gt; as well as &lt;em&gt;color&lt;/em&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Map &lt;strong&gt;price&lt;/strong&gt; to the &lt;strong&gt;x-axis&lt;/strong&gt;, &lt;strong&gt;quantity&lt;/strong&gt; to the &lt;strong&gt;y-axis&lt;/strong&gt;, &lt;strong&gt;fruit&lt;/strong&gt; to a &lt;strong&gt;shape&lt;/strong&gt;, &lt;strong&gt;fruit&lt;/strong&gt; to a &lt;strong&gt;color&lt;/strong&gt;, using a &lt;strong&gt;point geometry&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In Gadfly: &lt;code&gt;plot(inventory, x = :price, y = :quantity,&lt;br&gt;
shape = :fruit, color = :fruit, Geom.point)&lt;/code&gt;`&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/d-Z0GdFo6dicTM0haaP2cCjKAYgQZpwW0fluHOAa9E0/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzLzZ1/eHE2dTMyMGZoaGVj/aWZ5MGxhLnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/d-Z0GdFo6dicTM0haaP2cCjKAYgQZpwW0fluHOAa9E0/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzLzZ1/eHE2dTMyMGZoaGVj/aWZ5MGxhLnBuZw" alt="fruit mapped to shape and color" width="416" height="340"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Using a &lt;em&gt;bar geometry&lt;/em&gt; we can plot a statistics of the quantities in stock. Here we map a categorical variable (&lt;em&gt;fruit&lt;/em&gt;) to positions on the x-axis.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Map &lt;strong&gt;fruit&lt;/strong&gt; to the &lt;strong&gt;x-axis&lt;/strong&gt;, &lt;strong&gt;quantity&lt;/strong&gt; to the &lt;strong&gt;y-axis&lt;/strong&gt; using a &lt;strong&gt;bar geometry&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In Gadfly: &lt;code&gt;plot(inventory, x = :fruit, y = :quantity, Geom.bar)&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/jVf8_rzUvozaLGdp96l-9v8beq9XSKRr7lPv5HlQg38/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL2tz/MzZuZTQycm1naXFp/bDZpMHgzLnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/jVf8_rzUvozaLGdp96l-9v8beq9XSKRr7lPv5HlQg38/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL2tz/MzZuZTQycm1naXFp/bDZpMHgzLnBuZw" alt="bar geometry and fruit mapped to color" width="416" height="340"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These basic examples show nicely how a visualization can be specified using a few simple building blocks, thus making up a powerful visualization language.&lt;/p&gt;

&lt;p&gt;They show also that these specifications enable a graphics package to derive meaningful defaults for a variety of aspects of a visualization which aren’t given explicitly.&lt;/p&gt;

&lt;p&gt;All the examples had&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;meaningful &lt;em&gt;scales&lt;/em&gt; for the x- and y-axis (typically using a slightly larger interval than that of the data variable given)&lt;/li&gt;
&lt;li&gt;together with appropriate &lt;em&gt;ticks&lt;/em&gt; and &lt;em&gt;axis&lt;/em&gt; labeling&lt;/li&gt;
&lt;li&gt;as well as a descriptive &lt;em&gt;label&lt;/em&gt; (simply using the variable name)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Some examples even had an automatically generated &lt;em&gt;legend&lt;/em&gt;. This is possible because a legend is simply the inverse function of a data mapping to an aesthetic. If we e.g. map the variable &lt;em&gt;fruit&lt;/em&gt; to a &lt;em&gt;color&lt;/em&gt;, then the corresponding legend is the reverse mapping from &lt;em&gt;color&lt;/em&gt; to &lt;em&gt;fruit&lt;/em&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  More ingredients
&lt;/h3&gt;

&lt;p&gt;To be honest, we need a few more elements than just data, &lt;em&gt;aesthetics&lt;/em&gt; and a &lt;em&gt;geometry&lt;/em&gt; for a complete visualization.&lt;/p&gt;
&lt;h4&gt;
  
  
  Scale
&lt;/h4&gt;

&lt;p&gt;In order to map numerical variables e.g. to positional aesthetics (like the positions on the x- or y-axis), we need also a &lt;em&gt;scale&lt;/em&gt; which maps the data units to physical units (e.g. of the screen, a window or a web page).&lt;/p&gt;

&lt;p&gt;In the examples above, a &lt;em&gt;linear scale&lt;/em&gt; was used by default. But we could also exchange it e.g. with a &lt;em&gt;logarithmic scale&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;It’s also possible to map a numerical variable to a color. Then a &lt;em&gt;continuous color scale&lt;/em&gt; is used for that mapping as we can see in the following example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Map &lt;strong&gt;price&lt;/strong&gt; to the &lt;strong&gt;x-axis&lt;/strong&gt;, &lt;strong&gt;quantity&lt;/strong&gt; to the &lt;strong&gt;y-axis&lt;/strong&gt;, &lt;strong&gt;quantity&lt;/strong&gt; to a &lt;strong&gt;color&lt;/strong&gt; using a &lt;strong&gt;point geometry&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In Gadfly: &lt;code&gt;plot(inventory, x = :price, y = :quantity, color = :quantity, Geom.point)&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/OpXXwJFW6JWRwTkjuboeIjpn63kwElpxg2W3Exkwv9Y/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL3cz/c2J2d3k0N3FjcGJ6/MnliZ3ZzLnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/OpXXwJFW6JWRwTkjuboeIjpn63kwElpxg2W3Exkwv9Y/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL3cz/c2J2d3k0N3FjcGJ6/MnliZ3ZzLnBuZw" alt="using a color scale" width="416" height="340"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  Coordinate system
&lt;/h4&gt;

&lt;p&gt;Closely related to a scale is the concept of a &lt;em&gt;coordinate system&lt;/em&gt;, which defines how positional values are mapped onto the plotting plane. In the examples above, the &lt;em&gt;Cartesian coordinate system&lt;/em&gt; has been used by default. Other possibilities are &lt;em&gt;polar&lt;/em&gt; or &lt;em&gt;barycentric&lt;/em&gt; coordinate systems or the various systems which are used for map projections.&lt;/p&gt;

&lt;p&gt;It is an interesting aspect that we can produce different types of diagrams from the same data and aesthetics mappings, just by changing the coordinate system: E.g. a bar plot is based on the Cartesian coordinate system. If we replace that with a polar system, we get a Coxcomb chart, as the following example from &lt;a href="https://r4ds.had.co.nz/index.html"&gt;R for Data Science&lt;/a&gt; (by Hadley Wickham and Garret Grolemund, O’Reilly, 2017) shows.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/PwdW3RFpXrYhgT4GEBHu4lY9jlJLJtpQA4hg2pOlb7Q/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL28w/eHh5ZGFuemE5dmIy/b3Q5bnR3LnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/PwdW3RFpXrYhgT4GEBHu4lY9jlJLJtpQA4hg2pOlb7Q/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL28w/eHh5ZGFuemE5dmIy/b3Q5bnR3LnBuZw" alt="Bar plot and Coxcomb chart" width="880" height="381"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Conclusions
&lt;/h3&gt;

&lt;p&gt;With these two additional concepts we have now a complete picture of the basic GoG. In this short article I could of course only present a subset of all possible &lt;em&gt;aesthetics&lt;/em&gt; and &lt;em&gt;graphics&lt;/em&gt; and there are more elements to the GoG like &lt;em&gt;statistics&lt;/em&gt; and &lt;em&gt;facets&lt;/em&gt;. But what we have seen so far is the core of the Grammar of Graphics and should be enough to grasp the main ideas.&lt;/p&gt;
&lt;h2&gt;
  
  
  Comparison of Julia graphics packages
&lt;/h2&gt;

&lt;p&gt;Let’s now switch to the comparison of different Julia graphics packages which I will present in several follow-up articles. As sort of a preparation I will now present the data used for different example plots (which are inspired by the YouTube tutorial &lt;a href="https://youtu.be/s7ZRVCvdKAo"&gt;Julia Analysis for Beginners&lt;/a&gt; from the channel &lt;a href="https://www.youtube.com/c/juliafortalentedamateurs"&gt;julia for talented amateurs&lt;/a&gt;) within these follow-up articles and give an outlook on what sorts of diagrams I will use for the comparison.&lt;/p&gt;
&lt;h3&gt;
  
  
  Countries by GDP
&lt;/h3&gt;

&lt;p&gt;The basis of the data used for the plotting examples is a list of all countries and their GDP and population size for the years 2018 and 2019. It’s from this &lt;a href="https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)"&gt;Wikipedia-page&lt;/a&gt; (which got the data from a database of the IMF and the United Nations). The data is also available in &lt;a href="https://raw.githubusercontent.com/roland-KA/StatisticalPlotsWithJulia/main/data/countries.csv"&gt;CSV-format&lt;/a&gt; from my GitHub-repository.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/0AtQqYN4n7ccFl3j_cfNjaRu_2-7LpWpmhaDOS-osR4/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzLzdn/NWx3bGNraGJ6MWlt/MmlsMWEwLnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/0AtQqYN4n7ccFl3j_cfNjaRu_2-7LpWpmhaDOS-osR4/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzLzdn/NWx3bGNraGJ6MWlt/MmlsMWEwLnBuZw" alt="excerpt from country list" width="880" height="357"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The columns of the list have the following meaning:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ID&lt;/code&gt;: unique identifier&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Region&lt;/code&gt;: the continent where the country is located&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Subregion&lt;/code&gt;: each continent is divided into several subregions&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Pop2018&lt;/code&gt;: population of the country in 2018 [million people]&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Pop2019&lt;/code&gt;: population of the country in 2019 [million people]&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;PopChangeAbs&lt;/code&gt;: change in population from 2018 to 2019 in absolute numbers [million people]&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;PopChangePct&lt;/code&gt;: like &lt;code&gt;PopChangeAbs&lt;/code&gt; but as a percentage [%]&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;GDP&lt;/code&gt;: gross domestic product of the country in 2019 [million USD]&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;GDPperCapita&lt;/code&gt;: &lt;code&gt;GDP&lt;/code&gt; divided by the number of people living in the country [USD/person]; this column is not in the source file, but will be computed (see below)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The file is downloaded and converted to a &lt;code&gt;DataFrame&lt;/code&gt; using the following Julia code:&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;



&lt;p&gt;Line 7 computes the new column &lt;code&gt;GDPperCapita&lt;/code&gt; mentioned above and adds it to the &lt;code&gt;countries&lt;/code&gt;-DataFrame.&lt;/p&gt;

&lt;h3&gt;
  
  
  Aggregated data
&lt;/h3&gt;

&lt;p&gt;The detailed list which has one row per country (in 210 rows) will be grouped and aggregated on two levels (using &lt;code&gt;DataFrame&lt;/code&gt;-functions):&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Level 1 — Regions&lt;/strong&gt;: The following code groups the list by &lt;code&gt;Region&lt;/code&gt; (i.e. continent) omitting the columns &lt;code&gt;Country&lt;/code&gt; and &lt;code&gt;Subregion&lt;/code&gt; (using a nested &lt;code&gt;select&lt;/code&gt;) in line 1 and then creates an aggregation summing up all numerical columns (lines 2–5).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Level 2 — Subregions&lt;/strong&gt;: The same operations are applied on the subregion level in lines 7–11. First the countries are grouped by &lt;code&gt;Subregion&lt;/code&gt; omitting column &lt;code&gt;Country&lt;/code&gt; (line 7) and then an aggregation is created on that data; again summing up all numerical columns. Besides, the name of the region is picked from each subgroup (&lt;code&gt;:Region =&amp;gt; first&lt;/code&gt;)&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;This resulting DataFrames &lt;code&gt;regions_cum&lt;/code&gt; and &lt;code&gt;subregions_cum&lt;/code&gt; look as follows:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/b0Zrw2PHcpjrsYb8rQNPI63IQRDc_4Wo99-GcenEys0/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL2Nv/YndicDVoY3MyMW54/d2gyMTVvLnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/b0Zrw2PHcpjrsYb8rQNPI63IQRDc_4Wo99-GcenEys0/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL2Nv/YndicDVoY3MyMW54/d2gyMTVvLnBuZw" alt="aggregation by region" width="880" height="353"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/DbhWP8e7eAIsOZeoPn9Q29DYYxb1tdTIypRUtGOcRR0/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL2F6/NXVvNmJjYWhhNHpu/bnlqemw4LnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/DbhWP8e7eAIsOZeoPn9Q29DYYxb1tdTIypRUtGOcRR0/w:880/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL2F6/NXVvNmJjYWhhNHpu/bnlqemw4LnBuZw" alt="aggregation by subregion" width="880" height="492"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Summary
&lt;/h3&gt;

&lt;p&gt;The DataFrames &lt;code&gt;countries&lt;/code&gt;, &lt;code&gt;subregions_cum&lt;/code&gt; and &lt;code&gt;regions_cum&lt;/code&gt; are the basis for the plotting examples in the forthcoming articles about the different Julia graphics packages. In these articles we will see how to create&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;bar plots&lt;/li&gt;
&lt;li&gt;scatter plots&lt;/li&gt;
&lt;li&gt;histograms&lt;/li&gt;
&lt;li&gt;box plots and violin plots &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;in each of these graphics packages.&lt;/p&gt;

&lt;p&gt;The first article will present Gadfly. So stay tuned!&lt;/p&gt;

</description>
      <category>graphics</category>
      <category>plotting</category>
      <category>comparison</category>
      <category>ggplot</category>
    </item>
  </channel>
</rss>
