<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Julia Community 🟣: Machine Learning Julia (MLJ.jl)</title>
    <description>The latest articles on Julia Community 🟣 by Machine Learning Julia (MLJ.jl) (@mlj).</description>
    <link>https://forem.julialang.org/mlj</link>
    <image>
      <url>https://forem.julialang.org/images/OEYWw3sEizCAn-9iAMsJ-qUeN3OaV8V44KbTA5mOiQM/rs:fill:90:90/g:sm/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L29yZ2FuaXphdGlv/bi9wcm9maWxlX2lt/YWdlLzUvNjkxM2Yx/N2EtYjdlNS00NTll/LWE3N2UtNDQ4ODVj/MTY2YjljLnBuZw</url>
      <title>Julia Community 🟣: Machine Learning Julia (MLJ.jl)</title>
      <link>https://forem.julialang.org/mlj</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.julialang.org/feed/mlj"/>
    <language>en</language>
    <item>
      <title>Julia Boards the Titanic- A brief introduction to the MLJ.jl package</title>
      <dc:creator>Anthony Blaom, PhD</dc:creator>
      <pubDate>Wed, 15 Feb 2023 22:24:07 +0000</pubDate>
      <link>https://forem.julialang.org/mlj/julia-boards-the-titanic-1ne8</link>
      <guid>https://forem.julialang.org/mlj/julia-boards-the-titanic-1ne8</guid>
      <description>&lt;p&gt;This is a gentle introduction to Julia's machine learning toolbox &lt;a href="https://JuliaAI.github.io/MLJ.jl/stable/" rel="noopener noreferrer"&gt;MLJ&lt;/a&gt; focused on users new to Julia. In it we train a decision tree to predict whether a new passenger would survive a hypothetical replay of the Titanic disaster. The blog is loosely based on &lt;a href="https://github.com/ablaom/HelloJulia.jl/tree/dev/notebooks/03_machine_learning" rel="noopener noreferrer"&gt;these notebooks&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prerequisites:&lt;/strong&gt; No prior experience with Julia, but you should know how to open a Julia REPL session in some terminal or console. A nodding acquaintance with &lt;a href="https://www.digitalocean.com/community/tutorials/an-introduction-to-machine-learning" rel="noopener noreferrer"&gt;supervised machine learning&lt;/a&gt; would be helpful.&lt;/p&gt;

&lt;p&gt;Experienced data scientists may want to check out the more advanced tutorial, &lt;a href="https://juliaai.github.io/DataScienceTutorials.jl/end-to-end/telco/" rel="noopener noreferrer"&gt;MLJ for Data Scientists in Two Hours&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Decision Trees
&lt;/h2&gt;

&lt;p&gt;Generally, &lt;a href="https://en.wikipedia.org/wiki/Decision_tree" rel="noopener noreferrer"&gt;decision trees&lt;/a&gt; are not the best performing machine learning models. However, they are extremely fast to train, easy to interpret, and have flexible data requirements. They are also the building blocks of more advanced models, such as &lt;a href="https://en.wikipedia.org/wiki/Random_forest" rel="noopener noreferrer"&gt;random forests&lt;/a&gt; and &lt;a href="https://en.wikipedia.org/wiki/Gradient_boosting" rel="noopener noreferrer"&gt;gradient boosted trees&lt;/a&gt;, one of the most successful and widely applied class of machine learning models today. All these models are available in the MLJ toolbox and are trained in the same way as the decision tree. &lt;/p&gt;

&lt;p&gt;Here's a diagram representing what a decision tree, trained on the Titanic dataset, might look like:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/rAMWj5qLXC85eBy1eV6IAnWIMtAtR3xvoVeIQfg1Wus/rt:fit/w:800/g:sm/q:0/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL29w/am9jaW50NGJta3Rl/N3JwbHc5LmpwZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/rAMWj5qLXC85eBy1eV6IAnWIMtAtR3xvoVeIQfg1Wus/rt:fit/w:800/g:sm/q:0/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL29w/am9jaW50NGJta3Rl/N3JwbHc5LmpwZw" alt="decision tree" width="465" height="473"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For example, in this model, a male over the age of 9.5 is predicted to die, having a survival probability of 0.17.&lt;/p&gt;

&lt;h2&gt;
  
  
  Package installation
&lt;/h2&gt;

&lt;p&gt;We start by creating a new Julia package environment called &lt;code&gt;titanic&lt;/code&gt;, for tracking versions of the packages we will need. Do this by typing these commands at the &lt;code&gt;julia&amp;gt;&lt;/code&gt; prompt, with carriage returns added at the end of each line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;Pkg&lt;/span&gt;
&lt;span class="n"&gt;Pkg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;activate&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"titanic"&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shared&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To add the packages we need to your environment, enter the &lt;code&gt;]&lt;/code&gt; character at the &lt;code&gt;julia&amp;gt;&lt;/code&gt; prompt, to change it to &lt;code&gt;(titanic) pkg&amp;gt;&lt;/code&gt;. Then enter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="n"&gt;add&lt;/span&gt; &lt;span class="n"&gt;MLJ&lt;/span&gt; &lt;span class="n"&gt;DataFrames&lt;/span&gt; &lt;span class="n"&gt;BetaML&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It may take a few minutes for these packages to be installed and "precompiled".&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tip.&lt;/strong&gt; Next time you want to use exactly the same combination of packages in a new Julia session, you can skip the &lt;code&gt;add&lt;/code&gt; command and instead just enter the two lines above them.&lt;/p&gt;

&lt;p&gt;When the &lt;code&gt;(titanic) pkg&amp;gt;&lt;/code&gt; prompt returns, enter &lt;code&gt;status&lt;/code&gt; to see the package versions that were installed. Here's what each package does:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://JuliaAI.github.io/MLJ.jl/stable/" rel="noopener noreferrer"&gt;MLJ&lt;/a&gt; (machine learning toolbox): provides a common interface for interacting with models provided by different packages, and for automating common model-generic tasks, such as &lt;a href="https://en.wikipedia.org/wiki/Hyperparameter_optimization" rel="noopener noreferrer"&gt;hyperparameter optimization&lt;/a&gt; demonstrated at the end of this blog.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://dataframes.juliadata.org/stable/" rel="noopener noreferrer"&gt;DataFrames&lt;/a&gt;: Allows you to manipulate tabular data that fits into memory. &lt;strong&gt;Tip.&lt;/strong&gt; Checkout these &lt;a href="https://ahsmart.com/pub/data-wrangling-with-data-frames-jl-cheat-sheet/index.html" rel="noopener noreferrer"&gt;cheatsheets&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/sylvaticus/BetaML.jl" rel="noopener noreferrer"&gt;BetaML&lt;/a&gt;: Provides the core decision algorithm we will be building for Titanic prediction.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Learn more about Julia package management &lt;a href="https://docs.julialang.org/en/v1/stdlib/Pkg/" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For now, return to the &lt;code&gt;julia&amp;gt;&lt;/code&gt; prompt by pressing the "delete" or "backspace" key.&lt;/p&gt;

&lt;h2&gt;
  
  
  Establishing correct data representation
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;MLJ&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DataFrames&lt;/span&gt; &lt;span class="n"&gt;as&lt;/span&gt; &lt;span class="n"&gt;DF&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After entering the first line above we are ready to use any function in MLJ's documentation as it appears there. After the second, we can use functions from DataFrames, but must qualify the function names with a prefix &lt;code&gt;DF.&lt;/code&gt;, as we'll see later.&lt;/p&gt;

&lt;p&gt;In MLJ, and some other statistics packages, a &lt;a href="https://juliaai.github.io/ScientificTypes.jl/dev/" rel="noopener noreferrer"&gt;"scientific type"&lt;/a&gt; or &lt;em&gt;scitype&lt;/em&gt; indicates how MLJ will &lt;em&gt;interpret&lt;/em&gt; data (as opposed to how it is represented on your machine). For example, while we have&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="n"&gt;julia&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;typeof&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt;
&lt;span class="kt"&gt;Int64&lt;/span&gt;

&lt;span class="n"&gt;julia&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;typeof&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt;
&lt;span class="kt"&gt;Bool&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;we have&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="n"&gt;julia&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;scitype&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;Count&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;but also&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="n"&gt;julia&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;scitype&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;Count&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Tip.&lt;/strong&gt; To learn more about a Julia command, use the &lt;code&gt;?&lt;/code&gt; character. For example, try typing &lt;code&gt;?scitype&lt;/code&gt; at the &lt;code&gt;julia&amp;gt;&lt;/code&gt; prompt.&lt;/p&gt;

&lt;p&gt;In MLJ, model data requirements are articulated using scitypes, which allows you to focus on what your data represents in the real world, instead of how it is stored on your computer.&lt;/p&gt;

&lt;p&gt;Here are the most common "scalar" scitypes:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://forem.julialang.org/images/jsGnG784ey3efPt5lpB_zq_oo7Y_lSvM3rNwPTkfu6Q/rt:fit/w:800/g:sm/q:0/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL2Fk/MXpsaXF0emY2ZDY1/eHBjcWNtLnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://forem.julialang.org/images/jsGnG784ey3efPt5lpB_zq_oo7Y_lSvM3rNwPTkfu6Q/rt:fit/w:800/g:sm/q:0/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L2FydGljbGVzL2Fk/MXpsaXF0emY2ZDY1/eHBjcWNtLnBuZw" alt="scalar scitypes" width="598" height="81"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We'll grab our Titanic data set from &lt;a href="https://www.openml.org" rel="noopener noreferrer"&gt;OpenML&lt;/a&gt;, a platform for sharing machine learning datasets and workflows. The second line below converts the downloaded data into a dataframe.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="n"&gt;table&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;OpenML&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42638&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DF&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can use DataFrames to get summary statistics for the features in our dataset:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="n"&gt;DF&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;describe&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Row&lt;/th&gt;
&lt;th&gt;variable&lt;/th&gt;
&lt;th&gt;mean&lt;/th&gt;
&lt;th&gt;min&lt;/th&gt;
&lt;th&gt;median&lt;/th&gt;
&lt;th&gt;max&lt;/th&gt;
&lt;th&gt;nmissing&lt;/th&gt;
&lt;th&gt;eltype&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;pclass&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;CategoricalValue{String, UInt32}&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;sex&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;female&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;male&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;CategoricalValue{String, UInt32}&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;age&lt;/td&gt;
&lt;td&gt;29.7589&lt;/td&gt;
&lt;td&gt;0.42&lt;/td&gt;
&lt;td&gt;30.0&lt;/td&gt;
&lt;td&gt;80.0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;Float64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;sibsp&lt;/td&gt;
&lt;td&gt;0.523008&lt;/td&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;td&gt;8.0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;Float64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;fare&lt;/td&gt;
&lt;td&gt;32.2042&lt;/td&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;td&gt;14.4542&lt;/td&gt;
&lt;td&gt;512.329&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;Float64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;cabin&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;E31&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;C148&lt;/td&gt;
&lt;td&gt;687&lt;/td&gt;
&lt;td&gt;Union{Missing, CategoricalValue{…&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;embarked&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;C&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;S&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Union{Missing, CategoricalValue{…&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;survived&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;CategoricalValue{String, UInt32}&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In particular, we see that &lt;code&gt;cabin&lt;/code&gt; has a lot of missing values, and we'll shortly drop it for simplicity.&lt;/p&gt;

&lt;p&gt;To get a summary of feature scitypes, we use &lt;code&gt;schema&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="n"&gt;schema&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Row&lt;/th&gt;
&lt;th&gt;names&lt;/th&gt;
&lt;th&gt;scitypes&lt;/th&gt;
&lt;th&gt;types&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;pclass&lt;/td&gt;
&lt;td&gt;Multiclass{3}&lt;/td&gt;
&lt;td&gt;CategoricalValue{String, UInt32}&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;sex&lt;/td&gt;
&lt;td&gt;Multiclass{2}&lt;/td&gt;
&lt;td&gt;CategoricalValue{String, UInt32}&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;age&lt;/td&gt;
&lt;td&gt;Continuous&lt;/td&gt;
&lt;td&gt;Float64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;sibsp&lt;/td&gt;
&lt;td&gt;Continuous&lt;/td&gt;
&lt;td&gt;Float64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;fare&lt;/td&gt;
&lt;td&gt;Continuous&lt;/td&gt;
&lt;td&gt;Float64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;cabin&lt;/td&gt;
&lt;td&gt;Union{Missing, Multiclass{186}}&lt;/td&gt;
&lt;td&gt;Union{Missing, CategoricalValue{…&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;embarked&lt;/td&gt;
&lt;td&gt;Union{Missing, Multiclass{3}}&lt;/td&gt;
&lt;td&gt;Union{Missing, CategoricalValue{…&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;survived&lt;/td&gt;
&lt;td&gt;Multiclass{2}&lt;/td&gt;
&lt;td&gt;CategoricalValue{String, UInt32}&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Now &lt;code&gt;sibsp&lt;/code&gt; represents the number of siblings/spouses, which is not a continuous variable. So we fix that like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="n"&gt;coerce!&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;sibsp&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Count&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Call &lt;code&gt;schema(df)&lt;/code&gt; again, to check a successful change.&lt;/p&gt;

&lt;h2&gt;
  
  
  Splitting into train and test sets
&lt;/h2&gt;

&lt;p&gt;To objectively evaluate the performance of our final model, we split off 30% of our data into a &lt;em&gt;holdout set&lt;/em&gt;, called &lt;code&gt;df_test&lt;/code&gt;, which will not used for training:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;df_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;partition&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rng&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;123&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can check the number of observations in each set with &lt;code&gt;DF.nrow(df)&lt;/code&gt; and &lt;code&gt;DF.nrow(df_test)&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Splitting data into input features and target
&lt;/h2&gt;

&lt;p&gt;In supervised learning, the &lt;em&gt;target&lt;/em&gt; is the variable we want to predict, in this case &lt;code&gt;survived&lt;/code&gt;. The other features will be inputs to our predictor. The following code puts the &lt;code&gt;df&lt;/code&gt; column with name &lt;code&gt;survived&lt;/code&gt; into the vector &lt;code&gt;y&lt;/code&gt; (the target) and everything else, except &lt;code&gt;cabin&lt;/code&gt;, which we're dropping, into a new dataframe called &lt;code&gt;X&lt;/code&gt; (the input features).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;unpack&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;survived&lt;/span&gt;&lt;span class="x"&gt;),&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;cabin&lt;/span&gt;&lt;span class="x"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can check &lt;code&gt;X&lt;/code&gt; and &lt;code&gt;y&lt;/code&gt; have the expected form by doing &lt;code&gt;schema(X)&lt;/code&gt; and &lt;code&gt;scitype(y)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;We'll want to do the same for the holdout test set:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;unpack&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_test&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;survived&lt;/span&gt;&lt;span class="x"&gt;),&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;cabin&lt;/span&gt;&lt;span class="x"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Choosing a supervised model:
&lt;/h2&gt;

&lt;p&gt;There are not many models that can directly handle missing values and a mixture of scitypes, as we have here. Here's how to list the ones that do:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="n"&gt;julia&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;matching&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="x"&gt;))&lt;/span&gt;
 &lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ConstantClassifier&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;package_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MLJModels&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt; &lt;span class="x"&gt;)&lt;/span&gt;
 &lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DecisionTreeClassifier&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;package_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;BetaML&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt; &lt;span class="x"&gt;)&lt;/span&gt;
 &lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DeterministicConstantClassifier&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;package_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MLJModels&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt; &lt;span class="x"&gt;)&lt;/span&gt;
 &lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;RandomForestClassifier&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;package_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;BetaML&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt; &lt;span class="x"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This shortcoming can be addressed with data preprocessing &lt;a href="https://JuliaAI.github.io/MLJ.jl/stable/model_browser/#Model-Browser" rel="noopener noreferrer"&gt;provided by MLJ&lt;/a&gt; but not covered here, such as one-hot encoding and missing value imputation. We'll settle for the indicated decision tree.&lt;/p&gt;

&lt;p&gt;The code for the decision tree model is not available until we explicitly load it, but we can already inspect its documentation. Do this by entering &lt;code&gt;doc("DecisionTreeClassifier", pkg="BetaML")&lt;/code&gt;. (To browse &lt;em&gt;all&lt;/em&gt; MLJ model documentation use the &lt;a href="https://JuliaAI.github.io/MLJ.jl/stable/model_browser/#Model-Browser" rel="noopener noreferrer"&gt;Model Browser&lt;/a&gt;.)&lt;/p&gt;

&lt;p&gt;An MLJ-specific method for loading the model code (and necessary packages) is shown below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="n"&gt;Tree&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nd"&gt;@load&lt;/span&gt; &lt;span class="n"&gt;DecisionTreeClassifier&lt;/span&gt; &lt;span class="n"&gt;pkg&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;BetaML&lt;/span&gt;
&lt;span class="n"&gt;tree&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Tree&lt;/span&gt;&lt;span class="x"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first line loads the model &lt;em&gt;type&lt;/em&gt;, which we've called &lt;code&gt;Tree&lt;/code&gt;; the second creates an object storing default hyperparameters for a &lt;code&gt;Tree&lt;/code&gt; model. This &lt;code&gt;tree&lt;/code&gt; will be displayed thus:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="n"&gt;DecisionTreeClassifier&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;max_depth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;min_gain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;min_records&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;max_features&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;splitting_criterion&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;BetaML&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Utils&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gini&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;rng&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_GLOBAL_RNG&lt;/span&gt;&lt;span class="x"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can specify different hyperparameters like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="n"&gt;tree&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Tree&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Training the model
&lt;/h2&gt;

&lt;p&gt;We now bind the data to be used for training and the hyperparameter object &lt;code&gt;tree&lt;/code&gt; we just created in a new object called a &lt;em&gt;machine&lt;/em&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="n"&gt;mach&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;machine&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tree&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We train the model on all bound data by calling &lt;code&gt;fit!&lt;/code&gt; on the machine. The exclamation mark &lt;code&gt;!&lt;/code&gt; in &lt;code&gt;fit!&lt;/code&gt; tells us that &lt;code&gt;fit!&lt;/code&gt; mutates (changes) its argument. In this case the model's learned parameters (the actual decision tree) is stored in the &lt;code&gt;mach&lt;/code&gt; object:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="n"&gt;fit!&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mach&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Before getting predictions for new inputs, let's start by looking at predictions for the inputs we trained on:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mach&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice that these are &lt;em&gt;probabilistic&lt;/em&gt; predictions. For example, we have&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="n"&gt;julia&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="x"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="x"&gt;]&lt;/span&gt;
           &lt;span class="n"&gt;UnivariateFinite&lt;/span&gt;&lt;span class="x"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Multiclass&lt;/span&gt;&lt;span class="x"&gt;{&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="x"&gt;}}&lt;/span&gt;
     &lt;span class="n"&gt;┌&lt;/span&gt;                                        &lt;span class="n"&gt;┐&lt;/span&gt;
   &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="n"&gt;┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■&lt;/span&gt; &lt;span class="mf"&gt;0.914894&lt;/span&gt;
   &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="n"&gt;┤■■■&lt;/span&gt; &lt;span class="mf"&gt;0.0851064&lt;/span&gt;
     &lt;span class="n"&gt;└&lt;/span&gt;                                        &lt;span class="n"&gt;┘&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Extracting a raw probability requires an extra step. For example, to get the survival probability (&lt;code&gt;1&lt;/code&gt; corresponding to survival and &lt;code&gt;0&lt;/code&gt; to death), we do this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="n"&gt;julia&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;pdf&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="x"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="x"&gt;],&lt;/span&gt; &lt;span class="s"&gt;"1"&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt;
&lt;span class="mf"&gt;0.0851063829787234&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can also get "point" predictions using the &lt;code&gt;mode&lt;/code&gt; function and Julia's broadcasting syntax:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="n"&gt;julia&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;yhat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;julia&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;yhat&lt;/span&gt;&lt;span class="x"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="x"&gt;]&lt;/span&gt;
&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;element&lt;/span&gt; &lt;span class="n"&gt;CategoricalArrays&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CategoricalArray&lt;/span&gt;&lt;span class="x"&gt;{&lt;/span&gt;&lt;span class="kt"&gt;String&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt;&lt;span class="kt"&gt;UInt32&lt;/span&gt;&lt;span class="x"&gt;}&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
 &lt;span class="s"&gt;"0"&lt;/span&gt;
 &lt;span class="s"&gt;"0"&lt;/span&gt;
 &lt;span class="s"&gt;"1"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Evaluating model performance
&lt;/h2&gt;

&lt;p&gt;Let's see how accurate our model is at predicting on the data it trained on:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="n"&gt;julia&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;accuracy&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;yhat&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt;
&lt;span class="mf"&gt;0.921474358974359&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Over 90% accuracy! Better check the accuracy on the test data that the model hasn't seen:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="n"&gt;julia&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;yhat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mach&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="x"&gt;));&lt;/span&gt;
&lt;span class="n"&gt;julia&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;accuracy&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;yhat&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt;
&lt;span class="mf"&gt;0.7790262172284644&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Oh dear. We are most likely &lt;a href="https://en.wikipedia.org/wiki/Overfitting" rel="noopener noreferrer"&gt;overfitting&lt;/a&gt; the model. Still, not a bad first step.&lt;/p&gt;

&lt;p&gt;The evaluation we have just performed is known as &lt;em&gt;holdout&lt;/em&gt; evaluation. MLJ provides tools for automating such evaluations, as well as more sophisticated ones, such as &lt;a href="https://en.wikipedia.org/wiki/Cross-validation_(statistics)" rel="noopener noreferrer"&gt;cross-validation&lt;/a&gt;. See &lt;a href="https://JuliaAI.github.io/MLJ.jl/stable/getting_started/#Getting-Started" rel="noopener noreferrer"&gt;this simple example&lt;/a&gt; and &lt;a href="https://JuliaAI.github.io/MLJ.jl/stable/evaluating_model_performance/" rel="noopener noreferrer"&gt;the detailed documentation&lt;/a&gt; for more information.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tuning the model
&lt;/h2&gt;

&lt;p&gt;Changing any hyperparameter of our model will alter it's performance. In particular, changing certain parameters may mitigate overfitting.&lt;/p&gt;

&lt;p&gt;In MLJ we can "wrap" the model to make it automatically optimize a given hyperparameter, which it does by internally creating its own holdout set for evaluation (or using some other resampling scheme, such as cross-validation) and systematically searching over a specified range of one or more hyperparameters. Let's do that now for our decision tree.&lt;/p&gt;

&lt;p&gt;First, we define a hyperparameter range over which to search:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;range&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tree&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;max_depth&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lower&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;upper&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that, according to the document string for the decision tree (which we can retrieve now with &lt;code&gt;?Tree&lt;/code&gt;) we see that &lt;code&gt;0&lt;/code&gt; here means "no limit on &lt;code&gt;max_depth&lt;/code&gt;".&lt;/p&gt;

&lt;p&gt;Next, we apply MLJ's &lt;code&gt;TunedModel&lt;/code&gt; &lt;a href="//JuliaAI.github.io/MLJ.jl/stable/tuning_models/"&gt;wrapper&lt;/a&gt; to our tree, specifying the range and performance measure to use as a basis for optimization, as well as the resampling strategy we want to use, and the search method (grid in this case).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="n"&gt;tuned_tree&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TunedModel&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;tree&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tuning&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Grid&lt;/span&gt;&lt;span class="x"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;range&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;measure&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;accuracy&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;resampling&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Holdout&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fraction_train&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="x"&gt;),&lt;/span&gt;
&lt;span class="x"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The new model &lt;code&gt;tuned_tree&lt;/code&gt; behaves like the old, except that the &lt;code&gt;max_depth&lt;/code&gt; hyperparameter effectively becomes a &lt;em&gt;learned&lt;/em&gt; parameter instead.&lt;/p&gt;

&lt;p&gt;Training this &lt;code&gt;tuned_tree&lt;/code&gt; actually performs two operations, under the hood:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Search for the best model using an internally constructed holdout set&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Retrain the "best" model on &lt;em&gt;all&lt;/em&gt; available data&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mach2 = machine(tuned_tree, X, y)
fit!(mach2)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's how we can see what the optimal model actually is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="n"&gt;julia&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;fitted_params&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mach2&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;best_model&lt;/span&gt;
&lt;span class="n"&gt;DecisionTreeClassifier&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;max_depth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;min_gain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;min_records&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;max_features&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;splitting_criterion&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;BetaML&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Utils&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gini&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;rng&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_GLOBAL_RNG&lt;/span&gt;&lt;span class="x"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, let's test the self-tuning model on our existing holdout set:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="n"&gt;yhat2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mach2&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="x"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;accuracy&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;yhat2&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt;
&lt;span class="mf"&gt;0.8164794007490637&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Although we cannot assign this outcome statistical signicance, without a more detailed analysis, this appears to be an improvement on our original &lt;code&gt;depth=10&lt;/code&gt; model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Learning more
&lt;/h2&gt;

&lt;p&gt;Suggestions for learning more about Julia and MLJ are &lt;a href="https://JuliaAI.github.io/MLJ.jl/stable/learning_mlj/" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>mlj</category>
      <category>tutorial</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Case Study: Documenting machine learning models in a Julia ML framework</title>
      <dc:creator>Logan Kilpatrick</dc:creator>
      <pubDate>Wed, 30 Nov 2022 17:06:52 +0000</pubDate>
      <link>https://forem.julialang.org/mlj/case-study-documenting-machine-learning-models-in-a-julia-ml-framework-190a</link>
      <guid>https://forem.julialang.org/mlj/case-study-documenting-machine-learning-models-in-a-julia-ml-framework-190a</guid>
      <description>&lt;p&gt;Julia is a relatively new, general purpose programming language. MLJ (Machine Learning in Julia) is a toolbox written in Julia providing a common interface and meta-algorithms for selecting, tuning, evaluating, composing and &lt;a href="https://alan-turing-institute.github.io/MLJ.jl/dev/list_of_supported_models/"&gt;comparing a variety of machine learning models&lt;/a&gt; implemented in Julia and other languages. &lt;/p&gt;

&lt;p&gt;&lt;em&gt;Authors:&lt;/em&gt; Anthony Blaom, Logan Kilpatrick  and David Josephs&lt;br&gt;
Problem Statement&lt;/p&gt;




&lt;p&gt;While MLJ provides detailed documentation for its model-generic functionality (eg, hyperparameter optimization) users previously relied on third party package providers for model-specific documentation. This is physically scattered, occasionally terse, and not in any standard format. This was viewed as a barrier to adoption, especially by users new to machine learning, which is a large demographic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Proposal Abstract
&lt;/h2&gt;

&lt;p&gt;Having decided on a standard for model document strings, this project’s goal was to roll out model document strings for individual models. For a suitably identified technical writer, this was to involve:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Learning to use MLJ for data science projects&lt;/li&gt;
&lt;li&gt;Understanding the document string specification&lt;/li&gt;
&lt;li&gt;Reading and understanding third party model documentation &lt;/li&gt;
&lt;li&gt;Boosting machine learning knowledge where appropriate to inform accurate document strings&lt;/li&gt;
&lt;li&gt;Collaborating through code reviews in the writing of new document strings&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Details of the proposal are &lt;a href="https://julialang.org/jsoc/gsod/2022/proposal/"&gt;on the Julia website&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Project Description
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Creating the proposal
&lt;/h3&gt;

&lt;p&gt;Our Google Season of Docs process always starts with an open solicitation to the community for project ideas. Those are generally crowd sourced and added to the Julia website. From there, the core Julia team evalautes each possible proposal based on the level of contributor interest, impact to the community, and enthusiasm of the mentor. As we have learned with Google Summer of Code over the last 10 years, the contributor experience is profoundly shaped by the mentor so we work hard to make sure there is someone with expertise and adequate time to support each project if selected. &lt;/p&gt;

&lt;p&gt;This year, we were lucky enough to have a project that checked all three boxed. MLJ’s usage in the Julia ecosystem has expanded significantly over time so it seemed like a worthwhile investment to support the project with documentation help, especially around something critical like model information. &lt;/p&gt;

&lt;p&gt;Once we officially announced that the MLJ project was the one selected, we shared this widely with the community for input. Generally, unless people are close to the proposed project itself, people don’t have much to say. Nonetheless, this process is still critical for transparency in the open source community. &lt;/p&gt;

&lt;h3&gt;
  
  
  Budget
&lt;/h3&gt;

&lt;p&gt;Our budget was estimated based on previous years of supporting technical writers in similar domains and scopes of work. Estimating is always more of an art than science which is why we tend to add a buffer of time/budget to support unexpected hiccups. &lt;/p&gt;

&lt;p&gt;Initially, we intended to have two main mentors but due to mentor availability, we only ended up with one person (Anthony), who did most of the mentoring work. We ended up spending the full amount allocated for the project per our expectations (expect ordering our wrap up t-shirts which is still in progress). &lt;/p&gt;

&lt;h3&gt;
  
  
  Participants
&lt;/h3&gt;

&lt;p&gt;List the project participants. MLJ’s co-creator and lead developer Anthony Blaom managed the project, reviewed contributions, and provided mentorship to the technical writer David Josephs. Several third party model package developers/authors were also involved in documentation review, including GitHub users @ExpandingMan, @sylvaticus, @davnn, @tlienart, &lt;a class="mentioned-user" href="https://forem.julialang.org/okonsamuel"&gt;@okonsamuel&lt;/a&gt;. Logan Kilpatrick co-wrote the proposal, helped with recruitment, and took care of project administration.&lt;/p&gt;

&lt;p&gt;When we knew we would be getting funding, we immediately shared the hiring details with the community on Slack, Discourse, and posted a job listing on LinkedIn to cast the widest possible net. Prospective candidates were asked to write a little about their background, describe previous technical writing experience and open-source contributions. This information, together with published  examples of their technical writing, were evaluated. Two candidates were invited for one-on-one zoom interviews, which followed up on the written application and gave candidates an opportunity to demonstrate oral communication skills, which were deemed essential. &lt;/p&gt;

&lt;p&gt;&lt;em&gt;Did anyone drop out?&lt;/em&gt; No.&lt;/p&gt;

&lt;p&gt;Since familiarity with Julia was strongly preferred, and some data science proficiency essential, it was challenging finding a large pool of candidates. In the end we selected a candidate who was strong in data science but less experienced with Julia. That said, our writer David had just started working for a company that codes in Julia, and that worked out nicely for us. David was quickly up-to-speed with the Julia proficiency we needed. Our experience reaffirms to us the importance in our work of scientific domain knowledge (machine learning) and good communication skills, over specific technical skills, such as proficiency with a certain tool. &lt;br&gt;
Timeline&lt;/p&gt;

&lt;p&gt;Our original proposal details a timeline. Our initial ambition included documentation for all models, with the exception of the sk-learn models; time was divided equally among model-providing packages. In hindsight this was a poor distribution as some packages provide a lot more models than others. Gauging progress was further complicated by the fact that some models had vastly more hyper-parameters to document.&lt;/p&gt;




&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;A &lt;a href="https://github.com/alan-turing-institute/MLJ.jl/issues/913"&gt;tracking issue&lt;/a&gt; nicely summarizes results of the project and its status going forward beyond Google Season of Docs 2022. Documentation additions were made in the following packages, linked to the relevant pull requests:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/JuliaAI/MLJTSVDInterface.jl/pull/14"&gt;MLJTSVDInterface.jl (truncated singular value decomposition) - Part 1&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/JuliaAI/MLJTSVDInterface.jl/pull/15"&gt;MLJTSVDInterface.jl - Part 2&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/JuliaAI/MLJText.jl/pull/22"&gt;MLJText.jl (text analysis) - Part1&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/JuliaAI/MLJText.jl/pull/23"&gt;MLJText.jl - Part 2&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/JuliaAI/MLJModels.jl/pull/472"&gt;MLJModels.jl (transformers)&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/JuliaAI/MLJNaiveBayesInterface.jl"&gt;MLJNaiveBayesInterface.jl&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/FluxML/MLJFlux.jl/pull/207"&gt;MLJFlux.jl (neural networks) - Part 1&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/FluxML/MLJFlux.jl/pull/209"&gt;MLJFlux.jl - Part 2&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/JuliaAI/MLJGLMInterface.jl/pull/26"&gt;MLJGLMInterface.jl (generalized linear models) - Part 1&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/JuliaAI/MLJGLMInterface.jl/pull/29"&gt;MLJGLMInterface.jl - Part 2&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/JuliaAI/MLJGLMInterface.jl/pull/31"&gt;MLJGLMInterface.jl - Part 3&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/JuliaAI/MLJClusteringInterface.jl/pull/15"&gt;MLJClusteringInterface.jl&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/JuliaAI/MLJXGBoostInterface.jl/pull/21"&gt;MLJXGBoostInterface.jl - minus examples&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/IQVIA-ML/LightGBM.jl/pull/130"&gt;LightGBM.jl (gradient boosting machines) - very nearly complete&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/OutlierDetectionJL/OutlierDetectionNeighbors.jl/pull/3"&gt;OutlierDetectionNeighbors.jl - nearly complete&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Also, the technical writer made these code additions, to synthesize multi-target supervised learning datasets, to improve some doc-string examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/JuliaAI/MLJBase.jl/pull/780"&gt;MLJBase.jl (multi-target data synthesis) - Part 1&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/JuliaAI/MLJBase.jl/pull/811"&gt;MLJBase.jl - Part 2&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Were there any deliverables in the proposal that did not get created? List those as well. The following packages did not get new docstrings, but were included in the original proposal:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/JuliaAI/MLJLinearModels.jl"&gt;MLJLinearModels.jl&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/JuliaAI/NearestNeighborModels.jl"&gt;NearestNeighborModels.jl&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/PyDataBlog/ParallelKMeans.jl"&gt;ParallelKMeans.jl&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/lalvim/PartialLeastSquaresRegressor.jl/pull/30"&gt;PartialLeastSquaresRegressor.jl&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Did this project result in any new or updated processes or procedures in your organization? No.&lt;/p&gt;

&lt;h2&gt;
  
  
  Metrics
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;What metrics did you choose to measure the success of the project? Were you able to collect those metrics? Did the metrics correlate well or poorly with the behaviors or outcomes you wanted for the project? Did your metrics change since your proposal? Did you add or remove any metrics? How often do you intend to collect metrics going forward?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Initially progress was measured by the number of third party packages documented but, as described above, a better measure was the proportion of individual models documented. As the project is quite close to being finished, I don’t imagine we need to rethink our metrics for this project.&lt;/p&gt;

&lt;h2&gt;
  
  
  Analysis
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;What went well? What was unexpected? What hurdles or setbacks did you face? Do you consider your project successful? Why or why not? (If it's too early to tell, explain when you expect to be able to judge the success of your project.)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This documentation project was always going to have some tedium associated with it, and it was fantastic to have help. Our technical writer was super enthusiastic and eager to learn things beyond the project remit. This enthusiasm helped me (Anthony) a lot to boost my own engagement. All in all, the communication side of things went very well.&lt;/p&gt;

&lt;p&gt;I think having our writer David working at a Julia shop (startup using Julia) was an unexpected benefit, as I that increased exposure of the MLJ project. We had a few volunteer contributions from a co-worker, for example.  Of course our project and David’s company shared the goal of boosting David’s Julia proficiency quickly. I believe David’s new expertise in MLJ is a definite benefit for his company, which currently builds Julia deep learning models. &lt;/p&gt;

&lt;p&gt;Another benefit of the project was that the process of documentation occasionally highlighted issues or improvements with the software, which were then addressed or tagged for later projects. Moreover, David provided valuable feedback on his own experience with the software, as a new user. &lt;/p&gt;

&lt;p&gt;As manager of the project, I did not anticipate how much time pull-request reviews would take. I’ve learned that reviewing documentation is at least as intensive as code review. In doc review there’s no set of tests to provide extra reassurance; you really need to carefully check every word.&lt;/p&gt;

&lt;p&gt;Fortunately, there were no big setbacks. I would definitely rate the project as a success: We were able to achieve most of our goals, and this is certain to smooth out the on-ramp for new MLJ users. The final analysis will come over time, as we check our engagement levels, and check user feedback. A survey has been prepared and is to be rolled out soon. &lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;In 2-4 paragraphs, summarize your project experience. Highlight what you learned, and what you would choose to do differently in the future. What advice would you give to other projects trying to solve a similar problem with documentation?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In this project a Google Season of Docs Technical Writer added document strings to models provided by most of the machine learning packages interfacing with the MLJ machine learning framework. This writing was primarily supervised and reviewed by one other contributor, the framework’s lead author and co-creator. &lt;/p&gt;

&lt;p&gt;The main lesson for the MLJ team has been that creating good docstrings is a lot of work, with the review process as intensive as code review. It is easy to underestimate the resources needed for good documentation. Recruiting for short-term Julia related development is challenging, given the language’s young age. &lt;/p&gt;

&lt;p&gt;In recruitment, it pays to value domain knowledge and good oral and written communication skills over specific skills, like proficiency in a particular language, assuming you have more than a few months of engagement. Doing so in this case led to a satisfying outcome. (By contrast, we have found a lack of Julia proficiency in GSoC projects more challenging.) &lt;/p&gt;




&lt;h2&gt;
  
  
  Appendix
&lt;/h2&gt;

&lt;p&gt;A &lt;a href="https://forem.julialang.org/josephsdavid/my-experience-working-as-a-technical-writer-for-mlj-1hk4"&gt;blog post describes&lt;/a&gt; our technical writer’s experience working on the project. &lt;/p&gt;

&lt;h3&gt;
  
  
  Acknowledgements
&lt;/h3&gt;

&lt;p&gt;Anthony Blaom acknowledges the support of a &lt;a href="https://www.mbie.govt.nz/science-and-technology/science-and-innovation/funding-information-and-opportunities/investment-funds/strategic-science-investment-fund/ssif-funded-programmes/university-of-auckland/"&gt;New Zealand Strategic Science Investment&lt;/a&gt; awarded to the University of Auckland, which funded his work on MLJ during the project. &lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>docs</category>
      <category>technicalwriting</category>
      <category>launch</category>
    </item>
    <item>
      <title>My experience working as a technical writer for MLJ</title>
      <dc:creator>David Josephs</dc:creator>
      <pubDate>Tue, 22 Nov 2022 05:32:36 +0000</pubDate>
      <link>https://forem.julialang.org/mlj/my-experience-working-as-a-technical-writer-for-mlj-1hk4</link>
      <guid>https://forem.julialang.org/mlj/my-experience-working-as-a-technical-writer-for-mlj-1hk4</guid>
      <description>&lt;p&gt;The last 6 or so months, I have had the great honor and pleasure of being a technical writer for MLJ as part of Google's Season of Docs.&lt;/p&gt;

&lt;p&gt;At the start of this year, I made some big changes in my life, getting a new job at a company that aims to do a lot of good in the world, and switching from Python to Julia. Almost immediately, I fell in love with Julia and wanted to get involved in the open source community. Since I lacked confidence in my ability to actually write Julia code, I decided to sign up for Google Season of Docs for MLJ! Now that this is coming to an end, I would like to share my experiences from the last 6 months, and hopefully encourage other Julia learners to get involved with projects they care about (and write docstrings)&lt;/p&gt;

&lt;h2&gt;
  
  
  Documenting MLJ
&lt;/h2&gt;

&lt;p&gt;At the start of this all, MLJ didn't really have a problem with the lack of docstrings, it is much more the lack of &lt;em&gt;consistent&lt;/em&gt; and &lt;em&gt;helpful&lt;/em&gt; docstrings. This problem arises because at the highest level, MLJ essentially provides a convenient, unified frontend to other packages and algorithms (yes, there is much more to the story here, but bear with me!). This means the code is distributed throughout several locations, with different owners and different levels of required maintenance. To resolve this, MLJ rolled out the &lt;a href="https://alan-turing-institute.github.io/MLJ.jl/dev/adding_models_for_general_use/%23The-document-string-standard"&gt;MLJ document string standard&lt;/a&gt;. For my "season" of docs, I spent my time bringing the docstrings for all the existing MLJ models up to this standard.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to write an MLJ docstring
&lt;/h2&gt;

&lt;p&gt;I think the most useful thing I can share from the last 6 months is the process I used to write MLJ docstrings in not too long of a time!&lt;/p&gt;

&lt;p&gt;Probably the easiest part of the MLJ docstring is the "header", which basically describes what the model is and what it does. So for example, let's say I have a classification model which uses some sort of separating hyperplane to do binary classification. At a minimum, my header would look something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;
&lt;span class="s"&gt;"""
`SomeSortOfHyperPlaneClassifier`: A classification model that uses some sort of separating hyperplane to do binary classification,
as first described in [link to some paper that describes it]. Maybe we put a few details specific to our implementation here.
"""&lt;/span&gt;
&lt;span class="n"&gt;SomeSortOfHyperPlaneClassifier&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After you have your header, generally what I do next is document all the hyperparameters. To do this, typically I open up the source code and search for the name of the model, looking for a struct definition with the same name. Maybe it will look something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="nd"&gt;@mlj_model&lt;/span&gt; &lt;span class="k"&gt;mutable struct&lt;/span&gt;&lt;span class="nc"&gt; SomeSortOfHyperPlaneClassifier&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;:&lt;/span&gt; &lt;span class="n"&gt;MMI&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Deterministic&lt;/span&gt;
    &lt;span class="n"&gt;scale&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;Bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All the fields of the struct are the models hyperparameters! Once you have found these, the task is to figure out what they do. This can be accomplished in a few ways:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Already knowing what they do&lt;/li&gt;
&lt;li&gt;Looking them up in the documentation for the package MLJ is interfacing with&lt;/li&gt;
&lt;li&gt;Reading source code! (hooray Julia for readable source code!)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Once you have these documented, it is time for the fun part! You can now open up a repl and load up &lt;code&gt;MLJ&lt;/code&gt; and the MLJ interface package you are working on. Following this contrived example, we could do something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;MLJ&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MLJSomeSortOfHyperPlaneModelsInterface&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we start to work out our example, because the rest of the documentation essentially exists to show what you need to get your model up and running!&lt;/p&gt;

&lt;p&gt;The first step, now that you have a repl loaded, is to figure out what sort of input types the model accepts, and how the data needs to look. We can figure this out in one of two ways:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Using MLJ's model metadata, which should live somewhere in the source code, looking something like:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;
 &lt;span class="n"&gt;metadata_model&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;
     &lt;span class="n"&gt;SomeSortOfHyperPlaneClassifier&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt;
     &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Table&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Continuous&lt;/span&gt;&lt;span class="x"&gt;),&lt;/span&gt;
     &lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kt"&gt;AbstractVector&lt;/span&gt;&lt;span class="x"&gt;{&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;:&lt;/span&gt;&lt;span class="n"&gt;Finite&lt;/span&gt;&lt;span class="x"&gt;{&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="x"&gt;}},&lt;/span&gt;
     &lt;span class="n"&gt;weights&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;false&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt;
     &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(PKG)&lt;/span&gt;&lt;span class="s"&gt;.SomeSortOfHyperPlaneClassifier"&lt;/span&gt;
 &lt;span class="x"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This model for example takes in a table of continuous values, and returns a vector of finite predictions (binary classification).&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Trial and error, thanks to MLJ's incredibly helpful error messages, which, if you feed an inappropriate &lt;a href="https://juliaai.github.io/ScientificTypes.jl/dev/"&gt;scientific type&lt;/a&gt;, will tell you exactly what types the model received and what types it expected.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;With this information figured out, you can fill out the information in the second section of the MLJ docstring, as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Training data&lt;/span&gt;
In MLJ or MLJBase, bind an instance &lt;span class="sb"&gt;`model`&lt;/span&gt; to data with one of:&lt;span class="sb"&gt;

    mach = machine(model, X, y)
    mach = machine(model, X, y, w)

&lt;/span&gt;Here
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="sb"&gt;`X`&lt;/span&gt;: is any table of input features (eg, a &lt;span class="sb"&gt;`DataFrame`&lt;/span&gt;) whose columns
are of scitype &lt;span class="sb"&gt;`SCIENTIFIC INPUT TYPE HERE`&lt;/span&gt;; check the scitype with &lt;span class="sb"&gt;`schema(X)`&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`y`&lt;/span&gt;: is the target, which can be any &lt;span class="sb"&gt;`AbstractVector`&lt;/span&gt; whose element
scitype is &lt;span class="sb"&gt;`SCIENTIFIC OUTPUT TYPE HERE`&lt;/span&gt;; check the scitype with &lt;span class="sb"&gt;`scitype(y)`&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`w`&lt;/span&gt;: is a vector of &lt;span class="sb"&gt;`Real`&lt;/span&gt; per-observation weights

Train the machine using &lt;span class="sb"&gt;`fit!(mach, rows=...)`&lt;/span&gt;.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we go ahead and pick our data for the example. Since the example should be easily understood by beginners, it is advisable to use standard datasets like &lt;code&gt;iris&lt;/code&gt;, &lt;code&gt;mnist&lt;/code&gt;, &lt;code&gt;crabs&lt;/code&gt;, or &lt;code&gt;boston_housing&lt;/code&gt; in your example. For edge cases like multitarget regression, there is &lt;a href="https://github.com/JuliaAI/MLJBase.jl/pull/811"&gt;the &lt;code&gt;make_regression&lt;/code&gt;&lt;/a&gt; function. In some cases, for example if you are documenting a model that is heavily used in a specific domain (e.g. independent component analysis and signals, or naive bayes and simple text classification), it is good to pick or create a second dataset for how the model would be appropriately used in the domain. Once you have the data chosen, you can go ahead and train your model! Next, you check out &lt;code&gt;fitted_params(my_trained_model)&lt;/code&gt; and &lt;code&gt;report(my_trained_model)&lt;/code&gt;. These bits are typically easy, and they go into corresponding sections in your docstrings. Finally, for its existence to matter the model also needs to be able to do inference. So, you need to find out what sort of predictions your model makes. Does it return probability distributions when you call &lt;code&gt;predict&lt;/code&gt;? If so, does it implement &lt;code&gt;predict_mode&lt;/code&gt; or &lt;code&gt;predict_mean&lt;/code&gt; to return point predictions? Is it a decomposition model that projects data into a lower dimensional space? If so it probably implements a &lt;code&gt;transform&lt;/code&gt; method! Whatever methods it implements, these get documented in the &lt;code&gt;Operations&lt;/code&gt; section of your docstring. Now, all you have to do is copy your code from the repl to the docstring in the example section, add references to related models, and you are done! It is not so bad, and at the end of the day you have a docstring that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Has all the information necessary for someone who is relatively new to Julia or machine learning, while being easy to read and digest&lt;/li&gt;
&lt;li&gt;Has an example that people can play around with&lt;/li&gt;
&lt;li&gt;Has a clear description of hyperparameters so it can be tuned&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Staying organized
&lt;/h2&gt;

&lt;p&gt;There are a few things that the above section didn't cover, and may not be relevant to absolutely everyone's workflow, but were definitely helpful to me. The biggest one is staying organized! Some interfaces implement a &lt;strong&gt;LOT&lt;/strong&gt; of models &lt;a href="https://github.com/JuliaAI/MLJMultivariateStatsInterface.jl/pull/39/files"&gt;(looking at you MLJMultivariateStatsInterface)&lt;/a&gt;. &lt;em&gt;Writing&lt;/em&gt; all the docstrings out in one file for several miles is extremely difficult, as if you accidentally scroll, it may take you some time to figure out which model you are looking at, and you may put some bits in the wrong places. Also, while you certainly can get spellcheck and syntax highlighting in Julia docstrings with &lt;a href="https://tree-sitter.github.io/tree-sitter/"&gt;treesitter&lt;/a&gt; (&lt;a href="https://github.com/josephsdavid/neovim2/blob/ff32d93e7f5b31a07c18d3a16d122f7000654f12/queries/julia/injections.scm%23L1"&gt;using language injections!&lt;/a&gt;), it certainly isnt the easiest way and does not guarantee you have well formatted docstrings. Instead, it is best to&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Make a checklist of all the docstrings you want to write&lt;/li&gt;
&lt;li&gt;Write them all in separate markdown files&lt;/li&gt;
&lt;li&gt;Paste them in as docstrings to the source code
Otherwise, if you are even slightly like me, you will likely get lost in a wall of docstrings with a lot of very similar looking words (thanks to standardization!)&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Writing these docstrings is not too hard, and a great way to learn more about both the thing you are documenting as well as get comfortable reading and writing Julia code! If you are relatively new to Julia, I cannot recommend looking at packages you either use or want to contribute to and checking out their docstrings. Odds are, because maintaining code is hard and maintaining code and docstrings is harder, the owners of the code would be happy to have an extra brain thinking about their documentation, and will be nice to you when you make a PR (I know it is a little scary when you first start making changes to somone else's). If you are a package maintainer and maybe don't have a ton of time to keep your docstrings up to date, or they aren't as thorough as you want them to be, open an issue on github so people know you are open to receiving documentation help!&lt;/p&gt;

&lt;h2&gt;
  
  
  Running list of PRs
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/IQVIA-ML/LightGBM.jl/pull/130"&gt;lightgbm (wip)&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/OutlierDetectionJL/OutlierDetectionNeighbors.jl/pull/3"&gt;outlier detection neighbors (wip)&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/JuliaAI/MLJTSVDInterface.jl/pull/14"&gt;MLJTSVDInterface&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/JuliaAI/MLJTSVDInterface.jl/pull/15"&gt;MLJTSVDInterface part 2&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/JuliaAI/MLJText.jl/pull/22"&gt;MLJText&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/JuliaAI/MLJText.jl/pull/23"&gt;MLJText part 2&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/JuliaAI/MLJModels.jl/pull/472"&gt;MLJModels&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/JuliaAI/MLJNaiveBayesInterface.jl"&gt;MLJNaiveBayesInterface&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/FluxML/MLJFlux.jl/pull/207"&gt;MLJFlux&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/FluxML/MLJFlux.jl/pull/209"&gt;MLJFlux part 2&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/JuliaAI/MLJGLMInterface.jl/pull/26"&gt;MLJGLMInterface&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/JuliaAI/MLJGLMInterface.jl/pull/29"&gt;MLJGLMInterface part 2&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/JuliaAI/MLJGLMInterface.jl/pull/31"&gt;MLJGLMInterface part 3&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/JuliaAI/MLJClusteringInterface.jl/pull/15"&gt;MLJClusteringInterface&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/JuliaAI/MLJBase.jl/pull/780"&gt;MLJBase multi-target &lt;code&gt;make_regression&lt;/code&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/JuliaAI/MLJBase.jl/pull/811"&gt;multi-target make_regression part 2&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/JuliaAI/MLJMultivariateStatsInterface.jl/pull/39"&gt;The big MLJMultivariateStatsInterface pr&lt;/a&gt;&lt;/p&gt;

</description>
      <category>mlj</category>
      <category>ml</category>
      <category>jsoc</category>
      <category>technicalwriter</category>
    </item>
  </channel>
</rss>
