<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Julia Community 🟣: Tim Besard</title>
    <description>The latest articles on Julia Community 🟣 by Tim Besard (@maleadt).</description>
    <link>https://forem.julialang.org/maleadt</link>
    <image>
      <url>https://forem.julialang.org/images/Qa559pxlY_RJDhwherAbxfJcW0NmYSThYUWAwQZ0RPY/rs:fill:90:90/g:sm/mb:500000/ar:1/aHR0cHM6Ly9mb3Jl/bS5qdWxpYWxhbmcu/b3JnL3JlbW90ZWlt/YWdlcy91cGxvYWRz/L3VzZXIvcHJvZmls/ZV9pbWFnZS82MDYv/YmRkZWQxZDQtYjcz/Zi00ZjM3LWExNTEt/NTIyN2IwN2RlZWVm/LmpwZw</url>
      <title>Julia Community 🟣: Tim Besard</title>
      <link>https://forem.julialang.org/maleadt</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.julialang.org/feed/maleadt"/>
    <language>en</language>
    <item>
      <title>Technical preview: Programming Apple M1 GPUs in Julia with Metal.jl</title>
      <dc:creator>Tim Besard</dc:creator>
      <pubDate>Tue, 05 Jul 2022 18:58:34 +0000</pubDate>
      <link>https://forem.julialang.org/maleadt/technical-preview-programming-apple-m1-gpus-in-julia-with-metaljl-2e3e</link>
      <guid>https://forem.julialang.org/maleadt/technical-preview-programming-apple-m1-gpus-in-julia-with-metaljl-2e3e</guid>
      <description>&lt;p&gt;Julia has gained a new GPU back-end: &lt;a href="https://github.com/JuliaGPU/Metal.jl"&gt;Metal.jl&lt;/a&gt;, for working with Apple's M1 GPUs. The back-end is built on the same foundations that make up existing GPU packages like &lt;a href="https://github.com/JuliaGPU/CUDA.jl"&gt;CUDA.jl&lt;/a&gt; and &lt;a href="https://github.com/JuliaGPU/AMDGPU.jl"&gt;AMDGPU.jl&lt;/a&gt;, so it should be familiar to anybody who's already programmed GPUs in Julia. In the following post I'll demonstrate some of that functionality and explain how it works.&lt;/p&gt;

&lt;p&gt;But first, note that &lt;strong&gt;Metal.jl is under heavy development&lt;/strong&gt;: The package is considered experimental for now, as we're still working on squashing bugs and adding essential functionality. We also haven't optimized for performance yet. If you're interesting in using Metal.jl, please consider contributing to its development! Most of the package is written in Julia, and checking-out the source code is a single &lt;code&gt;Pkg.develop&lt;/code&gt; away :-)&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick start
&lt;/h2&gt;

&lt;p&gt;Start by getting a hold of the upcoming &lt;a href="https://julialang.org/downloads/#upcoming_release"&gt;Julia 1.8&lt;/a&gt;, launch it, and enter the package manager by pressing &lt;code&gt;]&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;julia&amp;gt; ]

pkg&amp;gt; add Metal
  Installed Metal
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Installation is as easy as that, and we'll automatically download the necessary binary artifacts (a C wrapper for the Metal APIs, and an LLVM back-end). Then, leave the package manager by pressing backspace, import the Metal package, and e.g. call the &lt;code&gt;versioninfo()&lt;/code&gt; method for some details on the toolchain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;julia&amp;gt; using Metal

julia&amp;gt; Metal.versioninfo()
macOS 13.0.0, Darwin 21.3.0

Toolchain:
- Julia: 1.8.0-rc1
- LLVM: 13.0.1

1 device:
- Apple M1 Pro (64.000 KiB allocated)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And there we go! You'll note here that I'm using the upcoming macOS 13 (Ventura); this is currently the only supported operating system. We also only support M-series GPUs, even though Metal does support other GPUs. These choices were made to simplify development, and aren't technical limitations. In fact, Metal.jl &lt;em&gt;does&lt;/em&gt; work on e.g. macOS Monterey with an Intel GPU, but it's an untested combination that may suffer from bugs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Array programming
&lt;/h2&gt;

&lt;p&gt;Just like our other GPU back-ends, Metal.jl offers an array abstraction that greatly simplifies GPU programming. The abstraction centers around the &lt;code&gt;MtlArray&lt;/code&gt; type that can be used to manage memory and perform GPU computations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# allocate + initialize
julia&amp;gt; a = MtlArray(rand(Float32, 2, 2))
2×2 MtlArray{Float32, 2}:
 0.158752  0.836366
 0.535798  0.153554

# perform some GPU-accelerated operations
julia&amp;gt; b = a * a
2×2 MtlArray{Float32, 2}:
 0.473325  0.261202
 0.167333  0.471702

# back to the CPU
julia&amp;gt; Array(b)
2×2 Matrix{Float32}:
 0.473325  0.261202
 0.167333  0.471702
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Beyond these simple operations, Julia's higher-order array abstractions can be used to express more complex operations without ever having to write a kernel:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;julia&amp;gt; mapreduce(sin, +, a; dims=1)
1×2 MtlArray{Float32, 2}:
 1.15276  0.584146

julia&amp;gt; cos.(a .+ 2) .* 3
2×2 MtlArray{Float32, 2}:
 -2.0472   -1.25332
 -2.96594  -2.60351
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Much of this functionality comes from the &lt;a href="https://github.com/JuliaGPU/GPUArrays.jl/"&gt;GPUArrays.jl&lt;/a&gt; package, which provides vendor-neutral implementations of common array operations. As a result, &lt;code&gt;MtlArray&lt;/code&gt; is already pretty capable, and should be usable with realistic array-based applications.&lt;/p&gt;

&lt;h2&gt;
  
  
  Kernel programming
&lt;/h2&gt;

&lt;p&gt;Metal.jl's array operations are implemented in Julia, using our native kernel programming capabilities and accompanying JIT-compiler. A small demonstration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="c"&gt;# a simple kernel that sets elements of an array to a value&lt;/span&gt;
&lt;span class="k"&gt;function&lt;/span&gt;&lt;span class="nf"&gt; memset_kernel&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;thread_position_in_grid_1d&lt;/span&gt;&lt;span class="x"&gt;()&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;length&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt;
    &lt;span class="nd"&gt;@inbounds&lt;/span&gt; &lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="x"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="x"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MtlArray&lt;/span&gt;&lt;span class="x"&gt;{&lt;/span&gt;&lt;span class="kt"&gt;Float32&lt;/span&gt;&lt;span class="x"&gt;}(&lt;/span&gt;&lt;span class="nb"&gt;undef&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt;
&lt;span class="nd"&gt;@metal&lt;/span&gt; &lt;span class="n"&gt;threads&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt; &lt;span class="n"&gt;grid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="n"&gt;memset_kernel&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# verify&lt;/span&gt;
&lt;span class="nd"&gt;@assert&lt;/span&gt; &lt;span class="n"&gt;all&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;isequal&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="x"&gt;),&lt;/span&gt; &lt;span class="kt"&gt;Array&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="x"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As can be seen here, we've opted to deviate slightly from the Metal Shading Language, instead providing a programming experience that's similar to Julia's existing back-ends. Some key differences:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;we use intrinsic functions instead of special kernel function arguments to access properties like the thread position, grid size, ...;&lt;/li&gt;
&lt;li&gt;all types of arguments (buffers, indirect buffers, value-typed inputs) are transparently converted to a GPU-compatible structure&lt;sup id="fnref1"&gt;1&lt;/sup&gt;;&lt;/li&gt;
&lt;li&gt;global (task-bound) state is used to keep track of the active device and a queue;&lt;/li&gt;
&lt;li&gt;compute pipeline set-up and command encoding is hidden behind a single macro.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Behind the scenes, we compile Julia to LLVM IR and use a &lt;a href="https://github.com/JuliaGPU/llvm-metal"&gt;tiny LLVM back-end&lt;/a&gt; (based on &lt;a href="https://github.com/a2flo"&gt;@a2flo&lt;/a&gt;'s &lt;a href="https://github.com/a2flo/floor"&gt;libfloor&lt;/a&gt;) that (re)writes the bitcode to a Metal-compatible library containing LLVM 5 bitcode. You can inspect the generated IR using &lt;code&gt;@device_code_metal&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;julia&amp;gt; @device_code_metal @metal threads=512 grid=2 memset_kernel(a, 42)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[header]
program_count: 1
...

[program]
name: julia_memset_kernel
type: kernel
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight llvm"&gt;&lt;code&gt;&lt;span class="k"&gt;target&lt;/span&gt; &lt;span class="k"&gt;datalayout&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"..."&lt;/span&gt;
&lt;span class="k"&gt;target&lt;/span&gt; &lt;span class="k"&gt;triple&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"air64-apple-macosx13.0.0"&lt;/span&gt;

&lt;span class="c1"&gt;; the (rewritten) kernel function:&lt;/span&gt;
&lt;span class="c1"&gt;;  - %value argument passed by reference&lt;/span&gt;
&lt;span class="c1"&gt;;  - %thread_position_in_grid argument added&lt;/span&gt;
&lt;span class="c1"&gt;;  - sitofp rewritten to AIR-specific intrinsic&lt;/span&gt;
&lt;span class="k"&gt;define&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="vg"&gt;@julia_memset_kernel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="kt"&gt;i8&lt;/span&gt; &lt;span class="k"&gt;addrspace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)*,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt; &lt;span class="p"&gt;x&lt;/span&gt; &lt;span class="kt"&gt;i64&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;addrspace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)*&lt;/span&gt; &lt;span class="nv"&gt;%array&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="kt"&gt;i64&lt;/span&gt; &lt;span class="k"&gt;addrspace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)*&lt;/span&gt; &lt;span class="nv"&gt;%value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="kt"&gt;i32&lt;/span&gt; &lt;span class="nv"&gt;%thread_position_in_grid&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="p"&gt;...&lt;/span&gt;
  &lt;span class="nv"&gt;%9&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;tail&lt;/span&gt; &lt;span class="k"&gt;call&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt; &lt;span class="vg"&gt;@air.convert.f.f32.s.i64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;i64&lt;/span&gt; &lt;span class="nv"&gt;%7&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;...&lt;/span&gt;
  &lt;span class="k"&gt;ret&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;; minimal required argument metadata&lt;/span&gt;
&lt;span class="nv"&gt;!air.kernel&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;!{&lt;/span&gt;&lt;span class="nv"&gt;!10&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nv"&gt;!10&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;!{&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="kt"&gt;i8&lt;/span&gt; &lt;span class="k"&gt;addrspace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)*,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt; &lt;span class="p"&gt;x&lt;/span&gt; &lt;span class="kt"&gt;i64&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;addrspace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)*,&lt;/span&gt;
              &lt;span class="kt"&gt;i64&lt;/span&gt; &lt;span class="k"&gt;addrspace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)*,&lt;/span&gt; &lt;span class="kt"&gt;i32&lt;/span&gt;&lt;span class="p"&gt;)*&lt;/span&gt; &lt;span class="vg"&gt;@julia_memset_kernel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;!11&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;!12&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nv"&gt;!12&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;!{&lt;/span&gt;&lt;span class="nv"&gt;!13&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;!14&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;!15&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nv"&gt;!13&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;!{&lt;/span&gt;&lt;span class="kt"&gt;i32&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;!"air.buffer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;!"air.location_index"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;i32&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;i32&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="nv"&gt;!"air.read_write"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;!"air.address_space"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;i32&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="nv"&gt;!"air.arg_type_size"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;i32&lt;/span&gt; &lt;span class="m"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;!"air.arg_type_align_size"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;i32&lt;/span&gt; &lt;span class="m"&gt;8&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nv"&gt;!14&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;!{&lt;/span&gt;&lt;span class="kt"&gt;i32&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;!"air.buffer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;!"air.location_index"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;i32&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;i32&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="nv"&gt;!"air.read_write"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;!"air.address_space"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;i32&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="nv"&gt;!"air.arg_type_size"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;i32&lt;/span&gt; &lt;span class="m"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;!"air.arg_type_align_size"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;i32&lt;/span&gt; &lt;span class="m"&gt;8&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nv"&gt;!15&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;!{&lt;/span&gt;&lt;span class="kt"&gt;i32&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;!"air.thread_position_in_grid"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;; other metadata not shown, for brevity&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Shout-out to &lt;a href="https://github.com/max-Hawkins"&gt;@max-Hawkins&lt;/a&gt; for exploring Metal code generation during his internship at Julia Computing!&lt;/p&gt;

&lt;h2&gt;
  
  
  Metal APIs in Julia
&lt;/h2&gt;

&lt;p&gt;Lacking an Objective C or C++ FFI, we interface with the Metal libraries using &lt;a href="https://github.com/recp/cmt"&gt;a shim C library&lt;/a&gt;. Most users won't have to interface with Metal directly -- the array abstraction is sufficient for many -- but more experienced developers can make use of the high-level wrappers that we've designed for the Metal APIs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;julia&amp;gt; dev = MtlDevice(1)
MtlDevice:
  name:             Apple M1 Pro
  lowpower:         false
  headless:         false
  removable:        false
  unified memory:   true

julia&amp;gt; desc = MtlHeapDescriptor()
MtlHeapDescriptor:
  type:             MtHeapTypeAutomatic
  storageMode:      MtStorageModePrivate
  size:             0

julia&amp;gt; desc.size = 16384
16384

julia&amp;gt; heap = MtlHeap(dev, desc)
MtlHeap:
  type:                 MtHeapTypeAutomatic
  size:                 16384
  usedSize:             0
  currentAllocatedSize: 16384

# etc
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These wrappers are based on &lt;a href="https://github.com/PhilipVinc"&gt;@PhilipVinc&lt;/a&gt;'s excellent work on MetalCore.jl, which formed the basis for (and has been folded into) Metal.jl.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next?
&lt;/h2&gt;

&lt;p&gt;The current release of Metal.jl focusses on code generation capabilities, and is meant as a preview for users and developers to try out on their system or with their specific GPU application. It is not production-ready yet, and is lacking some crucial features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;performance optimization&lt;/li&gt;
&lt;li&gt;integration with Metal Performance Shaders&lt;/li&gt;
&lt;li&gt;integration / documentation for use with Xcode tools&lt;/li&gt;
&lt;li&gt;fleshing out the array abstraction based on user feedback&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Please consider helping out with any of these!&lt;/strong&gt; Since Metal.jl and its dependencies are almost entirely implemented in Julia, any experience with the language is sufficient to contribute. If you're not certain, or have any questions, please drop by the &lt;code&gt;#gpu&lt;/code&gt; channel on &lt;a href="https://julialang.org/slack/"&gt;the JuliaLang Slack&lt;/a&gt;, ask questions on our &lt;a href="https://discourse.julialang.org/c/domain/gpu/11"&gt;Discourse&lt;/a&gt;, or chat to us during the &lt;a href="https://julialang.org/community/#events"&gt;GPU office hours&lt;/a&gt; every other Monday.&lt;/p&gt;

&lt;p&gt;If you encounter any bugs, feel free to let us know on the &lt;a href="https://github.com/JuliaGPU/Metal.jl/issues"&gt;Metal.jl issue tracker&lt;/a&gt;. For information on upcoming releases, &lt;a href="https://juliagpu.org/post/"&gt;subscribe&lt;/a&gt; to this website's blog where we post about significant developments in Julia's GPU ecosystem.&lt;/p&gt;




&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;This relies on Metal 3 from macOS 13, which introduced bindless argument buffers, as we didn't fully figure out how to reliably encode arbitrarily-nested indirect buffers in argument encoder metadata. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>gpu</category>
      <category>metal</category>
      <category>launch</category>
    </item>
  </channel>
</rss>
