Raw Vulkan

Alain Galvan
18 min readMay 26, 2017

--

Vulkan is a new low level Graphics API released February 2016 by the Khronos Group that maps directly to the design of modern GPUs.

Vulkan is used by Game Developers, Rendering Engineers and Scientists looking to do real-time rendering, raytracing, data visualization, GPGPU computations, machine learning, physics simulations, etc.

Graphic Processing Units (GPUs) were originally simple Application Specific Integrated Circuits (ASICs), but since then they have become programmable computational units of their own with a focus on throughput over latency. Older APIs like OpenGL or DirectX 11 and below were designed for hardware that’s drastically changed since the early 90s when they were first released, so Vulkan was designed from scratch to match the way GPUs are engineered today.

Currently Vulkan 1.x supports the following platforms:

  • 🖼️ Windows
  • 🐧 Linux
  • 🤖 Android

With Apple MacOS, iOS, and iPad OS supporting Vulkan through MoltenVK, a Vulkan-Metal compatibility layer that’s licensed as Apache 2.0.

  • 🍎 Mac OS
  • 📱 iOS / iPad OS

In addition to other surprising platforms such as TVs, game consoles, etc.

  • 🎮 Nintendo Switch
  • 📺 NVIDIA Shield
  • 🌐 Google Stadia
  • And many more!

And languages such as:

  • C — Through the official bindings for Vulkan, as C is Vulkan’s official language.
  • C++ — Through Vulkan-Hpp the official Vulkan C++ library.
  • Rust — Through Vulkano, an intuitive Rust wrapper with a heavy focus on compile time safety.
  • JavaScript — Through Node Vulkan, node.js bindings for native web applications.
  • Python — Through pyVulkan, a Python FFI to the C implementation of Vulkan.

I’ve prepared a Github Repo with everything we need to get started. We’re going to walk through a Hello Triangle app in modern C++ 17, a program that creates a triangle, processes it with a shader, and displays it on a window.

Setup

First install:

Then type the following in your terminal.

# 🐑 Clone the repo
git clone https://github.com/alaingalvan/vulkan-seed --recurse-submodules
# 💿 go inside the folder
cd vulkan-seed
# 👯 If you forget to `recurse-submodules` you can always run:
git submodule update --init
# 👷 Make a build folder
mkdir build
cd build
# 🖼️ To build your Visual Studio solution on Windows x64
cmake .. -A x64
# 🍎 To build your XCode project on Mac OS
cmake .. -G Xcode
# 🐧 To build your .make file on Linux
cmake ..
# 🔨 Build on any platform:
cmake --build .

Refer to this blog post on designing C++ libraries and apps for more details on CMake, Git Submodules, etc.

Project Layout

As your project becomes more complex, you’ll want to separate files and organize your application to something more akin to a game or renderer, check out this post on game engine architecture and this one on real time renderer architecture for more details.

├─ 📂 external/                    # 👶 Dependencies
│ ├─ 📁 crosswindow/ # 🖼️ OS Windows
│ ├─ 📁 crosswindow-graphics/ # 🎨 Vulkan Surface Creation
│ └─ 📁 glm/ # ➕ Linear Algebra
├─ 📂 src/ # 🌟 Source Files
│ ├─ 📄 Utils.h # ⚙️ Utilities (Load Files, Check Shaders, etc.)
│ ├─ 📄 Renderer.h # 🔺 Triangle Draw Code
│ ├─ 📄 Renderer.cpp # -
│ └─ 📄 Main.cpp # 🏁 Application Main
├─ 📄 .gitignore # 👁️ Ignore certain files in git repo
├─ 📄 CMakeLists.txt # 🔨 Build Script
├─ 📄 license.md # ⚖️ Your License (Unlicense)
└─ 📃readme.md # 📖 Read Me!

Dependencies

  • CrossWindow — A cross platform system abstraction library written in C++ for managing windows and performing OS tasks.
  • CrossWindow-Graphics — A library to simplify creating an Vulkan Surface with CrossWindow.
  • Vulkan SDK — The official Vulkan SDK distributed by LunarG. This should be installed separately.
  • GLM — A C++ library that allows users to write glsl like C++ code, with types for vectors, matrices, etc.

We’ll be writing our application using Vulkan’s C++ API through vulkan.hpp, a type safe abstraction of vulkan.h.

Overview

In this application we will need to do the following:

  1. Initialize the API — Create a Vulkan Instance to access inner functions of the Vulkan API. Pick the best Physical Device from every device that supports Vulkan on your machine. Create a Logical Device , Surface, Queue, Command Pool, Semaphores, Fences.
  2. Create Commands — Describe everything that’ll be rendered on the current frame in your command buffers.
  3. Initialize Resources — Create a Descriptor Pool, Descriptor Set Layout, Pipeline Layout, Vertex Buffer/Index Buffer and send it to GPU Accessible Memory, describe our Input Attributes, create a Uniform Buffer, Render Pass, Frame Buffers, Shader Modules, and Pipeline State.
  4. Setup Commands for each command buffer to set the GPU state to render the triangles.
  5. Render — Use an Update Loop to switch between different frames in your swapchain as well as to poll input devices/window events.
  6. Destroy any data structures once the application is asked to close.

The following will explain snippets that can be found in the Github repo, with certain parts omitted, and member variables (mMemberVariable) declared inline without the m prefix so their type is easier to see and the examples here can work on their own.

Window Creation

We’re using CrossWindow to handle cross platform window creation, so creating a window and updating it is very easy:

#include "CrossWindow/CrossWindow.h"
#include "Renderer.h"
#include <iostream>void xmain(int argc, const char** argv)
{
// 🖼 Create Window
xwin::WindowDesc wdesc;
wdesc.title = "Vulkan Seed";
wdesc.name = "MainWindow";
wdesc.visible = true;
wdesc.width = 640;
wdesc.height = 640;
wdesc.fullscreen = false;
xwin::Window window;
xwin::EventQueue eventQueue;
if (!window.create(wdesc, eventQueue))
{ return; };
// 🌋 Create a renderer
Renderer renderer(window);
// 🏁 Engine loop
bool isRunning = true;
while (isRunning)
{
bool shouldRender = true;
// ♻️ Update the event queue
eventQueue.update();
// 🎈 Iterate through that queue:
while (!eventQueue.empty())
{
//Update Events
const xwin::Event& event = eventQueue.front();
// 💗 On Resize:
if (event.type == xwin::EventType::Resize)
{
const xwin::ResizeData data = event.data.resize;
renderer.resize(data.width, data.height);
shouldRender = false;
}
// ❌ On Close:
if (event.type == xwin::EventType::Close)
{
window.close();
shouldRender = false;
isRunning = false;
}
eventQueue.pop();
}
// ✨ Update Visuals
if (shouldRender)
{
renderer.render();
}
}
}

Initialize API

Instances

Similar to the OpenGL context, a Vulkan application begins when you create an instance. This instance must be loaded with some information about the program such as its name, engine, and minimum Vulkan version, as well any extensions and layers you want to load.

void findBestExtensions(const std::vector<vk::ExtensionProperties>& installed,
const std::vector<const char*>& wanted,
std::vector<const char*>& out)
{
for (const char* const& w : wanted)
{
for (vk::ExtensionProperties const& i : installed)
{
if (std::string(i.extensionName).compare(w) == 0)
{
out.emplace_back(w);
break;
}
}
}
}
void findBestLayers(const std::vector<vk::LayerProperties>& installed,
const std::vector<const char*>& wanted,
std::vector<const char*>& out)
{
for (const char* const& w : wanted)
{
for (vk::LayerProperties const& i : installed)
{
if (std::string(i.layerName).compare(w) == 0)
{
out.emplace_back(w);
break;
}
}
}
}
uint32_t getQueueIndex(vk::PhysicalDevice& physicalDevice,
vk::QueueFlagBits flags)
{
std::vector<vk::QueueFamilyProperties> queueProps =
physicalDevice.getQueueFamilyProperties();
for (size_t i = 0; i < queueProps.size(); ++i)
{
if (queueProps[i].queueFlags & flags)
{
return static_cast<uint32_t>(i);
}
}
// Default queue index
return 0;
}
uint32_t getMemoryTypeIndex(vk::PhysicalDevice& physicalDevice,
uint32_t typeBits,
vk::MemoryPropertyFlags properties)
{
auto gpuMemoryProps = physicalDevice.getMemoryProperties();
for (uint32_t i = 0; i < gpuMemoryProps.memoryTypeCount; i++)
{
if ((typeBits & 1) == 1)
{
if ((gpuMemoryProps.memoryTypes[i].propertyFlags & properties) ==
properties)
{
return i;
}
}
typeBits >>= 1;
}
return 0;
};
  • Extension — Anything that adds extra functionality to Vulkan, such as support for Win32 windows, or enabling drawing onto a target.
  • Layer — Middleware between existing Vulkan functionality, such as checking for errors. Layers can range from runtime debugging checks like LunarG’s Standard Validation tools to hooks to the Steam renderer so your game can behave better when you Ctrl + Shift to switch to the Steam overlay.

You’ll want to begin by determining which extensions/layers you want, and compare that with which are available to you by Vulkan.

// 👋 Declare handles
vk::Instance instance;
// 🔍 Find the best Instance Extensionsstd::vector<vk::ExtensionProperties> installedExtensions = vk::enumerateInstanceExtensionProperties();std::vector<const char*> wantedExtensions =
{
VK_KHR_SURFACE_EXTENSION_NAME,
#ifdef VK_USE_PLATFORM_WIN32_KHR
VK_KHR_WIN32_SURFACE_EXTENSION_NAME
#elif VK_USE_PLATFORM_MACOS_MVK
VK_MVK_MACOS_SURFACE_EXTENSION_NAME
#elif VK_USE_PLATFORM_XCB_KHR
VK_KHR_XCB_SURFACE_EXTENSION_NAME
#elif VK_USE_PLATFORM_ANDROID_KHR
VK_KHR_ANDROID_SURFACE_EXTENSION_NAME
#elif VK_USE_PLATFORM_XLIB_KHR
VK_KHR_XLIB_SURFACE_EXTENSION_NAME
#elif VK_USE_PLATFORM_XCB_KHR
VK_KHR_XCB_SURFACE_EXTENSION_NAME
#elif VK_USE_PLATFORM_WAYLAND_KHR
VK_KHR_WAYLAND_SURFACE_EXTENSION_NAME
#elif VK_USE_PLATFORM_MIR_KHR || VK_USE_PLATFORM_DISPLAY_KHR
VK_KHR_DISPLAY_EXTENSION_NAME
#elif VK_USE_PLATFORM_ANDROID_KHR
VK_KHR_ANDROID_SURFACE_EXTENSION_NAME
#elif VK_USE_PLATFORM_IOS_MVK
VK_MVK_IOS_SURFACE_EXTENSION_NAME
#endif
};
std::vector<const char*> extensions = {};findBestExtensions(installedExtensions, wantedExtensions, extensions);// 🔎 Find the best Instance Layersstd::vector<vk::LayerProperties> installedLayers =
vk::enumerateInstanceLayerProperties();
std::vector<const char*> wantedLayers = {
#ifdef _DEBUG
"VK_LAYER_LUNARG_standard_validation"
#endif
};
std::vector<const char*> layers = {};findBestLayers(installedLayers, wantedLayers, layers);// ⚪ Create an Instance
vk::ApplicationInfo appInfo;
appInfo = {.pApplicationName = "MyApp",
.applicationVersion = VK_MAKE_VERSION(1, 0, 0),
.pEngineName = "MyAppEngine",
.engineVersion = VK_MAKE_VERSION(1, 0, 0),
.apiVersion = VK_API_VERSION_1_2};
vk::InstanceCreateInfo ci = vk::InstanceCreateInfo(
vk::InstanceCreateFlags(), &appInfo, layers, extensions);
vk::Instance instance = vk::createInstance(ci);

Physical Devices

In Vulkan, you have access to all enumerable devices that support it, and can query for information like their name, the number of heaps they support, their manufacturer, etc.

// 👋 Declare handles
vk::PhysicalDevice physicalDevice;
// 💡 Initialize Devices
std::vector<vk::PhysicalDevice> physicalDevices = instance.enumeratePhysicalDevices();
physicalDevice = physicalDevices[0];

Note — This is useful for choosing the fastest device to use, however you could use the KHX_device_group extension presented at GDC 2017 to help with multi-gpu processing.

Logical Devices

You can then create a logical device from a physical device handle. A logical device can be loaded with its own extensions/layers, can be set to work with graphics, GPGPU computations, handle sparse memory and/or memory transfers by creating queues for that device.

A logical device is your interface to the GPU, and allows you to allocate data and queue up tasks.

// 👋 Declare handles
uint32_t queueFamilyIndex;
vk::SurfaceKHR surface;
vk::Device device;
// 👪 Queue Family
queueFamilyIndex = getQueueIndex(mPhysicalDevice, vk::QueueFlagBits::eGraphics);
// ⏹ Get Vulkan Surface with CrossWindowGraphics
surface = xgfx::getSurface(&window, instance);
if (!physicalDevice.getSurfaceSupportKHR(queueFamilyIndex, surface))
{
// Check if queueFamily supports this surface
return;
}
// 📦 Queue Creation
std::vector<vk::DeviceQueueCreateInfo> queueCreateInfos;
float queuePriority = 0.5f;
vk::DeviceQueueCreateInfo qcinfo;
qcinfo = {.queueFamilyIndex = queueFamilyIndex,
.queueCount = 1,
.pQueuePriorities = &queuePriority};
queueCreateInfos.emplace_back(qcinfo);
// 🎮 Logical Device
std::vector<vk::ExtensionProperties> installedDeviceExtensions =
physicalDevice.enumerateDeviceExtensionProperties();
std::vector<const char*> wantedDeviceExtensions = {
VK_KHR_SWAPCHAIN_EXTENSION_NAME
};
std::vector<const char*> deviceExtensions = {};findBestExtensions(installedDeviceExtensions,
wantedDeviceExtensions,
deviceExtensions);
vk::DeviceCreateInfo dinfo = {{}, queueCreateInfos, deviceExtensions};
device = physicalDevice.createDevice(dinfo);

Queue

Once you have a virtual device, you can access the queues you requested when you created it:

// 👋 Declare handles
vk::Queue queue;
// 📦 We only allocated one queue earlier,
//so there's only one available on index 0.
queue = device.getQueue(queueFamilyIndex, 0);

If your application is idle for too long, the Vulkan API will throw a vk::OutOfDateKHRError error, requiring you to re-initialize your graphics API.

Command Pool

A command pool is a means of allocating command buffers. Any number of command buffers can be made from command pools, with you as the developer responsible for managing when and how they’re created and what is loaded in each.

A command pool cannot be used in multiple threads, but you can create one for each thread and manage them on a per thread level.

// 👋 Declare handles
vk::CommandPool commandPool;
// 🏊 Create a command pool
vk::CommandPoolCreateInfo commandPoolInfo = vk::CommandPoolCreateInfo(
vk::CommandPoolCreateFlags(vk::CommandPoolCreateFlagBits::eResetCommandBuffer),
queueFamilyIndex
);
commandPool = device.createCommandPool(commandPoolInfo);
// Later, once your ⛓️ vk::Swapchain has been created// Lets allocate 1 command buffer for each swapchain image.
std::vector<vk::CommandBuffer> commandBuffers = device.allocateCommandBuffers(
vk::CommandBufferAllocateInfo(
commandPool,
vk::CommandBufferLevel::ePrimary,
swapchainBuffers.size()
)
);

Descriptor Pool

A descriptor pool is a means of allocating Descriptor Sets, a set of data structures containing implementation-specific descriptions of resources. to make a descriptor pool, you need to describe exactly how many of each type of descriptor you need to allocate.

To do that you need to provide a collection of the size of each descriptor type.

// 👋 Declare handles
vk::DescriptorPool descriptorPool;
std::vector<vk::DescriptorPoolSize> dpsizes =
{
vk::DescriptorPoolSize(
vk::DescriptorType::eUniformBuffer,
1
)
};
// 🎱 Create Descriptor Pool
vk::DescriptorPoolCreateInfo dpci({}, 1, dpsizes);
descriptorPool = device.createDescriptorPool(dpci);

Like command buffers, we’ll come back to descriptor sets later.

Color Formats

Knowing what Color formats your GPU supports will play a crucial role in determining what you can display and what kind of buffers you can allocate.

// 👋 Declare handles
vk::SurfaceFormatKHR surfaceColorFormat;
vk::ColorSpaceKHR surfaceColorSpace;
vk::Format surfaceDepthFormat;
// 🔴🟢🔵 Check to see if we can display rgb colors.
std::vector<vk::SurfaceFormatKHR> surfaceFormats = mPhysicalDevice.getSurfaceFormatsKHR(surface);
if (surfaceFormats.size() == 1 && surfaceFormats[0].format == vk::Format::eUndefined)
surfaceColorFormat = vk::Format::eB8G8R8A8Unorm;
else
surfaceColorFormat = surfaceFormats[0].format;
surfaceColorSpace = surfaceFormats[0].colorSpace;// Since all depth formats may be optional, we need to find a suitable depth format to use
// Start with the highest precision packed format
std::vector<vk::Format> depthFormats =
{
vk::Format::eD32SfloatS8Uint,
vk::Format::eD32Sfloat,
vk::Format::eD24UnormS8Uint,
vk::Format::eD16UnormS8Uint,
vk::Format::eD16Unorm
};
for (vk::Format& format : depthFormats)
{
vk::FormatProperties depthFormatProperties = physicalDevice.getFormatProperties(format);
// Format must support depth stencil attachment for optimal tiling
if (depthFormatProperties.optimalTilingFeatures & vk::FormatFeatureFlagBits::eDepthStencilAttachment)
{
surfaceDepthFormat = format;
break;
}
}

Swapchain

A Swapchain is a structure that manages the allocation of frame buffers to be cycled through by your application. It’s here that your application sets up V-Sync via double buffering or triple buffering.

One approach to setting this up is to take in a JSON file at the start of your application, say config.json, which determines if you'll be using V-Sync, your screen resolution, any any other global data you want to configure.

// 👋 Declare handles
vk::Rect2D renderArea;
vk::Extent2D surfaceSize;
vk::Viewport viewport;
vk::SwapchainKHR swapchain;
void setupSwapchain(unsigned width, unsigned height)
{
// Setup viewports, Vsync
vk::Extent2D swapchainSize = vk::Extent2D(width, height);
// All framebuffers / attachments will be the same size as the surface
vk::SurfaceCapabilitiesKHR surfaceCapabilities = physicalDevice.getSurfaceCapabilitiesKHR(mSurface);
if (!(surfaceCapabilities.currentExtent.width == -1 || surfaceCapabilities.currentExtent.height == -1)) {
swapchainSize = surfaceCapabilities.currentExtent;
renderArea = vk::Rect2D(vk::Offset2D(), swapchainSize);
viewport = vk::Viewport(0.0f, 0.0f, static_cast<float>(swapchainSize.width), static_cast<float>(swapchainSize.height), 0, 1.0f);
}
// VSync
std::vector<vk::PresentModeKHR> surfacePresentModes = physicalDevice.getSurfacePresentModesKHR(mSurface);
vk::PresentModeKHR presentMode = vk::PresentModeKHR::eImmediate;
for (vk::PresentModeKHR& pm : surfacePresentModes) {
if (pm == vk::PresentModeKHR::eMailbox) {
presentMode = vk::PresentModeKHR::eMailbox;
break;
}
}
// ⛓️ Create Swapchain, Images, Frame Buffers
device.waitIdle();
vk::SwapchainKHR oldSwapchain = swapchain;
// Some devices can support more than 2 buffers,
// but during my tests they would crash on fullscreen
// Tested on an NVIDIA 1080 and 165 Hz 2K display ~ @alainxyz
uint32_t backbufferCount = std::clamp(surfaceCapabilities.maxImageCount, 1U, 2U);
swapchain = mDevice.createSwapchainKHR(
vk::SwapchainCreateInfoKHR(
vk::SwapchainCreateFlagsKHR(),
surface,
backbufferCount,
surfaceColorFormat,
surfaceColorSpace,
swapchainSize,
1,
vk::ImageUsageFlagBits::eColorAttachment,
vk::SharingMode::eExclusive,
1,
&queueFamilyIndex,
vk::SurfaceTransformFlagBitsKHR::eIdentity,
vk::CompositeAlphaFlagBitsKHR::eOpaque,
presentMode,
VK_TRUE,
oldSwapchain
)
);
surfaceSize = vk::Extent2D(std::clamp(swapchainSize.width, 1U, 8192U), std::clamp(swapchainSize.height, 1U, 8192U));
renderArea = vk::Rect2D(vk::Offset2D(), surfaceSize);
viewport = vk::Viewport(0.0f, 0.0f, static_cast<float>(surfaceSize.width), static_cast<float>(surfaceSize.height), 0, 1.0f);
// Destroy previous swapchain
if (oldSwapchain != vk::SwapchainKHR(nullptr))
{
device.destroySwapchainKHR(oldSwapchain);
}
// Resize swapchain buffers for use later
swapchainBuffers.resize(backbufferCount);
}

View Structures

A View in Vulkan is a handle to a particular resource on a GPU, such as an Image or a Buffer, and provides information on how that resource should be processed.

// 👋 Declare handles
vk::ImageView depthImageView;
depthImageView = device.createImageView(
vk::ImageViewCreateInfo(
vk::ImageViewCreateFlags(),
depthImage,
vk::ImageViewType::e2D,
surfaceDepthFormat,
vk::ComponentMapping(),
vk::ImageSubresourceRange(
vk::ImageAspectFlagBits::eDepth | vk::ImageAspectFlagBits::eStencil,
0,
1,
0,
1
)
)
);

Render Pass

A render pass describes the attachments that are expected to be used when executing a graphics pipeline and their relationship with each other.

// 👋 Declare handles
vk::RenderPass renderPass;
void createRenderPass()
{
std::vector<vk::AttachmentDescription> attachmentDescriptions =
{
vk::AttachmentDescription(
vk::AttachmentDescriptionFlags(),
surfaceColorFormat,
vk::SampleCountFlagBits::e1,
vk::AttachmentLoadOp::eClear,
vk::AttachmentStoreOp::eStore,
vk::AttachmentLoadOp::eDontCare,
vk::AttachmentStoreOp::eDontCare,
vk::ImageLayout::eUndefined,
vk::ImageLayout::ePresentSrcKHR
),
vk::AttachmentDescription(
vk::AttachmentDescriptionFlags(),
surfaceDepthFormat,
vk::SampleCountFlagBits::e1,
vk::AttachmentLoadOp::eClear,
vk::AttachmentStoreOp::eDontCare,
vk::AttachmentLoadOp::eDontCare,
vk::AttachmentStoreOp::eDontCare,
vk::ImageLayout::eUndefined,
vk::ImageLayout::eDepthStencilAttachmentOptimal
)
};
std::vector<vk::AttachmentReference> colorReferences =
{
vk::AttachmentReference(0, vk::ImageLayout::eColorAttachmentOptimal)
};
std::vector<vk::AttachmentReference> depthReferences = {
vk::AttachmentReference(1, vk::ImageLayout::eDepthStencilAttachmentOptimal)
};
std::vector<vk::SubpassDescription> subpasses =
{
vk::SubpassDescription(
vk::SubpassDescriptionFlags(),
vk::PipelineBindPoint::eGraphics,
0,
nullptr,
static_cast<uint32_t>(colorReferences.size()),
colorReferences.data(),
nullptr,
depthReferences.data(),
0,
nullptr
)
};
std::vector<vk::SubpassDependency> dependencies =
{
vk::SubpassDependency(
~0U,
0,
vk::PipelineStageFlagBits::eBottomOfPipe,
vk::PipelineStageFlagBits::eColorAttachmentOutput,
vk::AccessFlagBits::eMemoryRead,
vk::AccessFlagBits::eColorAttachmentRead | vk::AccessFlagBits::eColorAttachmentWrite,
vk::DependencyFlagBits::eByRegion
),
vk::SubpassDependency(
0,
~0U,
vk::PipelineStageFlagBits::eColorAttachmentOutput,
vk::PipelineStageFlagBits::eBottomOfPipe,
vk::AccessFlagBits::eColorAttachmentRead | vk::AccessFlagBits::eColorAttachmentWrite,
vk::AccessFlagBits::eMemoryRead,
vk::DependencyFlagBits::eByRegion
)
};
renderPass = mDevice.createRenderPass(
vk::RenderPassCreateInfo(
vk::RenderPassCreateFlags(),
static_cast<uint32_t>(attachmentDescriptions.size()),
attachmentDescriptions.data(),
static_cast<uint32_t>(subpasses.size()),
subpasses.data(),
static_cast<uint32_t>(dependencies.size()),
dependencies.data()
)
);
}

Frame Buffers

A frame buffer in Vulkan is a container of Image Views that are bound to a specific render pass.

// ⛓️ The swapchain handles allocating frame images.
std::vector<vk::Image> swapchainImages = device.getSwapchainImagesKHR(swapchain);
// ↘️ Create Depth Image Data
vk::Image depthImage = device.createImage(
vk::ImageCreateInfo(
vk::ImageCreateFlags(),
vk::ImageType::e2D,
surfaceDepthFormat,
vk::Extent3D(surfaceSize.width, surfaceSize.height, 1),
1,
1,
vk::SampleCountFlagBits::e1,
vk::ImageTiling::eOptimal,
vk::ImageUsageFlagBits::eDepthStencilAttachment | vk::ImageUsageFlagBits::eTransferSrc,
vk::SharingMode::eExclusive,
queueFamilyIndices.size(),
queueFamilyIndices.data(),
vk::ImageLayout::eUndefined
)
);
// Search through GPU memory properties to see if this can be device local.vk::MemoryRequirements depthMemoryReq = device.getImageMemoryRequirements(depthImage);
vk::DeviceMemory depthMemory = device.allocateMemory(vk::MemoryAllocateInfo(
depthMemoryReq.size,
getMemoryTypeIndex(physicalDevice, depthMemoryReq.memoryTypeBits,
vk::MemoryPropertyFlagBits::eDeviceLocal)));
device.bindImageMemory(
depthImage,
depthMemory,
0
);
vk::ImageView depthImageView = device.createImageView(
vk::ImageViewCreateInfo(
vk::ImageViewCreateFlags(),
depthImage,
vk::ImageViewType::e2D,
surfaceDepthFormat,
vk::ComponentMapping(),
vk::ImageSubresourceRange(
vk::ImageAspectFlagBits::eDepth | vk::ImageAspectFlagBits::eStencil,
0,
1,
0,
1
)
)
);
struct SwapChainBuffer {
vk::Image image;
std::array<vk::ImageView, 2> views;
vk::Framebuffer frameBuffer;
};
std::vector<SwapChainBuffer> swapchainBuffers;
swapchainBuffers.resize(swapchainImages.size());
for (int i = 0; i < swapchainImages.size(); i++)
{
swapchainBuffers[i].image = swapchainImages[i];
// 🌈 Color
swapchainBuffers[i].views[0] =
device.createImageView(
vk::ImageViewCreateInfo(
vk::ImageViewCreateFlags(),
swapchainImages[i],
vk::ImageViewType::e1D,
surfaceColorFormat,
vk::ComponentMapping(),
vk::ImageSubresourceRange(
vk::ImageAspectFlagBits::eColor,
0,
1,
0,
1
)
)
);
// ↘️ Depth
swapchainBuffers[i].views[1] = depthImageView;
swapchainBuffers[i].frameBuffer = device.createFramebuffer(
vk::FramebufferCreateInfo(
vk::FramebufferCreateFlags(),
renderPass,
swapchainBuffers[i].views.size(),
swapchainBuffers[i].views.data(),
surfaceSize.width,
surfaceSize.height,
1
)
);
}

Synchronization

Vulkan was designed with concurrency in mind, so you’re free to use mutexes, and built in Vulkan Semaphores and Fences for GPU level Synchronization.

Semaphores coordinate operations within the graphics queue and ensure correct command ordering.

// 🎌 Semaphore used to ensures that image presentation is complete before starting to submit again
vk::Semaphore presentCompleteSemaphore = device.createSemaphore(vk::SemaphoreCreateInfo());
// 🎌 Semaphore used to ensures that all commands submitted have been finished before submitting the image to the queue
vk::Semaphore renderCompleteSemaphore = device.createSemaphore(vk::SemaphoreCreateInfo());
// 🚧 Fence for command buffer completion
std::vector<vk::Fence> waitFences;
waitFences.resize(swapchainBuffers.size());
for (int i = 0; i < waitFences.size(); i++)
{
waitFences[i] = device.createFence(vk::FenceCreateInfo(vk::FenceCreateFlagBits::eSignaled));
}

You should try to have the minimum number of command buffers possible in your application.

One possible setup could be taking a flat collection of renderable objects (like a scene), distributing it across as many threads as the computer’s CPU allows, allocating a command buffer for each object, creating a pipeline for each object, and finishing by sending a ending buffer to start up the process.

We’ll come back to the command buffers we made here later in our app.

Initialize Resources

Vertex Buffers

The fundamental problem of graphics is how to manage large sets of data. A vertex buffer is an array of rows of relevant vertex information, such as its position, normal, color, etc. Unlike OpenGL where it would handle allocation and handling memory for you, in Vulkan, you must:

  1. Allocate all the memory related to your buffer.
  2. Map that data to a host visible handle.
  3. Copy that data to your GPU.
  4. Bind your buffer to that block of memory.

For buffers that you want as GPU accessible only, you’ll need to also copy that buffer to a GPU exclusive buffer.

Descriptor Sets

Descriptor Sets store the resources bound to the binding points in a shader (Basically Uniforms). They connect the binding points of a shader with the buffers and images used for those bindings.

In React Fiber there’s the idea of a frequently updated view and a not frequently updated view. Unreal Engine 4 shares this with two global uniform families for frequently (called variable parameters) and not frequently (constant parameters) updated uniforms. Descriptor Sets are where you would make this distinction in Vulkan.

Descriptor sets are composed of Descriptor Set Layouts, which are then composed of Descriptor Set Bindings, the individual bindings a uniform struct has.

In Vulkan, Uniforms must be contiguous structs of data that are multiples of 128 bits (So SIMD vector sized blocks).

// Binding 0: Uniform buffer (Vertex shader)
std::vector<vk::DescriptorSetLayoutBinding> descriptorSetLayoutBindings =
{
vk::DescriptorSetLayoutBinding(
0,
vk::DescriptorType::eUniformBuffer,
1,
vk::ShaderStageFlagBits::eVertex,
nullptr
)
};
std::vector<vk::DescriptorSetLayout> descriptorSetLayouts = {
device.createDescriptorSetLayout(
vk::DescriptorSetLayoutCreateInfo(
vk::DescriptorSetLayoutCreateFlags(),
descriptorSetLayoutBindings.size(),
descriptorSetLayoutBindings.data()
)
)
};
std::vector<vk::DescriptorSet> descriptorSets = device.allocateDescriptorSets(
vk::DescriptorSetAllocateInfo(
descriptorPool,
descriptorSetLayouts.size(),
descriptorSetLayouts.data()
)
);

Pipeline Layouts

Pipeline layouts are a collection of descriptor sets, the bindings to a shader program. In OpenGL in order to bind a shader to a set of data, you needed to describe how the inputs and outputs are organized in memory (their spacing, size, etc.)

Access to descriptor sets from a pipeline is accomplished through a pipeline layout. Zero or more descriptor set layouts and zero or more push constant ranges are combined to form a pipeline layout object which describes the complete set of resources that can be accessed by a pipeline. The pipeline layout represents a sequence of descriptor sets with each having a specific layout. This sequence of layouts is used to determine the interface between shader stages and shader resources. Each pipeline is created using a pipeline layout.

Pipeline State Objects

Pipelines are basically a mix of hardware and software functions that do a particular task on the GPU, in Vulkan, there’s 4 types:

  • Graphics Pipelines
  • Compute Pipelines
  • Ray-Tracing Pipelines
  • Tensor Pipelines

Graphics Pipeline

  • Color Blending — The function that controls how two objects draw on top of each other.
  • Depth Stencil — A extra piece of information that describes depth information.
  • Vertex Input — The actual vertex data you’ll be using in your shader.
  • Shaders — What shaders will be loaded in.

And many more. These can even be cached! These particular draw calls are grouped such that in older graphics APIs, they would trigger shader recompilation.

Pipeline Cache

A pipeline cache serves to cache previously created pipelines for reuse later. Since pipelines don’t change often, this you can quickly create another for use later.

// 👋 Declare handles
vk::PipelineCache pipelineCache;
// 💵 Create Pipeline Cache
vk::PipelineCacheCreateInfo pcci;
pipelineCache = device.createPipelineCache(pcci);

You’re even able to compile the pipeline down into binary, and write the pipeline to a a file. This is part of the reason why DOOM 2016 takes a while to first start up when running it on Vulkan [Lottes 2016], with Doom Eternal downloading Vulkan binaries separately in Steam.

Shaders

Shaders must be passed to Vulkan as SPIR-V binary, so any compiler that can make SPIR-V is allowed. Shaders are pre-compiled, loaded into memory, transferred to a shader module, bundled in a set of pipelineShaderStages, which is then put into a graphics pipeline.

Shaders are compiled using the glslangvalidator bundled with the Vulkan SDK provided by LunarG.

glslangvalidator -V shader.vert -o shader.vert.spv
glslangvalidator -V shader.frag -o shader.frag.spv

Vulkan’s GLSL code is the same as OpenGL 4.5:

// Vertex Shader
#version 450
#extension GL_ARB_separate_shader_objects : enable
#extension GL_ARB_shading_language_420pack : enable
// Uniforms now come in the form of input layouts
// Each location has a 128 bit alignment,
// so matrices/arrays mean larger strides in location.
layout (location = 0) in vec3 inPos;
layout (location = 1) in vec3 inColor;
layout (binding = 0) uniform UBO
{
mat4 projectionMatrix;
mat4 modelMatrix;
mat4 viewMatrix;
} ubo;
layout (location = 0) out vec3 outColor;out gl_PerVertex
{
vec4 gl_Position;
};
void main()
{
outColor = inColor;
gl_Position = ubo.projectionMatrix * ubo.viewMatrix * ubo.modelMatrix * vec4(inPos.xyz, 1.0);
}
// Fragment Shader
#version 450
#extension GL_ARB_separate_shader_objects : enable
#extension GL_ARB_shading_language_420pack : enable
layout (location = 0) in vec3 inColor;layout (location = 0) out vec4 outFragColor;void main()
{
outFragColor = vec4(inColor, 1.0);
}

Shaders are loaded into Pipeline Layouts which are then executed by a command buffer.

// 📈 Create your shader module handlesvk::ShaderModule vertModule = device.createShaderModule(
vk::ShaderModuleCreateInfo(
vk::ShaderModuleCreateFlags(),
vertexShader.size(),
vertexShader.data()
)
);
vk::ShaderModule fragModule = device.createShaderModule(
vk::ShaderModuleCreateInfo(
vk::ShaderModuleCreateFlags(),
fragShader.size(),
fragShader.data()
)
);

Command Buffer

A command buffer is a container of GPU commands, this is where you would see commands similar to OpenGL’s state commands:

  • setViewport
  • setSissor
  • blitImage
  • bindPipeline

A common pattern for building a command buffer is:

  1. Start Render Pass
  2. Bind Resources
  3. Descriptor Sets
  4. Vertex and Index Buffers
  5. Pipeline State
  6. Modify Dynamic State
  7. Draw
  8. Repeat 2 Through 4 as Needed
  9. End Render Pass

Different command buffer pools allow multiple threads performing generating command buffers, thus you could allocate a thread for each core on the CPU, and split rendering tasks across each core. This could be used to distribute rendering individual objects, differed rendering passes, physics calculations with compute buffers, etc.

Conclusion

Vulkan is a pretty complicated API to wrap your head around, and while this post attempts to make it simple, there’s still a lot to bear in mind that other graphics APIs deal with for you. Aspects of the API like memory management, queue indices, descriptor sets, don’t exist in other APIs but exist here to make this API much faster at the cost of added complexity to your renderer.

You’ll find all the source code described in this post in the Github repo here.

--

--

Alain Galvan

https://Alain.xyz | Graphics Software Engineer @ AMD, Previously @ Marmoset.co. Guest lecturer talking about 🛆 Computer Graphics, ✍ tech author.