Nvidia DeepStream 101: A step-by-step guide to creating your first DeepStream application

Chirag Shetty
14 min readJan 11, 2023

--

Photo by Conor Luddy on Unsplash

Welcome back to our DeepStream tutorial series! In the last blog, we covered the basics of DeepStream and how to get it up and running on your machine. Now it’s time to dive a little deeper into the world of DeepStream and see what it can do.

What the heck is GStreamer?

GStreamer is a powerful open-source multimedia framework that helps you build audio and video processing pipelines. And when it comes to DeepStream, GStreamer pipelines are kind of a big deal. They’re the driving force behind DeepStream, and they’re what allow you to process video streams in real-time.

But what makes up a GStreamer pipeline? Glad you asked! There are three main components: elements, loops, and pipelines.

Elements: are the building blocks of GStreamer pipelines. They’re like the bricks and mortar of your pipeline, and they can do all sorts of cool stuff. An element is a single unit of functionality in GStreamer, such as reading a file or decoding a video stream. You can even create your own elements if you’re feeling ambitious (or just really, really need a specific element).

What are elements?

For the application programmer, elements are best visualized as black boxes. On the one end, you might put something in, the element does something with it and something else comes out at the other side. For a decoder element, for example, you’d put in encoded data, and the element would output decoded data.

Elements in GStreamer

The Diagram below illustrates the most common type of elements that you’ll interact with while building a pipeline. There are three main types of elements in a GStreamer pipeline: source elements, filter elements, and sink elements. Source elements produce data, filter elements transform or modify data, and sink elements consume data.

A Filter element with one source pad and one sink pad

Pads: The connections between elements are called pads, and they allow data to flow between elements. The pads are the element’s interface to the outside world. Data streams from one element’s source pad to another element’s sink pad. The specific type of media that the element can handle will be exposed by the pad’s capabilities. The terminology of source and sink is reversed in case of pads, the sink pad gets the input from upstream element whereas the source pad dumps the output to the downstream element. An element can have any number of sink and source pads depending on how it handles the data.

Loops: as you might have guessed, allow you to loop over certain parts of your pipeline. They’re great for when you want to process a stream multiple times or when you need to do some kind of repetitive task. Just be careful not to get stuck in an infinite loop (unless you’re into that sort of thing).

Pipeline: Finally, we have pipelines. But what makes up a GStreamer pipeline? Glad you asked! A GStreamer pipeline is essentially a directed acyclic graph (DAG) of elements that work together to process a video stream. The elements are connected together in a specific order to form a pipeline. The diagram below is an example for a GStreamer pipeline that uses DeepStream plugins (apologies for the small text).

A GStreamer pipeline for Object Detection and Tracking that saves output to a mp4 file.

How do you create a Pipeline in DeepStream?

We’re gonna create a simple pipeline that takes in a mp4 file as input, does objects detection, tracking and displays the output in a mp4 file.

filesrc -> qtdemux -> h264parse -> nvv4l2decoder -> nvstreammux -> nvinfer -> nvtracker -> nvvideoconvert -> nvdsosd -> nvvideoconvert -> nvv4l2h264enc -> h264parse -> qtmux -> filesink
  • filesrc: This element is used to read in the input mp4 file. The location attribute specifies the path to the input mp4 file on the system.
  • qtdemux: This element is used to demultiplex the audio and video streams in the input mp4 file. It separates the audio and video streams into different pads, which can then be processed separately.
  • h264parse: This element is used to parse the H.264 video stream from the qtdemux element. It extracts information such as frame rate and resolution, which is used to configure the video pipeline.
  • nvv4l2decoder: This element is used to decode the H.264 video stream from the h264parse element. It converts the compressed video into raw video frames that can be processed by the pipeline.
  • nvstreammux: This element is used to multiplex the video stream from the nvv4l2decoder element with the metadata streams generated by other elements in the pipeline. The batch-size attribute specifies the number of frames to be processed in each batch.
  • nvinfer: This element is used to run the object detection model on the frames from the nvstreammux element. The config-file-path attribute specifies the path to the configuration file for the object detection model. We’re going to use the config path used by the deepstream-test-app1. Its a Resnet Caffe model that detects four classes.
  • nvtracker: This element is used to track objects detected by the nvinfer element. The ll-lib-file attribute specifies the path to the library file for the multi-object tracking algorithm. We’re using the default tracker that is supplied with DeepStream — NvMultiObjectTracker.
  • nvvideoconvert: This element is used to convert the video frames between different video formats (Nv12 and RGBA formats). It is used to convert the frames from the nvtracker element to a format that can be processed by the nvdsosd element.
  • nvdsosd: This element is used to overlay the bounding boxes and labels on the video frames from the nvvideoconvert element. It draws the bounding boxes and labels on the frames based on the metadata generated by the nvtracker and nvinfer elements.
  • nvv4l2h264enc: This element is used to encode the video frames from the nvvideoconvert element into H.264 format.
  • h264parse: This element is used to parse the H.264 video stream from the nvv4l2h264enc element.
  • qtmux: This element is used to multiplex the video and audio streams into an mp4 file.
  • filesink: This element is used to write the output mp4 file to a location on the system. The location attribute specifies the path to the output file.

First off, we got our trusty filesrc element, which is like the gatekeeper of the pipeline. It’s the one responsible for taking the mp4 file in and begin processing data to downstream elements. Next up, we have the qtdemux element, aka the party-starter. It separates the audio and video streams, so we can work on them separately. Then we got the h264parse element, the brain of the operation. It extracts all the important information like frame rate and resolution, so we can configure the video pipeline. Now, we got the nvv4l2decoder, it decodes the H.264 video stream and converts it into raw video frames for the pipeline to work its magic.

After that, we got the nvstreammux, the multitasker. It takes care of multiplexing multiple video stream with the metadata streams generated by other elements. We then got the nvinfer, it runs the object detection model on the frames and adds list of objects detected to the metadata of the pipeline. Next up, we got the nvtracker, the ninja of the pipeline. It tracks the objects detected by the nvinfer element.

Then we got the nvvideoconvert, it converts the video frames between different video formats so nvdsosd can work seamlessly. After that, we got the nvdsosd, the artist of the pipeline. It overlays the bounding boxes and labels on the video frames. Then we got another nvvideoconvert, to revert it back to a format that can be accepted by downstream elements. Now, we got the nvv4l2h264enc, it encodes the video frames into H.264 format and prepares the output file. We then got the h264parse element, the quality checker. It parses the H.264 video stream one last time to make sure everything is perfect. Lastly, we got the qtmux and filesink element, the dynamic duo. They multiplex the video and audio streams into an mp4 file and save it to a location on the system, so we can all enjoy the final output.

Code!!!

Here’s the code for creating the GStreamer and DeepStream pipeline in C on GitHub. Include necessary gstreamer, glib and DeepStream header files. Define some constants for nvstreammux output width, height and batch-timeout (Don’t worry we’ll go in detail about these properties in the upcoming series).

#include <gst/gst.h>
#include <glib.h>
#include <stdio.h>
#include <cuda_runtime_api.h>
#include "gstnvdsmeta.h"

/* The muxer output resolution must be set if the input streams will be of
* different resolution. The muxer will scale all the input frames to this
* resolution. */
#define MUXER_OUTPUT_WIDTH 1920
#define MUXER_OUTPUT_HEIGHT 1080

/* Muxer batch formation timeout, for e.g. 40 millisec. Should ideally be set
* based on the fastest source's framerate. */
#define MUXER_BATCH_TIMEOUT_USEC 40000

We’ll initialise all the elements of the pipeline

GMainLoop *loop = NULL;
GstElement *pipeline = NULL, *source = NULL, *h264parser = NULL, *nvv4l2h264enc = NULL, *qtdemux = NULL,
*nvv4l2decoder = NULL, *streammux = NULL, *sink = NULL, *nvvidconv = NULL, *qtmux = NULL,
*pgie = NULL, *tracker = NULL, *nvvidconv2 = NULL, *nvosd = NULL, *h264parser2 = NULL;

GstElement *transform = NULL;
GstBus *bus = NULL;
guint bus_watch_id;

/* Check input arguments */
if (argc != 2) {
g_printerr ("Usage: %s </path/to/input/video.mp4>\n", argv[0]);
return -1;
}

The pipeline starts by creating a GMainLoop, which is the main event loop in Gstreamer, and a GstElement pipeline, which is the main pipeline that connects all the elements. We need to initialise GStreamer with gst_init (&argc, &argv); , initialise the loop with loop = g_main_loop_new (NULL, FALSE); .

/* Standard GStreamer initialization */
gst_init (&argc, &argv);
loop = g_main_loop_new (NULL, FALSE);

The pipeline elements are created using the gst_element_factory_make function, which creates an instance of the specified element.

/* Create gstreamer elements */
/* Create Pipeline element that will form a connection of other elements */
pipeline = gst_pipeline_new ("deepstream_tutorial_app1");

/* Input File source element */
source = gst_element_factory_make ("filesrc", "file-source");

/* QTDemux for demuxing different type of input streams */
qtdemux = gst_element_factory_make ("qtdemux", "qtdemux");

/* Since the data format in the input file is elementary h264 stream,
* we need a h264parser */
h264parser = gst_element_factory_make ("h264parse", "h264-parser");

/* Use nvdec_h264 for hardware accelerated decode on GPU */
nvv4l2decoder = gst_element_factory_make ("nvv4l2decoder", "nvv4l2-decoder");

/* Create nvstreammux instance to form batches from one or more sources. */
streammux = gst_element_factory_make ("nvstreammux", "stream-muxer");

/* Use nvinfer to run inferencing on decoder's output,
* behaviour of inferencing is set through config file */
pgie = gst_element_factory_make ("nvinfer", "primary-nvinference-engine");

/* Assigns track ids to detected bounding boxes*/
tracker = gst_element_factory_make ("nvtracker", "tracker");

/* Use convertor to convert from NV12 to RGBA as required by nvosd */
nvvidconv = gst_element_factory_make ("nvvideoconvert", "nvvideo-converter");

/* Create OSD to draw on the converted RGBA buffer */
nvosd = gst_element_factory_make ("nvdsosd", "nv-onscreendisplay");

/* Use convertor to convert from NV12 to RGBA as required by nvosd */
nvvidconv2 = gst_element_factory_make ("nvvideoconvert", "nvvideo-converter2");

/* Use convertor to convert from NV12 to H264 as required */
nvv4l2h264enc = gst_element_factory_make ("nvv4l2h264enc", "nvv4l2h264enc");

/* Since the data format for the output file is elementary h264 stream,
* we need a h264parser */
h264parser2 = gst_element_factory_make ("h264parse", "h264parser2");

qtmux = gst_element_factory_make ("qtmux", "qtmux");

sink = gst_element_factory_make ("filesink", "filesink");

if (!pipeline || !source || !h264parser || !qtdemux ||
!nvv4l2decoder || !streammux || !pgie || !tracker ||
!nvvidconv || !nvosd || !nvvidconv2 || !nvv4l2h264enc ||
!h264parser2 || !qtmux || !sink) {
g_printerr ("One element could not be created. Exiting.\n");
return -1;
}

We need to set properties to the element using g_object_set function.


/* we set the input filename to the source element */
g_object_set (
G_OBJECT (source),
"location",
argv[1],
NULL
);

g_object_set (
G_OBJECT (streammux),
"batch-size",
1,
"width",
MUXER_OUTPUT_WIDTH,
"height",
MUXER_OUTPUT_HEIGHT,
"batched-push-timeout",
MUXER_BATCH_TIMEOUT_USEC, NULL
);

/* Set all the necessary properties of the nvinfer element,
* the necessary ones are : */
g_object_set (
G_OBJECT (pgie),
"config-file-path",
"/opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-test1/dstest1_pgie_config.txt",
NULL
);

/* Set all the necessary properties of the nvtracker element,
* the necessary ones are : */
g_object_set (
G_OBJECT (tracker),
"ll-lib-file",
"/opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so",
NULL
);

/* Set output file location */
g_object_set (
G_OBJECT (sink),
"location",
"output.mp4",
NULL
);

Before linking the elements together, the elements need to be added to the same GST_BIN .

/* Set up the pipeline */
/* we add all elements into the pipeline */
gst_bin_add_many (
GST_BIN (pipeline),
source,
qtdemux,
h264parser,
nvv4l2decoder,
streammux,
pgie,
tracker,
nvvidconv,
nvosd,
nvvidconv2,
nvv4l2h264enc,
h264parser2,
qtmux,
sink,
NULL
);

Not all elements in pipeline can be linked together with gst_link_many function. For eg. the video_%u pad and audio_%u pad on the qtmux element are created when the audio and video streams are detected by the qtdemux element, respectively. Once the audio and video streams are detected, the qtdemux element creates the video_%u and audio_%u pads and uses a callback function, cb_new_pad, to notify the downstream element, h264parser, that new pads are available. The cb_new_pad function is set as the callback for the “pad-added” signal on the qtdemux element, which means that the function is called every time a new pad is added to the element.

The cb_new_pad function takes the qtdemux element, the new pad, and the downstream element, h264parser, as its arguments. The function checks the name of the new pad and links it to the sink pad of the downstream element, h264parser, if it is the video_0 pad. If the link is successful, the function returns true, otherwise, it returns false. The %u in the video_%u and audio_%u pad name is a placeholder for a unique number, which means that every time a new video or audio pad is created, it will be given a unique number. For example, it can be video_0 or audio_0.

void cb_new_pad (GstElement *qtdemux, GstPad* pad, gpointer data) {
GstElement* h264parser = (GstElement*) data;
gchar *name = gst_pad_get_name (pad);
if (strcmp (name, "video_0") == 0 &&
!gst_element_link_pads(qtdemux, name, h264parser, "sink")){
g_printerr ("Could not link %s pad of qtdemux to sink pad of h264parser", name);
}
}

The callback is added to the element using g_signal_connect.

/* Link "video_0" pad of qtdemux to sink pad of h264Parse
* "video_0" pad of qtdemux is created only when
* a valid video stream is found in the input
* in that case only the pipeline will be linked */
g_signal_connect (qtdemux, "pad-added", G_CALLBACK (cb_new_pad), h264parser);

Similarly the sink_%u pad on the nvstreammux element is created on request, which means it is created as needed when it is required in the pipeline. This is done through the use of dynamic pads, which are pads that are created on demand based on the requirements of the pipeline.

In this specific pipeline, the nvstreammux element is used to multiplex the video stream with the metadata streams generated by other elements in the pipeline. When the nvv4l2decoder element pushes the data into the nvstreammux element, the nvstreammux element creates the sink_%u pad and connects it to the src pad of the upstream element, which is the nvv4l2decoder element. This allows the data to flow through the pipeline from the nvv4l2decoder element to the nvstreammux element.

The %u in the sink_%u pad name is a placeholder for the stream index that is being multiplexed, which means that every time a new sink pad is created, it will be given a unique number. For example, it can be sink_0 for stream 0 and so on.

/* 
* Dynamic linking
* sink_0 pad of nvstreammux is only created on request
* and hence cannot be linked automatically
* Need to request it to create it and then link it
* to the upstream element in the pipeline
*/
GstPad *sinkpad, *srcpad;
gchar pad_name_sink[16] = "sink_0";
gchar pad_name_src[16] = "src";

/* Dynamically created pad */
sinkpad = gst_element_get_request_pad (streammux, pad_name_sink);
if (!sinkpad) {
g_printerr ("Streammux request sink pad failed. Exiting.\n");
return -1;
}

/* Statically created pad */
srcpad = gst_element_get_static_pad (nvv4l2decoder, pad_name_src);
if (!srcpad) {
g_printerr ("Decoder request src pad failed. Exiting.\n");
return -1;
}

/* Linking the pads */
if (gst_pad_link (srcpad, sinkpad) != GST_PAD_LINK_OK) {
g_printerr ("Failed to link decoder to stream muxer. Exiting.\n");
return -1;
}

/* Unreference the object */
gst_object_unref (sinkpad);
gst_object_unref (srcpad);

qtdemux and h264parse will be linked in cb_new_pad . We also linked nvv4l2decoder and streammux by using the request pads. Now we are ready to link the rest of the elements.

/* 
* we link the elements together
* file-source -> qtdemux -> h264-parser -> nvh264-decoder ->
* nvinfer -> tracker -> nvvidconv -> nvosd -> nvvidconv2 ->
* nvh264-encoder -> qtmux -> filesink */
if (!gst_element_link_many (source, qtdemux, NULL)) {
g_printerr ("Source and QTDemux could not be linked: 1. Exiting.\n");
return -1;
}

if (!gst_element_link_many (h264parser, nvv4l2decoder, NULL)) {
g_printerr ("H264Parse and NvV4l2-Decoder could not be linked: 2. Exiting.\n");
return -1;
}

if (!gst_element_link_many (streammux, pgie, tracker, nvvidconv, nvosd, nvvidconv2, nvv4l2h264enc, h264parser2, qtmux, sink, NULL)) {
g_printerr ("Rest of the pipeline elements could not be linked: 3. Exiting.\n");
return -1;
}

We are ready to start playing the pipeline!

/* Set the pipeline to "playing" state */
g_print ("Using file: %s\n", argv[1]);
gst_element_set_state (pipeline, GST_STATE_PLAYING);

/* Wait till pipeline encounters an error or EOS */
g_print ("Running...\n");
g_main_loop_run (loop);

Wait for the pipeline to finish and clean up!

/* Out of the main loop, clean up nicely */
g_print ("Returned, stopping playback\n");
gst_element_set_state (pipeline, GST_STATE_NULL);
g_print ("Deleting pipeline\n");
gst_object_unref (GST_OBJECT (pipeline));
g_source_remove (bus_watch_id);
g_main_loop_unref (loop)

In order to compile and run the code, you’ll need to create this Makefile


# Set appropriate CUDA Version here
# Use command `nvcc --version` to check your default CUDA version
CUDA_VER:=11.6

APP:= deepstream_tutorial_app1

TARGET_DEVICE = $(shell gcc -dumpmachine | cut -f1 -d -)

LIB_INSTALL_DIR?=/opt/nvidia/deepstream/deepstream/lib/
APP_INSTALL_DIR?=/opt/nvidia/deepstream/deepstream/bin/

ifeq ($(TARGET_DEVICE),aarch64)
CFLAGS:= -DPLATFORM_TEGRA
endif

SRCS:= $(wildcard *.c)

INCS:= $(wildcard *.h)

PKGS:= gstreamer-1.0

OBJS:= $(SRCS:.c=.o)

CFLAGS+= -I /opt/nvidia/deepstream/deepstream/sources/includes \
-I /usr/local/cuda-$(CUDA_VER)/include

CFLAGS+= $(shell pkg-config --cflags $(PKGS))

LIBS:= $(shell pkg-config --libs $(PKGS))

LIBS+= -L/usr/local/cuda-$(CUDA_VER)/lib64/ -lcudart \
-L$(LIB_INSTALL_DIR) -lnvdsgst_meta -lnvds_meta -lnvds_yml_parser \
-lcuda -Wl,-rpath,$(LIB_INSTALL_DIR) \

all: $(APP)

%.o: %.c $(INCS) Makefile
$(CC) -c -o $@ $(CFLAGS) $<

$(APP): $(OBJS) Makefile
$(CC) -o $(APP) $(OBJS) $(LIBS)

install: $(APP)
cp -rv $(APP) $(APP_INSTALL_DIR)

clean:
rm -rf $(OBJS) $(APP)

In order compile you need to set proper CUDA_VER in the Makefile and use this command.

make

To run the program use the executable deepstream_tutorial_app1 provide absolute path to any mp4 file. I’ve used one of the sample streams from DeepStream for this purpose.

./deepstream_tutorial_app1 /opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.mp4

After execution there should be an output.mp4 file created in the same directory. Here’s the output that was generated.

Bonus Section!

What if I told you, that you could have skipped the whole section above and created the pipeline using just one big cli command? I know, you’d kill me for it, but the reason for writing all this code was to introduce some important concepts in GStreamer like elements, pads, linking (dynamic and static) setting state of the pipeline etc. But all of this can be done with a cli command, as long as you’re not doing something very complicated in which case you’ll be required to write the pipeline in C. Without further ado here’s the gst-launch command for creating the same GStreamer pipeline.

gst-launch-1.0 filesrc location=/opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.mp4 \
! qtdemux ! h264parse ! nvv4l2decoder ! m.sink_0 nvstreammux name=m batch-size=1 width=1920 height=1080 \
! nvinfer config-file-path=/opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-test1/dstest1_pgie_config.txt \
! nvtracker ll-lib-file=/opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so \
! nvvideoconvert ! nvdsosd ! nvvideoconvert ! nvv4l2h264enc ! h264parse ! qtmux ! filesink location=output.mp4

Well folks, that’s all for this tutorial! You’ve learned how to set up a pipeline using Gstreamer elements and link them together. You’ve also learned how to link dynamically created pads. But wait, don’t go anywhere just yet! We’ve only just scratched the surface of what DeepStream can do. In the next tutorial, we’ll be diving deep(stream) into the world of probes and how to attach them to your pipeline. So, hold on to your horses and get ready for some serious pipeline shenanigans. See you in the next tutorial!

References

--

--

Chirag Shetty
Chirag Shetty

Written by Chirag Shetty

Machine Learning Engineer | Computer Vision | MLOps | World Class Computer Vision at Scale.