5 Tools to monitor the performance of your Deep Learning Stack!
Introducing DML Notes, plus an overview of popular vision foundation models, a toolset to use in vision data engineering and performance monitoring DL pipelines.
Decoding ML Notes
Hey everyone, Happy Saturday!
Today marks the beginning of an exciting new journey.
Weโre introducing "DML Notes" a weekly series dedicated to covering summaries, tips and tricks, and advice on ML& MLOps engineering in a short-form content format.
Every Saturday, "DML Notes" will bring you a concise, easy-to-digest roundup of the most significant concepts and practices in MLE, Deep Learning, and MLOps. We aim to craft this series to enhance your understanding and keep you updated while respecting your time (2-3 minutes) and curiosity.
Letโs start with the first iteration and cover a few key elements when working with Deep Learning Vision Systems.
This weekโs topics:
๐ฑ ๐ฉ๐ถ๐๐ถ๐ผ๐ป ๐๐ผ๐๐ป๐ฑ๐ฎ๐๐ถ๐ผ๐ป ๐ ๐ผ๐ฑ๐ฒ๐น๐ to keep an eye on!
๐ง๐ผ๐ฝ ๐ญ๐ฌ FFMPEG commands when working with image/video!
๐ฑ ๐ง๐ผ๐ผ๐น๐ ๐๐ผ monitor the performance of your ๐๐ฒ๐ฒ๐ฝ ๐๐ฒ๐ฎ๐ฟ๐ป๐ถ๐ป๐ด ๐ฆ๐๐ฎ๐ฐ๐ธ!
5 Vision Foundation Models
With LLMs being in the spotlight, letโs not forget these foundation models for vision that use the same Transformer + Attention mechanisms but instead of text - they process image patches as tokens.
Here are the Top-5
๐ฆ๐๐ (๐ฆ๐ฒ๐ด๐บ๐ฒ๐ป๐-๐๐ป๐๐๐ต๐ถ๐ป๐ด)
๐๐ณ๐ฐ๐ฎ: ๐๐ฆ๐ต๐ข ๐๐
Used For: ๐๐ฆ๐ฎ๐ข๐ฏ๐ต๐ช๐ค ๐๐ฆ๐จ๐ฎ๐ฆ๐ฏ๐ต๐ข๐ต๐ช๐ฐ๐ฏ
๐๐ฏ๐ฑ๐ถ๐ต: ๐๐ฎ๐ข๐จ๐ฆ๐ด๐๐๐๐ฃ (๐๐ผ๐ป๐๐ฟ๐ฎ๐๐๐ถ๐๐ฒ-๐๐ฎ๐ป๐ด๐๐ฎ๐ด๐ฒ-๐๐บ๐ฎ๐ด๐ฒ-๐ฃ๐ฟ๐ฒ๐๐ฟ๐ฎ๐ถ๐ป๐ถ๐ป๐ด)
๐๐ณ๐ฐ๐ฎ: ๐๐ฑ๐ฆ๐ฏ๐๐
Used For: ๐๐ฎ๐ฃ๐ฆ๐ฅ๐ฅ๐ช๐ฏ๐จ ๐๐น๐ต๐ณ๐ข๐ค๐ต๐ช๐ฐ๐ฏ
๐๐ฏ๐ฑ๐ถ๐ต: ๐๐ฎ๐ข๐จ๐ฆ๐ด + ๐๐ฆ๐น๐ต๐๐ถ๐ป๐ผ๐ฉ๐ฎ
๐๐ณ๐ฐ๐ฎ: ๐๐ฆ๐ต๐ข ๐๐
Used For: ๐๐ฎ๐ฃ๐ฆ๐ฅ๐ฅ๐ช๐ฏ๐จ ๐๐น๐ต๐ณ๐ข๐ค๐ต๐ช๐ฐ๐ฏ
๐๐ฏ๐ฑ๐ถ๐ต: ๐๐ฎ๐ข๐จ๐ฆ๐ด๐ข๐ช๐-๐ฉ๐ถ๐ง (๐ฉ๐ถ๐๐ถ๐ผ๐ป ๐ง๐ฟ๐ฎ๐ป๐๐ณ๐ผ๐ฟ๐บ๐ฒ๐ฟ ๐ณ๐ผ๐ฟ ๐ข๐ฝ๐ฒ๐ป-๐ช๐ผ๐ฟ๐น๐ฑ ๐๐ผ๐ฐ๐ฎ๐น๐ถ๐๐ฎ๐๐ถ๐ผ๐ป)
๐๐ณ๐ฐ๐ฎ: ๐๐ฐ๐ฐ๐จ๐ญ๐ฆ ๐๐ฆ๐ด๐ฆ๐ข๐ณ๐ค๐ฉ
Used For: ๐ก๐ฆ๐ณ๐ฐ-๐๐ฉ๐ฐ๐ต ๐๐ฃ๐ซ๐ฆ๐ค๐ต ๐๐ฆ๐ต๐ฆ๐ค๐ต๐ช๐ฐ๐ฏ
๐๐ฏ๐ฑ๐ถ๐ต: ๐๐ฎ๐ข๐จ๐ฆ๐ด + ๐๐ฆ๐น๐ต๐๐๐ง๐ฅ (๐๐ฒ๐๐ฒ๐ฐ๐๐ถ๐ผ๐ป ๐ง๐ฟ๐ฎ๐ป๐๐ณ๐ผ๐ฟ๐บ๐ฒ๐ฟ)
๐๐ณ๐ฐ๐ฎ: ๐๐ฆ๐ต๐ข ๐๐
Used For: ๐๐ฃ๐ซ๐ฆ๐ค๐ต ๐๐ฆ๐ต๐ฆ๐ค๐ต๐ช๐ฐ๐ฏ
๐๐ฏ๐ฑ๐ถ๐ต: ๐๐ฎ๐ข๐จ๐ฆ๐ด
What is ๐๐ฎ๐ฃ๐ฆ๐ฅ๐ฅ๐ช๐ฏ๐จ ๐๐น๐ต๐ณ๐ข๐ค๐ต๐ช๐ฐ๐ฏ - it implies transposing the image into the pre-learned latent space of features, yielding to the most accurate compressed context of the image.
What is ๐ก๐ฆ๐ณ๐ฐ-๐๐ฉ๐ฐ๐ต ๐๐ฃ๐ซ๐ฆ๐ค๐ต ๐๐ฆ๐ต๐ฆ๐ค๐ต๐ช๐ฐ๐ฏ - zero-shot means that the model requires no prior training on the specific label-set it is inferred with. In this case, this model can find an object it has never seen before in its training set.
What is ๐๐ฆ๐ฎ๐ข๐ฏ๐ต๐ช๐ค ๐๐ฆ๐จ๐ฎ๐ฆ๐ฏ๐ต๐ข๐ต๐ช๐ฐ๐ฏ - can be considered as a per-pixel classification. Each pixel is attributed to an object label, such that the result yields accurate delimitations of different objects in an image.
Some of these models were used as vision encoders in projects like LLaVA (Vision Transformer + LLaMA) which is a multi-modal model text + image that can process and understand an image - similar to GPT4.
Top 10 FFMPEG commands you must know!
As a Computer Vision engineer, I use FFMPEG 90% of the time when Iโm working with data. Be it to generate datasets, split/merge videos/images, or inspect metadata - Iโm able to do everything related to media data manipulation.
Here are 10 commands to get you started:
1๏ธโฃ Basic conversion : ๐๐๐ข๐ฅ๐๐ -๐ ๐๐ฃ๐ฅ๐ช๐ฉ.๐ข๐ฅ4 ๐ค๐ช๐ฉ๐ฅ๐ช๐ฉ.๐๐ซ๐
2๏ธโฃ Extracting frames : ๐๐๐ข๐ฅ๐๐ -๐ ๐ซ๐๐๐๐ค.๐ข๐ฅ4 -๐ง 1/1 $๐๐๐ก๐๐ฃ๐๐ข๐%03๐.๐๐ฅ๐
3๏ธโฃ Resizing videos : ๐๐๐ข๐ฅ๐๐ -๐ ๐๐ฃ๐ฅ๐ช๐ฉ.๐ข๐ฅ4 -๐ซ๐ ๐จ๐๐๐ก๐=320:240 ๐ค๐ช๐ฉ๐ฅ๐ช๐ฉ.๐ข๐ฅ4
4๏ธโฃ Adjust framerate : ๐๐๐ข๐ฅ๐๐ -๐ ๐๐ฃ๐ฅ๐ช๐ฉ.๐ข๐ฅ4 -๐ง 30 ๐ค๐ช๐ฉ๐ฅ๐ช๐ฉ.๐ข๐ฅ4
5๏ธโฃ Trimming videos : ๐๐๐ข๐ฅ๐๐ -๐ ๐๐ฃ๐ฅ๐ช๐ฉ.๐ข๐ฅ4 -๐จ๐จ 00:00:10 -๐ฉ๐ค 00:00:20 -๐ ๐๐ค๐ฅ๐ฎ ๐ค๐ช๐ฉ๐ฅ๐ช๐ฉ.๐ข๐ฅ4
6๏ธโฃ Compress videos : ๐๐๐ข๐ฅ๐๐ -๐ ๐๐ฃ๐ฅ๐ช๐ฉ.๐ข๐ฅ4 -๐ซ๐๐ค๐๐๐ ๐264 -๐๐๐ค๐๐๐ ๐ข๐ฅ2 ๐ค๐ช๐ฉ๐ฅ๐ช๐ฉ.๐ข๐ฅ4
7๏ธโฃ Adjust aspect-ratio: ๐๐๐ข๐ฅ๐๐ -๐ ๐๐ฃ๐ฅ๐ช๐ฉ.๐ข๐ฅ4 -๐๐จ๐ฅ๐๐๐ฉ 1.7777 ๐ค๐ช๐ฉ๐ฅ๐ช๐ฉ.๐ข๐ฅ4
8๏ธโฃ Audio Extraction : ๐๐๐ข๐ฅ๐๐ -๐ ๐ซ๐๐๐๐ค.๐ข๐ฅ4 -๐ฆ:๐ 0 -๐ข๐๐ฅ ๐ ๐๐ช๐๐๐ค.๐ข๐ฅ3`
9๏ธโฃ Remote VideoPlay:
๐จ๐จ๐ [HOST]@[IP] ๐๐๐ข๐ฅ๐๐ -๐ "[REMOTE_PATH] -๐ ๐๐ค๐ฅ๐ฎ -๐ ๐ฃ๐ช๐ฉ ๐ฅ๐๐ฅ๐:1" | ๐๐๐ฅ๐ก๐๐ฎ -๐ ๐ฅ๐๐ฅ๐:0
๐ Images to video :
๐๐๐ข๐ฅ๐๐ -๐๐ง๐๐ข๐๐ง๐๐ฉ๐ 1 -๐ ๐๐ข๐%03๐.๐ฅ๐ฃ๐ -๐:๐ซ ๐ก๐๐๐ญ264 -๐ง 30 -๐ฅ๐๐ญ_๐๐ข๐ฉ ๐ฎ๐ช๐ซ420๐ฅ ๐ค๐ช๐ฉ.๐ข๐ฅ4
5 Tools to monitor the performance of your Deep Learning Stack!
This is the toolset that I use frequently to set up a performance monitoring pipeline for the Computer Vision Edge stacks I deploy. It allows us to identify various issues in the flow, like:
IDLE GPU Times - check when the GPU is lazy and doesnโt process.
Inference Times - especially important in multi-batch prediction as one can see how stressed the system becomes.
CPU/RAM Usage - especially important in real-time video processing applications, as video-reading and decoding eat a lot of CPU and one has to pay attention to memory management when working with video frames to avoid memory build-up or leaks.
Hereโs an overview of the tooling:
1. ๐ฐ๐๐ฑ๐๐ถ๐๐ผ๐ฟ
Used to scrape/monitor individual container metrics.
2. ๐ฃ๐ฟ๐ผ๐บ๐ฒ๐๐ต๐ฒ๐๐
Configured to monitor and scrape cAdvisor metrics (CPU/RAM) and Triton Inference Server GPU metrics, like the following:
- Latency - the average time a complete request is finished
- QPS or QueriesPerSecond - a useful metric to test the speed of the model when performing inference.
3. ๐๐ผ๐ฐ๐ธ๐ฒ๐ฟ ๐๐ผ๐บ๐ฝ๐ผ๐๐ฒ
Used to wrap up and contain all these services.
Itโs easy to manage and allows you to control the actions for each container at once.
4. ๐๐ฟ๐ฎ๐ณ๐ฎ๐ป๐ฎ
Visualization dashboard, this is the metrics-consuming point. The UI panels can be defined and saved as a .json to be shared or re-uploaded in other Grafana configurations.
A few recommended metrics to monitor:
- CPU/RAM usage per container
- GPU usage %
- GPU Active Memory/IDLE
- Inference throughput and batching frequency.
5.๐ง๐ฟ๐ถ๐๐ผ๐ป ๐๐ป๐ณ๐ฒ๐ฟ๐ฒ๐ป๐ฐ๐ฒ ๐ฆ๐ฒ๐ฟ๐๐ฒ๐ฟ
Used as the model serving framework. A big advantage is the integrated Prometheus metrics port (:8002) that monitors a multitude of GPU-specific parameters.
Missed the post on NVIDIA Triton? โ
Iโve covered Triton Inference Server in-depth, starting from the environment setup up to model deployment and inferencing an image of a pizza.
๐ NVIDIA Triton Inference Server
Donโt hesitate to share your thoughts - we would love to hear them.
โ Remember, when ML looks encoded - weโll help you decode it.
โFrom Decoding ML, every Thursday and Saturday!โ