Developer stories to your inbox.

Subscribe to the Developer Digest, a monthly dose of all things code.

You may unsubscribe at any time using the unsubscribe link in the digest email. See our privacy policy for more information.

alwaysAI Ad

A Developer’s Intro to Computer Vision Libraries

By Jason Koo Oct 07, 2019

Developers have several libraries available to them to make the process of building and deploying computer vision simpler and more effective. Most of these libraries are written in C/C++, ensuring a fast execution. In most cases, however, there is a Python API that wraps the C++ implementation. This is because Python has become the go-to language for prototyping and developing deep neural networks. An extremely popular and versatile language, Python enables interactive development, and its rich set of libraries and frameworks accelerates the prototyping of complex software. Many Python implementations run a fair bit slower than native C++, hence when speed is an issue, C++ is typically preferred over Python.

Amongst the libraries and frameworks used in the community, we can separate them into three bins with somewhat overlapping functionalities.

The first are the deep learning frameworks such as TensorFlow, PyTorch, CNTK and MXNet. The second set is specific to computer vision and image processing. The third relates to machine learning and data handling. This article provides an overview of the deep learning libraries and a brief introduction to the other libraries.


Software developers working with computer vision libraries


A Brief History of Deep Learning Frameworks

There are many frameworks available for the development of deep learning models such as TensorFlow, PyTorch, CNTK, and MXNet. Essentially, these frameworks have the following elements: a high-level API in multiple languages that enables deep learning with respect to model definition, training, testing and deployment — a native C++ implementation that enables fast execution on a variety of hardware, and sometimes a target application for execution such as a browser or smartphone.

Theano, a Python library that is no longer under active development, started out as the first of these libraries in 2008. It used a NumPy-like syntax to enable efficient, read GPU, and execute multi-dimensional arrays. In 2012, AlexNet won the ImageNet competition and the interest in deep learning spiked. Caffe emerged as a fantastic and fast library for academics to develop deep networks. Inspired by the popularity of Caffe in 2015, Google open sourced TensorFlow which, until that time, was developed and used internally at Google — where it was known as DistBelief. TensorFlow, with the backing of Google, leveraged an equally intuitive design and interface to Python, and borrowed some of the strongest features of Theano, like auto-differentiation, and quickly gained significant momentum in the deep learning community. Many academics and researchers latched on to TensorFlow and, as a result, made almost all of the new networks today available in the TensorFlow model zoo. This is especially true for the research coming out of the Google Brain team that is the source of many state-of-the-art deep learning models.

While the power of Google’s application was behind TensorFlow, Facebook released PyTorch, a library written in Lua based on Torch. Later, Facebook started another project called Caffe2, but eventually merged it with PyTorch. While TensorFlow was designed to run large production grade models, PyTorch was designed (and re-written) specifically for DNN developers using Python. It had some significant advantages over TensorFlow with respect to interactive model development and more framework like functionalities. It has continuously gained popularity over the last couple of years and most of the research coming out of Facebook’s AI Research is implemented in PyTorch.

While TensorFlow remains quite popular today, it has some severe limitations in terms of the learning curve compared to some of the modern frameworks now available. Developers are required, for example, to define the entire computation graph before evaluating parts of it. TensorFlow provides great support for deployment, and control of execution. The user, however, has to understand concepts such as sessions and graphs. In comparison, PyTorch seems much more straightforward and “Pythonic.” In addition, there are many differing APIs that support different use cases, from small scale prototyping to large dataset-based training. TensorFlow developers have recognized these limitations and will release TensorFlow 2.0 to address these concerns.

Besides PyTorch and TensorFlow, there are other frameworks such as MXNet (Amazon), CNTK (Microsoft), and Chainer. Each of these has some features better than the other and the choice depends on the desired features and deployment target. Other frameworks and languages exist that support deep learning such as deep learning for JavaScript (d4js).


Framework APIs

Many deep learning frameworks are now associated with a high-level API that enables an easy definition of the networks. For example, Keras is the preferred API for defining TensorFlow models. Networks defined in Keras can be executed with TensorFlow, Theano or CNTK. Keras also provides many important features such as data augmentation for training the network. Similarly, Gluon is a high-level Python API introduced by Amazon and supported by Microsoft that supports MXNet and CNTK networks.


Choosing a Framework

At the end of the day, there is an overwhelming amount of different frameworks. If you are new to the project, choose wisely by taking into account your application’s training and deployment requirements. For example, support for reinforcement learning and RNNs varies across the library. Whereas for CNN, most of the libraries provide similar levels of support. Also, research support for the framework in the cloud provider of choice. For example, Google provides TPU based training of models in its cloud that can increase speed significantly. TensorFlow models, however, generally run slower than other libraries at inference time unless they are on an accelerator like edge TPU. You can also train your models in one library and have the option to convert them to another using converters such as ONNX (open neural network exchange), which enables a developer to convert models between popular frameworks, and make use of different runtimes, compilers and visualization tools. It may not always be possible, however, as some neural network functions in one library may not be available in another, or may not be supported by the tools that translate network definitions.

Many of the networks provide specialized git repos for important computer vision tasks. For example, TensorFlow object detection API has many pre-trained networks that can be fine-tuned for user defined datasets.

For the users of alwaysAI, it is important to ensure that the developed models can be run efficiently on the IoT board. For example, if you are using a Google Coral board or an Edge TPU accelerator, it is recommended to stick within the TensorFlow ecosystem. Similarly, if you are using a Jetson Nano board, choose a framework that can be parsed with Nvidia’s TensorRT platform. It is critical to ensure board level support for the deep learning model framework before you invest in model development.


Computer Vision and Image Processing Libraries

The second set of libraries relate to computer vision and image processing. The first and the most important of these is OpenCV. With OpenCV, developers can perform a variety of computer vision tasks, starting from image and video acquisition, stabilization, analysis such as motion estimation, camera calibration, and 3D reconstruction. It also supports machine learning and deep neural networks. The OpenCV library also provides tools to run networks defined in caffe, TensorFlow and darknet libraries. It should be noted that not all networks defined in these libraries can be run in OpenCV.

In general, for any sophisticated computer vision application, it will be essential to use some features from OpenCV. Another important aspect of object detection application is object tracking. For this, users will find a library called dlib to be useful. In some cases, libraries such as PIL for image manipulation, SciPy for non-linear filtering, object measurement, etc. can be quite useful.


Machine Learning and Data Management Libraries

The third set of libraries relate to data management and machine learning. For example, scikit-learn may be very useful for building machine learning models for classification, regression, and clustering. Outside of the deep learning frameworks. Although, many of the frameworks also support traditional machine learning model development. Similarly, familiarity with SciPy libraries such as Matplotlib for visualization, Pandas for data analysis, NumPy for mathematical processing can be very handy. For iterative development, familiarity with Jupyter notebook can be very helpful.



A sophisticated computer vision application will require a developer to utilize many of the libraries mentioned in this article. While a comprehensive overview of these is beyond the scope of this article, we hope you found the information useful.

Get Started Now
We are providing professional developers with a simple and easy-to-use platform to build and deploy computer vision applications on edge devices.
Get Started Now
By Jason Koo Oct 07, 2019