Fundamental Issues In Computer Vision Still Unresolved
๐ Abstract
The article discusses the current state and challenges of computer vision, a cornerstone of many applications from self-driving cars to medical diagnosis and robotics. It covers the evolution of computer vision from rule-based systems to neural networks and the latest advancements in convolutional neural networks (CNNs) and transformers. The article also explores issues such as dataset biases, corner cases, hardware limitations, and the need for more interpretable algorithms.
๐ Q&A
[01] Industry and Academia Addressing Next Steps
1. What are the key weak points of computer vision that need to be mitigated?
- The ability to identify corner cases or if algorithms are trained on shallow datasets
- Fundamental technical issues that require further research
2. How has the term "computer vision" evolved compared to "machine vision"?
- "Computer vision" has become the dominant term, replacing "machine vision" which was more focused on the hardware embodiment of vision.
- Computer vision started as an academic field combining neuroscience and AI research, and has now become the preferred term even in the robotics field.
3. What are the three main categories of problems that academic and industry researchers work on in computer vision?
- Image classification/detection
- Object recognition
- Scene understanding
4. How did the shift from rule-based systems to neural networks impact the progress of computer vision?
- The switch from rule-based systems to neural networks, which could learn salient features from image datasets, enabled the transition from lab projects to everyday applications of computer vision.
[02] Transformers and Convolutional Neural Networks
1. What are the key differences between convolutional neural networks (CNNs) and transformers in computer vision?
- CNNs are very good at distinguishing local features, while transformers perceive a more globalized picture.
- Transformers are a natural evolution from CNNs and recurrent neural networks, offering greater accuracy but at the expense of more computations and power consumption.
2. How are the two parallel trends reshaping computer vision systems?
- Transformer networks for object detection and recognition are entering production, offering greater accuracy and usability than their convolution-based predecessors.
- Computer vision experts are reinventing classical ISP functions with neural network and transformer-based models that offer superior results.
[03] Computer Vision for Inspection
1. How is computer vision being used for inspection applications?
- Computer vision has helped detect everything from cancer tumors to manufacturing errors and critical flaws in the built environment, such as cracks in bridges.
- By combining visual transformers with self-supervised learning, the annotation requirement for these inspection tasks is vastly reduced.
2. What is the "visual prompting" approach introduced by IBM, and how does it help reduce the need for extensive labeling?
- "Visual prompting" allows the AI to be taught to make the correct distinctions with limited supervision by using "in-context learning," such as a scribble as a prompt.
- This approach can significantly reduce the amount of labeling work compared to traditional CNNs, where hundreds or thousands of labels would be required.
[04] Challenges and Limitations
1. What are the issues with dataset biases and their impact on computer vision algorithms?
- Datasets, both general and domain-specific, can be subject to human and technical biases, leading to issues like Google's racial identification gaffes.
- Allowing the inclusion of contradictory annotations from multiple experts can help introduce less bias into the datasets.
2. What are "corner cases" in computer vision, and why are they a challenge?
- Corner cases are rare events that are likely not included in the training dataset, but can have critical importance, such as a baby in the road for a self-driving car.
- Algorithms can exploit biases in the dataset and fail to generalize to these out-of-domain situations, potentially with fatal results.
3. How do the limitations of current computer vision algorithms compare to human visual processing?
- Humans learn about objects through multi-sensory experiences over years, while current algorithms are trained as classifiers on 2D images without proper 3D understanding.
- This lack of 3D knowledge makes the algorithms vulnerable to degradation when images are contaminated or occluded, unlike human vision.
4. What are the hardware challenges in implementing advanced computer vision algorithms?
- The complexity of transformer-based models, with their quadratic scaling, is a bottleneck that requires more powerful hardware to overcome.
- Designing the full optical path, from lens to sensor to processing, is crucial for achieving the necessary visual fidelity and dynamic range for applications like passenger monitoring.
5. Why is the "black box" nature of transformer-based models a concern, and what is needed to address it?
- The lack of interpretability of transformer-based models makes it difficult to understand their failure modes and test them thoroughly.
- There is a need for more interpretable algorithms that can be evaluated by finding the specific images or situations that break them, rather than just random sampling.