Teaching machines to see

Automation by cognitive artificial vision already advances industries, space, health care, and infrastructure inspection. Many problems still remain complex. Noisy and scarce data challenge neural systems in inferring reliable and useful signals at scale. Our lab explores learning formulations of multi-tasking, self-supervision, and weak supervision to improve machine autonomy and human-computer interfaces in real-world settings.

Multi-task Video Enhancement for Dental Interventions

A microcamera firmly attached to a dental handpiece allows dentists to continuously monitor the progress of conservative dental procedures. Video enhancement in video-assisted dental interventions alleviates low-light, noise, blur, and camera handshakes that collectively degrade visual comfort. To this end, we introduce a novel deep network for multi-task video enhancement that enables macro-visualization of dental scenes. In particular, the proposed network jointly leverages video restoration and temporal alignment in a multi-scale manner for effective video enhancement. Our experiments on videos of natural teeth in phantom scenes demonstrate that the proposed network achieves stateof-the-art results in multiple tasks with near real-time processing.

Leveraging spatio-temporal features for joint deblurring and segmentation of instruments in dental video microscopy

In dentistry, microscopes have become indispensable optical devices for high-quality treatment and micro-invasive surgery, especially in the field of endodontics. Recent machine vision advances enable more advanced, real-time applications including but not limited to dental video deblurring and workflow analysis through relevant metadata obtained by instrument motion trajectories. To this end, the proposed work addresses dental video deblurring and instrument segmentation in a Multi-task Learning fashion, leveraging spatio-temporal adaptive kernels via a recurrent design. The task-specific branches of our architecture employ the responses of those kernels to recover sharper video frames and yield the dental instrument segmentation mask. We demonstrate that the proposed method improves deblurring while retaining segmentation performance under a low computational footprint.

Video and image restoration

We study the influence of motion estimation and concurrent tasks on video restoration performance. We develop tailored neural architectures that efficiently address multiple vision tasks such as video denoising, deblurring, stabilization, and segmentation.

BP-EVD: Forward Block-Output Propagation for Efficient Video Denoising

We conduct research on cognitive and cybernetic vision systems to automate processes. The research activity of the group includes developing models for visual information processing and the development of algorithms that learn in a supervised and self-supervised manner from continuous and limited data as well as from clean and noisy data. Our work is firmly grounded in applications. The lab focuses on transferring the developed technologies to innovative single and multi-camera systems, aided by additional modalities, actuators, and efficient cloud and edge computing.

Multi-task Video Enhancement for Dental Interventions

A microcamera firmly attached to a dental handpiece allows dentists to continuously monitor the progress of conservative dental procedures. Video enhancement in video-assisted dental interventions alleviates low-light, noise, blur, and camera handshakes that collectively degrade visual comfort. To this end, we introduce a novel deep network for multi-task video enhancement that enables macro-visualization of dental scenes. In particular, the proposed network jointly leverages video restoration and temporal alignment in a multi-scale manner for effective video enhancement.

Teaching machines to see

Automation by cognitive artificial vision already advances industries, space, health care, and infrastructure inspection. Many problems still remain complex. Noisy and scarce data challenge neural systems in inferring reliable and useful signals at scale. Our lab explores learning formulations of multi-tasking, self-supervision, and weak supervision to improve machine autonomy and human-computer interfaces in real-world settings.

Multi-task Video Enhancement for Dental Interventions

A microcamera firmly attached to a dental handpiece allows dentists to continuously monitor the progress of conservative dental procedures. Video enhancement in video-assisted dental interventions alleviates low-light, noise, blur, and camera handshakes that collectively degrade visual comfort. To this end, we introduce a novel deep network for multi-task video enhancement that enables macro-visualization of dental scenes. In particular, the proposed network jointly leverages video restoration and temporal alignment in a multi-scale manner for effective video enhancement. Our experiments on videos of natural teeth in phantom scenes demonstrate that the proposed network achieves stateof-the-art results in multiple tasks with near real-time processing.

Leveraging spatio-temporal features for joint deblurring and segmentation of instruments in dental video microscopy

In dentistry, microscopes have become indispensable optical devices for high-quality treatment and micro-invasive surgery, especially in the field of endodontics. Recent machine vision advances enable more advanced, real-time applications including but not limited to dental video deblurring and workflow analysis through relevant metadata obtained by instrument motion trajectories. To this end, the proposed work addresses dental video deblurring and instrument segmentation in a Multi-task Learning fashion, leveraging spatio-temporal adaptive kernels via a recurrent design. The task-specific branches of our architecture employ the responses of those kernels to recover sharper video frames and yield the dental instrument segmentation mask. We demonstrate that the proposed method improves deblurring while retaining segmentation performance under a low computational footprint.

Video and image restoration

We study the influence of motion estimation and concurrent tasks on video restoration performance. We develop tailored neural architectures that efficiently address multiple vision tasks such as video denoising, deblurring, stabilization, and segmentation.

BP-EVD: Forward Block-Output Propagation for Efficient Video Denoising

Denoising videos in real-time is critical in many applications, including robotics and medicine, where varying-light conditions, miniaturized sensors, and optics can substantially compromise image quality. This work proposes the first video denoising method based on a deep neural network that achieves state-of-the-art performance on dynamic scenes while running in real-time on VGA video resolution with no frame latency. The backbone of our method is a novel, remarkably simple, temporal network of cascaded blocks with forward block output propagation. We train our architecture with short, long, and global residual connections by minimizing the restoration loss of pairs of frames, leading to a more effective training across noise levels. It is robust to heavy noise following Poisson-Gaussian noise statistics. The algorithm is evaluated on RAW and RGB data. We propose a denoising algorithm that requires no future frames to denoise a current frame, reducing its latency considerably. The visual and quantitative results show that our algorithm achieves state-of-the-art performance among efficient algorithms, achieving from two-fold to two-orders-of-magnitude speed-ups on standard benchmarks for video denoising.

Multi-task Video Enhancement for Dental Interventions

A microcamera firmly attached to a dental handpiece allows dentists to continuously monitor the progress of conservative dental procedures. Video enhancement in video-assisted dental interventions alleviates low-light, noise, blur, and camera handshakes that collectively degrade visual comfort. To this end, we introduce a novel deep network for multi-task video enhancement that enables macro-visualization of dental scenes. In particular, the proposed network jointly leverages video restoration and temporal alignment in a multi-scale manner for effective video enhancement.

Cameras and Algorithms Lab

We conduct research on cognitive and cybernetic vision systems to automate processes. The research activity of the group includes developing models for visual information processing and the development of algorithms that learn in a supervised and self-supervised manner from continuous and limited data as well as from clean and noisy data. Our work is firmly grounded in applications. The lab focuses on transferring the developed technologies to innovative single and multi-camera systems, aided by additional modalities, actuators, and efficient cloud and edge computing.

BP-EVD: Forward Block-Output Propagation for Efficient Video Denoising

We conduct research on cognitive and cybernetic vision systems to automate processes. The research activity of the group includes developing models for visual information processing and the development of algorithms that learn in a supervised and self-supervised manner from continuous and limited data as well as from clean and noisy data. Our work is firmly grounded in applications. The lab focuses on transferring the developed technologies to innovative single and multi-camera systems, aided by additional modalities, actuators, and efficient cloud and edge computing.

Multi-task Video Enhancement for Dental Interventions

A microcamera firmly attached to a dental handpiece allows dentists to continuously monitor the progress of conservative dental procedures. Video enhancement in video-assisted dental interventions alleviates low-light, noise, blur, and camera handshakes that collectively degrade visual comfort. To this end, we introduce a novel deep network for multi-task video enhancement that enables macro-visualization of dental scenes. In particular, the proposed network jointly leverages video restoration and temporal alignment in a multi-scale manner for effective video enhancement.