#CVPR

20 posts loaded — scroll for more

Link

futurride 1 year ago

Nvidia Research kicks off CVPR with Autonomous Grand Challenge win

#AutonomousVehicle #AV #ComputerVisionandPatternRecognition #ComputerVisionFoundation #CVPR #CVPRAutonomousGrandChallenge #end-to-endautonomousdriving #Futurride +14

0 notes

Text

govindhtech 1 year ago

govindhtech

Computer Vision at CVPR 2024 with Qualcomm AI research

CVPR 2024 workshops

Qualcomm Technologies is excited to present five research papers on the main track, 11 on co-located workshops, 10 demos, two co-organized workshops, and other research community events at the Computer Vision and Pattern Recognition Conference (CVPR) 2024 on Monday, June 17. CVPR honors’ the greatest research and previews the future of generative artificial intelligence (AI) for photos and videos, XR, automotive, computational photography, robotic vision, and more with a 25% acceptance rate.

At top conferences like CVPR, meticulously peer-reviewed papers establish the new state of the art (SOTA) and contribute to the community. Qualcomm Technologies has accepted several papers in generative AI and perception that advance computer vision research.

Generative AI research

Text-to-image diffusion models enable fast, high-definition visual content creation at high computational expense. The paper “Clockwork UNets for Efficient Diffusion Models” introduces Clockwork Diffusion to improve model efficiency. They use computationally intensive UNet-based denoising in every generation. However, the paper notes that not all operations affect output quality equally.

High-resolution feature maps make UNet layers sensitive to slight perturbations, while low-resolution maps have less effect. Based on this fact, Clockwork Diffusion proposes periodically reusing computation from previous denoising steps to approximate low-resolution feature maps.

Reducing FLOPs by 32% improves Stable Diffusion v1.5 perceptual scores. The approach works on low-power edge devices, making it scalable for real-world applications. Clockwork is covered in this blog post on efficient generative AI for photos and videos.

CVPR papers

Solving optical flow estimation data scarcity

Qualcomm study “OCAI: Improving Optical Flow Estimation by Occlusion and Consistency Aware Interpolation” proposes a novel method named OCAI to handle data scarcity in optical flow estimation:

Generates intermediate video frames and optical fluxes for robust frame interpolation.
Uses occlusion-aware forward-warping to resolve pixel value discrepancies and fill missing values.

This research also introduces a teacher-student semi-supervised learning strategy that trains a student model using interpolated frames and flows, boosting optical flow accuracy. The benchmarks show that OCAI has perceptually better interpolation and optical flow accuracy.

Better optical flow accuracy can improve robot navigation, car movement detection, and more.

Enhancing depth completion accuracy

With their article “DeCoTR: Enhancing Depth Completion with 2D and 3D Attentions,” Qualcomm AI Research accomplished state-of-the-art depth completion for computer vision. The research proposes combining 2D and 3D attentions to improve depth completion accuracy without repetitive spatial propagations. Look for 2D features in bottleneck and skip connections:

The baseline convolutional depth completion model is improved to match sophisticated transformer-based models.
The authors elevate 2D features to generate a 3D point cloud and process it with a 3D point transformer to let the model explicitly learn and utilise 3D geometric features.

DeCoTR shows superior generalizability over current techniques by setting new SOTA performance on depth completion benchmarks. This has potential applications in autonomous driving, robotics, and augmented reality, where depth estimation is critical.

Enhancing stereo video compression

New multimodal technologies like virtual reality and autonomous driving have boosted need for efficient multi-view video compression.

Existing stereo video compression methods sequentially compress left and right views, limiting parallelization and runtime efficiency. Low-Latency Neural Stereo Streaming (LLSS) is a new parallel stereo video coding technology. This is addressed by LLSS:

A new bidirectional feature shifting module uses mutual information to encode views effectively using a joint cross-view prior model for entropy coding.
LLSS may process left and right images simultaneously, reducing latency and increasing rate-distortion performance over neural and conventional codecs.

The approach saves 15.8% to 50.6% bitrates on benchmark datasets, outperforming SOTA.

Allowing natural facial expressions

With so many AI-generated faces and characters, facial expressions and transfers must be more natural. This affects gaming and content. Facial action unit (AU) intensity quantifies fine-grained expression behaviours, making it ideal for facial expression modulation. In “AUEditNet: Dual-Branch Facial Action Unit Intensity Manipulation with Implicit Disentanglement,” the scientists accurately manipulated AU intensity in high-resolution synthetic facial images.

AUEditNet, trained with 18 individuals, manipulates intensity across 12 AUs well.
Qualcomm use a dual branch architecture to separate facial characteristics and identification without loss functions or high batch sizes.
Conditioning modification on intensity values or target images without retraining the network or adding estimators is possible with AUEditNet.

This process for facial attribute modification shows promise despite the dataset’s small subject count.

Displays of technology

Qualcomm demonstrate their generative AI and computer vision research in real life to prove its practicality. Visit booth #1931 to try these technologies. Here are some demos to showcase.

Generative AI

Stable Diffusion with LoRA adapters on Android smartphones is shown on generative AI. Personal or creative preferences can be used to create high-quality Stable Diffusion photos with LoRA adapters. Users might choose a LoRA adaptor and set its strength to get the desired image.

Android phone multimodal LLM is also shown. Qualcomm present Large Language and Vision Assistant (LLaVA), a more than 7 billion-parameter LMM that can produce multi-turn conversations about images using text and photographs. Reference design enabled by Snapdragon 8 Gen 3 mobile platform operated LLaVA on smartphone at a responsive token rate.

Computer vision

Other interesting computer vision demonstrations are at booth:

Improved real-time and precise segmentation on edge devices: This technology can segment known and unknown classes in photos and videos using user inputs like touch, drawing a box, or inputting text.
Generational video portrait relighting technology: Works with generative AI for background replacement to improve video chat facial realism.
AI helper on device: Uses an audio-driven 3D avatar with lip-syncing and expressive face motions to engage users.

Qualcomm also demonstrate a cutting-edge face performance capture and retargeting system that can capture complex facial expressions and transform them onto digital avatars, making digital interactions more realistic and personalised.

Autonomous vehicle technology or AV technology

Qualcomm’s Snapdragon Ride sophisticated driver-assistance system demos perception, mapping, fusion, and vehicle control using their modular stack approach for automotive applications. Qualcomm showcase includes driver monitoring system improvements and generative AI-augmented camera sensor data for model training in challenging settings.

Collaborative workshops

Efficient Large Vision Models (eLVM): Many computer vision jobs now use efficient large vision models (eLVM). Qualcomm AI Research is co-organizing the Efficient Large Vision Models Workshop to increase LVM accessibility to researchers and practitioners by focusing on computing efficiency. Exploring concepts for effective LVM adaptation to downstream tasks and domains without rigorous model training and fine-tuning empowers the community to do research with limited computational budget. Accelerating LVM inference allows them to be used in real-time applications on low-computing platforms like cars and phones.
Omnidirectional Computer Vision (OmniCV): A wide field of vision makes omnidirectional cameras popular in automotive, security, photography, and augmented/virtual reality. To connect omnidirectional vision research and practice, Qualcomm AI Research is co-organizing the Omnidirectional Computer Vision Workshop. This session bridges formative research backing these achievements to commercial products using this technology. It promotes novel imaging modality methods and applications that will advance the field.

Join us in advancing generative AI and computer vision

This is just a sampling of Qualcomm’s CVPR 2024 highlights. Come to Qualcomm booth #1931 to learn about their research, see their demos, and discuss AI careers.

Samsung Research obtiene 20 aceptaciones en papel para CVPR 2022 – Samsung Global Newsroom

Búsqueda de Samsung1 Centros de I+D de todo el mundo presentarán un total de 20 tesis en la conferencia Computer Vision and Pattern Recognition (CVPR) de este año.
CVPR es una conferencia internacional de renombre mundial sobre inteligencia artificial (IA) coorganizada por el Instituto de Ingenieros Eléctricos y Electrónicos (IEEE) y la Computer Vision Foundation (CVF) que existe desde 1983. CVPR…

View On WordPress

Text

negativemind 4 years ago

negativemind

Kubric：機械学習用アノテーション付き動画データ生成パイプライン

久しぶりにコンピュータビジョン系の話題。

Google Researchから機械学習用のアノテーション付き動画を生成するためのツールがオープンソースで公開された↓
Kubric

Kubricは、Instance SegmentationマスクやDepthマップ、オプティカルフローなどのリッチなアノテーション付きのセミリアルな動画を作成するためのデータ生成パイプラインです。

　※このプロジェクトはまだアルファ段階であり、大幅に変更される可能性があります。

モチベーションと設計
機械学習システムの訓練と評価、特にunsupervised multi-object video understandingにおいてはより良いデータが必要です。既存のシステムは、toy…

View On WordPress

#bullet #CVPR #data augmentation #Deep Learning #google #instance segmentation #pybullet #semantic segmentation +1

0 notes

Text

negativemind 5 years ago

hloc：SuperGlueで精度を向上させたSfM実装

negativemind

SfM(Structure from Motion)の話題。

機械学習で特徴点マッチングの精度を向上させたSuperGlueをSfMに利用した実装が公開されている↓
hloc – the hierarchical localization toolbox

hlocは、state-of-the-artな6-DoF visual localization用モジュラーツールボックスです。
hlocはHierarchical Localizationや画像の検索の活用、特徴マッチングを実装しており、高速・正確でスケーラブルです。
このコードベースは、CVPR 2020の屋内/屋外のlocalization challengeで優勝しました。SuperGlueとは特徴マッチングのためのグラフニューラルネットワークです。

hlocを使えば、以下のことが可能です:

CVPR…

View On WordPress

Photo

pavel-nosok 5 years ago

pavel-nosok

R&D Roundup: Tech giants unveil breakthroughs at computer vision summit

Computer vision summit CVPR has just (virtually) taken place, and like other CV-focused conferences, there are quite a few interesting papers.

Text

pavel-nosok 5 years ago

Facebook detection challenge winners spot deepfakes with 82% accuracy

pavel-nosok

Facebook detection challenge winners spot deepfakes with 82% accuracy

The top-performing AI model from Facebook’s deepfake detection challenge can pick out distorted videos with 82% accuracy.Read More

View On WordPress

#ai #Artificial Intelligence #Business #category-/Online Communities/Social Networks #CVPR #CVPR 2020 #Deepfakes #Dev +5

0 notes

Text

negativemind 6 years ago

Pix2Pix：CGANによる画像変換

negativemind

GAN, DCGAN, CGANに続きGAN手法のお勉強。
https://blog.negativemind.com/2019/10/05/conditional-gan/

順番に記事を書いてきて、やっとPix2Pixまで来た。
Pix2Pix

Pix2PixはCVPR 2017で発表された論文 Image-to-Image Translation with Conditional Adversarial Networksで提案された生成手法。

Pix2Pixも広い意味ではCGAN (Conditional GAN)の一種。
CGANでは「条件ベクトルと画像のペア」を学習データとしてその対応関係を学習していたが、Pix2Pixでは「条件画像と画像のペア」を学習データとしてその対応関係を学習する。つまり、Pix2Pixは条件ベクトルの代わりに条件画像を使用し、画像から画像への変換問題を…

View On WordPress

Text

abdulaziz2026 6 years ago

abdulaziz2026

تقنية CVPR عرض تقديمي حول توقعات الكلام والإيماءات.

CVPR = Computer Vision and Pattern Recognition

رؤية الكمبيوتر والتعرف على الأنماط

Text

ssblog33 7 years ago

Duvarların Arkasını Gösteren Akıllı Teknolojiler Özel Hayatın Gizliliğini İhlal Ediyor!

ssblog33

Duvarların Arkasını Gösteren Akıllı Teknolojiler Özel Hayatın Gizliliğini İhlal Ediyor!

Duvarların Arkasını Gösteren Akıllı Teknolojiler
Özel Hayatın Gizliliğini İhlal Ediyor!
Yazar: Phillip Schneider, ACTIVIST POST, 9 Temmuz 2018

Çeviren: Ercan Caner, Sun Savunma Net, 3 Eylül 2017

MIT (Massachusetts Institute of Technology) Üniversitesinden bir araştırma ekibi, yapay zekâ kullanarak, katı duvarların ötesindeki insanların hareketlerini izleyebilen ve konumlarını…

View On WordPress

Link

youjirec 7 years ago

visualslam.ai

youjirec

TimeTitle
08:30Welcome: Ronald Clark (Imperial)
08:45Invited Talk: Jitendra Malik (FAIR & UC Berkeley)
09:15Invited Talk: Simon Lucey (CMU Robotics Institute)
09:45Oral 1: SuperPoint: Self-Supervised Interest Point Detection and Description
10:15Morning Break
10:35Invited Talk: Jan Kautz (NVIDIA)
11:05Invited Talk: Daniel Cremers (TUM)
11:35Invited Talk: Katerina Frakiadaki (CMU)
12:05Oral 2: Global Pose Estimation with an Attention-based Recurrent Network
12:35Lunch and Poster Session
14:35Invited Talk: Noah Snavely (Cornell / Google)
15:00Invited Talk: Vladlen Koltun (Intel)
15:30Invited Talk: Matthias Niessner (TUM)
16:30Panel Discussion: Challenges, Potentials and The Future of Machine Learning for SLAM: Chair - Andrew Davison (Imperial)
17:30Dinner

Link

youjirec 7 years ago

cvpaper.challenge

cvpaperchallenge.github.io

Link

youjirec 7 years ago

cvpaperchallenge/CVPR2018_Survey

github.com

Photo

scitechman 8 years ago

scitechman

Finding Faces in a Crowd

Spotting a face in a crowd, or recognizing any small or distant object within a large image, is a major challenge for…

Photo

warningsine 9 years ago

warningsine

(x)

Link

disktnk 9 years ago

10 Papers from ICML and CVPR

leotam.github.io

Text

hurutoriya 9 years ago

CVPR2016 notes

hurutoriya

Note: write about CVPR2016 for me.

http://cvpr2016.thecvf.com/program/

Accept Rates

This year, we received a record 2145 valid submissions to the main conference, of which 1865 were fully reviewed (the others were either administratively rejected for technical or ethical reasons or withdrawn before review). A total of 643 papers were accepted (29.9% of valid submissions). 83 of these were allocated long oral presentations (3.9% of valid submissions) and 123 were allocated short spotlight presentations (total of 9.7% of valid submissions have live presentations). All papers will appear also in the interactive session where posters are presented.

Turtrial

6/26/16 (full day) Low-Rank and Sparse Modeling for Visual Analytics Rene Vidal, Ehsan Elhamifar, Zhouchen Lin, Jiashi Feng, Sheng Li, Yun Fu
6/26/16 (PM only) Computer Vision and Applied Deep Learning with Mathematica Sebastian Bodenstein, Matthias Odisio
6/26/16 (PM only) Mathematics of Deep Learning Rene vidal, Guillermo Sapiro, Joan Bruna, Benjamin Haeffele, Raja Giryes
6/26/16 (PM only) Recent Image Search Techniques Sungeui Yoon and Zhe Lin

Oral

2 Deep Residual Learning for Image Recognition. Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
3 You Only Look Once: Unified, Real-Time Object Detection. Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi
13 Unsupervised Learning of Edges. Yin Li, Manohar Paluri, James M. Rehg, Piotr Dollár
15 Deeply-Recursive Convolutional Network for Image Super-Resolution. Jiwon Kim, Jung Kwon Lee, Kyoung Mu Lee
16 Accurate Image Super-Resolution Using Very Deep Convolutional Networks. Jiwon Kim, Jung Kwon Lee, Kyoung Mu Lee
15 Self-Adaptive Matrix Completion for Heart Rate Estimation From Face Videos Under Realistic Conditions. Sergey Tulyakov, Xavier Alameda-Pineda, Elisa Ricci, Lijun Yin, Jeffrey F. Cohn, Nicu Sebe
4 Personalizing Human Video Pose Estimation. James Charles, Tomas Pfister, Derek Magee, David Hogg, Andrew Zisserman
14 Scalable Sparse Subspace Clustering by Orthogonal Matching Pursuit. Chong You, Daniel Robinson, René Vidal

Sporlight Session

19 Affinity CNN: Learning Pixel-Centric Pairwise Relations for Figure/Ground Embedding. Michael Maire, Takuya Narihira, Stella X. Yu
21 Groupwise Tracking of Crowded Similar-Appearance Targets From Low-Continuity Image Sequences. Hongkai Yu, Youjie Zhou, Jeff Simmons, Craig P. Przybyla, Yuewei Lin, Xiaochuan Fan, Yang Mi, Song Wang
23 What Players Do With the Ball: A Physically Constrained Interaction Modeling. Andrii Maksai, Xinchao Wang, Pascal Fua
6 We Are Humor Beings: Understanding and Predicting Visual Humor. Arjun Chandrasekaran, Ashwin K. Vijayakumar, Stanislaw Antol, Mohit Bansal, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh
7 Where to Look: Focus Regions for Visual Question Answering. Kevin J. Shih, Saurabh Singh, Derek Hoiem
6 Simultaneous Clustering and Model Selection for Tensor Affinities. Zhuwen Li, Shuoguang Yang, Loong-Fah Cheong, Kim-Chuan Toh
7 Discriminatively Embedded K-Means for Multi-View Clustering. Jinglin Xu, Junwei Han, Feiping Nie

Poster

40 Pairwise Matching Through Max-Weight Bipartite Belief Propagation. Zhen Zhang, Qinfeng Shi, Julian McAuley, Wei Wei, Yanning Zhang, Anton van den Hengel
26 Video2GIF: Automatic Generation of Animated GIFs From Video. Michael Gygli, Yale Song, Liangliang Cao
37 SketchNet: Sketch Classification With Web Images. Hua Zhang, Si Liu, Changqing Zhang, Wenqi Ren, Rui Wang, Xiaochun Cao
49 Similarity Learning With Spatial Constraints for Person Re-Identification. Dapeng Chen, Zejian Yuan, Badong Chen, Nanning Zheng
62 Efficient Large-Scale Similarity Search Using Matrix Factorization. Ahmet Iscen, Michael Rabbat, Teddy Furon
73 Eye Tracking for Everyone. Kyle Krafka, Aditya Khosla, Petr Kellnhofer, Suchendra Bhandarkar, Wojciech Matusik, Antonio Torralba
78 iLab-20M: A Large-Scale Controlled Object Dataset to Investigate Deep Learning. Ali Borji, Saeed Izadi, Laurent Itti
68 Iterative Instance Segmentation. Ke Li, Bharath Hariharan, Jitendra Malik

Link

seonhooh 10 years ago

OpenFace

seonhooh

CVPR 2015에 소개된 DNN을 이용한 얼굴인식 논문¹을 Python과 Torch로 구현한 것이다.

Florian Schroff, Dmitry Kalenichenko, and James Philbin, FaceNet: A Unified Embedding for Face Recognition and Clustering ↩︎

Text

hurutoriya 10 years ago

CVPR Challenge のモチベーションが素晴らしい

hurutoriya

CVPR Challenge という産総研の片岡さんと東京電機大学中村明生研究室が共同で行っているCVPR2015を読破しようという取り組みがある。読んだ論文をSlideShareにまとめて共有されていますが、なによりも始めた動動機が良いなと思ったので記事にしました。 [CVPR Challenge in SlideShare](http://www.slideshare.net/cvpaperchallenge) [片岡さんのBlogエントリ](http://hirokatsu16.blog.fc2.com/blog-entry-108.html)

cvpaper.challengeについて from cvpaper. challenge