#CVPR

20 posts loaded — scroll for more

Link
futurride
futurride

Nvidia Research kicks off CVPR with Autonomous Grand Challenge win

Nvidia Research kicks off CVPR with Autonomous Grand Challenge win
futurride.com
Text
govindhtech
govindhtech

Computer Vision at CVPR 2024 with Qualcomm AI research

CVPR 2024 workshops

Qualcomm Technologies is excited to present five research papers on the main track, 11 on co-located workshops, 10 demos, two co-organized workshops, and other research community events at the Computer Vision and Pattern Recognition Conference (CVPR) 2024 on Monday, June 17. CVPR honors’ the greatest research and previews the future of generative artificial intelligence (AI) for photos and videos, XR, automotive, computational photography, robotic vision, and more with a 25% acceptance rate.

At top conferences like CVPR, meticulously peer-reviewed papers establish the new state of the art (SOTA) and contribute to the community. Qualcomm Technologies has accepted several papers in generative AI and perception that advance computer vision research.

Generative AI research

Text-to-image diffusion models enable fast, high-definition visual content creation at high computational expense. The paper “Clockwork UNets for Efficient Diffusion Models” introduces Clockwork Diffusion to improve model efficiency. They use computationally intensive UNet-based denoising in every generation. However, the paper notes that not all operations affect output quality equally.

High-resolution feature maps make UNet layers sensitive to slight perturbations, while low-resolution maps have less effect. Based on this fact, Clockwork Diffusion proposes periodically reusing computation from previous denoising steps to approximate low-resolution feature maps.

Reducing FLOPs by 32% improves Stable Diffusion v1.5 perceptual scores. The approach works on low-power edge devices, making it scalable for real-world applications. Clockwork is covered in this blog post on efficient generative AI for photos and videos.

CVPR papers

Solving optical flow estimation data scarcity

Qualcomm study “OCAI: Improving Optical Flow Estimation by Occlusion and Consistency Aware Interpolation” proposes a novel method named OCAI to handle data scarcity in optical flow estimation:

  • Generates intermediate video frames and optical fluxes for robust frame interpolation.
  • Uses occlusion-aware forward-warping to resolve pixel value discrepancies and fill missing values.

This research also introduces a teacher-student semi-supervised learning strategy that trains a student model using interpolated frames and flows, boosting optical flow accuracy. The benchmarks show that OCAI has perceptually better interpolation and optical flow accuracy.

Better optical flow accuracy can improve robot navigation, car movement detection, and more.

Enhancing depth completion accuracy

With their article “DeCoTR: Enhancing Depth Completion with 2D and 3D Attentions,” Qualcomm AI Research accomplished state-of-the-art depth completion for computer vision. The research proposes combining 2D and 3D attentions to improve depth completion accuracy without repetitive spatial propagations. Look for 2D features in bottleneck and skip connections:

  • The baseline convolutional depth completion model is improved to match sophisticated transformer-based models.
  • The authors elevate 2D features to generate a 3D point cloud and process it with a 3D point transformer to let the model explicitly learn and utilise 3D geometric features.

DeCoTR shows superior generalizability over current techniques by setting new SOTA performance on depth completion benchmarks. This has potential applications in autonomous driving, robotics, and augmented reality, where depth estimation is critical.

Enhancing stereo video compression

New multimodal technologies like virtual reality and autonomous driving have boosted need for efficient multi-view video compression.

Existing stereo video compression methods sequentially compress left and right views, limiting parallelization and runtime efficiency. Low-Latency Neural Stereo Streaming (LLSS) is a new parallel stereo video coding technology. This is addressed by LLSS:

  • A new bidirectional feature shifting module uses mutual information to encode views effectively using a joint cross-view prior model for entropy coding.
  • LLSS may process left and right images simultaneously, reducing latency and increasing rate-distortion performance over neural and conventional codecs.

The approach saves 15.8% to 50.6% bitrates on benchmark datasets, outperforming SOTA.

Allowing natural facial expressions

With so many AI-generated faces and characters, facial expressions and transfers must be more natural. This affects gaming and content. Facial action unit (AU) intensity quantifies fine-grained expression behaviours, making it ideal for facial expression modulation. In “AUEditNet: Dual-Branch Facial Action Unit Intensity Manipulation with Implicit Disentanglement,” the scientists accurately manipulated AU intensity in high-resolution synthetic facial images.

  • AUEditNet, trained with 18 individuals, manipulates intensity across 12 AUs well.
  • Qualcomm use a dual branch architecture to separate facial characteristics and identification without loss functions or high batch sizes.
  • Conditioning modification on intensity values or target images without retraining the network or adding estimators is possible with AUEditNet.

This process for facial attribute modification shows promise despite the dataset’s small subject count.

Displays of technology

Qualcomm demonstrate their generative AI and computer vision research in real life to prove its practicality. Visit booth #1931 to try these technologies. Here are some demos to showcase.

Generative AI

Stable Diffusion with LoRA adapters on Android smartphones is shown on generative AI. Personal or creative preferences can be used to create high-quality Stable Diffusion photos with LoRA adapters. Users might choose a LoRA adaptor and set its strength to get the desired image.

Android phone multimodal LLM is also shown. Qualcomm present Large Language and Vision Assistant (LLaVA), a more than 7 billion-parameter LMM that can produce multi-turn conversations about images using text and photographs. Reference design enabled by Snapdragon 8 Gen 3 mobile platform operated LLaVA on smartphone at a responsive token rate.

Computer vision

Other interesting computer vision demonstrations are at booth:

  • Improved real-time and precise segmentation on edge devices: This technology can segment known and unknown classes in photos and videos using user inputs like touch, drawing a box, or inputting text.
  • Generational video portrait relighting technology: Works with generative AI for background replacement to improve video chat facial realism.
  • AI helper on device: Uses an audio-driven 3D avatar with lip-syncing and expressive face motions to engage users.

Qualcomm also demonstrate a cutting-edge face performance capture and retargeting system that can capture complex facial expressions and transform them onto digital avatars, making digital interactions more realistic and personalised.

Autonomous vehicle technology or AV technology

Qualcomm’s Snapdragon Ride sophisticated driver-assistance system demos perception, mapping, fusion, and vehicle control using their modular stack approach for automotive applications. Qualcomm showcase includes driver monitoring system improvements and generative AI-augmented camera sensor data for model training in challenging settings.

Collaborative workshops

  • Efficient Large Vision Models (eLVM): Many computer vision jobs now use efficient large vision models (eLVM). Qualcomm AI Research is co-organizing the Efficient Large Vision Models Workshop to increase LVM accessibility to researchers and practitioners by focusing on computing efficiency. Exploring concepts for effective LVM adaptation to downstream tasks and domains without rigorous model training and fine-tuning empowers the community to do research with limited computational budget. Accelerating LVM inference allows them to be used in real-time applications on low-computing platforms like cars and phones.
  • Omnidirectional Computer Vision (OmniCV): A wide field of vision makes omnidirectional cameras popular in automotive, security, photography, and augmented/virtual reality. To connect omnidirectional vision research and practice, Qualcomm AI Research is co-organizing the Omnidirectional Computer Vision Workshop. This session bridges formative research backing these achievements to commercial products using this technology. It promotes novel imaging modality methods and applications that will advance the field.

Join us in advancing generative AI and computer vision

This is just a sampling of Qualcomm’s CVPR 2024 highlights. Come to Qualcomm booth #1931 to learn about their research, see their demos, and discuss AI careers.

Read more on Govindhtech.com

Text
earhartsplane
earhartsplane

Since we’re all talking about plagiarism now, I’d like to share this video which came out last year about a paper accepted at the CVPR 2022:

For the people not in the know, the Computer Vision and Pattern Recognition conference is the biggest conference in computer science. Last year, in 2022, the paper featured in the video got accepted. A few days later, this video was posted. The first author, a PhD student, apologized and the paper was retracted and removed from the proceedings. Hilariously, the first reaction of the co-authors, including a professor at the Seoul National University, was to say that they had nothing to do with it.

[[MORE]]

My point here is that scientific papers are not rigorously checked for plagiarism, and a background in academia tells you absolutely nothing about whether or not someone will be diligent in avoiding plagiarism. The biggest difference is that there are consequences if you’re caught.

I also don’t want people to be too harsh on the first author of this paper, or to think the situation is equivalent to the whole Somerton debacle. For starters, you don’t get paid for publishing papers, you (or more commonly your university) pay the publishers. But the phrase publish or perish exists for a reason, and everyone in the field wants to get published in the CVPR, because it’s supposed to show that you’re great at research. Additionally, the number of papers and the prestige of the venues they’re published in criteria on which you will be evaluated as a researcher and a university employee.

The way I see it, there are basically two kinds of plagiarism that are shown in the video. The first one concerns sentences that are lifted completely unchanged from other papers. This is bad, and it is plagiarism, but I can see how this would happen. Most instances of this appear in the introduction and on background information, so if you’re insecure about your mastery of English and it’s not about your contribution anyway, I can understand how you would take the shortcut of copy-pasting and tell yourself that it’s just so that the rest of the paper makes sense, and why waste time on phrasing things differently if others have done it already, and it’s not like there are a million way to write these equations anyways.

Let me be clear. I don’t approve, or condone. It’s still erasing the work of the people who took the time and pain to phrase these things. It’s still plagiarism. But I understand how you could get to that point.

The second kind of plagiarism is a way bigger deal in my opinion. At 0:37 , we can see that one of the contributions of the paper is also lifted from another paper. Egregiously, the passage includes “To the best of our knowledge, this is the first […]” , which is a hell of a thing to copy-paste. So this is not only lazily passing other people’s words as your own, it’s also pretending that you’re making a contribution you damn well know other people have already done. I also wasn’t able to find a version of the plagiarized article that had been published in a peer-reviewed venue, which might mean that the authors submitted it, got rejected, and published it on arXiv (an website on which authors can put their papers so that they’re accessible to the public, but doesn’t “count” as a publication because it’s not peer-reviewed. You can also put papers that are under review or have been published on there as long as you’re careful with the copyrights and double-blind process). And then parts of it were published in the CVPR under someone else’s name.

I think there’s also a third kind of plagiarism going on here, one that is incredibly common in academia, but that is not shown in the video. That’s the FIVE other authors, including a professor, who were apparently happy to add their name to the paper but obviously didn’t do anything meaningful since they didn’t notice how much plagiarism was going on.

Text
samsungnewsonline
samsungnewsonline

Samsung Research obtiene 20 aceptaciones en papel para CVPR 2022 – Samsung Global Newsroom


Búsqueda de Samsung1 Centros de I+D de todo el mundo presentarán un total de 20 tesis en la conferencia Computer Vision and Pattern Recognition (CVPR) de este año.
CVPR es una conferencia internacional de renombre mundial sobre inteligencia artificial (IA) coorganizada por el Instituto de Ingenieros Eléctricos y Electrónicos (IEEE) y la Computer Vision Foundation (CVF) que existe desde 1983. CVPR…


View On WordPress

Text
negativemind
negativemind

Kubric:機械学習用アノテーション付き動画データ生成パイプライン


久しぶりにコンピュータビジョン系の話題。

Google Researchから機械学習用のアノテーション付き動画を生成するためのツールがオープンソースで公開された↓
Kubric

Kubricは、Instance SegmentationマスクやDepthマップ、オプティカルフローなどのリッチなアノテーション付きのセミリアルな動画を作成するためのデータ生成パイプラインです。

 ※このプロジェクトはまだアルファ段階であり、大幅に変更される可能性があります。

モチベーションと設計
機械学習システムの訓練と評価、特にunsupervised multi-object video understandingにおいてはより良いデータが必要です。既存のシステムは、toy…


View On WordPress

Text
negativemind
negativemind

hloc:SuperGlueで精度を向上させたSfM実装

SfM(Structure from Motion)の話題。

機械学習で特徴点マッチングの精度を向上させたSuperGlueをSfMに利用した実装が公開されている↓

hloc – the hierarchical localization toolbox

hlocは、state-of-the-artな6-DoF visual localization用モジュラーツールボックスです。
hlocはHierarchical Localizationや画像の検索の活用、特徴マッチングを実装しており、高速・正確でスケーラブルです。
このコードベースは、CVPR 2020の屋内/屋外のlocalization challengeで優勝しました。SuperGlueとは特徴マッチングのためのグラフニューラルネットワークです。

hlocを使えば、以下のことが可能です:

  • CVPR…

View On WordPress

Photo
pavel-nosok
pavel-nosok

R&D Roundup: Tech giants unveil breakthroughs at computer vision summit

Computer vision summit CVPR has just (virtually) taken place, and like other CV-focused conferences, there are quite a few interesting papers.

photo
Text
pavel-nosok
pavel-nosok

Facebook detection challenge winners spot deepfakes with 82% accuracy

Facebook detection challenge winners spot deepfakes with 82% accuracy

Technology NewsJune 4, 2020 / 5:38 PM / Updated 30 minutes agoFacebook to apply state media labels on Russian, Chinese outletsKatie Paul3 Min ReadSAN FRANCISCO (Reuters) - Facebook Inc (FB.O) will start labeling Russian, Chinese and other state-controlled media organizations, and later this summer will block any ads from such outlets that target U.S. users, it said on Thursday.FILE PHOTO: A Facebook sign is seen at the second China International Import Expo (CIIE) in Shanghai, China November 6, 2019. REUTERS/Aly Song

The top-performing AI model from Facebook’s deepfake detection challenge can pick out distorted videos with 82% accuracy.Read More

View On WordPress

Text
negativemind
negativemind

Pix2Pix:CGANによる画像変換

GAN, DCGAN, CGANに続きGAN手法のお勉強。
https://blog.negativemind.com/2019/10/05/conditional-gan/

順番に記事を書いてきて、やっとPix2Pixまで来た。

Pix2Pix

Pix2PixCVPR 2017で発表された論文 Image-to-Image Translation with Conditional Adversarial Networksで提案された生成手法。

Pix2Pixも広い意味ではCGAN (Conditional GAN)の一種。
CGANでは「条件ベクトルと画像のペア」を学習データとしてその対応関係を学習していたが、Pix2Pixでは「条件画像と画像のペア」を学習データとしてその対応関係を学習する。つまり、Pix2Pixは条件ベクトルの代わりに条件画像を使用し、画像から画像への変換問題を…

View On WordPress

Text
abdulaziz2026
abdulaziz2026

تقنية CVPR عرض تقديمي حول توقعات الكلام والإيماءات.

CVPR = Computer Vision and Pattern Recognition

رؤية الكمبيوتر والتعرف على الأنماط

Text
ssblog33
ssblog33

Duvarların Arkasını Gösteren Akıllı Teknolojiler Özel Hayatın Gizliliğini İhlal Ediyor!

Duvarların Arkasını Gösteren Akıllı Teknolojiler Özel Hayatın Gizliliğini İhlal Ediyor!

Duvarların Arkasını Gösteren Akıllı Teknolojiler
Özel Hayatın Gizliliğini İhlal Ediyor
!

Yazar: Phillip Schneider, ACTIVIST POST, 9 Temmuz 2018

Çeviren: Ercan Caner,  Sun Savunma Net, 3 Eylül 2017

MIT (Massachusetts Institute of Technology) Üniversitesinden bir araştırma ekibi, yapay zekâ kullanarak, katı duvarların ötesindeki insanların hareketlerini izleyebilen ve konumlarını…

View On WordPress

Link
youjirec
youjirec

visualslam.ai

TimeTitle
08:30Welcome: Ronald Clark (Imperial)
08:45Invited Talk: Jitendra Malik (FAIR & UC Berkeley)
09:15Invited Talk: Simon Lucey (CMU Robotics Institute)
09:45Oral 1: SuperPoint: Self-Supervised Interest Point Detection and Description
10:15Morning Break
10:35Invited Talk: Jan Kautz (NVIDIA)
11:05Invited Talk: Daniel Cremers (TUM)
11:35Invited Talk: Katerina Frakiadaki (CMU)
12:05Oral 2: Global Pose Estimation with an Attention-based Recurrent Network
12:35Lunch and Poster Session
14:35Invited Talk: Noah Snavely (Cornell / Google)
15:00Invited Talk: Vladlen Koltun (Intel)
15:30Invited Talk: Matthias Niessner (TUM)
16:30Panel Discussion: Challenges, Potentials and The Future of Machine Learning for SLAM: Chair - Andrew Davison (Imperial)
17:30Dinner

Link
youjirec
youjirec

cvpaper.challenge

cvpaper.challenge
cvpaperchallenge.github.io
Link
youjirec
youjirec

cvpaperchallenge/CVPR2018_Survey

cvpaperchallenge/CVPR2018_Survey
github.com
Photo
scitechman
scitechman

Finding Faces in a Crowd

Spotting a face in a crowd, or recognizing any small or distant object within a large image, is a major challenge for…

photo
Photo
warningsine
warningsine
photo
Link
disktnk
disktnk

10 Papers from ICML and CVPR

10 Papers from ICML and CVPR
leotam.github.io
Text
hurutoriya
hurutoriya

CVPR2016 notes

Note: write about CVPR2016 for me.

http://cvpr2016.thecvf.com/program/

Accept Rates

This year, we received a record 2145 valid submissions to the main conference, of which 1865 were fully reviewed (the others were either administratively rejected for technical or ethical reasons or withdrawn before review). A total of 643 papers were accepted (29.9% of valid submissions). 83 of these were allocated long oral presentations (3.9% of valid submissions) and 123 were allocated short spotlight presentations (total of 9.7% of valid submissions have live presentations). All papers will appear also in the interactive session where posters are presented.

Turtrial

  • 6/26/16 (full day) Low-Rank and Sparse Modeling for Visual Analytics Rene Vidal, Ehsan Elhamifar, Zhouchen Lin, Jiashi Feng, Sheng Li, Yun Fu
  • 6/26/16 (PM only) Computer Vision and Applied Deep Learning with Mathematica Sebastian Bodenstein, Matthias Odisio
  • 6/26/16 (PM only) Mathematics of Deep Learning Rene vidal, Guillermo Sapiro, Joan Bruna, Benjamin Haeffele, Raja Giryes
  • 6/26/16 (PM only) Recent Image Search Techniques Sungeui Yoon and Zhe Lin

Oral

  • 2 Deep Residual Learning for Image Recognition. Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
  • 3 You Only Look Once: Unified, Real-Time Object Detection. Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi
  • 13 Unsupervised Learning of Edges. Yin Li, Manohar Paluri, James M. Rehg, Piotr Dollár
  • 15 Deeply-Recursive Convolutional Network for Image Super-Resolution. Jiwon Kim, Jung Kwon Lee, Kyoung Mu Lee
  • 16 Accurate Image Super-Resolution Using Very Deep Convolutional Networks. Jiwon Kim, Jung Kwon Lee, Kyoung Mu Lee
  • 15 Self-Adaptive Matrix Completion for Heart Rate Estimation From Face Videos Under Realistic Conditions. Sergey Tulyakov, Xavier Alameda-Pineda, Elisa Ricci, Lijun Yin, Jeffrey F. Cohn, Nicu Sebe
  • 4 Personalizing Human Video Pose Estimation. James Charles, Tomas Pfister, Derek Magee, David Hogg, Andrew Zisserman
  • 14 Scalable Sparse Subspace Clustering by Orthogonal Matching Pursuit. Chong You, Daniel Robinson, René Vidal

Sporlight Session

  • 19 Affinity CNN: Learning Pixel-Centric Pairwise Relations for Figure/Ground Embedding. Michael Maire, Takuya Narihira, Stella X. Yu
  • 21 Groupwise Tracking of Crowded Similar-Appearance Targets From Low-Continuity Image Sequences. Hongkai Yu, Youjie Zhou, Jeff Simmons, Craig P. Przybyla, Yuewei Lin, Xiaochuan Fan, Yang Mi, Song Wang
  • 23 What Players Do With the Ball: A Physically Constrained Interaction Modeling. Andrii Maksai, Xinchao Wang, Pascal Fua
  • 6 We Are Humor Beings: Understanding and Predicting Visual Humor. Arjun Chandrasekaran, Ashwin K. Vijayakumar, Stanislaw Antol, Mohit Bansal, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh
  • 7 Where to Look: Focus Regions for Visual Question Answering. Kevin J. Shih, Saurabh Singh, Derek Hoiem
  • 6 Simultaneous Clustering and Model Selection for Tensor Affinities. Zhuwen Li, Shuoguang Yang, Loong-Fah Cheong, Kim-Chuan Toh
  • 7 Discriminatively Embedded K-Means for Multi-View Clustering. Jinglin Xu, Junwei Han, Feiping Nie

Poster

  • 40 Pairwise Matching Through Max-Weight Bipartite Belief Propagation. Zhen Zhang, Qinfeng Shi, Julian McAuley, Wei Wei, Yanning Zhang, Anton van den Hengel
  • 26 Video2GIF: Automatic Generation of Animated GIFs From Video. Michael Gygli, Yale Song, Liangliang Cao
  • 37 SketchNet: Sketch Classification With Web Images. Hua Zhang, Si Liu, Changqing Zhang, Wenqi Ren, Rui Wang, Xiaochun Cao
  • 49 Similarity Learning With Spatial Constraints for Person Re-Identification. Dapeng Chen, Zejian Yuan, Badong Chen, Nanning Zheng
  • 62 Efficient Large-Scale Similarity Search Using Matrix Factorization. Ahmet Iscen, Michael Rabbat, Teddy Furon
  • 73 Eye Tracking for Everyone. Kyle Krafka, Aditya Khosla, Petr Kellnhofer, Suchendra Bhandarkar, Wojciech Matusik, Antonio Torralba
  • 78 iLab-20M: A Large-Scale Controlled Object Dataset to Investigate Deep Learning. Ali Borji, Saeed Izadi, Laurent Itti
  • 68 Iterative Instance Segmentation. Ke Li, Bharath Hariharan, Jitendra Malik

Link
seonhooh
seonhooh

OpenFace

CVPR 2015에 소개된 DNN을 이용한 얼굴인식 논문1을 Python과 Torch로 구현한 것이다.


  1. Florian Schroff, Dmitry Kalenichenko, and James Philbin, FaceNet: A Unified Embedding for Face Recognition and Clustering ↩︎

Text
hurutoriya
hurutoriya

CVPR Challenge のモチベーションが素晴らしい

CVPR Challenge という産総研の片岡さんと東京電機大学中村明生研究室が共同で行っているCVPR2015を読破しようという取り組みがある。 読んだ論文をSlideShareにまとめて共有されていますが、なによりも始めた動動機が良いなと思ったので記事にしました。 [CVPR Challenge in SlideShare](http://www.slideshare.net/cvpaperchallenge) [片岡さんのBlogエントリ](http://hirokatsu16.blog.fc2.com/blog-entry-108.html)