15-16 April, 2025
3/F, Main Library, The University of Hong Kong, HONG KONG SAR
Navigate HereDavid Alexander Forsyth is currently Fulton-Watson-Copp chair in computer science at U. Illinois at Urbana-Champaign, where he moved from U.C Berkeley, where he was also full professor. Prof. Forsyth has occupied the Fulton-Watson-Copp chair in Computer Science at the University of Illinois since 2014. He has published over 170 papers on computer vision, computer graphics and machine learning. He has served as program co-chair for IEEE Computer Vision and Pattern Recognition in 2000, 2011, 2018 and 2021, general co-chair for CVPR 2006 and 2015 and ICCV 2019, program co-chair for the European Conference on Computer Vision 2008, and is a regular member of the program committee of all major international conferences on computer vision. He has served six years on the SIGGRAPH program committee, and is a regular reviewer for that conference. He has received best paper awards at the International Conference on Computer Vision and at the European Conference on Computer Vision. He received an IEEE technical achievement award for 2005 for his research. Prof. Forsyth became an IEEE Fellow in 2009, and an ACM Fellow in 2014. His textbook, "Computer Vision: A Modern Approach" (joint with J. Ponce and published by Prentice Hall) is now widely adopted as a course text (adoptions include MIT, U. Wisconsin-Madison, UIUC, Georgia Tech and U.C. Berkeley). A further textbook, “Probability and Statistics for Computer Science”, is in print; yet another (“Applied Machine Learning”) has just appeared. He has served two terms as Editor in Chief, IEEE TPAMI. He has served on a number of scientific advisory boards.
Dr. Yinqiang Zheng received his Doctoral degree of engineering from the Department of Mechanical and Control Engineering, Tokyo Institute of Technology, Tokyo, Japan, in 2013. He is currently a Full Professor in the Next Generation Artificial Intelligence Research Center, The University of Tokyo, Japan, leading the Optical Sensing and Camera System Laboratory (OSCARS Lab.). He has published a serial of research papers that bridge AI and optical imaging, surrounding the novel paradigms of ‘Optics for Better AI’ and ‘AI for Best Optics’. He has served as area chair for CVPR, ICCV, ICML, ICLR, NeurIPS, MM, 3DV, ACCV, ISAIR, DICTA and MVA. He is a foreign fellow of the Engineering Academy of Japan, and the recipient of the Konica Minolta Image Science Award and Funai Academic Award.
Seung-Hwan Baek is an Assistant Professor of Computer Science and Engineering at POSTECH, also affiliated with the Graduate School of AI, POSTECH. He leads POSTECH Computational Imaging Group and serves as a faculty co-director of POSTECH Computer Graphics Lab. Prof. Baek worked as a postdoctoral researcher at Princeton University, and obtained his Ph.D. in Computer Science from KAIST. Prof. Baek's research, situated at the intersection of computer graphics, vision, AI, and optics, focuses on capturing, modeling, and analyzing high-dimensional visual data originating from complex interplays between light, material appearance, and geometry. Prof. Baek has received awards, including Frontiers of Science Award from the International Congress of Basic Science, Outstanding Ph.D. Thesis Award in IT from the Korean Academy of Science and Technology, SIGGRAPH Asia Doctoral Consortium, Microsoft Research Asia Ph.D. Fellowship, and both ACCV Best Application Paper Award and Best Demo Award.
Yuanmu Yang is an Associate Professor in the Department of Precision Instrument at Tsinghua University. He earned his B.Eng. in Optoelectronics from Tianjin University and Nankai University in 2011 and his Ph.D. in Interdisciplinary Materials Science from Vanderbilt University in 2015. He was a postdoctoral researcher at Sandia National Laboratories, USA from 2015 to 2017. He then worked at Intellectual Ventures, a Seattle-based startup incubator, from 2017 to 2018, and was a founding team member of metasurface-based solid-state lidar startup Lumotive. His research focuses on the area of meta-optics. He has published more than 50 journal articles, including 2 in Nature Photonics and 1 in Nature Physics, received over 6000 citations according to Google Scholar, and was selected as a “Highly-cited Researcher in China” by Elsevier in 2023. He has been granted over 10 China and US patents. His recognitions include the Jin-Guofan Young Scientist Award given by the China Instrument and Control Society as well as the Forbes China “30 under 30” in 2018.
Dr. He Sun is an Assistant Professor in the National Biomedical Imaging Center, Peking University, China. Prior to starting at Peking University, he was a postdoctoral researcher in the Department of Computing and Mathematical Sciences at California Institute of Technology. He received his Ph.D. in Mechanical and Aerospace Engineering from Princeton University in 2019 and his bachelor's degree in Engineering Mechanics and Economics from Peking University in 2014. His research primarily focuses on computational imaging, which tightly integrates optics, control, signal processing and machine learning to push the boundary of scientific imaging. His past work has contributed to multiple real science missions, including the Event Horizon Telescope for black hole interferometric imaging.
Wenzheng Chen is a tenure-track Assistant Professor at Wangxuan Institute of Computer Technology, Peking University, where he specializes in computational photography and 3D vision. He is also affiliated with the Visual Computing and Learning Lab. Prior to joining Peking University, he served as a research scientist at NVIDIA Toronto AI Lab. He received his Ph.D. from the University of Toronto and earned both his Master's and Bachelor's degrees from Shandong University. His research primarily explores the integration of various imaging systems—including digital cameras, LiDAR, structured light, and SPAD—with deep learning frameworks to enhance 3D perception by accurately predicting attributes such as geometry, texture, surface material, and environmental lighting. His innovative work in differentiable rendering has successfully transformed into an Omniverse product, while his structured light research earned the ICCP 2021 Best Poster Award and his work in non-line-of-sight imaging has been exhibited at the Princeton Art of Science Exhibition.
Hongzhi Wu is a Professor in the State Key Lab of CAD&CG, Zhejiang University, China. He obtained his Ph.D. from Yale University, and was awarded NSFC Excellent Young Scholar, and Luzengyong CAD&CG High Tech Award (1st prize). Hongzhi focuses on the acquisition and reconstruction of physical information, including complex high-dimensional appearance, 3D surfaces and volumes. His research leads to publications in top venues including SIGGRAPH/SIGGRAPH Asia, CVPR and ICCV, two published books, as well as a number of state-of-the-art high-performance illumination multiplexing devices. He is on the editorial board of IEEE TVCG, JCAD & VCIBA, and has served on the program committees of various international conferences, including SIGGRAPH, SIGGRAPH Asia, VR, EG, PG, EGSR, I3D and HPG. More details can be found from https://svbrdf.github.io/.
Dr. Tan is a Professor in the Department of Electronic and Computer Engineering at the Hong Kong University of Science and Technology (HKUST). Before joining HKUST, he served as the director of the XR Lab at Alibaba DAMO Academy from 2019 to 2022, an Associate Professor at Simon Fraser University (SFU) in Canada from 2014 to 2019, and an assistant and Associate Professor at the National University of Singapore (NUS) from 2007 to 2014. Dr. Tan received his PhD from HKUST in 2007 and his Master's and Bachelor's degrees from Shanghai Jiao Tong University (SJTU) in 2003 and 2000, respectively. He specializes in computer vision, computer graphics, and robotics, with a research focus on 3D vision.
Professor Tianfan Xue is an Assistant Professor in the Department of Information Engineering at the Chinese University of Hong Kong. Prior to this, he spent over five years as a Senior Staff Engineer at Google Research. He earned his Ph.D. from the Computer Science and Artificial Intelligence Laboratory (CSAIL) at the Massachusetts Institute of Technology (MIT) under the supervision of Professor William T. Freeman. In 2011, he obtained his M.Phil. from the Chinese University of Hong Kong under the guidance of Professor Xiaoou Tang, and in 2009, he received his bachelor's degree from Tsinghua University. His research focuses on computational photography, computer vision and graphics, and machine learning. His work on reflectance technology has been adopted by Google Photoscan, an app with over 10 million users; his fast bilateral learning technique has been integrated into Google Tensor chips; his night photography algorithm won DPReview's Best Innovation Award; and his bilateral NeRF algorithm received a Best Paper honorable mention at SIGGRAPH. He also serves as a reviewer for various top conferences and journals and has been a web chair or area chair for conferences such as CVPR, ICCV, WACV, ACM MM, and NeurIPS.
Xiaojuan Qi is currently an Assistant Professor in the Department of Electrical and Electronic Engineering, the University of Hong Kong (HKU). Before joining HKU, she was a postdoctoral researcher at the University of Oxford, UK. She received her Ph.D. from the Chinese University of Hong Kong (CUHK) in 2018 and her B.Eng. from Shanghai Jiao Tong University (SJTU) in 2014. From September 2016 to November 2016, she was a visiting student in the Machine Learning Group, University of Toronto. She has carried out an internship at Intel Intelligent Systems Lab from May 2017 to November 2017. She has won several awards such as the first place of ImageNet Semantic Parsing Challenge, Outstanding Reviewer in ICCV’17 and ICCV’19, CVPR’18 Doctoral Consortium Travel Award and Hong Kong PhD Fellowship (2014 – 2018).
Taku Komura joined the University of Hong Kong in 2020. Before joining HKU, he worked at the University of Edinburgh (2006-2020), City University of Hong Kong (2002-2006) and RIKEN (2000-2002). He received his BSc, MSc and PhD in Information Science from University of Tokyo. His research has focused on data-driven character animation, physically-based character animation, crowd simulation, 3D modelling, cloth animation, anatomy-based modelling and robotics. Recently, his main research interests have been on physically-based animation and the application of machine learning techniques for animation synthesis. He received the Royal Society Industry Fellowship (2014) and the Google AR/VR Research Award (2017).
Hongbo Fu is a Professor in the Division of Arts and Machine Creativity at HKUST, serving as Technical Papers Assistant Chair for SIGGRAPH Asia 2025. Prior to this, he worked at the School of Creative Media, City University of Hong Kong, for over 15 years. He conducted postdoctoral research at the Imager Lab, University of British Columbia, Canada, and the Department of Computer Graphics, Max-Planck-Institut Informatik, Germany. He received his Ph.D. in Computer Science from HKUST in 2007 and BS in Information Sciences from Peking University, China, in 2002. His primary research interests include computer graphics, human-computer interaction, and computer vision. His work has resulted in over 100 scientific publications, including 70+ papers in top graphics/vision journals and 30+ in leading vision/HCI conferences. His recent research received a Silver Medal at Special Edition 2022 Inventions Geneva Evaluation Days, Best Demo awards from the Emerging Technologies program at ACM SIGGRAPH Asia 2013 and 2014, and Best Paper awards at CAD/Graphics 2015 and UIST 2019.
I will discuss three mysteries at the heart of the current vision agenda: What do image generators "know" and not "know"? What should be represented explicitly? How can we tell how well an image generator works?
I will show strong evidence that depth, normal and albedo can be extracted from two kinds of image generator, with minimal inconvenience or training data. This suggests that image generators might "know" more about scene appearance than we realize. I will show that there are important scene properties that image generators very reliably get wrong. These include shadow geometry and perspective geometry. Similarly, video generators get object constancy and properties like momentum conservation wrong.
If generators know intrinsics, why extract them from images? Perhaps because some other downstream process -- rendering, movement, whatever -- will use them. But rendering seems to work much better if you use latents, and numerous systems move using entirely latent representations. The only good argument in favor of an exposed image representation is that it might be easy to interact or compute with. This suggests paying more attention to representations that suppress detail in favor of capturing major effects in a compact form. I will show a method capable of computing such representations very efficiently for vast numbers of images.
Fixing image generators requires an evaluation procedure that will tell you if the generator got better. Image generators are evaluated using entirely irrational procedures (doesn't this look good; FID; and so on). But, looked at closely, most image generators and conditional image generators don't work very well. Part of the problem is we simply don't know how to do evaluate them in a sensible way. I will discuss some progress in this respect.
Diffusion models excel in solving imaging inverse problems due to their exceptional ability to model complex image priors. When integrated with image formation models, they enable a physics-guided diffusion process—termed diffusion posterior sampling (DPS)—that effectively retrieves high-dimensional posterior distributions of images from corrupted observations. However, DPS’s dependence on clean diffusion priors and accurate imaging physics limits its practical utility, especially when clean data is scarce for pre-training diffusion models or when the underlying physical models are inaccurate for inference. However, DPS requires clean diffusion priors and accurate imaging physics for inference, limiting its practical use when clean data is scarce for pre-training diffusion models or when the underlying physical models are inaccurate or unknown. In this talk, Prof. Sun will introduce an expectation-maximization (EM) framework that adapts DPS for scenarios with inaccurate priors or physics. This method alternates between an E-step, in which clean images are reconstructed from corrupted data assuming known diffusion prior and physical model, and an M-step, where these reconstructions are employed to refine and recalibrate the prior or physical model. This iterative process progressively aligns the learned image prior with the true clean distribution and adapts the physical model for enhanced accuracy. They validate their approach through extensive experiments across diverse computational imaging tasks—including inpainting, deblurring, denoising, and realistic microscopic imaging—demonstrating new state-of-the-art performance.
Humans navigate a rich 3D world, continuously acquiring diverse skills and engaging in various activities through perception, understanding, and interaction. The long-term research goal of Prof. Qi's group is to simulate the dynamic 3D world and equip AI systems with 3D spatial understanding. A fundamental challenge in achieving this lies in how to effectively represent 3D structures. In this talk, Prof. Qi will present a series of research efforts from her group focused on learning 3D representations from videos for surface reconstruction, novel view synthesis, and 4D modeling of dynamic scenes, with applications in video processing. She will also discuss future directions toward generating editable 4D worlds from casually captured videos.
AI algorithms for computer-based visual understanding have advanced significantly, due to the prevalence of deep learning and large-scale visual datasets in the RGB domain, which have also been proven vulnerable to digital and physical adversarial attacks. To deal with complex scenarios, many other imaging modalities beyond the visibility scope of human eyes, such as near infrared (NIR), short wavelength infrared (SWIR), thermal infrared (TIR), polarization, neuromorphic pulse, have been introduced, yet the vulnerabilities of visual AI based on these non-RGB modalities have not received due attention. In this talk, Prof. Zheng will show that typical AI algorithms, like object detection and segmentation, can be more fragile than in the RGB domain, and the properly crafted attackers can hardly be noticed by naked human eyes. They showcase a serial of physics-based attackers in the NIR, SWIR, and TIR domain, a printable attacker for event-based human detection, and a projection-based attacker on polarization-based reconstruction and segmentation.
High-quality, efficient acquisition of complex appearance and geometry is a classic problem in computer graphics and vision. Differentiable acquisition maps both physical acquisition and computational reconstruction to a differentiable pipeline, enabling a fully automatic, joint optimization of hardware and software. This significantly improves modeling efficiency and quality over existing work. In this talk, Prof. Wu will briefly introduce his recent research on differential acquisition, including OpenSVBRDF, the first large-scale measured SVBRDF database (SIGAsia 2023a), the first dynamic visible-light tomography system (SIGAsia 2023b), as well as a system for real-time acquisition and reconstruction of dynamic 3D volumes (CVPR 2024). He will also describe on-going research on building a dedicated high-performance device for acquiring the appearance of cultural relics in the National Museum of China.
Computational photography is widely used in our daily lives, from the convenience of smartphone photography, image enhancement, and short video filters to more advanced applications like autonomous driving, medical imaging, and scientific observation. With the rapid development of generative models and neural rendering fields, image generation has made significant strides, raising a challenging question: can traditional computational photography be fully replaced by large models?
In this talk, we will explore how to combine traditional computational photography techniques with foundational and generative models. Specifically, we will address the following four aspects: first, how to use generative models to enhance existing image processing, such as UltraFusion HDR, where we will demonstrate achieving high dynamic range fusion of up to 9 stops for the first time using generative models; second, how to leverage generative model priors to improve the imaging quality of novel camera systems (e.g., lensless cameras and event cameras); third, how foundational models can automate camera system development, including automated evaluation (DepictQA) and automated debugging (RLPixTuner); and finally, the application of large models in 3D reconstruction and video frame interpolation.
Conventional cameras can only acquire light intensity in two dimensions. In order to further obtain the depth, polarization, and spectral information of the target object or to achieve imaging resolution beyond the diffraction limit, it is often required to use bulky and expensive instruments. Here, Prof. Yang will present his group’s recent effort to replace conventional camera lenses with flat diffractive optical elements or metasurfaces. By leveraging the unique capability of metasurface to tailor the vectorial field of light, in combination with an advanced image retrieval algorithm, they aim to build compact camera systems that can capture multi-dimensional light field information of a target scene in a single shot under ambient illumination conditions. Specifically, he will show how they use flat-optics to build a monocular camera that can capture a 4D image, including 2D all-in-focus intensity, depth, and polarization of a target scene in a single shot over extended scenes. He would also like to present their effort to construct flat-optics-based monocular 3D camera modules for real-world applications.
Simulating rain in a realistic and physically accurate manner has long been a challenge—especially in complex, uncontrolled outdoor environments. Traditional physics-based simulations produce high-quality rain effects like splashes and droplets, but require meticulous scene setup and lack scalability. On the other hand, recent advances in 3D reconstruction, such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), enable flexible scene modeling and novel view synthesis, yet fall short when it comes to dynamic scene editing like rain simulation.
In this talk, I will present RainyGS, a novel method that bridges this gap by combining physically-based rain simulation with the efficiency and flexibility of the 3DGS framework. RainyGS allows for real-time, photorealistic rendering of rain effects—from light drizzles to heavy storms—on in-the-wild scenes, all without manual scene setup. By integrating shallow water dynamics, raindrop motion, and realistic reflections into 3DGS, our method achieves over 30 fps rendering and supports diverse open-world scenarios, including urban driving scenes. I'll showcase how RainyGS opens up new possibilities for dynamic environmental simulation in computer vision, virtual reality, and autonomous driving research.
3D Gaussian Splatting (GS) excels in real-time novel view synthesis but faces challenges in geometric reconstruction fidelity and volumetric rendering accuracy. This talk explores two advancements addressing these limitations: RaDe-GS and Volumetric Gaussian Splatting. RaDe-GS introduces rasterized depth and surface normal rendering for unstructured 3D Gaussians, optimizing shape reconstruction through depth consistency to achieve state-of-the-art geometric accuracy (e.g., NeuraLangelo-level Chamfer distance on DTU) while retaining the original framework’s computational efficiency. Complementing this, Volumetric Gaussian Splatting redefines GS primitives as stochastic solids, enabling volumetric rendering with an attenuation function that eliminates visual artifacts (e.g., popping effects) and ensures continuity in parameter optimization. This approach precisely models light interactions in scattering media and overlapping primitives, outperforming existing methods in both generic and volumetric scenes. Together, these works advance GS’s capabilities—RaDe-GS bridges its rendering quality with precise 3D reconstruction, while Volumetric GS unlocks artifact-free, physically grounded rendering for complex media—demonstrating GS’s potential as a unified framework for high-fidelity 3D scene modeling.
In this talk, Prof. Baek will discuss his recent work on designing, controlling, and utilizing computational imaging systems for inverse rendering, robotic vision in extreme conditions, transparent and metameric material analysis, and advanced VR/AR systems. The core idea is to take a holistic approach—considering rendering, the transformation of multi-dimensional light properties as they travel from the source through a scene, their interaction with imaging systems, and the computational processes for reconstruction and restoration.