We contribute a novel dataset for human pose estimation and localization in metric space for complex urban scenes. We provide high resolution stereo images along with LiDAR data captured from a parked car at busy traffic intersections. We also provide 3D pose, location and shape data for pedestrians as well as 2D instance level segmentations, tracking IDs and 2D coordinates of body joints in images. Our dataset opens up new research avenues in understanding pedestrian behavior in real metric space which is crucial for the development of autonomous vehicles.

A Probabilistic Framework for Intrinsic Image Decomposition from RGB-D Streams
Wonhui Kim, Matthew Johnson-Roberson
In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017

Lighting and shading can have a great impact on a robot’s ability to recognize, match, and classify objects in indoor scenes. Additionally, the realism of augmented reality applications benefit greatly from understanding the lighting and shading in a scene. To aid in extracting this information from a mobile platform we introduce a novel framework to solve the intrinsic image decomposition problem for RGB-D streams. In our pipeline, the task is formulated as a Bayesian estimation problem. Compared to frame-based methods that must solve a full conditional random field (CRF) optimization problem at each time step, our framework can utilize the knowledge of past frames to predict the intrinsic images at a given frame. Our approach produces more reliable and consistent predictions over time, and our filtering-based framework achieves signifi- cant performance gains. Furthermore, our framework can be easily integrated into standard perception loops in many robotic systems that use a similar recursive filtering structure. We show qualitative results on real data and generate quantitative results using ground truth from a photorealistic synthetic dataset produced using a state-of-the-art ray tracer and high fidelity 3D model.

ObjectNet3D: A Large Scale Database for 3D Object Recognition
Yu Xiang, Wonhui Kim, Wei Chen, Jingwei Ji, Christopher Choy, Hao Su, Roozbeh Mottaghi, Leonidas Guibas and Silvio Savarese
In European Conference on Computer Vision (ECCV), 2016.
PDFBibTexSpotlight Oral

We contribute a large scale database for 3D object recognition, named ObjectNet3D, that consists of 100 categories, 90,127 images, 201,888 objects in these images and 44,147 3D shapes. Objects in the images in our database are aligned with the 3D shapes, and the alignment provides both accurate 3D pose annotation and the closest 3D shape annotation for each 2D object. Consequently, our database is useful for recognizing the 3D pose and 3D shape of objects from 2D images. We also provide baseline experiments on four tasks: region proposal generation, 2D object detection, joint 2D detection and 3D object pose estimation, and image-based 3D shape retrieval, which can serve as baselines for future research using our database.

High Dynamic Range Image Tone Mapping Using a Local Edge-Preserving Multiscale Decomposition
Wonhui Kim, Hongki Lim
Final project of 2016 Winter EECS 556: Image Processing
ArticlePDFWinning the Second Prize sponsored by KLA-Tencor

A High Dynamic Range (HDR) image has a large ratio between the maximum and minimum intensities of the image. Since it usually exceeds the dynamic range of standard displays, we need a tone mapping process. The key to HDR tone mapping is to preserve details while compressing the unimportant image components. Most state-of-the-art approaches to HDR tone mapping involve separating an image into base and detail layers. Base layer can be obtained by applying smoothing filter to the image, which usually causes the artifacts around edges. A general solution is to formulate the energy minimization problem in terms of base layer with adaptive edge-preserving penalty term. In this project, we propose the joint base-detail decomposition by considering additional constraints on detail layers, which gives increase in both sharpness and naturalness to the resulting tone-mapped image.