Data-driven character animation based on motion capture can produce highly naturalistic behaviors and, when combined with physics simulation, can provide for natural procedural responses to physical perturbations, environmental changes, and morphological discrepancies. Motion capture remains the most popular source of motion data, but collecting mocap data typically requires heavily instrumented environments and actors. In this paper, we propose a method that enables physically simulated characters to learn skills from videos (SFV). Our approach, based on deep pose estimation and deep reinforcement learning, allows data-driven animation to leverage the abundance of publicly available video clips from the web, such as those from YouTube. This has the potential to enable fast and easy design of character controllers simply by querying for video recordings of the desired behavior. The resulting controllers are robust to perturbations, can be adapted to new settings, can perform basic object interactions, and can be retargeted to new morphologies via reinforcement learning. We further demonstrate that our method can predict potential human motions from still images, by forward simulation of learned controllers initialized from the observed pose. Our framework is able to learn a broad range of dynamic skills, including locomotion, acrobatics, and martial arts. (Video1)
Creating animation of a character putting on clothing is challenging due to the complex interactions between the character and the simulated garment. We take a model-free deep reinforcement learning (deepRL) approach to automatically discovering robust dressing control policies represented by neural networks. While deepRL has demonstrated several successes in learning complex motor skills, the data-demanding nature of the learning algorithms is at odds with the computationally costly cloth simulation required by the dressing task. This paper is the first to demonstrate that, with an appropriately designed input state space and a reward function, it is possible to incorporate cloth simulation in the deepRL framework to learn a robust dressing control policy. We introduce a salient representation of haptic information to guide the dressing process and utilize it in the reward function to provide learning signals during training. In order to learn a prolonged sequence of motion involving a diverse set of manipulation skills, such as grasping the edge of the shirt or pulling on a sleeve, we find it necessary to separate the dressing task into several subtasks and learn a control policy for each subtask. We introduce a policy sequencing algorithm that matches the distribution of output states from one task to the input distribution for the next task in the sequence. We have used this approach to produce character controllers for several dressing tasks: putting on a t-shirt, putting on a jacket, and robot-assisted dressing of a sleeve.
We present an approach that learns to act from raw motion data for interactive character animation. Our motion generator takes a continuous stream of control inputs and generates the character's motion in an online manner. The key insight is modeling rich connections between a multitude of control objectives and a large repertoire of actions. The model is trained using Recurrent Neural Network conditioned to deal with spatiotemporal constraints and structural variabilities in human motion. We also present a new data augmentation method that allows the model to be learned even from a small to moderate amount of training data. The learning process is fully automatic if it learns the motion of a single character, and requires minimal user intervention if it deals with props and interaction between multiple characters.
Flying creatures in animated films often perform highly dynamic aerobatic maneuvers, which require their extreme of exercise capacity and skillful control. Designing physics-based controllers (a.k.a., control policies) for aerobatic maneuvers is very challenging because dynamic states remain in unstable equilibrium most of the time during aerobatics. Recently, Deep Reinforcement Learning (DRL) has shown its potential in constructing physics-based controllers. In this paper, we present a new concept, Self-Regulated Learning (SRL), which is combined with DRL to address the aerobatics control problem. The key idea of SRL is to allow the agent to take control over its own learning using an additional self-regulation policy. The policy allows the agent to regulate its goals according to the capability of the current control policy. The control and self-regulation policies are learned jointly along the progress of learning. Self-regulated learning can be viewed as building its own curriculum and seeking compromise on the goals. The effectiveness of our method is demonstrated with physically-simulated creatures performing aerobatic skills of sharp turning, rapid winding, rolling, soaring, and diving.
We propose a real-time method for the infrastructure-free estimation of articulated human motion. The approach leverages a swarm of camera-equipped flying robots and jointly optimizes the swarm's and skeletal states, which include the 3D joint positions and a set of bones. Our method allows to track the motion of human subjects, for example an athlete, over long time horizons and long distances, in challenging settings and at large scale, where fixed infrastructure approaches are not applicable. The proposed algorithm uses active infra-red markers, runs in real-time and accurately estimates robot and human pose parameters online without the need for accurately calibrated or stationary mounted cameras. Our method i) estimates a global coordinate frame for the MAV swarm, ii) jointly optimizes the human pose and relative camera positions, and iii) estimates the length of the human bones. The entire swarm is then controlled via a model predictive controller to maximize visibility of the subject from multiple viewpoints even under fast motion such as jumping or jogging. We demonstrate our method in a number of difficult scenarios including capture of long locomotion sequences at the scale of a triplex gym, in non-planar terrain, while climbing and in outdoor scenarios.
Small unmanned aerial vehicles (UAVs) are ideal capturing devices for high-resolution urban 3D reconstructions using multi-view stereo. Nevertheless, practical considerations such as safety usually mean that access to the scan target is often only available for a short amount of time, especially in urban environments. It therefore becomes crucial to perform both view and path planning to minimize flight time while ensuring complete and accurate reconstructions.
In this work, we address the challenge of automatic view and path planning for UAV-based aerial imaging with the goal of urban reconstruction from multi-view stereo. To this end, we develop a novel continuous optimization approach using heuristics for multi-view stereo reconstruction quality and apply it to the problem of path planning. Even for large scan areas, our method generates paths in only a few minutes, and is therefore ideally suited for deployment in the field.
To evaluate our method, we introduce and describe a detailed benchmark dataset for UAV path planning in urban environments which can also be used to evaluate future research efforts on this topic. Using this dataset and both synthetic and real data, we demonstrate survey-grade urban reconstructions with ground resolutions of 1 cm or better on large areas (30 000 m2).
Ambient sounds arise from a massive superposition of chaotic events distributed over a large area or volume, such as waves breaking on a beach or rain hitting the ground. The directionality and loudness of these sounds as they propagate in complex 3D scenes vary with listener location, providing cues that distinguish indoors from outdoors and reveal portals and occluders. We show that ambient sources can be approximated using an ideal notion of spatio-temporal incoherence and develop a lightweight technique to capture their global propagation effects. Our approach precomputes a single FDTD simulation using a sustained source signal whose phase is randomized over frequency and source extent. It then extracts a spherical harmonic encoding of the resulting steady-state distribution of power over direction and position in the scene using an efficient flux density formulation. The resulting parameter fields are smooth and compressible, requiring only a few MB of memory per extended source. We also present a fast binaural rendering technique that exploits phase incoherence to reduce filtering cost.
We demonstrate a novel deep neural network capable of reconstructing human full body pose in real-time from 6 Inertial Measurement Units (IMUs) worn on the user's body. In doing so, we address several difficult challenges. First, the problem is severely under-constrained as multiple pose parameters produce the same IMU orientations. Second, capturing IMU data in conjunction with ground-truth poses is expensive and difficult to do in many target application scenarios (e.g., outdoors). Third, modeling temporal dependencies through non-linear optimization has proven effective in prior work but makes real-time prediction infeasible. To address this important limitation, we learn the temporal pose priors using deep learning. To learn from sufficient data, we synthesize IMU data from motion capture datasets. A bi-directional RNN architecture leverages past and future information that is available at training time. At test time, we deploy the network in a sliding window fashion, retaining real time capabilities. To evaluate our method, we recorded DIP-IMU, a dataset consisting of 10 subjects wearing 17 IMUs for validation in 64 sequences with 330 000 time instants; this constitutes the largest IMU dataset publicly available. We quantitatively evaluate our approach on multiple datasets and show results from a real-time implementation. DIP-IMU and the code are available for research purposes.1
Over the last two decades there has been a proliferation of methods for simulating crowds of humans. As the number of different methods and their complexity increases, it becomes increasingly unrealistic to expect researchers and users to keep up with all the possible options and trade-offs. We therefore see the need for tools that can facilitate both domain experts and non-expert users of crowd simulation in making high-level decisions about the best simulation methods to use in different scenarios. In this paper, we leverage trajectory data from human crowds and machine learning techniques to learn a manifold which captures representative local navigation scenarios that humans encounter in real life. We show the applicability of this manifold in crowd research, including analyzing trends in simulation accuracy, and creating automated systems to assist in choosing an appropriate simulation method for a given scenario.
Many analysis tasks for human motion rely on high-level similarity between sequences of motions, that are not an exact matches in joint angles, timing, or ordering of actions. Even the same movements performed by the same person can vary in duration and speed. Similar motions are characterized by similar sets of actions that appear frequently. In this paper we introduce motion motifs and motion signatures that are a succinct but descriptive representation of motion sequences. We first break the motion sequences to short-term movements called motion words, and then cluster the words in a high-dimensional feature space to find motifs. Hence, motifs are words that are both common and descriptive, and their distribution represents the motion sequence. To cluster words and find motifs, the challenge is to define an effective feature space, where the distances among motion words are semantically meaningful, and where variations in speed and duration are handled. To this end, we use a deep neural network to embed the motion words into feature space using a triplet loss function. To define a signature, we choose a finite set of motion-motifs, creating a bag-of-motifs representation for the sequence. Motion signatures are agnostic to movement order, speed or duration variations, and can distinguish fine-grained differences between motions of the same class. We illustrate examples of characterizing motion sequences by motifs, and for the use of motion signatures in a number of applications.
We provide the first large dataset of human fixations on physical 3D objects presented in varying viewing conditions and made of different materials. Our experimental setup is carefully designed to allow for accurate calibration and measurement. We estimate a mapping from the pair of pupil positions to 3D coordinates in space and register the presented shape with the eye tracking setup. By modeling the fixated positions on 3D shapes as a probability distribution, we analysis the similarities among different conditions. The resulting data indicates that salient features depend on the viewing direction. Stable features across different viewing directions seem to be connected to semantically meaningful parts. We also show that it is possible to estimate the gaze density maps from view dependent data. The dataset provides the necessary ground truth data for computational models of human perception in 3D.
We introduce a computational solution for cost-efficient 3D fabrication using universal building blocks. Our key idea is to employ a set of universal blocks, which can be massively prefabricated at a low cost, to quickly assemble and constitute a significant internal core of the target object, so that only the residual volume need to be 3D printed online. We further improve the fabrication efficiency by decomposing the residual volume into a small number of printing-friendly pyramidal pieces. Computationally, we face a coupled decomposition problem: decomposing the input object into an internal core and residual, and decomposing the residual, to fulfill a combination of objectives for efficient 3D fabrication. To this end, we formulate an optimization that jointly minimizes the residual volume, the number of pyramidal residual pieces, and the amount of support waste when printing the residual pieces. To solve the optimization in a tractable manner, we start with a maximal internal core and iteratively refine it with local cuts to minimize the cost function. Moreover, to efficiently explore the large search space, we resort to cost estimates aided by pre-computation and avoid the need to explicitly construct pyramidal decompositions for each solution candidate. Results show that our method can iteratively reduce the estimated printing time and cost, as well as the support waste, and helps to save hours of fabrication time and much material consumption.
We study a new and elegant instance of geometric dissection of 2D shapes: reversible hinged dissection, which corresponds to a dual transform between two shapes where one of them can be dissected in its interior and then inverted inside-out, with hinges on the shape boundary, to reproduce the other shape, and vice versa. We call such a transform reversible inside-out transform or RIOT. Since it is rare for two shapes to possess even a rough RIOT, let alone an exact one, we develop both a RIOT construction algorithm and a quick filtering mechanism to pick, from a shape collection, potential shape pairs that are likely to possess the transform. Our construction algorithm is fully automatic. It computes an approximate RIOT between two given input 2D shapes, whose boundaries can undergo slight deformations, while the filtering scheme picks good inputs for the construction. Furthermore, we add properly designed hinges and connectors to the shape pieces and fabricate them using a 3D printer so that they can be played as an assembly puzzle. With many interesting and fun RIOT pairs constructed from shapes found online, we demonstrate that our method significantly expands the range of shapes to be considered for RIOT, a seemingly impossible shape transform, and offers a practical way to construct and physically realize these transforms.
Interlocking assemblies have a long history in the design of puzzles, furniture, architecture, and other complex geometric structures. The key defining property of interlocking assemblies is that all component parts are immobilized by their geometric arrangement, preventing the assembly from falling apart. Computer graphics research has recently contributed design tools that allow creating new interlocking assemblies. However, these tools focus on specific kinds of assemblies and explore only a limited space of interlocking configurations, which restricts their applicability for design.
In this paper, we propose a new general framework for designing interlocking assemblies. The core idea is to represent part relationships with a family of base Directional Blocking Graphs and leverage efficient graph analysis tools to compute an interlocking arrangement of parts. This avoids the exponential complexity of brute-force search. Our algorithm iteratively constructs the geometry of assembly components, taking advantage of all existing blocking relations for constructing successive parts. As a result, our approach supports a wider range of assembly forms compared to previous methods and provides significantly more design flexibility. We show that our framework facilitates efficient design of complex interlocking assemblies, including new solutions that cannot be achieved by state of the art approaches.
Existing online 3D shape repositories contain thousands of 3D models but lack photorealistic appearance. We present an approach to automatically assign high-quality, realistic appearance models to large scale 3D shape collections. The key idea is to jointly leverage three types of online data - shape collections, material collections, and photo collections, using the photos as reference to guide assignment of materials to shapes. By generating a large number of synthetic renderings, we train a convolutional neural network to classify materials in real photos, and employ 3D-2D alignment techniques to transfer materials to different parts of each shape model. Our system produces photorealistic, relightable, 3D shapes (PhotoShapes).
Augmented reality (AR) for smartphones has matured from a technology for earlier adopters, available only on select high-end phones, to one that is truly available to the general public. One of the key breakthroughs has been in low-compute methods for six degree of freedom (6DoF) tracking on phones using only the existing hardware (camera and inertial sensors). 6DoF tracking is the cornerstone of smartphone AR allowing virtual content to be precisely locked on top of the real world. However, to really give users the impression of believable AR, one requires mobile depth. Without depth, even simple effects such as a virtual object being correctly occluded by the real-world is impossible. However, requiring a mobile depth sensor would severely restrict the access to such features. In this article, we provide a novel pipeline for mobile depth that supports a wide array of mobile phones, and uses only the existing monocular color sensor. Through several technical contributions, we provide the ability to compute low latency dense depth maps using only a single CPU core of a wide range of (medium-high) mobile phones. We demonstrate the capabilities of our approach on high-level AR applications including real-time navigation and shopping.
Current AR systems only track sparse geometric features but do not compute depth for all pixels. For this reason, most AR effects are pure overlays that can never be occluded by real objects. We present a novel algorithm that propagates sparse depth to every pixel in near realtime. The produced depth maps are spatio-temporally smooth but exhibit sharp discontinuities at depth edges. This enables AR effects that can fully interact with and be occluded by the real scene. Our algorithm uses a video and a sparse SLAM reconstruction as input. It starts by estimating soft depth edges from the gradient of optical flow fields. Because optical flow is unreliable near occlusions we compute forward and backward flow fields and fuse the resulting depth edges using a novel reliability measure. We then localize the depth edges by thinning and aligning them with image edges. Finally, we optimize the propagated depth smoothly but encourage discontinuities at the recovered depth edges. We present results for numerous real-world examples and demonstrate the effectiveness for several occlusion-aware AR video effects. To quantitatively evaluate our algorithm we characterize the properties that make depth maps desirable for AR applications, and present novel evaluation metrics that capture how well these are satisfied. Our results compare favorably to a set of competitive baseline algorithms in this context.
Holographic displays have great potential to realize mixed reality by modulating the wavefront of light in a fundamental manner. As a computational display, holographic displays offer a large degree of freedom, such as focus cue generation and vision correction. However, the limited bandwidth of spatial light modulator imposes an inherent trade-off relationship between the field of view and eye-box size. Thus, we demonstrate the first practical eye-box expansion method for a holographic near-eye display. Instead of providing an intrinsic large exit-pupil, we shift the optical system's exit-pupil to cover the expanded eye-box area with pupil-tracking. For compact implementation, a pupil-shifting holographic optical element (PSHOE) is proposed that can reduce the form factor for exit-pupil shifting. A thorough analysis of the design parameters and display performance are provided. In particular, we provide a comprehensive analysis of the incorporation of the holographic optical element into a holographic display system. The influence of holographic optical elements on the intrinsic exit-pupil and pupil switching is revealed by numerical simulation and Wigner distribution function analysis.
The visual appearance of an object can be disguised by projecting virtual shading as if overwriting the material. However, conventional projection-mapping methods depend on markers on a target or a model of the target shape, which limits the types of targets and the visual quality. In this paper, we focus on the fact that the shading of a virtual material in a virtual scene is mainly characterized by surface normals of the target, and we attempt to realize markerless and modelless projection mapping for material representation. In order to deal with various targets, including static, dynamic, rigid, soft, and fluid objects, without any interference with visible light, we measure surface normals in the infrared region in real time and project material shading with a novel high-speed texturing algorithm in screen space. Our system achieved 500-fps high-speed projection mapping of a uniform material and a tileable-textured material with millisecond-order latency, and it realized dynamic and flexible material representation for unknown objects. We also demonstrated advanced applications and showed the expressive shading performance of our technique.
We present a system for acquiring, processing, and rendering panoramic light field still photography for display in Virtual Reality (VR). We acquire spherical light field datasets with two novel light field camera rigs designed for portable and efficient light field acquisition. We introduce a novel real-time light field reconstruction algorithm that uses a per-view geometry and a disk-based blending field. We also demonstrate how to use a light field prefiltering operation to project from a high-quality offline reconstruction model into our real-time model while suppressing artifacts. We introduce a practical approach for compressing light fields by modifying the VP9 video codec to provide high quality compression with real-time, random access decompression.
We combine these components into a complete light field system offering convenient acquisition, compact file size, and high-quality rendering while generating stereo views at 90Hz on commodity VR hardware. Using our system, we built a freely available light field experience application called Welcome to Light Fields featuring a library of panoramic light field stills for consumer VR which has been downloaded over 15,000 times.
We present a virtual reality display that is capable of generating a dense collection of depth/focal planes. This is achieved by driving a focus-tunable lens to sweep a range of focal lengths at a high frequency and, subsequently, tracking the focal length precisely at microsecond time resolutions using an optical module. Precise tracking of the focal length, coupled with a high-speed display, enables our lab prototype to generate 1600 focal planes per second. This enables a novel first-of-its-kind virtual reality multifocal display that is capable of resolving the vergence-accommodation conflict endemic to today's displays.
Streaming high quality rendering for virtual reality applications requires minimizing perceived latency. We introduce Shading Atlas Streaming (SAS), a novel object-space rendering framework suitable for streaming virtual reality content. SAS decouples server-side shading from client-side rendering, allowing the client to perform framerate upsampling and latency compensation autonomously for short periods of time. The shading information created by the server in object space is temporally coherent and can be efficiently compressed using standard MPEG encoding. Our results show that SAS compares favorably to previous methods for remote image-based rendering in terms of image quality and network bandwidth efficiency. SAS allows highly efficient parallel allocation in a virtualized-texture-like memory hierarchy, solving a common efficiency problem of object-space shading. With SAS, untethered virtual reality headsets can benefit from high quality rendering without paying in increased latency.
Addressing vergence-accommodation conflict in head-mounted displays (HMDs) requires resolving two interrelated problems. First, the hardware must support viewing sharp imagery over the full accommodation range of the user. Second, HMDs should accurately reproduce retinal defocus blur to correctly drive accommodation. A multitude of accommodation-supporting HMDs have been proposed, with three architectures receiving particular attention: varifocal, multifocal, and light field displays. These designs all extend depth of focus, but rely on computationally expensive rendering and optimization algorithms to reproduce accurate defocus blur (often limiting content complexity and interactive applications). To date, no unified framework has been proposed to support driving these emerging HMDs using commodity content. In this paper, we introduce DeepFocus, a generic, end-to-end convolutional neural network designed to efficiently solve the full range of computational tasks for accommodation-supporting HMDs. This network is demonstrated to accurately synthesize defocus blur, focal stacks, multilayer decompositions, and multiview imagery using only commonly available RGB-D images, enabling real-time, near-correct depictions of retinal blur with a broad set of accommodation-supporting HMDs.
We propose an inverse strategy for modeling thin elastic shells physically, just from the observation of their geometry. Our algorithm takes as input an arbitrary target mesh, and interprets this configuration automatically as a stable equilibrium of a shell simulator under gravity and frictional contact constraints with a given external object. Unknowns are the natural shape of the shell (i.e., its shape without external forces) and the frictional contact forces at play, while the material properties (mass density, stiffness, friction coefficients) can be freely chosen by the user. Such an inverse problem formulates as an ill-posed nonlinear system subject to conical constraints. To select and compute a plausible solution, our inverse solver proceeds in two steps. In a first step, contacts are reduced to frictionless bilateral constraints and a natural shape is retrieved using the adjoint method. The second step uses this result as an initial guess and adjusts each bilateral force so that it projects onto the admissible Coulomb friction cone, while preserving global equilibrium. To better guide minimization towards the target, these two steps are applied iteratively using a degressive regularization of the shell energy.
We validate our approach on simulated examples with reference material parameters, and show that our method still converges well for material parameters lying within a reasonable range around the reference, and even in the case of arbitrary meshes that are not issued from a simulation. We finally demonstrate practical inversion results on complex shell geometries freely modeled by an artist or automatically captured from real objects, such as posed garments or soft accessories.
We describe an interactive design tool for authoring, simulating, and adjusting yarn-level patterns for knitted and woven cloth. To achieve interactive performance for notoriously slow yarn-level simulations, we propose two acceleration schemes: (a) yarn-level periodic boundary conditions that enable the restricted simulation of only small periodic patches, thereby exploiting the spatial repetition of many cloth patterns in cardinal directions, and (b) a highly parallel GPU solver for efficient yarn-level simulation of the small patch. Our system supports interactive pattern editing and simulation, and runtime modification of parameters. To adjust the amount of material used (yarn take-up) we support "on the fly" modification of (a) local yarn rest-length adjustments for pattern specific edits, e.g., to tighten slip stitches, and (b) global yarn length by way of a novel yarn-radius similarity transformation. We demonstrate the tool's ability to support interactive modeling, by novice users, of a wide variety of yarn-level knit and woven patterns. Finally, to validate our approach, we compare dozens of generated patterns against reference images of actual woven or knitted cloth samples, and we release this corpus of digital patterns and simulated models as a public dataset to support future comparisons.
Designing real and virtual garments is becoming extremely demanding with rapidly changing fashion trends and increasing need for synthesizing realisticly dressed digital humans for various applications. This necessitates creating simple and effective workflows to facilitate authoring sewing patterns customized to garment and target body shapes to achieve desired looks. Traditional workflow involves a trial-and-error procedure wherein a mannequin is draped to judge the resultant folds and the sewing pattern iteratively adjusted until the desired look is achieved. This requires time and experience. Instead, we present a data-driven approach wherein the user directly indicates desired fold patterns simply by sketching while our system estimates corresponding garment and body shape parameters at interactive rates. The recovered parameters can then be further edited and the updated draped garment previewed. Technically, we achieve this via a novel shared shape space that allows the user to seamlessly specify desired characteristics across multimodal input without requiring to run garment simulation at design time. We evaluate our approach qualitatively via a user study and quantitatively against test datasets, and demonstrate how our system can generate a rich quality of on-body garments targeted for a range of body shapes while achieving desired fold characteristics. Code and data are available at our project webpage.
We present an incremental collision handling algorithm for GPU-based interactive cloth simulation. Our approach exploits the spatial and temporal coherence between successive iterations of an optimization-based solver for collision response computation. We present an incremental continuous collision detection algorithm that keeps track of deforming vertices and combine it with spatial hashing. We use a non-linear GPU-based impact zone solver to resolve the penetrations. We combine our collision handling algorithm with implicit integration to use large time steps. Our overall algorithm, I-Cloth, can simulate complex cloth deformation with a few hundred thousand vertices at 2 - 8 frames per second on a commodity GPU. We highlight its performance on different benchmarks and observe up to 7 - 10X speedup over prior algorithms.
Creating realistic 3D hairs that closely match the real-world inputs remains challenging. With the increasing popularity of lightweight depth cameras featured in devices such as iPhone X, Intel RealSense and DJI drones, depth cues can be very helpful in consumer applications, for example, the Animated Emoji. In this paper, we introduce a fully automatic, data-driven approach to model the hair geometry and compute a complete strand-level 3D hair model that closely resembles the input from a single RGB-D camera. Our method heavily exploits the geometric cues contained in the depth channel and leverages exemplars in a 3D hair database for high-fidelity hair synthesis. The core of our method is a local-similarity based search and synthesis algorithm that simultaneously reasons about the hair geometry, strands connectivity, strand orientation, and hair structural plausibility. We demonstrate the efficacy of our method using a variety of complex hairstyles and compare our method with prior arts.
Imagine taking a selfie video with your mobile phone and getting as output a 3D model of your head (face and 3D hair strands) that can be later used in VR, AR, and any other domain. State of the art hair reconstruction methods allow either a single photo (thus compromising 3D quality) or multiple views, but they require manual user interaction (manual hair segmentation and capture of fixed camera views that span full 360°). In this paper, we describe a system that can completely automatically create a reconstruction from any video (even a selfie video), and we don't require specific views, since taking your -90°, 90°, and full back views is not feasible in a selfie capture.
In the core of our system, in addition to the automatization components, hair strands are estimated and deformed in 3D (rather than 2D as in state of the art) thus enabling superior results. We provide qualitative, quantitative, and Mechanical Turk human studies that support the proposed system, and show results on a diverse variety of videos (8 different celebrity videos, 9 selfie mobile videos, spanning age, gender, hair length, type, and styling).
Recreating the appearance of humans in virtual environments for the purpose of movie, video game, or other types of production involves the acquisition of a geometric representation of the human body and its scattering parameters which express the interaction between the geometry and light propagated throughout the scene. Teeth appearance is defined not only by the light and surface interaction, but also by its internal geometry and the intra-oral environment, posing its own unique set of challenges. Therefore, we present a system specifically designed for capturing the optical properties of live human teeth such that they can be realistically re-rendered in computer graphics. We acquire our data in vivo in a conventional multiple camera and light source setup and use exact geometry segmented from intra-oral scans. To simulate the complex interaction of light in the oral cavity during inverse rendering we employ a novel pipeline based on derivative path tracing with respect to both optical properties and geometry of the inner dentin surface. The resulting estimates of the global derivatives are used to extract parameters in a joint numerical optimization. The final appearance faithfully recreates the acquired data and can be directly used in conventional path tracing frameworks for rendering virtual humans.
Recent advances in single-view 3D hair digitization have made the creation of high-quality CG characters scalable and accessible to end-users, enabling new forms of personalized VR and gaming experiences. To handle the complexity and variety of hair structures, most cutting-edge techniques rely on the successful retrieval of a particular hair model from a comprehensive hair database. Not only are the aforementioned data-driven methods storage intensive, but they are also prone to failure for highly unconstrained input images, complicated hairstyles, and failed face detection. Instead of using a large collection of 3D hair models directly, we propose to represent the manifold of 3D hairstyles implicitly through a compact latent space of a volumetric variational autoencoder (VAE). This deep neural network is trained with volumetric orientation field representations of 3D hair models and can synthesize new hairstyles from a compressed code. To enable end-to-end 3D hair inference, we train an additional embedding network to predict the code in the VAE latent space from any input image. Strand-level hairstyles can then be generated from the predicted volumetric representation. Our fully automatic framework does not require any ad-hoc face fitting, intermediate classification and segmentation, or hairstyle database retrieval. Our hair synthesis approach is significantly more robust and can handle a much wider variation of hairstyles than state-of-the-art data-driven hair modeling techniques with challenging inputs, including photos that are low-resolution, overexposured, or contain extreme head poses. The storage requirements are minimal and a 3D hair model can be produced from an image in a second. Our evaluations also show that successful reconstructions are possible from highly stylized cartoon images, non-human subjects, and pictures taken from behind a person. Our approach is particularly well suited for continuous and plausible hair interpolation between very different hairstyles.
Object functionality is often expressed through part articulation - as when the two rigid parts of a scissor pivot against each other to perform the cutting function. Such articulations are often similar across objects within the same functional category. In this paper we explore how the observation of different articulation states provides evidence for part structure and motion of 3D objects. Our method takes as input a pair of unsegmented shapes representing two different articulation states of two functionally related objects, and induces their common parts along with their underlying rigid motion. This is a challenging setting, as we assume no prior shape structure, no prior shape category information, no consistent shape orientation, the articulation states may belong to objects of different geometry, plus we allow inputs to be noisy and partial scans, or point clouds lifted from RGB images. Our method learns a neural network architecture with three modules that respectively propose correspondences, estimate 3D deformation flows, and perform segmentation. To achieve optimal performance, our architecture alternates between correspondence, deformation flow, and segmentation prediction iteratively in an ICP-like fashion. Our results demonstrate that our method significantly outperforms state-of-the-art techniques in the task of discovering articulated parts of objects. In addition, our part induction is object-class agnostic and successfully generalizes to new and unseen objects.
A majority of stock 3D models in modern shape repositories are assembled with many fine-grained components. The main cause of such data form is the component-wise modeling process widely practiced by human modelers. These modeling components thus inherently reflect some function-based shape decomposition the artist had in mind during modeling. On the other hand, modeling components represent an over-segmentation since a functional part is usually modeled as a multi-component assembly. Based on these observations, we advocate that labeled segmentation of stock 3D models should not overlook the modeling components and propose a learning solution to grouping and labeling of the fine-grained components. However, directly characterizing the shape of individual components for the purpose of labeling is unreliable, since they can be arbitrarily tiny and semantically meaningless. We propose to generate part hypotheses from the components based on a hierarchical grouping strategy, and perform labeling on those part groups instead of directly on the components. Part hypotheses are mid-level elements which are more probable to carry semantic information. A multi-scale 3D convolutional neural network is trained to extract context-aware features for the hypotheses. To accomplish a labeled segmentation of the whole shape, we formulate higher-order conditional random fields (CRFs) to infer an optimal label assignment for all components. Extensive experiments demonstrate that our method achieves significantly robust labeling results on raw 3D models from public shape repositories. Our work also contributes the first benchmark for component-wise labeling.
We introduce SCORES, a recursive neural network for shape composition. Our network takes as input sets of parts from two or more source 3D shapes and a rough initial placement of the parts. It outputs an optimized part structure for the composed shape, leading to high-quality geometry construction. A unique feature of our composition network is that it is not merely learning how to connect parts. Our goal is to produce a coherent and plausible 3D shape, despite large incompatibilities among the input parts. The network may significantly alter the geometry and structure of the input parts and synthesize a novel shape structure based on the inputs, while adding or removing parts to minimize a structure plausibility loss. We design SCORES as a recursive autoencoder network. During encoding, the input parts are recursively grouped to generate a root code. During synthesis, the root code is decoded, recursively, to produce a new, coherent part assembly. Assembled shape structures may be novel, with little global resemblance to training exemplars, yet have plausible substructures. SCORES therefore learns a hierarchical substructure shape prior based on per-node losses. It is trained on structured shapes from ShapeNet, and is applied iteratively to reduce the plausibility loss. We show results of shape composition from multiple sources over different categories of man-made shapes and compare with state-of-the-art alternatives, demonstrating that our network can significantly expand the range of composable shapes for assembly-based modeling.
We introduce a novel framework for using natural language to generate and edit 3D indoor scenes, harnessing scene semantics and text-scene grounding knowledge learned from large annotated 3D scene databases. The advantage of natural language editing interfaces is strongest when performing semantic operations at the sub-scene level, acting on groups of objects. We learn how to manipulate these sub-scenes by analyzing existing 3D scenes. We perform edits by first parsing a natural language command from the user and transforming it into a semantic scene graph that is used to retrieve corresponding sub-scenes from the databases that match the command. We then augment this retrieved sub-scene by incorporating other objects that may be implied by the scene context. Finally, a new 3D scene is synthesized by aligning the augmented sub-scene with the user's current scene, where new objects are spliced into the environment, possibly triggering appropriate adjustments to the existing scene arrangement. A suggestive modeling interface with multiple interpretations of user commands is used to alleviate ambiguities in natural language. We conduct studies comparing our approach against both prior text-to-scene work and artist-made scenes and find that our method significantly outperforms prior work and is comparable to handmade scenes even when complex and varied natural sentences are used.
While computer-aided design is a major part of many modern manufacturing pipelines, the design files typically generated describe raw geometry. Lost in this representation is the procedure by which these designs were generated. In this paper, we present a method for reverse-engineering the process by which 3D models may have been generated, in the language of constructive solid geometry (CSG). Observing that CSG is a formal grammar, we formulate this inverse CSG problem as a program synthesis problem. Our solution is an algorithm that couples geometric processing with state-of-the-art program synthesis techniques. In this scheme, geometric processing is used to convert the mixed discrete and continuous domain of CSG trees to a pure discrete domain where modern program synthesizers excel. We demonstrate the efficiency and scalability of our algorithm on several different examples, including those with over 100 primitive parts. We show that our algorithm is able to find simple programs which are close to the ground truth, and demonstrate our method's applicability in mesh re-editing. Finally, we compare our method to prior state-of-the-art. We demonstrate that our algorithm dominates previous methods in terms of resulting CSG compactness and runtime, and can handle far more complex input meshes than any previous method.
We introduce a generative model for 3D man-made shapes. The presented method takes a global-to-local (G2L) approach. An adversarial network (GAN) is built first to construct the overall structure of the shape, segmented and labeled into parts. A novel conditional auto-encoder (AE) is then augmented to act as a part-level refiner. The GAN, associated with additional local discriminators and quality losses, synthesizes a voxel-based model, and assigns the voxels with part labels that are represented in separate channels. The AE is trained to amend the initial synthesis of the parts, yielding more plausible part geometries. We also introduce new means to measure and evaluate the performance of an adversarial generative model. We demonstrate that our global-to-local generative model produces significantly better results than a plain three-dimensional GAN, in terms of both their shape variety and the distribution with respect to the training data.
This paper introduces a 3D shape generative model based on deep neural networks. A new image-like (i.e., tensor) data representation for genus-zero 3D shapes is devised. It is based on the observation that complicated shapes can be well represented by multiple parameterizations (charts), each focusing on a different part of the shape. The new tensor data representation is used as input to Generative Adversarial Networks for the task of 3D shape generation.
The 3D shape tensor representation is based on a multi-chart structure that enjoys a shape covering property and scale-translation rigidity. Scale-translation rigidity facilitates high quality 3D shape learning and guarantees unique reconstruction. The multi-chart structure uses as input a dataset of 3D shapes (with arbitrary connectivity) and a sparse correspondence between them. The output of our algorithm is a generative model that learns the shape distribution and is able to generate novel shapes, interpolate shapes, and explore the generated shape space. The effectiveness of the method is demonstrated for the task of anatomic shape generation including human body and bone (teeth) shape generation.
Coarse building mass models are now routinely generated at scales ranging from individual buildings to whole cities. Such models can be abstracted from raw measurements, generated procedurally, or created manually. However, these models typically lack any meaningful geometric or texture details, making them unsuitable for direct display. We introduce the problem of automatically and realistically decorating such models by adding semantically consistent geometric details and textures. Building on the recent success of generative adversarial networks (GANs), we propose F
We present an Adaptive Octree-based Convolutional Neural Network (Adaptive O-CNN) for efficient 3D shape encoding and decoding. Different from volumetric-based or octree-based CNN methods that represent a 3D shape with voxels in the same resolution, our method represents a 3D shape adaptively with octants at different levels and models the 3D shape within each octant with a planar patch. Based on this adaptive patch-based representation, we propose an Adaptive O-CNN encoder and decoder for encoding and decoding 3D shapes. The Adaptive O-CNN encoder takes the planar patch normal and displacement as input and performs 3D convolutions only at the octants at each level, while the Adaptive O-CNN decoder infers the shape occupancy and subdivision status of octants at each level and estimates the best plane normal and displacement for each leaf octant. As a general framework for 3D shape analysis and generation, the Adaptive O-CNN not only reduces the memory and computational cost, but also offers better shape generation capability than the existing 3D-CNN approaches. We validate Adaptive O-CNN in terms of efficiency and effectiveness on different shape analysis and generation tasks, including shape classification, 3D autoencoding, shape prediction from a single image, and shape completion for noisy and incomplete point clouds.
We introduce C
We introduce a learning-based method to reconstruct objects acquired in a casual handheld scanning setting with a depth camera. Our method is based on two core components. First, a deep network that provides a semantic segmentation and labeling of the frames of an input RGBD sequence. Second, an alignment and reconstruction method that employs the semantic labeling to reconstruct the acquired object from the frames. We demonstrate that the use of a semantic labeling improves the reconstructions of the objects, when compared to methods that use only the depth information of the frames. Moreover, since training a deep network requires a large amount of labeled data, a key contribution of our work is an active self-learning framework to simplify the creation of the training data. Specifically, we iteratively predict the labeling of frames with the neural network, reconstruct the object from the labeled frames, and evaluate the confidence of the labeling, to incrementally train the neural network while requiring only a small amount of user-provided annotations. We show that this method enables the creation of data for training a neural network with high accuracy, while requiring only little manual effort.
The advent of consumer depth cameras has incited the development of a new cohort of algorithms tackling challenging computer vision problems. The primary reason is that depth provides direct geometric information that is largely invariant to texture and illumination. As such, substantial progress has been made in human and object pose estimation, 3D reconstruction and simultaneous localization and mapping. Most of these algorithms naturally benefit from the ability to accurately track the pose of an object or scene of interest from one frame to the next. However, commercially available depth sensors (typically running at 30fps) can allow for large inter-frame motions to occur that make such tracking problematic. A high frame rate depth camera would thus greatly ameliorate these issues, and further increase the tractability of these computer vision problems. Nonetheless, the depth accuracy of recent systems for high-speed depth estimation [Fanello et al. 2017b] can degrade at high frame rates. This is because the active illumination employed produces a low SNR and thus a high exposure time is required to obtain a dense accurate depth image. Furthermore in the presence of rapid motion, longer exposure times produce artifacts due to motion blur, and necessitates a lower frame rate that introduces large inter-frame motion that often yield tracking failures. In contrast, this paper proposes a novel combination of hardware and software components that avoids the need to compromise between a dense accurate depth map and a high frame rate. We document the creation of a full 3D capture system for high speed and quality depth estimation, and demonstrate its advantages in a variety of tracking and reconstruction tasks. We extend the state of the art active stereo algorithm presented in Fanello et al. [2017b] by adding a space-time feature in the matching phase. We also propose a machine learning based depth refinement step that is an order of magnitude faster than traditional postprocessing methods. We quantitatively and qualitatively demonstrate the benefits of the proposed algorithms in the acquisition of geometry in motion. Our pipeline executes in 1.1ms leveraging modern GPUs and off-the-shelf cameras and illumination components. We show how the sensor can be employed in many different applications, from [non-]rigid reconstructions to hand/face tracking. Further, we show many advantages over existing state of the art depth camera technologies beyond framerate, including latency, motion artifacts, multi-path errors, and multi-sensor interference.
The image processing pipeline boasts a wide variety of complex filters and effects. Translating an individual effect to operate on 3D surface geometry inevitably results in a bespoke algorithm. Instead, we propose a general-purpose back-end optimization that allows users to edit an input 3D surface by simply selecting an off-the-shelf image processing filter. We achieve this by constructing a differentiable triangle mesh renderer, with which we can back propagate changes in the image domain to the 3D mesh vertex positions. The given image processing technique is applied to the entire shape via stochastic snapshots of the shape: hence, we call our method Paparazzi. We provide simple yet important design considerations to construct the Paparazzi renderer and optimization algorithms. The power of this rendering-based surface editing is demonstrated via the variety of image processing filters we apply. Each application uses an off-the-shelf implementation of an image processing method without requiring modification to the core Paparazzi algorithm.
Gradient-based methods are becoming increasingly important for computer graphics, machine learning, and computer vision. The ability to compute gradients is crucial to optimization, inverse problems, and deep learning. In rendering, the gradient is required with respect to variables such as camera parameters, light sources, scene geometry, or material appearance. However, computing the gradient of rendering is challenging because the rendering integral includes visibility terms that are not differentiable. Previous work on differentiable rendering has focused on approximate solutions. They often do not handle secondary effects such as shadows or global illumination, or they do not provide the gradient with respect to variables other than pixel coordinates.
We introduce a general-purpose differentiable ray tracer, which, to our knowledge, is the first comprehensive solution that is able to compute derivatives of scalar functions over a rendered image with respect to arbitrary scene parameters such as camera pose, scene geometry, materials, and lighting parameters. The key to our method is a novel edge sampling algorithm that directly samples the Dirac delta functions introduced by the derivatives of the discontinuous integrand. We also develop efficient importance sampling methods based on spatial hierarchies. Our method can generate gradients in times running from seconds to minutes depending on scene complexity and desired precision.
We interface our differentiable ray tracer with the deep learning library PyTorch and show prototype applications in inverse rendering and the generation of adversarial examples for neural networks.
Finding good global importance sampling strategies for Monte Carlo light transport is challenging. While estimators using local methods (such as BSDF sampling or next event estimation) often work well in the majority of a scene, small regions in path space can be sampled insufficiently (e.g. a reflected caustic). We propose a novel data-driven guided sampling method which selectively adapts to such problematic regions and complements the unguided estimator. It is based on complete transport paths, i.e. is able to resolve the correlation due to BSDFs and free flight distances in participating media. It is conceptually simple and places anisotropic truncated Gaussian distributions around guide paths to reconstruct a continuous probability density function (guided PDF). Guide paths are iteratively sampled from the guided as well as the unguided PDF and only recorded if they cause high variance in the current estimator. While plain Monte Carlo samples paths independently and Markov chain-based methods perturb a single current sample, we determine the reconstruction kernels by a set of neighbouring paths. This enables local exploration of the integrand without detailed balance constraints or the need for analytic derivatives. We show that our method can decompose the path space into a region that is well sampled by the unguided estimator and one that is handled by the new guided sampler. In realistic scenarios, we show 4× speedups over the unguided sampler.
We present in this paper a generic and parameter-free algorithm to efficiently build a wide variety of optical components, such as mirrors or lenses, that satisfy some light energy constraints. In all of our problems, one is given a collimated or point light source and a desired illumination after reflection or refraction and the goal is to design the geometry of a mirror or lens which transports exactly the light emitted by the source onto the target. We first propose a general framework and show that eight different optical component design problems amount to solving a light energy conservation equation that involves the computation of visibility diagrams. We then show that these diagrams all have the same structure and can be obtained by intersecting a 3D Power diagram with a planar or spherical domain. This allows us to propose an efficient and fully generic algorithm capable to solve these eight optical component design problems. The support of the prescribed target illumination can be a set of directions or a set of points located at a finite distance. Our solutions satisfy design constraints such as convexity or concavity. We show the effectiveness of our algorithm on simulated and fabricated examples.
We develop a new theory of volumetric light transport for media with non-exponential free-flight distributions. Recent insights from atmospheric sciences and neutron transport demonstrate that such distributions arise in the presence of correlated scatterers, which are naturally produced by processes such as cloud condensation and fractal-pattern formation. Our theory formulates a non-exponential path integral as the result of averaging stochastic classical media, and we introduce practical models to solve the resulting averaging problem efficiently. Our theory results in a generalized path integral which allows us to handle non-exponential media using the full range of Monte Carlo rendering algorithms while enriching the range of achievable appearance. We propose parametric models for controlling the statistical correlations by leveraging work on stochastic processes, and we develop a method to combine such unresolved correlations (and the resulting non-exponential free-flight behavior) with explicitly modeled macroscopic heterogeneity. This provides a powerful authoring approach where artists can freely design the shape of the attenuation profile separately from the macroscopic heterogeneous density, while our theory provides a physically consistent interpretation in terms of a path space integral. We address important considerations for graphics including reciprocity and bidirectional rendering algorithms, all in the presence of surfaces and correlated media.
Many geometric quantities can be computed efficiently for convex meshes. For general meshes, methods for approximate convex decomposition have been developed that decompose a static, non-convex object into a small set of approximately convex parts. The convex hulls of those parts can then be used as a piecewise convex approximation to the original mesh.
While previous work was only concerned with static meshes, we present a method for decomposing animated 3D meshes into temporally coherent approximately convex parts. Given a mesh and several training frames---that is, different spatial configurations of its vertices---we precompute an approximate convex decomposition that is independent of any specific frame. Such a decomposition can be transferred in real-time to novel, unseen frames. We apply our method to a variety of pre-animated meshes as well as a 3D character interactively controlled by a user's body pose. We further demonstrate that our method enables real-time physics simulations to interact with animated meshes.
We present a novel framework for creating Möbius-invariant subdivision operators with a simple conversion of existing linear subdivision operators. By doing so, we create a wide variety of subdivision surfaces that have properties derived from Möbius geometry; namely, reproducing spheres, circular arcs, and Möbius regularity. Our method is based on establishing a canonical form for each 1-ring in the mesh, representing the class of all 1-rings that are Möbius equivalent to that 1-ring. We perform a chosen linear subdivision operation on these canonical forms, and blend the positions contributed from adjacent 1-rings, using two novel Möbius-invariant operators, into new face and edge points. The generality of the method allows for easy coarse-to-fine mesh editing with diverse polygonal patterns, and with exact reproduction of circular and spherical features. Our operators are in closed-form and their computation is as local as the computation of the linear operators they correspond to, allowing for efficient subdivision mesh editing and optimization.
Discrete orthogonal geodesic nets (DOGs) are a quad mesh analogue of developable surfaces. In this work we study continuous deformations on these discrete objects. Our main theoretical contribution is the characterization of the shape space of DOGs for a given net connectivity. We show that generally, this space is locally a manifold of a fixed dimension, apart from a set of singularities, implying that DOGs are continuously deformable. Smooth flows can be constructed by a smooth choice of vectors on the manifold's tangent spaces, selected to minimize a desired objective function under a given metric. We show how to compute such vectors by solving a linear system, and we use our findings to devise a geometrically meaningful way to handle singular points. We base our shape space metric on a novel DOG Laplacian operator, which is proved to converge under sampling of an analytical orthogonal geodesic net. We further show how to extend the shape space of DOGs by supporting creases and curved folds and apply the developed tools in an editing system for developable surfaces that supports arbitrary bending, stretching, cutting, (curved) folds, as well as smoothing and subdivision operations.
Space coordinates offer an elegant, scalable and versatile framework to propagate (multi-)scalar functions from the boundary vertices of a 3-manifold, often called a cage, within its volume. These generalizations of the barycentric coordinate system have progressively expanded the range of eligible cages to triangle and planar polygon surface meshes with arbitrary topology, concave regions and a spatially-varying sampling ratio, while preserving a smooth diffusion of the prescribed on-surface functions. In spite of their potential for major computer graphics applications such as freeform deformation or volume texturing, current space coordinate systems have only found a moderate impact in applications. This follows from the constraint of having only triangles in the cage most of the time, while many application scenarios favor arbitrary (non-planar) quad meshes for their ability to align the surface structure with features and to naturally cope with anisotropic sampling. In order to use space coordinates with arbitrary quad cages currently, one must triangulate them, which results in large propagation distortion. Instead, we propose a generalization of a popular coordinate system - Mean Value Coordinates - to quad and tri-quad cages, bridging the gap between high-quality coarse meshing and volume diffusion through space coordinates. Our method can process non-planar quads, comes with a closed-form solution free from global optimization and reproduces the expected behavior of Mean Value Coordinates, namely smoothness within the cage volume and continuity everywhere. As a result, we show how these coordinates compare favorably to classical space coordinates on triangulated quad cages, in particular for freeform deformation.
A common operation in geometry processing is solving symmetric and positive semi-definite systems on a subset of a mesh, with conditions for the vertices at the boundary of the region. This is commonly done by setting up the linear system for the sub-mesh, factorizing the system (potentially applying preordering to improve sparseness of the factors), and then solving by back-substitution. This approach suffers from a comparably high setup cost for each local operation. We propose to reuse factorizations defined on the full mesh to solve linear problems on sub-meshes. We show how an update on sparse matrices can be performed in a particularly efficient way to obtain the factorization of the operator on a sun-mesh significantly outperforming general factor updates and complete refactorization. We analyze the resulting speedup for a variety of situations and demonstrate that our method outperforms factorization of a new matrix by a factor of up to 10 while never being slower in our experiments.
This paper introduces a novel method for realtime portrait animation in a single photo. Our method requires only a single portrait photo and a set of facial landmarks derived from a driving source (e.g., a photo or a video sequence), and generates an animated image with rich facial details. The core of our method is a warp-guided generative model that instantly fuses various fine facial details (e.g., creases and wrinkles), which are necessary to generate a high-fidelity facial expression, onto a pre-warped image. Our method factorizes out the nonlinear geometric transformations exhibited in facial expressions by lightweight 2D warps and leaves the appearance detail synthesis to conditional generative neural networks for high-fidelity facial animation generation. We show such a factorization of geometric transformation and appearance synthesis largely helps the network better learn the high nonlinearity of the facial expression functions and also facilitates the design of the network architecture. Through extensive experiments on various portrait photos from the Internet, we show the significant efficacy of our method compared with prior arts.
We present a method to acquire dynamic properties of facial skin appearance, including dynamic diffuse albedo encoding blood flow, dynamic specular intensity, and per-frame high resolution normal maps for a facial performance sequence. The method reconstructs these maps from a purely passive multi-camera setup, without the need for polarization or requiring temporally multiplexed illumination. Hence, it is very well suited for integration with existing passive systems for facial performance capture. To solve this seemingly underconstrained problem, we demonstrate that albedo dynamics during a facial performance can be modeled as a combination of: (1) a static, high-resolution base albedo map, modeling full skin pigmentation; and (2) a dynamic, one-dimensional component in the CIE L*a*b* color space, which explains changes in hemoglobin concentration due to blood flow. We leverage this albedo subspace and additional constraints on appearance and surface geometry to also estimate specular reflection parameters and resolve high-resolution normal maps with unprecedented detail in a passive capture system. These constraints are built into an inverse rendering framework that minimizes the difference of the rendered face to the captured images, incorporating constraints from multiple views for every texel on the face. The presented method is the first system capable of capturing high-quality dynamic appearance maps at full resolution and video framerates, providing a major step forward in the area of facial appearance acquisition.
Despite the popularity of real-time monocular face tracking systems in many successful applications, one overlooked problem with these systems is rigid instability. It occurs when the input facial motion can be explained by either head pose change or facial expression change, creating ambiguities that often lead to jittery and unstable rigid head poses under large expressions. Existing rigid stabilization methods either employ a heavy anatomically-motivated approach that are unsuitable for real-time applications, or utilize heuristic-based rules that can be problematic under certain expressions. We propose the first rigid stabilization method for real-time monocular face tracking using a dynamic rigidity prior learned from realistic datasets. The prior is defined on a region-based face model and provides dynamic region-based adaptivity for rigid pose optimization during real-time performance. We introduce an effective offline training scheme to learn the dynamic rigidity prior by optimizing the convergence of the rigid pose optimization to the ground-truth poses in the training data. Our real-time face tracking system is an optimization framework that alternates between rigid pose optimization and expression optimization. To ensure tracking accuracy, we combine both robust, drift-free facial landmarks and dense optical flow into the optimization objectives. We evaluate our system extensively against state-of-the-art monocular face tracking systems and achieve significant improvement in tracking accuracy on the high-quality face tracking benchmark. Our system can improve facial-performance-based applications such as facial animation retargeting and virtual face makeup with accurate expression and stable pose. We further validate the dynamic rigidity prior by comparing it against other variants on the tracking accuracy.
In this paper, we present an incremental learning framework for efficient and accurate facial performance tracking. Our approach is to alternate the modeling step, which takes tracked meshes and texture maps to train our deep learning-based statistical model, and the tracking step, which takes predictions of geometry and texture our model infers from measured images and optimize the predicted geometry by minimizing image, geometry and facial landmark errors. Our Geo-Tex VAE model extends the convolutional variational autoencoder for face tracking, and jointly learns and represents deformations and variations in geometry and texture from tracked meshes and texture maps. To accurately model variations in facial geometry and texture, we introduce the decomposition layer in the Geo-Tex VAE architecture which decomposes the facial deformation into global and local components. We train the global deformation with a fully-connected network and the local deformations with convolutional layers. Despite running this model on each frame independently - thereby enabling a high amount of parallelization - we validate that our framework achieves sub-millimeter accuracy on synthetic data and outperforms existing methods. We also qualitatively demonstrate high-fidelity, long-duration facial performance tracking on several actors.
Deep learning systems extensively use convolution operations to process input data. Though convolution is clearly defined for structured data such as 2D images or 3D volumes, this is not true for other data types such as sparse point clouds. Previous techniques have developed approximations to convolutions for restricted conditions. Unfortunately, their applicability is limited and cannot be used for general point clouds. We propose an efficient and effective method to learn convolutions for non-uniformly sampled point clouds, as they are obtained with modern acquisition techniques. Learning is enabled by four key novelties: first, representing the convolution kernel itself as a multilayer perceptron; second, phrasing convolution as a Monte Carlo integration problem, third, using this notion to combine information from multiple samplings at different levels; and fourth using Poisson disk sampling as a scalable means of hierarchical point cloud learning. The key idea across all these contributions is to guarantee adequate consideration of the underlying non-uniform sample distribution function from a Monte Carlo perspective. To make the proposed concepts applicable to real-world tasks, we furthermore propose an efficient implementation which significantly reduces the GPU memory required during the training process. By employing our method in hierarchical network architectures we can outperform most of the state-of-the-art networks on established point cloud segmentation, classification and normal estimation benchmarks. Furthermore, in contrast to most existing approaches, we also demonstrate the robustness of our method with respect to sampling variations, even when training with uniformly sampled data only. To support the direct application of these concepts, we provide a ready-to-use TensorFlow implementation of these layers at https://github.com/viscom-ulm/MCCNN.
We propose a novel approach for performing convolution of signals on curved surfaces and show its utility in a variety of geometric deep learning applications. Key to our construction is the notion of directional functions defined on the surface, which extend the classic real-valued signals and which can be naturally convolved with with real-valued template functions. As a result, rather than trying to fix a canonical orientation or only keeping the maximal response across all alignments of a 2D template at every point of the surface, as done in previous works, we show how information across all rotations can be kept across different layers of the neural network. Our construction, which we call multi-directional geodesic convolution, or directional convolution for short, allows, in particular, to propagate and relate directional information across layers and thus different regions on the shape. We first define directional convolution in the continuous setting, prove its key properties and then show how it can be implemented in practice, for shapes represented as triangle meshes. We evaluate directional convolution in a wide variety of learning scenarios ranging from classification of signals on surfaces, to shape segmentation and shape matching, where we show a significant improvement over several baselines.
Transferring deformation from a source shape to a target shape is a very useful technique in computer graphics. State-of-the-art deformation transfer methods require either point-wise correspondences between source and target shapes, or pairs of deformed source and target shapes with corresponding deformations. However, in most cases, such correspondences are not available and cannot be reliably established using an automatic algorithm. Therefore, substantial user effort is needed to label the correspondences or to obtain and specify such shape sets. In this work, we propose a novel approach to automatic deformation transfer between two unpaired shape sets without correspondences. 3D deformation is represented in a high-dimensional space. To obtain a more compact and effective representation, two convolutional variational autoencoders are learned to encode source and target shapes to their latent spaces. We exploit a Generative Adversarial Network (GAN) to map deformed source shapes to deformed target shapes, both in the latent spaces, which ensures the obtained shapes from the mapping are indistinguishable from the target shapes. This is still an under-constrained problem, so we further utilize a reverse mapping from target shapes to source shapes and incorporate cycle consistency loss, i.e. applying both mappings should reverse to the input shape. This VAE-Cycle GAN (VC-GAN) architecture is used to build a reliable mapping between shape spaces. Finally, a similarity constraint is employed to ensure the mapping is consistent with visual similarity, achieved by learning a similarity neural network that takes the embedding vectors from the source and target latent spaces and predicts the light field distance between the corresponding shapes. Experimental results show that our fully automatic method is able to obtain high-quality deformation transfer results with unpaired data sets, comparable or better than existing methods where strict correspondences are required.
Sketching provides an intuitive user interface for communicating free form shapes. While human observers can easily envision the shapes they intend to communicate, replicating this process algorithmically requires resolving numerous ambiguities. Existing sketch-based modeling methods resolve these ambiguities by either relying on expensive user annotations or by restricting the modeled shapes to specific narrow categories. We present an approach for modeling generic freeform 3D surfaces from sparse, expressive 2D sketches that overcomes both limitations by incorporating convolution neural networks (CNN) into the sketch processing workflow.
Given a 2D sketch of a 3D surface, we use CNNs to infer the depth and normal maps representing the surface. To combat ambiguity we introduce an intermediate CNN layer that models the dense curvature direction, or flow, field of the surface, and produce an additional output confidence map along with depth and normal. The flow field guides our subsequent surface reconstruction for improved regularity; the confidence map trained unsupervised measures ambiguity and provides a robust estimator for data fitting. To reduce ambiguities in input sketches users can refine their input by providing optional depth values at sparse points and curvature hints for strokes. Our CNN is trained on a large dataset generated by rendering sketches of various 3D shapes using non-photo-realistic line rendering (NPR) method that mimics human sketching of free-form shapes. We use the CNN model to process both single- and multi-view sketches. Using our multi-view framework users progressively complete the shape by sketching in different views, generating complete closed shapes. For each new view, the modeling is assisted by partial sketches and depth cues provided by surfaces generated in earlier views. The partial surfaces are fused into a complete shape using predicted confidence levels as weights.
We validate our approach, compare it with previous methods and alternative structures, and evaluate its performance with various modeling tasks. The results demonstrate our method is a new approach for efficiently modeling freeform shapes with succinct but expressive 2D sketches.
Elastically deforming wire structures are lightweight, durable, and can be bent within minutes using CNC bending machines. We present a computational technique for the design of kinetic wire characters, tailored for fabrication on consumer-grade hardware. Our technique takes as input a network of curves or a skeletal animation, then estimates a cable-driven, compliant wire structure which matches user-selected targets or keyframes as closely as possible. To enable large localized deformations, we shape wire into functional spring-like entities at a discrete set of locations. We first detect regions where changes to local stiffness properties are needed, then insert bendable entities of varying shape and size. To avoid a discrete optimization, we first optimize stiffness properties of generic, non-fabricable entities which capture well the behavior of our bendable designs. To co-optimize stiffness properties and cable forces, we formulate an equilibrium-constrained minimization problem, safeguarding against inelastic deformations. We demonstrate our method on six fabricated examples, showcasing rich behavior including large deformations and complex, spatial motion.
We present a fully automatic method that finds a small number of machine fabricable wires with minimal overlap to reproduce a wire sculpture design as a 3D shape abstraction. Importantly, we consider non-planar wires, which can be fabricated by a wire bending machine, to enable efficient construction of complex 3D sculptures that cannot be achieved by previous works. We call our wires Eulerian wires, since they are as Eulerian as possible with small overlap to form the target design together. Finding such Eulerian wires is highly challenging, due to an enormous search space. After exploring a variety of optimization strategies, we formulate a population-based hybrid metaheuristic model, and design the join, bridge and split operators to refine the solution wire sets in the population. We start the exploration of each solution wire set in a bottom-up manner, and adopt an adaptive simulated annealing model to regulate the exploration. By further formulating a meta model on top to optimize the cooling schedule, and precomputing fabricable subwires, our method can efficiently find promising solutions with low wire count and overlap in one to two minutes. We demonstrate the efficiency of our method on a rich variety of wire sculptures, and physically fabricate several of them. Our results show clear improvements over other optimization alternatives in terms of solution quality, versatility, and scalability.
We propose FlexMaps, a novel framework for fabricating smooth shapes out of flat, flexible panels with tailored mechanical properties. We start by mapping the 3D surface onto a 2D domain as in traditional UV mapping to design a set of deformable flat panels called FlexMaps. For these panels, we design and obtain specific mechanical properties such that, once they are assembled, the static equilibrium configuration matches the desired 3D shape. FlexMaps can be fabricated from an almost rigid material, such as wood or plastic, and are made flexible in a controlled way by using computationally designed spiraling microstructures.
Wire art is the creation of three-dimensional sculptural art using wire strands. As the 2D projection of a 3D wire sculpture forms line drawing patterns, it is possible to craft multi-view wire sculpture art --- a static sculpture with multiple (potentially very different) interpretations when perceived at different viewpoints. Artists can effectively leverage this characteristic and produce compelling artistic effects. However, the creation of such multi-view wire sculpture is extremely time-consuming even by highly skilled artists. In this paper, we present a computational framework for automatic creation of multi-view 3D wire sculpture. Our system takes two or three user-specified line drawings and the associated viewpoints as inputs. We start with producing a sparse set of voxels via greedy selection approach such that their projections on the virtual cameras cover all the contour pixels of the input line drawings. The sparse set of voxels, however, do not necessary form one single connected component. We introduce a constrained 3D pathfinding algorithm to link isolated groups of voxels into a connected component while maintaining the similarity between the projected voxels and the line drawings. Using the reconstructed visual hull, we extract a curve skeleton and produce a collection of smooth 3D curves by fitting cubic splines and optimizing the curve deformation to best approximate the provided line drawings. We demonstrate the effectiveness of our system for creating compelling multi-view wire sculptures in both simulation and 3D physical printouts.
In this paper, we present a novel unsupervised learning method for pixelization. Due to the difficulty in creating pixel art, preparing the paired training data for supervised learning is impractical. Instead, we propose an unsupervised learning framework to circumvent such difficulty. We leverage the dual nature of the pixelization and depixelization, and model these two tasks in the same network in a bi-directional manner with the input itself as training supervision. These two tasks are modeled as a cascaded network which consists of three stages for different purposes. GridNet transfers the input image into multi-scale grid-structured images with different aliasing effects. PixelNet associated with GridNet to synthesize pixel arts with sharp edges and perceptually optimal local structures. DepixelNet connects the previous network and aims to recover the pixelized result to the original image. For the sake of unsupervised learning, the mirror loss is proposed to hold the reversibility of feature representations in the process. In addition, adversarial, L1, and gradient losses are involved in the network to obtain pixel arts by retaining color correctness and smoothness. We show that our technique can synthesize crisper and perceptually more appropriate pixel arts than state-of-the-art image downscaling methods. We evaluate the proposed method with extensive experiments on many images. The proposed method outperforms state-of-the-art methods in terms of visual quality and user preference.
Facial caricature is an art form of drawing faces in an exaggerated way to convey humor or sarcasm. In this paper, we propose the first Generative Adversarial Network (GAN) for unpaired photo-to-caricature translation, which we call "CariGANs". It explicitly models geometric exaggeration and appearance stylization using two components: CariGeoGAN, which only models the geometry-to-geometry transformation from face photos to caricatures, and CariStyGAN, which transfers the style appearance from caricatures to face photos without any geometry deformation. In this way, a difficult cross-domain translation problem is decoupled into two easier tasks. The perceptual study shows that caricatures generated by our CariGANs are closer to the hand-drawn ones, and at the same time better persevere the identity, compared to state-of-the-art methods. Moreover, our CariGANs allow users to control the shape exaggeration degree and change the color/texture style by tuning the parameters or giving an example caricature.
We aim to generate high resolution shallow depth-of-field (DoF) images from a single all-in-focus image with controllable focal distance and aperture size. To achieve this, we propose a novel neural network model comprised of a depth prediction module, a lens blur module, and a guided upsampling module. All modules are differentiable and are learned from data. To train our depth prediction module, we collect a dataset of 2462 RGB-D images captured by mobile phones with a dual-lens camera, and use existing segmentation datasets to improve border prediction. We further leverage a synthetic dataset with known depth to supervise the lens blur and guided upsampling modules. The effectiveness of our system and training strategies are verified in the experiments. Our method can generate high-quality shallow DoF images at high resolution, and produces significantly fewer artifacts than the baselines and existing solutions for single image shallow DoF synthesis. Compared with the iPhone portrait mode, which is a state-of-the-art shallow DoF solution based on a dual-lens depth camera, our method generates comparable results, while allowing for greater flexibility to choose focal points and aperture size, and is not limited to one capture setup.
Once a color image is converted to grayscale, it is a common belief that the original color cannot be fully restored, even with the state-of-the-art colorization methods. In this paper, we propose an innovative method to synthesize invertible grayscale. It is a grayscale image that can fully restore its original color. The key idea here is to encode the original color information into the synthesized grayscale, in a way that users cannot recognize any anomalies. We propose to learn and embed the color-encoding scheme via a convolutional neural network (CNN). It consists of an encoding network to convert a color image to grayscale, and a decoding network to invert the grayscale to color. We then design a loss function to ensure the trained network possesses three required properties: (a) color invertibility, (b) grayscale conformity, and (c) resistance to quantization error. We have conducted intensive quantitative experiments and user studies over a large amount of color images to validate the proposed method. Regardless of the genre and content of the color input, convincing results are obtained in all cases.
Low-distortion mapping of three-dimensional surfaces to the plane is a critical problem in geometry processing. The intrinsic distortion introduced by these UV mappings is highly dependent on the choice of surface cuts that form seamlines which break mapping continuity. Parameterization applications typically require UV maps with an application-specific upper bound on distortion to avoid mapping artifacts; at the same time they seek to reduce cut lengths to minimize discontinuity artifacts. We propose OptCuts, an algorithm that jointly optimizes the parameterization and cutting of a three-dimensional mesh. OptCuts starts from an arbitrary initial embedding and a user-requested distortion bound. It requires no parameter setting and automatically seeks to minimize seam lengths subject to satisfying the distortion bound of the mapping computed using these seams. OptCuts alternates between topology and geometry update steps that consistently decrease distortion and seam length, producing a UV map with compact boundaries that strictly satisfies the distortion bound. OptCuts automatically produces high-quality, globally bijective UV maps without user intervention. While OptCuts can thus be a highly effective tool to create new mappings from scratch, we also show how it can be employed to improve pre-existing embeddings. Additionally, when semantic or other priors on seam placement are desired, OptCuts can be extended to respect these user preferences as constraints during optimization of the parameterization. We demonstrate the scalable performance of OptCuts on a wide range of challenging benchmark parameterization examples, as well as in comparisons with state-of-the-art UV methods and commercial tools.
We propose a method for efficiently computing orientation-preserving and approximately continuous correspondences between non-rigid shapes, using the functional maps framework. We first show how orientation preservation can be formulated directly in the functional (spectral) domain without using landmark or region correspondences and without relying on external symmetry information. This allows us to obtain functional maps that promote orientation preservation, even when using descriptors, that are invariant to orientation changes. We then show how higher quality, approximately continuous and bijective pointwise correspondences can be obtained from initial functional maps by introducing a novel refinement technique that aims to simultaneously improve the maps both in the spectral and spatial domains. This leads to a general pipeline for computing correspondences between shapes that results in high-quality maps, while admitting an efficient optimization scheme. We show through extensive evaluation that our approach improves upon state-of-the-art results on challenging isometric and non-isometric correspondence benchmarks according to both measures of continuity and coverage as well as producing semantically meaningful correspondences as measured by the distance to ground truth maps.
In this paper, we introduce a novel and extremely fast algorithm to compute continuous transport maps between 2D probability densities discretized on uniform grids. The core of our method is a novel iterative solver computing the L2 optimal transport map from a grid to the uniform density in the 2D Euclidean plane. A transport map between arbitrary densities is then recovered through numerical inversion and composition. In this case, the resulting map is only approximately optimal, but it is continuous and density preserving. Our solver is derivative-free, and it converges in a few cheap iterations. We demonstrate interactive performance in various applications such as adaptive sampling, feature sensitive remeshing, and caustic design.
We propose a technique for interpolating between probability distributions on discrete surfaces, based on the theory of optimal transport. Unlike previous attempts that use linear programming, our method is based on a dynamical formulation of quadratic optimal transport proposed for flat domains by Benamou and Brenier [2000], adapted to discrete surfaces. Our structure-preserving construction yields a Riemannian metric on the (finite-dimensional) space of probability distributions on a discrete surface, which translates the so-called Otto calculus to discrete language. From a practical perspective, our technique provides a smooth interpolation between distributions on discrete surfaces with less diffusion than state-of-the-art algorithms involving entropic regularization. Beyond interpolation, we show how our discrete notion of optimal transport extends to other tasks, such as distribution-valued Dirichlet problems and time integration of gradient flows.
A variety of structures in nature exhibit sparse, thin, and intricate features. It is challenging to investigate these structural characteristics using conventional numerical approaches since such features require highly refined spatial resolution to capture and therefore they incur a prohibitively high computational cost. We present a novel computational framework for high-resolution topology optimization that delivers leaps in simulation capabilities, by two orders of magnitude, from the state-of-the-art approaches. Our technique accommodates computational domains with over one billion grid voxels on a single shared-memory multiprocessor platform, allowing automated emergence of structures with both rich geometric features and exceptional mechanical performance. To achieve this, we track the evolution of thin structures and simulate its elastic deformation in a dynamic narrow-band region around high-density sites to avoid wasted computational effort on large void regions. We have also designed a mixed-precision multigrid-preconditioned iterative solver that keeps the memory footprint of the simulation to a compact size while maintaining double-precision accuracy. We have demonstrated the efficacy of the algorithm through optimizing a variety of complex structures from both natural and engineering systems.
Large-scale binder jetting provides a promising alternative to manual sculpting of sandstone. The weak build material, however, severely limits its use in architectural ornamentation. We propose a structural optimization that jointly optimizes an ornament's strength-to-weight ratio and balance under self-weight, thermal, wind, and live loads. To account for the difference in the tensile and compressive strength of the build material, we turn the Bresler-Pister criterion into a failure potential, measuring the distance to failure. Integrated into an XFEM-based level set formulation, we minimize this potential by changing the topology and shape of the internal structure. To deal with uncertainties in the location of live loads, and the direction of wind loads, we first estimate loads that lead to the weakest structure, then minimize the potential of failure under identified worst-case loads. With the help of first-order optimality constraints, we unify our worst-case load estimation and structural optimization into a continuous optimization. We demonstrate applications in art, furniture design, and architectural ornamentation with three large-scale 3D printed examples.
Elastic parameter optimization has revealed its importance in 3D modeling, virtual reality, and additive manufacturing in recent years. Unfortunately, it is known to be computationally expensive, especially if there are many parameters and data samples. To address this challenge, we propose to introduce the inexactness into descent methods, by iteratively solving a forward simulation step and a parameter update step in an inexact manner. The development of such inexact descent methods is centered at two questions: 1) how accurate/inaccurate can the two steps be; and 2) what is the optimal way to implement an inexact descent method. The answers to these questions are in our convergence analysis, which proves the existence of relative error thresholds for the two inexact steps to ensure the convergence. This means we can simply solve each step by a fixed number of iterations, if the iterative solver is at least linearly convergent. While the use of the inexact idea speeds up many descent methods, we specifically favor a GPU-based one powered by state-of-the-art simulation techniques. Based on this method, we study a variety of implementation issues, including backtracking line search, initialization, regularization, and multiple data samples. We demonstrate the use of our inexact method in elasticity measurement and design applications. Our experiment shows the method is fast, reliable, memory-efficient, GPU-friendly, flexible with different elastic models, scalable to a large parameter space, and parallelizable for multiple data samples.
The Material Point Method (MPM) has been shown to facilitate effective simulations of physically complex and topologically challenging materials, with a wealth of emerging applications in computational engineering and visual computing. Borne out of the extreme importance of regularity, MPM is given attractive parallelization opportunities on high-performance modern multiprocessors. Parallelization of MPM that fully leverages computing resources presents challenges that require exploring an extensive design-space for favorable data structures and algorithms. Unlike the conceptually simple CPU parallelization, where the coarse partition of tasks can be easily applied, it takes greater effort to reach the GPU hardware saturation due to its many-core SIMT architecture. In this paper we introduce methods for addressing the computational challenges of MPM and extending the capabilities of general simulation systems based on MPM, particularly concentrating on GPU optimization. In addition to our open-source high-performance framework, we also conduct performance analyses and benchmark experiments to compare against alternative design choices which may superficially appear to be reasonable, but can suffer from suboptimal performance in practice. Our explicit and fully implicit GPU MPM solvers are further equipped with a Moving Least Squares MPM heat solver and a novel sand constitutive model to enable fast simulations of a wide range of materials. We demonstrate that more than an order of magnitude performance improvement can be achieved with our GPU solvers. Practical high-resolution examples with up to ten million particles run in less than one minute per frame.
Motivated by augmented and virtual reality applications such as telepresence, there has been a recent focus in real-time performance capture of humans under motion. However, given the real-time constraint, these systems often suffer from artifacts in geometry and texture such as holes and noise in the final rendering, poor lighting, and low-resolution textures. We take the novel approach to augment such real-time performance capture systems with a deep architecture that takes a rendering from an arbitrary viewpoint, and jointly performs completion, super resolution, and denoising of the imagery in real-time. We call this approach neural (re-)rendering, and our live system "LookinGood". Our deep architecture is trained to produce high resolution and high quality images from a coarse rendering in real-time. First, we propose a self-supervised training method that does not require manual ground-truth annotation. We contribute a specialized reconstruction error that uses semantic information to focus on relevant parts of the subject, e.g. the face. We also introduce a salient reweighing scheme of the loss function that is able to discard outliers. We specifically design the system for virtual and augmented reality headsets where the consistency between the left and right eye plays a crucial role in the final user experience. Finally, we generate temporally stable results by explicitly minimizing the difference between two consecutive frames. We tested the proposed system in two different scenarios: one involving a single RGB-D sensor, and upper body reconstruction of an actor, the second consisting of full body 360° capture. Through extensive experimentation, we demonstrate how our system generalizes across unseen sequences and subjects.
We introduce a realtime compression architecture for 4D performance capture that is two orders of magnitude faster than current state-of-the-art techniques, yet achieves comparable visual quality and bitrate. We note how much of the algorithmic complexity in traditional 4D compression arises from the necessity to encode geometry using an explicit model (i.e. a triangle mesh). In contrast, we propose an encoder that leverages an implicit representation (namely a Signed Distance Function) to represent the observed geometry, as well as its changes through time. We demonstrate how SDFs, when defined over a small local region (i.e. a block), admit a low-dimensional embedding due to the innate geometric redundancies in their representation. We then propose an optimization that takes a Truncated SDF (i.e. a TSDF), such as those found in most rigid/non-rigid reconstruction pipelines, and efficiently projects each TSDF block onto the SDF latent space. This results in a collection of low entropy tuples that can be effectively quantized and symbolically encoded. On the decoder side, to avoid the typical artifacts of block-based coding, we also propose a variational optimization that compensates for quantization residuals in order to penalize unsightly discontinuities in the decompressed signal. This optimization is expressed in the SDF latent embedding, and hence can also be performed efficiently. We demonstrate our compression/decompression architecture by realizing, to the best of our knowledge, the first system for streaming a real-time captured 4D performance on consumer-level networks.
Free-viewpoint image-based rendering (IBR) is a standing challenge. IBR methods combine warped versions of input photos to synthesize a novel view. The image quality of this combination is directly affected by geometric inaccuracies of multi-view stereo (MVS) reconstruction and by view- and image-dependent effects that produce artifacts when contributions from different input views are blended. We present a new deep learning approach to blending for IBR, in which we use held-out real image data to learn blending weights to combine input photo contributions. Our Deep Blending method requires us to address several challenges to achieve our goal of interactive free-viewpoint IBR navigation. We first need to provide sufficiently accurate geometry so the Convolutional Neural Network (CNN) can succeed in finding correct blending weights. We do this by combining two different MVS reconstructions with complementary accuracy vs. completeness tradeoffs. To tightly integrate learning in an interactive IBR system, we need to adapt our rendering algorithm to produce a fixed number of input layers that can then be blended by the CNN. We generate training data with a variety of captured scenes, using each input photo as ground truth in a held-out approach. We also design the network architecture and the training loss to provide high quality novel view synthesis, while reducing temporal flickering artifacts. Our results demonstrate free-viewpoint IBR in a wide variety of scenes, clearly surpassing previous methods in visual quality, especially when moving far from the input cameras.
With the rising interest in personalized VR and gaming experiences comes the need to create high quality 3D avatars that are both low-cost and variegated. Due to this, building dynamic avatars from a single unconstrained input image is becoming a popular application. While previous techniques that attempt this require multiple input images or rely on transferring dynamic facial appearance from a source actor, we are able to do so using only one 2D input image without any form of transfer from a source image. We achieve this using a new conditional Generative Adversarial Network design that allows fine-scale manipulation of any facial input image into a new expression while preserving its identity. Our photoreal avatar GAN (paGAN) can also synthesize the unseen mouth interior and control the eye-gaze direction of the output, as well as produce the final image from a novel viewpoint. The method is even capable of generating fully-controllable temporally stable video sequences, despite not using temporal information during training. After training, we can use our network to produce dynamic image-based avatars that are controllable on mobile devices in real time. To do this, we compute a fixed set of output images that correspond to key blendshapes, from which we extract textures in UV space. Using a subject's expression blendshapes at run-time, we can linearly blend these key textures together to achieve the desired appearance. Furthermore, we can use the mouth interior and eye textures produced by our network to synthesize on-the-fly avatar animations for those regions. Our work produces state-of-the-art quality image and video synthesis, and is the first to our knowledge that is able to generate a dynamically textured avatar with a mouth interior, all from a single image.
Image smoothing represents a fundamental component of many disparate computer vision and graphics applications. In this paper, we present a unified unsupervised (label-free) learning framework that facilitates generating flexible and high-quality smoothing effects by directly learning from data using deep convolutional neural networks (CNNs). The heart of the design is the training signal as a novel energy function that includes an edge-preserving regularizer which helps maintain important yet potentially vulnerable image structures, and a spatially-adaptive Lp flattening criterion which imposes different forms of regularization onto different image regions for better smoothing quality. We implement a diverse set of image smoothing solutions employing the unified framework targeting various applications such as, image abstraction, pencil sketching, detail enhancement, texture removal and content-aware image manipulation, and obtain results comparable with or better than previous methods. Moreover, our method is extremely fast with a modern GPU (e.g, 200 fps for 1280×720 images).
Single image superresolution has been a popular research topic in the last two decades and has recently received a new wave of interest due to deep neural networks. In this paper, we approach this problem from a different perspective. With respect to a downsampled low resolution image, we model a high resolution image as a combination of two components, a deterministic component and a stochastic component. The deterministic component can be recovered from the low-frequency signals in the downsampled image. The stochastic component, on the other hand, contains the signals that have little correlation with the low resolution image. We adopt two complementary methods for generating these two components. While generative adversarial networks are used for the stochastic component, deterministic component reconstruction is formulated as a regression problem solved using deep neural networks. Since the deterministic component exhibits clearer local orientations, we design novel loss functions tailored for such properties for training the deep regression network. These two methods are first applied to the entire input image to produce two distinct high-resolution images. Afterwards, these two images are fused together using another deep neural network that also performs local statistical rectification, which tries to make the local statistics of the fused image match the same local statistics of the groundtruth image. Quantitative results and a user study indicate that the proposed method outperforms existing state-of-the-art algorithms with a clear margin.
Sketch or line art colorization is a research field with significant market demand. Different from photo colorization which strongly relies on texture information, sketch colorization is more challenging as sketches may not have texture. Even worse, color, texture, and gradient have to be generated from the abstract sketch lines. In this paper, we propose a semi-automatic learning-based framework to colorize sketches with proper color, texture as well as gradient. Our framework consists of two stages. In the first drafting stage, our model guesses color regions and splashes a rich variety of colors over the sketch to obtain a color draft. In the second refinement stage, it detects the unnatural colors and artifacts, and try to fix and refine the result. Comparing to existing approaches, this two-stage design effectively divides the complex colorization task into two simpler and goal-clearer subtasks. This eases the learning and raises the quality of colorization. Our model resolves the artifacts such as water-color blurring, color distortion, and dull textures.
We build an interactive software based on our model for evaluation. Users can iteratively edit and refine the colorization. We evaluate our learning model and the interactive system through an extensive user study. Statistics shows that our method outperforms the state-of-art techniques and industrial applications in several aspects including, the visual quality, the ability of user control, user experience, and other metrics.
We introduce an extremely scalable and efficient yet simple palette-based image decomposition algorithm. Given an RGB image and set of palette colors, our algorithm decomposes the image into a set of additive mixing layers, each of which corresponds to a palette color applied with varying weight. Our approach is based on the geometry of images in RGBXY-space. This new geometric approach is orders of magnitude more efficient than previous work and requires no numerical optimization. We provide an implementation of the algorithm in 48 lines of Python code. We demonstrate a real-time layer decomposition tool in which users can interactively edit the palette to adjust the layers. After preprocessing, our algorithm can decompose 6 MP images into layers in 20 milliseconds.
Delaunay meshes (DM) are a special type of manifold triangle meshes --- where the local Delaunay condition holds everywhere --- and find important applications in digital geometry processing. This paper addresses the general DM simplification problem: given an arbitrary manifold triangle mesh M with n vertices and the user-specified resolution m (< n), compute a Delaunay mesh M* with m vertices that has the least Hausdorffdistance to M. To solve the problem, we abstract the simplification process using a 2D Cartesian grid model, in which each grid point corresponds to triangle meshes with a certain number of vertices and a simplification process is a monotonic path on the grid. We develop a novel differential-evolution-based method to compute a low-cost path, which leads to a high quality Delaunay mesh. Extensive evaluation shows that our method consistently outperforms the existing methods in terms of approximation error. In particular, our method is highly effective for small-scale CAD models and man-made objects with sharp features but less details. Moreover, our method is fully automatic and can preserve sharp features well and deal with models with multiple components, whereas the existing methods often fail.
Mimicking natural tessellation patterns is a fascinating multi-disciplinary problem. Geometric methods aiming at reproducing such partitions on surface meshes are commonly based on the Voronoi model and its variants, and are often faced with challenging issues such as metric estimation, geometric, topological complications, and most critically, parallelization. In this paper, we introduce an alternate model which may be of value for resolving these issues. We drop the assumption that regions need to be separated by lines. Instead, we regard region boundaries as narrow bands and we model the partition as a set of smooth functions layered over the surface. Given an initial set of seeds or regions, the partition emerges as the solution of a time dependent set of partial differential equations describing concurrently evolving fronts on the surface. Our solution does not require geodesic estimation, elaborate numerical solvers, or complicated bookkeeping data structures. The cost per time-iteration is dominated by the multiplication and addition of two sparse matrices. Extension of our approach in a Lloyd's algorithm fashion can be easily achieved and the extraction of the dual mesh can be conveniently preformed in parallel through matrix algebra. As our approach relies mainly on basic linear algebra kernels, it lends itself to efficient implementation on modern graphics hardware.
We propose a GPU algorithm that computes a 3D Voronoi diagram. Our algorithm is tailored for applications that solely make use of the geometry of the Voronoi cells, such as Lloyd's relaxation used in meshing, or some numerical schemes used in fluid simulations and astrophysics. Since these applications only require the geometry of the Voronoi cells, they do not need the combinatorial mesh data structure computed by the classical algorithms (Bowyer-Watson). Thus, by exploiting the specific spatial distribution of the point-sets used in this type of applications, our algorithm computes each cell independently, in parallel, based on its nearest neighbors. In addition, we show how to compute integrals over the Voronoi cells by decomposing them on the fly into tetrahedra, without needing to compute any global combinatorial information. The advantages of our algorithm is that it is fast, very simple to implement, has constant memory usage per thread and does not need any synchronization primitive. These specificities make it particularly efficient on the GPU: it gains one order of magnitude as compared to the fastest state-of-the-art multi-core CPU implementations. To ease the reproducibility of our results, the full documented source code is included in the supplemental material.
This article answers an important theoretical question: How many different subdivisions of the hexahedron into tetrahedra are there? It is well known that the cube has five subdivisions into 6 tetrahedra and one subdivision into 5 tetrahedra. However, all hexahedra are not cubes and moving the vertex positions increases the number of subdivisions. Recent hexahedral dominant meshing methods try to take these configurations into account for combining tetrahedra into hexahedra, but fail to enumerate them all: they use only a set of 10 subdivisions among the 174 we found in this article.
The enumeration of these 174 subdivisions of the hexahedron into tetrahedra is our combinatorial result. Each of the 174 subdivisions has between 5 and 15 tetrahedra and is actually a class of 2 to 48 equivalent instances which are identical up to vertex relabeling. We further show that exactly 171 of these subdivisions have a geometrical realization, i.e. there exist coordinates of the eight hexahedron vertices in a three-dimensional space such that the geometrical tetrahedral mesh is valid. We exhibit the tetrahedral meshes for these configurations and show in particular subdivisions of hexahedra with 15 tetrahedra that have a strictly positive Jacobian.
Capturing spatially-varying bidirectional reflectance distribution functions (SVBRDFs) of 3D objects with just a single, hand-held camera (such as an off-the-shelf smartphone or a DSLR camera) is a difficult, open problem. Previous works are either limited to planar geometry, or rely on previously scanned 3D geometry, thus limiting their practicality. There are several technical challenges that need to be overcome: First, the built-in flash of a camera is almost colocated with the lens, and at a fixed position; this severely hampers sampling procedures in the light-view space. Moreover, the near-field flash lights the object partially and unevenly. In terms of geometry, existing multiview stereo techniques assume diffuse reflectance only, which leads to overly smoothed 3D reconstructions, as we show in this paper. We present a simple yet powerful framework that removes the need for expensive, dedicated hardware, enabling practical acquisition of SVBRDF information from real-world, 3D objects with a single, off-the-shelf camera with a built-in flash. In addition, by removing the diffuse reflection assumption and leveraging instead such SVBRDF information, our method outputs high-quality 3D geometry reconstructions, including more accurate high-frequency details than state-of-the-art multiview stereo techniques. We formulate the joint reconstruction of SVBRDFs, shading normals, and 3D geometry as a multi-stage, iterative inverse-rendering reconstruction pipeline. Our method is also directly applicable to any existing multiview 3D reconstruction technique. We present results of captured objects with complex geometry and reflectance; we also validate our method numerically against other existing approaches that rely on dedicated hardware, additional sources of information, or both.
Capturing appearance often requires dense sampling in light-view space, which is often achieved in specialized, expensive hardware setups. With the aim of realizing a compact acquisition setup without multiple angular samples of light and view, we sought to leverage an alternative optical property of light, polarization. To this end, we capture a set of polarimetric images with linear polarizers in front of a single projector and camera to obtain the appearance and normals of real-world objects. We encountered two technical challenges: First, no complete polarimetric BRDF model is available for modeling mixed polarization of both specular and diffuse reflection. Second, existing polarization-based inverse rendering methods are not applicable to a single local illumination setup since they are formulated with the assumption of spherical illumination. To this end, we first present a complete polarimetric BRDF (pBRDF) model that can define mixed polarization of both specular and diffuse reflection. Second, by leveraging our pBRDF model, we propose a novel inverse-rendering method with joint optimization of pBRDF and normals to capture spatially-varying material appearance: per-material specular properties (including the refractive index, specular roughness and specular coefficient), per-pixel diffuse albedo and normals. Our method can solve the severely ill-posed inverse-rendering problem by carefully accounting for the physical relationship between polarimetric appearance and geometric properties. We demonstrate how our method overcomes limited sampling in light-view space for inverse rendering by means of polarization.
Reconstructing shape and reflectance properties from images is a highly under-constrained problem, and has previously been addressed by using specialized hardware to capture calibrated data or by assuming known (or highly constrained) shape or reflectance. In contrast, we demonstrate that we can recover non-Lambertian, spatially-varying BRDFs and complex geometry belonging to any arbitrary shape class, from a single RGB image captured under a combination of unknown environment illumination and flash lighting. We achieve this by training a deep neural network to regress shape and reflectance from the image. Our network is able to address this problem because of three novel contributions: first, we build a large-scale dataset of procedurally generated shapes and real-world complex SVBRDFs that approximate real world appearance well. Second, single image inverse rendering requires reasoning at multiple scales, and we propose a cascade network structure that allows this in a tractable manner. Finally, we incorporate an in-network rendering layer that aids the reconstruction task by handling global illumination effects that are important for real-world scenes. Together, these contributions allow us to tackle the entire inverse rendering problem in a holistic manner and produce state-of-the-art results on both synthetic and real data.
Relighting of human images has various applications in image synthesis. For relighting, we must infer albedo, shape, and illumination from a human portrait. Previous techniques rely on human faces for this inference, based on spherical harmonics (SH) lighting. However, because they often ignore light occlusion, inferred shapes are biased and relit images are unnaturally bright particularly at hollowed regions such as armpits, crotches, or garment wrinkles. This paper introduces the first attempt to infer light occlusion in the SH formulation directly. Based on supervised learning using convolutional neural networks (CNNs), we infer not only an albedo map, illumination but also a light transport map that encodes occlusion as nine SH coefficients per pixel. The main difficulty in this inference is the lack of training datasets compared to unlimited variations of human portraits. Surprisingly, geometric information including occlusion can be inferred plausibly even with a small dataset of synthesized human figures, by carefully preparing the dataset so that the CNNs can exploit the data coherency. Our method accomplishes more realistic relighting than the occlusion-ignored formulation.
We propose a workflow for spectral reproduction of paintings, which captures a painting's spectral color, invariant to illumination, and reproduces it using multi-material 3D printing. We take advantage of the current 3D printers' capabilities of combining highly concentrated inks with a large number of layers, to expand the spectral gamut of a set of inks. We use a data-driven method to both predict the spectrum of a printed ink stack and optimize for the stack layout that best matches a target spectrum. This bidirectional mapping is modeled using a pair of neural networks, which are optimized through a problem-specific multi-objective loss function. Our loss function helps find the best possible ink layout resulting in the balance between spectral reproduction and colorimetric accuracy under a multitude of illuminants. In addition, we introduce a novel spectral vector error diffusion algorithm based on combining color contoning and halftoning, which simultaneously solves the layout discretization and color quantization problems, accurately and efficiently. Our workflow outperforms the state-of-the-art models for spectral prediction and layout optimization. We demonstrate reproduction of a number of real paintings and historically important pigments using our prototype implementation that uses 10 custom inks with varying spectra and a resin-based 3D printer.
We present two novel and complimentary approaches to measure diffraction effects in commonly found planar spatially varying holographic surfaces. Such surfaces are increasingly found in various decorative materials such as gift bags, holographic papers, clothing and security holograms, and produce impressive visual effects that have not been previously acquired for realistic rendering. Such holographic surfaces are usually manufactured with one dimensional diffraction gratings that are varying in periodicity and orientation over an entire sample in order to produce a wide range of diffraction effects such as gradients and kinematic (rotational) effects. Our proposed methods estimate these two parameters and allow an accurate reproduction of these effects in real-time. The first method simply uses a point light source to recover both the grating periodicity and orientation in the case of regular and stochastic textures. Under the assumption that the sample is made of the same repeated diffractive tile, good results can be obtained using just one to five photographs on a wide range of samples. The second method is based on polarization imaging and enables an independent high resolution measurement of the grating orientation and relative periodicity at each surface point. The method requires a minimum of four photographs for accurate results, does not assume repetition of an exemplar tile, and can even reveal minor fabrication defects. We present point light source renderings with both approaches that qualitatively match photographs, as well as real-time renderings under complex environmental illumination.
The bidirectional reflectance distribution function (BRDF) is crucial for modeling the appearance of real-world materials. In production rendering, analytic BRDF models are often used to approximate the surface appearance since they are compact and flexible. Measured BRDFs usually have a more realistic appearance, but consume much more storage and are hard to modify. In this paper, we propose a novel framework for connecting measured and analytic BRDFs. First, we develop a robust method for separating a measured BRDF into diffuse and specular components. This is commonly done in analytic models, but has been difficult previously to do explicitly for measured BRDFs. This diffuse-specular separation allows novel measured BRDF editing on the diffuse and specular parts separately. In addition, we conduct analysis on each part of the measured BRDF, and demonstrate a more intuitive and lower-dimensional PCA model than Nielsen et al. [2015]. In fact, our measured BRDF model has the same number of parameters (8 parameters) as the commonly used analytic models, such as the GGX model. Finally, we visualize the analytic and measured BRDFs in the same space, and directly demonstrate their similarities and differences. We also design an analytic fitting algorithm for two-lobe materials, which is more robust, efficient and simple, compared to previous non-convex optimization-based analytic fitting methods.
One of the key ingredients of any physically based rendering system is a detailed specification characterizing the interaction of light and matter of all materials present in a scene, typically via the Bidirectional Reflectance Distribution Function (BRDF). Despite their utility, access to real-world BRDF datasets remains limited: this is because measurements involve scanning a four-dimensional domain at sufficient resolution, a tedious and often infeasibly time-consuming process.
We propose a new parameterization that automatically adapts to the behavior of a material, warping the underlying 4D domain so that most of the volume maps to regions where the BRDF takes on non-negligible values, while irrelevant regions are strongly compressed. This adaptation only requires a brief 1D or 2D measurement of the material's retro-reflective properties. Our parameterization is unified in the sense that it combines several steps that previously required intermediate data conversions: the same mapping can simultaneously be used for BRDF acquisition, storage, and it supports efficient Monte Carlo sample generation.
We observe that the above desiderata are satisfied by a core operation present in modern rendering systems, which maps uniform variates to direction samples that are proportional to an analytic BRDF. Based on this insight, we define our adaptive parameterization as an invertible, retro-reflectively driven mapping between the parametric and directional domains. We are able to create noise-free renderings of existing BRDF datasets after conversion into our representation with the added benefit that the warped data is significantly more compact, requiring 16KiB and 544KiB per spectral channel for isotropic and anisotropic specimens, respectively.
Finally, we show how to modify an existing gonio-photometer to provide the needed retro-reflection measurements. Acquisition then proceeds within a 4D space that is warped by our parameterization. We demonstrate the efficacy of this scheme by acquiring the first set of spectral BRDFs of surfaces exhibiting arbitrary roughness, including anisotropy.
Microfacet theory concisely models light transport over rough surfaces. Specular reflection is the result of single mirror reflections on each facet, while exact computation of multiple scattering is either neglected, or modeled using costly importance sampling techniques. Practical but accurate simulation of multiple scattering in microfacet theory thus remains an open challenge. In this work, we revisit the traditional V-groove cavity model and derive an analytical, cost-effective solution for multiple scattering in rough surfaces. Our kaleidoscopic model is made up of both real and virtual V-grooves, and allows us to calculate higher-order scattering in the microfacets in an analytical fashion. We then extend our model to include nonsymmetric grooves, allowing for additional degrees of freedom on the surface geometry, improving multiple reflections at grazing angles with backward compatibility to traditional normal distribution functions. We validate the accuracy of our model against ground-truth Monte Carlo simulations, and demonstrate its flexibility on anisotropic and textured materials. Our model is analytical, does not introduce significant cost and variance, can be seamless integrated in any rendering engine, preserves reciprocity and energy conservation, and is suitable for bidirectional methods.
Microfacet-based reflection models are the most common way to represent reflection from rough surfaces. However, a major current limitation of these models is that they only account for single scattering. Unfortunately, single scattering models do not preserve energy. In this paper, we develop a microfacet BRDF for specular v-grooves that includes multiple scattering. Our approach is based on previous work by Zipin, who showed that the number of reflections inside a specular v-groove is bounded and analytically computable. Using his insight, we present a closed form solution for the BRDF and its probability density function (PDF); we also present a method for importance sampling the BRDF. As a result, our BRDF can be easily used within a path-traced rendering system such as PBRT. The model supports any microfacet distribution function, and spatially-varying surface roughness. The images produced by the model have a pleasing appearance compared to traditional single-scattering models.
Realistic rendering with materials that exhibit high-frequency spatial variation remains a challenge, as eliminating spatial and temporal aliasing requires prohibitively high sampling rates. Recent work has made the problem more tractable, however existing methods remain prohibitively expensive when using large environmental lights and/or (correctly filtered) global illumination. We present an appearance model with explicit high-frequency micro-normal variation, and a filtering approach that scales to multi-dimensional shading integrals. By combining a novel and compact half-vector histogram scheme with a directional basis expansion, we accurately compute the integral of filtered high-frequency reflectance over large lights with angularly varying emission. Our approach is scalable, rendering images indistinguishable from ground truth at over 10× the speed of the state-of-the-art and with only 15% the memory footprint. When filtering appearance with global illumination, we outperform the state-of-the-art by ~30×.
Markov chain Monte Carlo (MCMC) rendering utilizes a sequence of correlated path samples which is obtained by iteratively mutating the current state to the next. The efficiency of MCMC rendering depends on how well the mutation strategy is designed to adapt to the local structure of the state space. We present a novel MCMC rendering method that automatically adapts the step sizes of the mutations to the geometry of the rendered scene. Our geometry-aware path space perturbation largely avoids tentative samples with zero contribution due to occlusion. Our method limits the mutation step size by estimating the maximum opening angle of a cone, centered around a segment of a light transport path, where no geometry obstructs visibility. This geometry-aware mutation increases the acceptance rates, while not degrading the sampling quality. As this cone estimation introduces a considerable overhead if done naively, to make our approach efficient, we discuss and analyze fast approximate methods for cone angle estimation which utilize the acceleration structure already present for the ray-geometry intersection. Our new approach, integrated into the framework of Metropolis light transport, can achieve results with lower error and less artifact in equal time compared to current path space mutation techniques.
Real-world materials are often layered: metallic paints, biological tissues, and many more. Variation in the interface and volumetric scattering properties of the layers leads to a rich diversity of material appearances from anisotropic highlights to complex textures and relief patterns. However, simulating light-layer interactions is a challenging problem. Past analytical or numerical solutions either introduce several approximations and limitations, or rely on expensive operations on discretized BSDFs, preventing the ability to freely vary the layer properties spatially. We introduce a new unbiased layered BSDF model based on Monte Carlo simulation, whose only assumption is the layer assumption itself. Our novel position-free path formulation is fundamentally more powerful at constructing light transport paths than generic light transport algorithms applied to the special case of flat layers, since it is based on a product of solid angle instead of area measures, so does not contain the high-variance geometry terms needed in the standard formulation. We introduce two techniques for sampling the position-free path integral, a forward path tracer with next-event estimation and a full bidirectional estimator. We show a number of examples, featuring multiple layers with surface and volumetric scattering, surface and phase function anisotropy, and spatial variation in all parameters.
For a given PDE problem, three main factors affect the accuracy of FEM solutions: basis order, mesh resolution, and mesh element quality. The first two factors are easy to control, while controlling element shape quality is a challenge, with fundamental limitations on what can be achieved.
We propose to use p-refinement (increasing element degree) to decouple the approximation error of the finite element method from the domain mesh quality for elliptic PDEs.
Our technique produces an accurate solution even on meshes with badly shaped elements, with a slightly higher running time due to the higher cost of high-order elements. We demonstrate that it is able to automatically adapt the basis to badly shaped elements, ensuring an error consistent with high-quality meshing, without any per-mesh parameter tuning. Our construction reduces to traditional fixed-degree FEM methods on high-quality meshes with identical performance.
Our construction decreases the burden on meshing algorithms, reducing the need for often expensive mesh optimization and automatically compensates for badly shaped elements, which are present due to boundary constraints or limitations of current meshing methods. By tackling mesh generation and finite element simulation jointly, we obtain a pipeline that is both more efficient and more robust than combinations of existing state of the art meshing and FEM algorithms.
We propose a novel discrete scheme for simulating viscous thin films at real-time frame rates. Our scheme is based on a new formulation of the gradient flow approach, that leads to a discretization based on local stencils that are easily computable on the GPU. Our approach has physical fidelity, as the total mass is guaranteed to be preserved, an appropriate discrete energy is controlled, and the film height is guaranteed to be non-negative at all times. In addition, and unlike all existing methods for thin films simulation, it is fast enough to allow realtime interaction with the flow, for designing initial conditions and controlling the forces during the simulation.
We propose a method for accurately simulating dissipative forces in deformable bodies when using optimization-based integrators. We represent such forces using dissipation functions which may be nonlinear in both positions and velocities, enabling us to model a range of dissipative effects including Coulomb friction, Rayleigh damping, and power-law dissipation. We propose a general method for incorporating dissipative forces into optimization-based time integration schemes, which hitherto have been applied almost exclusively to systems with only conservative forces. To improve accuracy and minimize artificial damping, we provide an optimization-based version of the second-order accurate TR-BDF2 integrator. Finally, we present a method for modifying arbitrary dissipation functions to conserve linear and angular momentum, allowing us to eliminate the artificial angular momentum loss caused by Rayleigh damping.
We propose a technique to simulate granular materials that exploits the dual strengths of discrete and continuum treatments. Discrete element simulations provide unmatched levels of detail and generality, but prove excessively costly when applied to large scale systems. Continuum approaches are computationally tractable, but limited in applicability due to built-in modeling assumptions; e.g., models suitable for granular flows typically fail to capture clogging, bouncing and ballistic motion. In our hybrid approach, an oracle dynamically partitions the domain into continuum regions where safe, and discrete regions where necessary. The domains overlap along transition zones, where a Lagrangian dynamics mass-splitting coupling principle enforces agreement between the two simulation states. Enrichment and homogenization operations allow the partitions to evolve over time. This approach accurately and efficiently simulates scenarios that previously required an entirely discrete treatment.