Optimizing distortion energies over a mesh, in two or three dimensions, is a common and critical problem in physical simulation and geometry processing. We present three new improvements to the state of the art: a barrier-aware line-search filter that cures blocked descent steps due to element barrier terms and so enables rapid progress; an energy proxy model that adaptively blends the Sobolev (inverse-Laplacian-processed) gradient and L-BFGS descent to gain the advantages of both, while avoiding L-BFGS's current limitations in distortion optimization tasks; and a characteristic gradient norm providing a robust and largely mesh- and energy-independent convergence criterion that avoids wrongful termination when algorithms temporarily slow their progress. Together these improvements form the basis for Blended Cured Quasi-Newton (BCQN), a new distortion optimization algorithm. Over a wide range of problems over all scales we show that BCQN is generally the fastest and most robust method available, making some previously intractable problems practical while offering up to an order of magnitude improvement in others.
We propose a novel approach, called Progressive Parameterizations, to compute foldover-free parameterizations with low isometric distortion on disk topology meshes. Instead of using the input mesh as a reference to define the objective function, we introduce a progressive reference that contains bounded distortion to the parameterized mesh and is as close as possible to the input mesh. After optimizing the bounded distortion energy between the progressive reference and the parameterized mesh, the parameterized mesh easily approaches the progressive reference, thereby also coming close to the input. By iteratively generating the progressive reference and optimizing the bounded distortion energy to update the parameterized mesh, our algorithm achieves high-quality parameterizations with strong practical reliability and high efficiency. We have demonstrated that our algorithm succeeds on a massive test data set containing over 20712 complex disk topology meshes. Compared to the state-of-the-art methods, our method has achieved higher computational efficiency and practical reliability.
Many computer graphics problems require computing geometric shapes subject to certain constraints. This often results in non-linear and non-convex optimization problems with globally coupled variables, which pose great challenge for interactive applications. Local-global solvers developed in recent years can quickly compute an approximate solution to such problems, making them an attractive choice for applications that prioritize efficiency over accuracy. However, these solvers suffer from lower convergence rate, and may take a long time to compute an accurate result. In this paper, we propose a simple and effective technique to accelerate the convergence of such solvers. By treating each local-global step as a fixed-point iteration, we apply Anderson acceleration, a well-established technique for fixed-point solvers, to speed up the convergence of a local-global solver. To address the stability issue of classical Anderson acceleration, we propose a simple strategy to guarantee the decrease of target energy and ensure its global convergence. In addition, we analyze the connection between Anderson acceleration and quasi-Newton methods, and show that the canonical choice of its mixing parameter is suitable for accelerating local-global solvers. Moreover, our technique is effective beyond classical local-global solvers, and can be applied to iterative methods with a common structure. We evaluate the performance of our technique on a variety of geometry optimization and physics simulation problems. Our approach significantly reduces the number of iterations required to compute an accurate result, with only a slight increase of computational cost per iteration. Its simplicity and effectiveness makes it a promising tool for accelerating existing algorithms as well as designing efficient new algorithms.
Inside-outside determination is a basic building block for higher-level geometry processing operations. Generalized winding numbers provide a robust answer for triangle meshes, regardless of defects such as self-intersections, holes or degeneracies. In this paper, we further generalize the winding number to point clouds. Previous methods for evaluating the winding number are slow for completely disconnected surfaces, such as triangle soups or-in the extreme case- point clouds. We propose a tree-based algorithm to reduce the asymptotic complexity of generalized winding number computation, while closely approximating the exact value. Armed with a fast evaluation, we demonstrate the winding number in a variety of new applications: voxelization, signing distances, generating 3D printer paths, defect-tolerant mesh booleans and point set surfaces.
We present a novel algorithm for computing the medial axes of 3D shapes. We make the observation that the medial axis of a voxel shape can be simply yet faithfully approximated by the interior Voronoi diagram of the boundary vertices, which we call the voxel core. We further show that voxel cores can approximate the medial axes of any smooth shape with homotopy equivalence and geometric convergence. These insights motivate an algorithm that is simple, efficient, numerically stable, and equipped with theoretical guarantees. Compared with existing voxel-based methods, our method inherits their simplicity but is more scalable and can process significantly larger inputs. Compared with sampling-based methods that offer similar theoretical guarantees, our method produces visually comparable results but more robustly captures the topology of the input shape.
Self-intersecting, or nearly self-intersecting, meshes are commonly found in 2D and 3D computer graphics practice. Self-intersections occur, for example, in the process of artist manual work, as a by-product of procedural methods for mesh generation, or due to modeling errors introduced by scanning equipment. If the space bounded by such inputs is meshed naively, the resulting mesh joins ("glues") self-overlapping parts, precluding efficient further modeling and animation of the underlying geometry. Similarly, near self-intersections force the simulation algorithm to employ an unnecessarily detailed mesh to separate the nearly self-intersecting regions. Our work addresses both of these challenges, by giving an algorithm to generate an "un-glued" simulation mesh, of arbitrary user-chosen resolution, that properly accounts for self-intersections and near self-intersections. In order to achieve this result, we study the mathematical concept of immersion, and give a deterministic and constructive algorithm to determine if the input self-intersecting triangle mesh is the boundary of an immersion. For near self-intersections, we give a robust algorithm to properly duplicate mesh elements and correctly embed the underlying geometry into the mesh element copies. Both the self-intersections and near self-intersections are combined into one algorithm that permits successful meshing at arbitrary resolution. Applications of our work include volumetric shape editing, physically based simulation and animation, and volumetric weight and geodesic distance computation on self-intersecting inputs.
Surface reconstruction is one of the central problems in computer graphics. Existing research on this problem has primarily focused on improving the geometric aspects of the reconstruction (e.g., smoothness, features, element quality, etc.), and little attention has been paid to ensure it also has desired topological properties (e.g., connectedness and genus). In this paper, we propose a novel and general optimization method for surface reconstruction under topological constraints. The input to our method is a prescribed genus for the reconstructed surface, a partition of the ambient volume into cells, and a set of possible surface candidates and their associated energy within each cell. Our method computes one candidate per cell so that their union is a connected surface with the prescribed genus that minimizes the total energy. We formulate the task as an integer program, and propose a novel solution that combines convex relaxations within a branch and bound framework. As our method is oblivious of the type of input cells, surface candidates, and energy, it can be applied to a variety of reconstruction scenarios, and we explore two of them in the paper: reconstruction from cross-section slices and iso-surfacing an intensity volume. In the first scenario, our method outperforms an existing topology-aware method particularly for complex inputs and higher genus constraints. In the second scenario, we demonstrate the benefit of topology control over classical topology-oblivious methods such as Marching Cubes.
We propose the first deep learning approach for exemplar-based local colorization. Given a reference color image, our convolutional neural network directly maps a grayscale image to an output colorized image. Rather than using hand-crafted rules as in traditional exemplar-based methods, our end-to-end colorization network learns how to select, propagate, and predict colors from the large-scale data. The approach performs robustly and generalizes well even when using reference images that are unrelated to the input grayscale image. More importantly, as opposed to other learning-based colorization methods, our network allows the user to achieve customizable results by simply feeding different references. In order to further reduce manual effort in selecting the references, the system automatically recommends references with our proposed image retrieval algorithm, which considers both semantic and luminance information. The colorization can be performed fully automatically by simply picking the top reference suggestion. Our approach is validated through a user study and favorable quantitative comparisons to the-state-of-the-art methods. Furthermore, our approach can be naturally extended to video colorization. Our code and models are freely available for public use.
A fully automatic method for descreening halftone images is presented based on convolutional neural networks with end-to-end learning. Incorporating context level information, the proposed method not only removes halftone artifacts but also synthesizes the fine details lost during halftone. The method consists of two main stages. In the first stage, intrinsic features of the scene are extracted, the low-frequency reconstruction of the image is estimated, and halftone patterns are removed. For the intrinsic features, the edges and object-categories are estimated and fed to the next stage as strong visual and contextual cues. In the second stage, fine details are synthesized on top of the low-frequency output based on an adversarial generative model. In addition, the novel problem of rescreening is addressed, where a natural input image is halftoned so as to be similar to a separately given reference halftone image. To this end, a two-stage convolutional neural network is also presented. Both networks are trained with millions of before-and-after example image pairs of various halftone styles. Qualitative and quantitative evaluations are provided, which demonstrates the effectiveness of the proposed methods.
The real world exhibits an abundance of non-stationary textures. Examples include textures with large scale structures, as well as spatially variant and inhomogeneous textures. While existing example-based texture synthesis methods can cope well with stationary textures, non-stationary textures still pose a considerable challenge, which remains unresolved. In this paper, we propose a new approach for example-based non-stationary texture synthesis. Our approach uses a generative adversarial network (GAN), trained to double the spatial extent of texture blocks extracted from a specific texture exemplar. Once trained, the fully convolutional generator is able to expand the size of the entire exemplar, as well as of any of its sub-blocks. We demonstrate that this conceptually simple approach is highly effective for capturing large scale structures, as well as other non-stationary attributes of the input exemplar. As a result, it can cope with challenging textures, which, to our knowledge, no other existing method can handle.
We resolve the longstanding problem of simulating the contact-mediated interaction of cloth and sharp geometric features by introducing an Eulerian-on-Lagrangian (EOL) approach to cloth simulation. Unlike traditional Lagrangian approaches to cloth simulation, our EOL approach permits bending exactly at and sliding over sharp edges, avoiding parasitic locking caused by over-constraining contact constraints. Wherever the cloth is in contact with sharp features, we insert EOL vertices into the cloth, while the rest of the cloth is simulated in the standard Lagrangian fashion. Our algorithm manifests as new equations of motion for EOL vertices, a contact-conforming remesher, and a set of simple constraint assignment rules, all of which can be incorporated into existing state-of-the-art cloth simulators to enable smooth, inequality-constrained contact between cloth and objects in the world.
We propose a method for simulating the complex dynamics of partially and fully saturated woven and knit fabrics interacting with liquid, including the effects of buoyancy, nonlinear drag, pore (capillary) pressure, dripping, and convection-diffusion. Our model evolves the velocity fields of both the liquid and solid relying on mixture theory, as well as tracking a scalar saturation variable that affects the pore pressure forces in the fluid. We consider the porous microstructure implied by the fibers composing individual threads, and use it to derive homogenized drag and pore pressure models that faithfully reflect the anisotropy of fabrics. In addition to the bulk liquid and fabric motion, we derive a quasi-static flow model that accounts for liquid spreading within the fabric itself. Our implementation significantly extends standard numerical cloth and fluid models to support the diverse behaviors of wet fabric, and includes a numerical method tailored to cope with the challenging nonlinearities of the problem. We explore a range of fabric-water interactions to validate our model, including challenging animation scenarios involving splashing, wringing, and collisions with obstacles, along with qualitative comparisons against simple physical experiments.
Cloth dynamics plays an important role in the visual appearance of moving characters. Properly accounting for contact and friction is of utmost importance to avoid cloth-body and cloth-cloth penetration and to capture typical folding and stick-slip behavior due to dry friction. We present here the first method able to account for cloth contact with exact Coulomb friction, treating both cloth self-contacts and contacts occurring between the cloth and an underlying character. Our key contribution is to observe that for a nodal system like cloth, the frictional contact problem may be formulated based on velocities as primary variables, without having to compute the costly Delassus operator. Then, by reversing the roles classically played by the velocities and the contact impulses, conical complementarity solvers of the literature can be adapted to solve for compatible velocities at nodes. To handle the full complexity of cloth dynamics scenarios, we have extended this base algorithm in two ways: first, towards the accurate treatment of frictional contact at any location of the cloth, through an adaptive node refinement strategy; second, towards the handling of multiple constraints at each node, through the duplication of constrained nodes and the adding of pin constraints between duplicata. Our method allows us to handle the complex cloth-cloth and cloth-body interactions in full-size garments with an unprecedented level of realism compared to former methods, while maintaining reasonable computational timings.
Being able to customize sewing patterns for different human bodies without using any pre-defined adjustment rule will not only improve the realism of virtual humans in the entertainment industry, but also deeply affect the fashion industry by making fast fashion and made-to-measure garments more accessible. To meet the requirement set by the fashion industry, a sewing pattern adjustment system must be both efficient and precise, which unfortunately cannot be achieved by existing techniques. In this paper, we propose to solve sewing pattern adjustment as a nonlinear optimization problem immediately, rather than in two phases: a garment shape optimization phase and an inverse pattern design phase as in previous systems. This allows us to directly minimize the objective function that evaluates the fitting quality of the garment sewn from a pattern, without any compromise caused by the nonexistence of the solution to inverse pattern design. To improve the efficiency of our system, we carry out systematic research on a variety of optimization topics, including pattern parametrization, initialization, an inexact strategy, acceleration, and CPU-GPU implementation. We verify the usability of our system through automatic grading tests and made-to-measure tests. Designers and pattern makers confirm that our pattern results are able to preserve design details and their fitting qualities are acceptable. In our computational experiment, the system further demonstrates its efficiency, reliability, and flexibility of handling various pattern designs. While our current system still needs to overcome certain limitations, we believe it is a crucial step toward fully automatic pattern design and adjustment in the future.
Spherical Harmonic (SH) lighting is widely used for real-time rendering within Precomputed Radiance Transfer (PRT) systems. SH coefficients are precomputed and stored at object vertices, and combined interactively with SH lighting coefficients to enable effects like soft shadows, interreflections, and glossy reflection. However, the most common PRT techniques assume distant, low-frequency environment lighting, for which SH lighting coefficients can easily be computed once per frame. There is currently limited support for near-field illumination and area lights, since it is non-trivial to compute the SH coefficients for an area light, and the incident lighting (SH coefficients) varies over the object geometry. We present an efficient closed-form solution for projection of uniform polygonal area lights to spherical harmonic coefficients of arbitrary order, enabling easy adoption of accurate area lighting in PRT systems, with no modifications required to the core PRT framework. Our method only requires computing zonal harmonic (ZH) coefficients, for which we introduce a novel recurrence relation. In practice, ZH coefficients are built up iteratively, with computation linear in the desired SH order. General SH coefficients can then be obtained by the recently developed sparse zonal harmonic rotation method.
Simulating combinations of depth-of-field and motion blur is an important factor to cinematic quality in synthetic images but can take long to compute. Splatting the point-spread function (PSF) of every pixel is general and provides high quality, but requires prohibitive compute time. We accelerate this in two steps: In a pre-process we optimize for sparse representations of the Laplacian of all possible PSFs that we call spreadlets. At runtime, spreadlets can be splat efficiently to the Laplacian of an image. Integrating this image produces the final result. Our approach scales faithfully to strong motion and large out-of-focus areas and compares favorably in speed and quality with off-line and interactive approaches. It is applicable to both synthesizing from pinhole as well as reconstructing from stochastic images, with or without layering.
We introduce a biomimetic framework for human sensorimotor control, which features a biomechanically simulated human musculoskeletal model actuated by numerous muscles, with eyes whose retinas have nonuniformly distributed photoreceptors. The virtual human's sensorimotor control system comprises 20 trained deep neural networks (DNNs), half constituting the neuromuscular motor subsystem, while the other half compose the visual sensory subsystem. Directly from the photoreceptor responses, 2 vision DNNs drive eye and head movements, while 8 vision DNNs extract visual information required to direct arm and leg actions. Ten DNNs achieve neuromuscular control---2 DNNs control the 216 neck muscles that actuate the cervicocephalic musculoskeletal complex to produce natural head movements, and 2 DNNs control each limb; i.e., the 29 muscles of each arm and 39 muscles of each leg. By synthesizing its own training data, our virtual human automatically learns efficient, online, active visuomotor control of its eyes, head, and limbs in order to perform nontrivial tasks involving the foveation and visual pursuit of target objects coupled with visually-guided limb-reaching actions to intercept the moving targets, as well as to carry out drawing and writing tasks.
We propose a framework for simulation and control of the human musculoskeletal system, capable of reproducing realistic animations of dexterous activities with high-level coordination. We present the first controllable system in this class that incorporates volumetric muscle actuators, tightly coupled with the motion controller, in enhancement of line-segment approximations that prior art is overwhelmingly restricted to. The theoretical framework put forth by our methodology computes all the necessary Jacobians for control, even with the drastically increased dimensionality of the state descriptors associated with three-dimensional, volumetric muscles. The direct coupling of volumetric actuators in the controller allows us to model muscular deficiencies that manifest in shape and geometry, in ways that cannot be captured with line-segment approximations. Our controller is coupled with a trajectory optimization framework, and its efficacy is demonstrated in complex motion tasks such as juggling, and weightlifting sequences with variable anatomic parameters and interaction constraints.
Simulating how the human body deforms in contact with external objects, tight clothing, or other humans is of central importance to many fields. Despite great advances in numerical methods, the material properties required to accurately simulate the body of a real human have been sorely lacking. Here we show that mechanical properties of the human body can be directly measured using a novel hand-held device. We describe a complete pipeline for measurement, modeling, parameter estimation, and simulation using the finite element method. We introduce a phenomenological model (the sliding thick skin model) that is effective for both simulation and parameter estimation. Our data also provide new insights into how the human body actually behaves. The methods described here can be used to create personalized models of an individual human or of a population. Consequently, our methods have many potential applications in computer animation, product design, e-commerce, and medicine.
In computer graphics the motion of the jaw is commonly modelled by up-down and left-right rotation around a fixed pivot plus a forward-backward translation, yielding a three dimensional rig that is highly suited for intuitive artistic control. The anatomical motion of the jaw is, however, much more complex since the joints that connect the jaw to the skull exhibit both rotational and translational components. In reality the jaw does not move in a three dimensional subspace but on a constrained manifold in six dimensions. We analyze this manifold in the context of computer animation and show how the manifold can be parameterized with three degrees of freedom, providing a novel jaw rig that preserves the intuitive control while providing more accurate jaw positioning. The chosen parameterization furthermore places anatomically correct limits on the motion, preventing the rig from entering physiologically infeasible poses. Our new jaw rig is empirically designed from accurate capture data, and we provide a simple method to retarget the rig to new characters, both human and fantasy.
We propose a novel tetrahedral meshing technique that is unconditionally robust, requires no user interaction, and can directly convert a triangle soup into an analysis-ready volumetric mesh. The approach is based on several core principles: (1) initial mesh construction based on a fully robust, yet efficient, filtered exact computation (2) explicit (automatic or user-defined) tolerancing of the mesh relative to the surface input (3) iterative mesh improvement with guarantees, at every step, of the output validity. The quality of the resulting mesh is a direct function of the target mesh size and allowed tolerance: increasing allowed deviation from the initial mesh and decreasing the target edge length both lead to higher mesh quality.
Our approach enables "black-box" analysis, i.e. it allows to automatically solve partial differential equations on geometrical models available in the wild, offering a robustness and reliability comparable to, e.g., image processing algorithms, opening the door to automatic, large scale processing of real-world geometric data.
Meshes with curvilinear elements hold the appealing promise of enhanced geometric flexibility and higher-order numerical accuracy compared to their commonly-used straight-edge counterparts. However, the generation of curved meshes remains a computationally expensive endeavor with current meshing approaches: high-order parametric elements are notoriously difficult to conform to a given boundary geometry, and enforcing a smooth and non-degenerate Jacobian everywhere brings additional numerical difficulties to the meshing of complex domains. In this paper, we propose an extension of Optimal Delaunay Triangulations (ODT) to curved and graded isotropic meshes. By exploiting a continuum mechanics interpretation of ODT instead of the usual approximation theoretical foundations, we formulate a very robust geometry and topology optimization of Bézier meshes based on a new simple functional promoting isotropic and uniform Jacobians throughout the domain. We demonstrate that our resulting curved meshes can adapt to complex domains with high precision even for a small count of elements thanks to the added flexibility afforded by more control points and higher order basis functions.
This article presents a new method to compute a self-intersection free high-dimensional Euclidean embedding (SIFHDE2) for surfaces and volumes equipped with an arbitrary Riemannian metric. It is already known that given a high-dimensional (high-d) embedding, one can easily compute an anisotropic Voronoi diagram by back-mapping it to 3D space. We show here how to solve the inverse problem, i.e., given an input metric, compute a smooth intersection-free high-d embedding of the input such that the pullback metric of the embedding matches the input metric. Our numerical solution mechanism matches the deformation gradient of the 3D → higher-d mapping with the given Riemannian metric. We demonstrate the applicability of our method, by using it to construct anisotropic Restricted Voronoi Diagram (RVD) and anisotropic meshing, that are otherwise extremely difficult to compute. In SIFHDE2-space constructed by our algorithm, difficult 3D anisotropic computations are replaced with simple Euclidean computations, resulting in an isotropic RVD and its dual mesh on this high-d embedding. Results are compared with the state-of-the-art in anisotropic surface and volume meshings using several examples and evaluation metrics.
We study the isometric immersion problem for orientable surface triangle meshes endowed with only a metric: given the combinatorics of the mesh together with edge lengths, approximate an isometric immersion into R3. To address this challenge we develop a discrete theory for surface immersions into R3. It precisely characterizes a discrete immersion, up to subdivision and small perturbations. In particular our discrete theory correctly represents the topology of the space of immersions, i.e., the regular homotopy classes which represent its connected components. Our approach relies on unit quaternions to represent triangle orientations and to encode, in their parallel transport, the topology of the immersion. In unison with this theory we develop a computational apparatus based on a variational principle. Minimizing a non-linear Dirichlet energy optimally finds extrinsic geometry for the given intrinsic geometry and ensures low metric approximation error.
We demonstrate our algorithm with a number of applications from mathematical visualization and art directed isometric shape deformation, which mimics the behavior of thin materials with high membrane stiffness.
Shallow depth-of-field is commonly used by photographers to isolate a subject from a distracting background. However, standard cell phone cameras cannot produce such images optically, as their short focal lengths and small apertures capture nearly all-in-focus images. We present a system to computationally synthesize shallow depth-of-field images with a single mobile camera and a single button press. If the image is of a person, we use a person segmentation network to separate the person and their accessories from the background. If available, we also use dense dual-pixel auto-focus hardware, effectively a 2-sample light field with an approximately 1 millimeter baseline, to compute a dense depth map. These two signals are combined and used to render a defocused image. Our system can process a 5.4 megapixel image in 4 seconds on a mobile phone, is fully automatic, and is robust enough to be used by non-experts. The modular nature of our system allows it to degrade naturally in the absence of a dual-pixel sensor or a human subject.
The view synthesis problem---generating novel views of a scene from known imagery---has garnered recent attention due in part to compelling applications in virtual and augmented reality. In this paper, we explore an intriguing scenario for view synthesis: extrapolating views from imagery captured by narrow-baseline stereo cameras, including VR cameras and now-widespread dual-lens camera phones. We call this problem stereo magnification, and propose a learning framework that leverages a new layered representation that we call multiplane images (MPIs). Our method also uses a massive new data source for learning view extrapolation: online videos on YouTube. Using data mined from such videos, we train a deep network that predicts an MPI from an input stereo image pair. This inferred MPI can then be used to synthesize a range of novel views of the scene, including views that extrapolate significantly beyond the input baseline. We show that our method compares favorably with several recent view synthesis methods, and demonstrate applications in magnifying narrow-baseline stereo images.
Immersive computer-generated environments (aka virtual reality, VR) are limited by the physical space around them, e.g., enabling natural walking in VR is only possible by perceptually-inspired locomotion techniques such as redirected walking (RDW). We introduce a completely new approach to imperceptible position and orientation redirection that takes advantage of the fact that even healthy humans are functionally blind for circa ten percent of the time under normal circumstances due to motor processes preventing light from reaching the retina (such as eye blinks) or perceptual processes suppressing degraded visual information (such as blink-induced suppression). During such periods of missing visual input, change blindness occurs, which denotes the inability to perceive a visual change such as the motion of an object or self-motion of the observer. We show that this phenomenon can be exploited in VR by synchronizing the computer graphics rendering system with the human visual processes for imperceptible camera movements, in particular to implement position and orientation redirection. We analyzed human sensitivity to such visual changes with detection thresholds, which revealed that commercial off-the-shelf eye trackers and head-mounted displays suffice to translate a user by circa 4 -- 9 cm and rotate the user by circa 2 -- 5 degrees in any direction, which could be accumulated each time the user blinks. Moreover, we show the potential for RDW, whose performance could be improved by approximately 50% when using our technique.
Redirected walking techniques can enhance the immersion and visual-vestibular comfort of virtual reality (VR) navigation, but are often limited by the size, shape, and content of the physical environments.
We propose a redirected walking technique that can apply to small physical environments with static or dynamic obstacles. Via a head- and eye-tracking VR headset, our method detects saccadic suppression and redirects the users during the resulting temporary blindness. Our dynamic path planning runs in real-time on a GPU, and thus can avoid static and dynamic obstacles, including walls, furniture, and other VR users sharing the same physical space. To further enhance saccadic redirection, we propose subtle gaze direction methods tailored for VR perception.
We demonstrate that saccades can significantly increase the rotation gains during redirection without introducing visual distortions or simulator sickness. This allows our method to apply to large open virtual spaces and small physical environments for room-scale VR. We evaluate our system via numerical simulations and real user studies.
We introduce a deep appearance model for rendering the human face. Inspired by Active Appearance Models, we develop a data-driven rendering pipeline that learns a joint representation of facial geometry and appearance from a multiview capture setup. Vertex positions and view-specific textures are modeled using a deep variational autoencoder that captures complex nonlinear effects while producing a smooth and compact latent representation. View-specific texture enables the modeling of view-dependent effects such as specularity. In addition, it can also correct for imperfect geometry stemming from biased or low resolution estimates. This is a significant departure from the traditional graphics pipeline, which requires highly accurate geometry as well as all elements of the shading model to achieve realism through physically-inspired light transport. Acquiring such a high level of accuracy is difficult in practice, especially for complex and intricate parts of the face, such as eyelashes and the oral cavity. These are handled naturally by our approach, which does not rely on precise estimates of geometry. Instead, the shading model accommodates deficiencies in geometry though the flexibility afforded by the neural network employed. At inference time, we condition the decoding network on the viewpoint of the camera in order to generate the appropriate texture for rendering. The resulting system can be implemented simply using existing rendering engines through dynamic textures with flat lighting. This representation, together with a novel unsupervised technique for mapping images to facial states, results in a system that is naturally suited to real-time interactive settings such as Virtual Reality (VR).
Correspondence between images is a fundamental problem in computer vision, with a variety of graphics applications. This paper presents a novel method for sparse cross-domain correspondence. Our method is designed for pairs of images where the main objects of interest may belong to different semantic categories and differ drastically in shape and appearance, yet still contain semantically related or geometrically similar parts. Our approach operates on hierarchies of deep features, extracted from the input images by a pre-trained CNN. Specifically, starting from the coarsest layer in both hierarchies, we search for Neural Best Buddies (NBB): pairs of neurons that are mutual nearest neighbors. The key idea is then to percolate NBBs through the hierarchy, while narrowing down the search regions at each level and retaining only NBBs with significant activations. Furthermore, in order to overcome differences in appearance, each pair of search regions is transformed into a common appearance.
We evaluate our method via a user study, in addition to comparisons with alternative correspondence approaches. The usefulness of our method is demonstrated using a variety of graphics applications, including cross-domain image alignment, creation of hybrid images, automatic image morphing, and more.
We present a convolutional neural network based approach for indoor scene synthesis. By representing 3D scenes with a semantically-enriched image-based representation based on orthographic top-down views, we learn convolutional object placement priors from the entire context of a room. Our approach iteratively generates rooms from scratch, given only the room architecture as input. Through a series of perceptual studies we compare the plausibility of scenes generated using our method against baselines for object selection and object arrangement, as well as scenes modeled by people. We find that our method generates scenes that are preferred over the baselines, and in some cases are equally preferred to human-created scenes.
This paper presents Point Convolutional Neural Networks (PCNN): a novel framework for applying convolutional neural networks to point clouds. The framework consists of two operators: extension and restriction, mapping point cloud functions to volumetric functions and vise-versa. A point cloud convolution is defined by pull-back of the Euclidean volumetric convolution via an extension-restriction mechanism.
The point cloud convolution is computationally efficient, invariant to the order of points in the point cloud, robust to different samplings and varying densities, and translation invariant, that is the same convolution kernel is used at all points. PCNN generalizes image CNNs and allows readily adapting their architectures to the point cloud setting.
Evaluation of PCNN on three central point cloud learning benchmarks convincingly outperform competing point cloud learning methods, and the vast majority of methods working with more informative shape representations such as surfaces and/or normals.
Accurate representation of soft transitions between image regions is essential for high-quality image editing and compositing. Current techniques for generating such representations depend heavily on interaction by a skilled visual artist, as creating such accurate object selections is a tedious task. In this work, we introduce semantic soft segments, a set of layers that correspond to semantically meaningful regions in an image with accurate soft transitions between different objects. We approach this problem from a spectral segmentation angle and propose a graph structure that embeds texture and color features from the image as well as higher-level semantic information generated by a neural network. The soft segments are generated via eigendecomposition of the carefully constructed Laplacian matrix fully automatically. We demonstrate that otherwise complex image editing tasks can be done with little effort using semantic soft segments.
We derive a novel framework for the efficient analysis and computation of light transport within layered materials. Our derivation consists in two steps. First, we decompose light transport into a set of atomic operators that act on its directional statistics. Specifically, our operators consist of reflection, refraction, scattering, and absorption, whose combinations are sufficient to describe the statistics of light scattering multiple times within layered structures. We show that the first three directional moments (energy, mean and variance) already provide an accurate summary. Second, we extend the adding-doubling method to support arbitrary combinations of such operators efficiently. During shading, we map the directional moments to BSDF lobes. We validate that the resulting BSDF closely matches the ground truth in a lightweight and efficient form. Unlike previous methods we support an arbitrary number of textured layers, and demonstrate a practical and accurate rendering of layered materials with both an offline and real-time implementation that are free from per-material precomputation.
We present a versatile computational framework for modeling the reflective and transmissive properties of arbitrarily layered anisotropic material structures. Given a set of input layers, our model synthesizes an effective BSDF of the entire structure, which accounts for all orders of internal scattering and is efficient to sample and evaluate in modern rendering systems.
Our technique builds on the insight that reflectance data is sparse when expanded into a suitable frequency-space representation, and that this property extends to the class of anisotropic materials. This sparsity enables an efficient matrix calculus that admits the entire space of BSDFs and considerably expands the scope of prior work on layered material modeling. We show how both measured data and the popular class of microfacet models can be expressed in our representation, and how the presence of anisotropy leads to a weak coupling between Fourier orders in frequency space.
In addition to additive composition, our models supports subtractive composition, a fascinating new operation that reconstructs the BSDF of a material that can only be observed indirectly through another layer with known reflectance properties. The operation produces a new BSDF of the desired layer as if measured in isolation. Subtractive composition can be interpreted as a type of deconvolution that removes both internal scattering and blurring due to transmission through the known layer.
We experimentally demonstrate the accuracy and scope of our model and validate both additive and subtractive composition using measurements of real-world layered materials. Both implementation and data will be released to ensure full reproducibility of all of our results.1
Simulation of light reflection from specular surfaces is a core problem of computer graphics. Existing solutions either make the approximation of providing only a large-area average solution in terms of a fixed BRDF (ignoring spatial detail), or are specialized for specific microgeometry (e.g. 1D scratches), or are based only on geometric optics (which is an approximation to more accurate wave optics). We design the first rendering algorithm based on a wave optics model that is also able to compute spatially-varying specular highlights with high-resolution detail on general surface microgeometry. We compute a wave optics reflection integral over the coherence area; our solution is based on approximating the phase-delay grating representation of a micron-resolution surface heightfield using Gabor kernels. We found that the appearance difference between the geometric and wave solution is more dramatic when spatial detail is taken into account. The visualizations of the corresponding BRDF lobes differ significantly. Moreover, the wave optics solution varies as a function of wavelength, predicting noticeable color effects in the highlights. Our results show both single-wavelength and spectral solution to reflection from common everyday objects, such as brushed, scratched and bumpy metals.
We present a learning-based system for rapid mass-scale material synthesis that is useful for novice and expert users alike. The user preferences are learned via Gaussian Process Regression and can be easily sampled for new recommendations. Typically, each recommendation takes 40-60 seconds to render with global illumination, which makes this process impracticable for real-world workflows. Our neural network eliminates this bottleneck by providing high-quality image predictions in real time, after which it is possible to pick the desired materials from a gallery and assign them to a scene in an intuitive manner. Workflow timings against Disney's "principled" shader reveal that our system scales well with the number of sought materials, thus empowering even novice users to generate hundreds of high-quality material models without any expertise in material modeling. Similarly, expert users experience a significant decrease in the total modeling time when populating a scene with materials. Furthermore, our proposed solution also offers controllable recommendations and a novel latent space variant generation step to enable the real-time fine-tuning of materials without requiring any domain expertise.
Developable surfaces are those that can be made by smoothly bending flat pieces without stretching or shearing. We introduce a definition of developability for triangle meshes which exactly captures two key properties of smooth developable surfaces, namely flattenability and presence of straight ruling lines. This definition provides a starting point for algorithms in developable surface modeling---we consider a variational approach that drives a given mesh toward developable pieces separated by regular seam curves. Computation amounts to gradient descent on an energy with support in the vertex star, without the need to explicitly cluster patches or identify seams. We briefly explore applications to developable design and manufacturing.
Fabrication from developable parts is the basis for arts such as papercraft and needlework, as well as modern architecture and CAD in general, and it has inspired much research. We observe that the assembly of complex 3D shapes created by existing methods often requires first fabricating many small parts and then carefully following instructions to assemble them together. Despite its significance, this error prone and tedious process is generally neglected in the discussion. We present the concept of zippables - single, two dimensional, branching, ribbon-like pieces of fabric that can be quickly zipped up without any instructions to form 3D objects. Our inspiration comes from the so-called zipit bags [zipit 2017], which are made of a single, long ribbon with a zipper around its boundary. In order to "assemble" the bag, one simply needs to zip up the ribbon. Our method operates in the same fashion, but it can be used to approximate a wide variety of shapes. Given a 3D model, our algorithm produces plans for a single 2D shape that can be laser cut in few parts from fabric or paper. A zipper can then be attached along the boundary by sewing, or by gluing using a custom-built fastening rig. We show physical and virtual results that demonstrate the capabilities of our method and the ease with which shapes can be assembled.
We propose a novel projection scheme that corrects energy fluctuations in simulations of deformable objects, thereby removing unwanted numerical dissipation and numerical "explosions". The key idea of our method is to first take a step using a conventional integrator, then project the result back to the constant energy-momentum manifold. We implement this strategy using fast projection, which only adds a small amount of overhead to existing physics-based solvers. We test our method with several implicit integration rules and demonstrate its benefits when used in conjunction with Position Based Dynamics and Projective Dynamics. When added to a dissipative integrator such as backward Euler, our method corrects the artificial damping and thus produces more vivid motion. Our projection scheme also effectively prevents instabilities that can arise due to approximate solves or large time steps. Our method is fast, stable, and easy to implement---traits that make it well-suited for real-time physics applications such as games or training simulators.
We present a method for the real-time simulation of deformable objects that combines the robustness, generality, and high performance of Projective Dynamics with the efficiency and scalability offered by model reduction techniques. The method decouples the cost for time integration from the mesh resolution and can simulate large meshes in real-time. The proposed hyper-reduction of Projective Dynamics combines a novel fast approximation method for constraint projections and a scalable construction of sparse subspace bases. The resulting system achieves real-time rates for large sub-spaces enabling rich dynamics and can resolve general user interactions, collision constraints, external forces and changes to the materials. The construction of the hyper-reduced system does not require user-interaction and refrains from using training data or modal analysis, which results in a fast preprocessing stage.
We introduce Dynamic Kelvinlets, a new analytical technique for real-time physically based animation of virtual elastic materials. Our formulation is based on the dynamic response to time-varying force distributions applied to an infinite elastic medium. The resulting displacements provide the plausibility of volumetric elasticity, the dynamics of compressive and shear waves, and the interactivity of closed-form expressions. Our approach builds upon the work of de Goes and James [2017] by presenting an extension of the regularized Kelvinlet solutions from elastostatics to the elastodynamic regime. To finely control our elastic deformations, we also describe the construction of compound solutions that resolve pointwise and keyframe constraints. We demonstrate the versatility and efficiency of our method with a series of examples in a production grade implementation.
Gradient-domain rendering can improve the convergence of surface-based light transport by exploiting smoothness in image space. Scenes with participating media exhibit similar smoothness and could potentially benefit from gradient-domain techniques. We introduce the first gradient-domain formulation of image synthesis with homogeneous participating media, including four novel and efficient gradient-domain volumetric density estimation algorithms. We show that naïve extensions of gradient domain path-space and density estimation methods to volumetric media, while functional, can result in inefficient estimators. Focussing on point-, beam- and plane-based gradient-domain estimators, we introduce a novel shift mapping that eliminates redundancies in the naïve formulations using spatial relaxation within the volume. We show that gradient-domain volumetric rendering improve convergence compared to primal domain state-of-the-art, across a suite of scenes. Our formulation and algorithms support progressive estimation and are easy to incorporate atop existing renderers.
We introduce a non-exponential radiative framework that takes into account the local spatial correlation of scattering particles in a medium. Most previous works in graphics have ignored this, assuming uncorrelated media with a uniform, random local distribution of particles. However, positive and negative correlation lead to slower- and faster-than-exponential attenuation respectively, which cannot be predicted by the Beer-Lambert law. As our results show, this has a major effect on extinction, and thus appearance. From recent advances in neutron transport, we first introduce our Extended Generalized Boltzmann Equation, and develop a general framework for light transport in correlated media. We lift the limitations of the original formulation, including an analysis of the boundary conditions, and present a model suitable for computer graphics, based on optical properties of the media and statistical distributions of scatterers. In addition, we present an analytic expression for transmittance in the case of positive correlation, and show how to incorporate it efficiently into a Monte Carlo renderer. We show results with a wide range of both positive and negative correlation, and demonstrate the differences compared to classic light transport.
Generating realistic fluid simulations remains computationally expensive, and animators can expend enormous effort trying to achieve a desired motion. To reduce such costs, several methods have been developed in which high-resolution turbulence is synthesized as a post process. Since global motion can then be obtained using a fast, low-resolution simulation, less effort is needed to create a realistic animation with the desired behavior. While much research has focused on accelerating the low-resolution simulation, the problem controlling the behavior of the turbulent, high-resolution motion has received little attention. In this paper, we show that style transfer methods from image editing can be adapted to transfer the turbulent style of an existing fluid simulation onto a new one. We do this by extending example-based image synthesis methods to handle velocity fields using a combination of patch-based and optimization-based texture synthesis. This approach allows us to take into account the incompressibility condition, which we have found to be a important factor during synthesis. Using our method, a user can easily and intuitively create high-resolution fluid animations that have a desired turbulent motion.
Advection-projection methods for fluid animation are widely appreciated for their stability and efficiency. However, the projection step dissipates energy from the system, leading to artificial viscosity and suppression of small-scale details. We propose an alternative approach for detail-preserving fluid animation that is surprisingly simple and effective. We replace the energy-dissipating projection operator applied at the end of a simulation step by an energy-preserving reflection operator applied at mid-step. We show that doing so leads to two orders of magnitude reduction in energy loss, which in turn yields vastly improved detail-preservation. We evaluate our reflection solver on a set of 2D and 3D numerical experiments and show that it compares favorably to state-of-the-art methods. Finally, our method integrates seamlessly with existing projection-advection solvers and requires very little additional implementation.
We present a novel extended partitioned method for two-way solid-fluid coupling, where the fluid and solid solvers are treated as black boxes with limited exposed interfaces, facilitating modularity and code reusability. Our method achieves improved stability and extended range of applicability over standard partitioned approaches through three techniques. First, we couple the black-box solvers through a small, reduced-order monolithic system, which is constructed on the fly from input/output pairs generated by the solid and fluid solvers. Second, we use a conservative, impulse-based interaction term to couple the solid and fluid rather than typical pressure-based forces. We show that both of these techniques significantly improve stability and reduce the number of iterations needed for convergence. Finally, we propose a novel boundary pressure projection method that allows for the partitioned simulation of a fully enclosed fluid coupled to a dynamic solid, a scenario that has been problematic for partitioned methods. We demonstrate the benefits of our extended partitioned method by coupling Eulerian fluid solvers for smoke and water to Lagrangian solid solvers for volumetric and thin deformable and rigid objects in a variety of challenging scenarios. We further demonstrate our method by coupling a Lagrangian SPH fluid solver to a rigid body solver.
The Laplacian Eigenfunction method for fluid simulation, which we refer to as Eigenfluids, introduced an elegant new way to capture intricate fluid flows with near-zero viscosity. However, the approach does not scale well, as the memory cost grows prohibitively with the number of eigenfunctions. The method also lacks generality, because the dynamics are constrained to a closed box with Dirichlet boundaries, while open, Neumann boundaries are also needed in most practical scenarios. To address these limitations, we present a set of analytic eigenfunctions that supports uniform Neumann and Dirichlet conditions along each domain boundary, and show that by carefully applying the discrete sine and cosine transforms, the storage costs of the eigenfunctions can be made completely negligible. The resulting algorithm is both faster and more memory-efficient than previous approaches, and able to achieve lower viscosities than similar pseudo-spectral methods. We are able to surpass the scalability of the original Laplacian Eigenfunction approach by over two orders of magnitude when simulating rectangular domains. Finally, we show that the formulation allows forward scattering to be directed in a way that is not possible with any other method.
Capturing aerial videos with a quadrotor-mounted camera is a challenging creative task, as it requires the simultaneous control of the quadrotor's motion and the mounted camera's orientation. Letting the drone follow a pre-planned trajectory is a much more appealing option, and recent research has proposed a number of tools designed to automate the generation of feasible camera motion plans; however, these tools typically require the user to specify and edit the camera path, for example by providing a complete and ordered sequence of key viewpoints.
In this paper, we propose a higher level tool designed to enable even novice users to easily capture compelling aerial videos of large-scale outdoor scenes. Using a coarse 2.5D model of a scene, the user is only expected to specify starting and ending viewpoints and designate a set of landmarks, with or without a particular order. Our system automatically generates a diverse set of candidate local camera moves for observing each landmark, which are collision-free, smooth, and adapted to the shape of the landmark. These moves are guided by a landmark-centric view quality field, which combines visual interest and frame composition. An optimal global camera trajectory is then constructed that chains together a sequence of local camera moves, by choosing one move for each landmark and connecting them with suitable transition trajectories. This task is formulated and solved as an instance of the Set Traveling Salesman Problem.
We present a data-driven technique to instantly predict how fluid flows around various three-dimensional objects. Such simulation is useful for computational fabrication and engineering, but is usually computationally expensive since it requires solving the Navier-Stokes equation for many time steps. To accelerate the process, we propose a machine learning framework which predicts aerodynamic forces and velocity and pressure fields given a three-dimensional shape input. Handling detailed free-form three-dimensional shapes in a data-driven framework is challenging because machine learning approaches usually require a consistent parametrization of input and output. We present a novel PolyCube maps-based parametrization that can be computed for three-dimensional shapes at interactive rates. This allows us to efficiently learn the nonlinear response of the flow using a Gaussian process regression. We demonstrate the effectiveness of our approach for the interactive design and optimization of a car body.
In this paper we first contribute a large scale online study (N ≈ 400) to better understand aesthetic perception of aerial video. The results indicate that it is paramount to optimize smoothness of trajectories across all keyframes. However, for experts timing control remains an essential tool. Satisfying this dual goal is technically challenging because it requires giving up desirable properties in the optimization formulation. Second, informed by this study we propose a method that optimizes positional and temporal reference fit jointly. This allows to generate globally smooth trajectories, while retaining user control over reference timings. The formulation is posed as a variable, infinite horizon, contour-following algorithm. Finally, a comparative lab study indicates that our optimization scheme outperforms the state-of-the-art in terms of perceived usability and preference of resulting videos. For novices our method produces smoother and better looking results and also experts benefit from generated timings.
We propose a new iterative algorithm for computing smooth cross fields on triangle meshes that is simple, easily parallelizable on the GPU, and finds solutions with lower energy and fewer cone singularities than state-of-the-art methods. Our approach is based on a formal equivalence, which we prove, between two formulations of the optimization problem. This equivalence allows us to eliminate the real variables and design an efficient grid search algorithm for the cone singularities. We leverage a recent graph-theoretical approximation of the resistance distance matrix of the triangle mesh to speed up the computation and enable a trade-off between the computation time and the smoothness of the output.
We introduce an approach to quadrilateral meshing of arbitrary triangulated surfaces that combines the theoretical guarantees of Morse-based approaches with the practical advantages of parameterization methods. We first construct, through an eigensolver followed by a few Gauss-Newton iterations, a periodic four-dimensional vector field that aligns with a user-provided frame field and/or a set of features over the input mesh. A field-aligned parameterization is then greedily computed along a spanning tree based on the Dirichlet energy of the optimal periodic vector field, from which quad elements are efficiently extracted over most of the surface. The few regions not yet covered by elements are then upsampled and the first component of the periodic vector field is used as a Morse function to extract the remaining quadrangles. This hybrid parameterization- and Morse-based quad meshing method is not only fast (the parameterization is greedily constructed, and the Morse function only needs to be upsampled in the few uncovered patches), but is guaranteed to provide a feature-aligned quad mesh with non-degenerate cells that closely matches the input frame field over an arbitrary surface. We show that our approach is much faster than Morse-based techniques since it does not require a densely tessellated input mesh, and is significantly more robust than parameterization-based techniques on models with complex features.
Despite high practical demand, algorithmic hexahedral meshing with guarantees on robustness and quality remains unsolved. A promising direction follows the idea of integer-grid maps, which pull back the Cartesian hexahedral grid formed by integer isoplanes from a parametric domain to a surface-conforming hexahedral mesh of the input object. Since directly optimizing for a high-quality integer-grid map is mathematically challenging, the construction is usually split into two steps: (1) generation of a surface-aligned octahedral field and (2) generation of an integer-grid map that best aligns to the octahedral field. The main robustness issue stems from the fact that smooth octahedral fields frequently exhibit singularity graphs that are not appropriate for hexahedral meshing and induce heavily degenerate integer-grid maps. The first contribution of this work is an enumeration of all local configurations that exist in hex meshes with bounded edge valence, and a generalization of the Hopf-Poincaré formula to octahedral fields, leading to necessary local and global conditions for the hex-meshability of an octahedral field in terms of its singularity graph. The second contribution is a novel algorithm to generate octahedral fields with prescribed hex-meshable singularity graphs, which requires the solution of a large nonlinear mixed-integer algebraic system. This algorithm is an important step toward robust automatic hexahedral meshing since it enables the generation of a hex-meshable octahedral field.
The current state of the art in real-time two-dimensional water wave simulation requires developers to choose between efficient Fourier-based methods, which lack interactions with moving obstacles, and finite-difference or finite element methods, which handle environmental interactions but are significantly more expensive. This paper attempts to bridge this long-standing gap between complexity and performance, by proposing a new wave simulation method that can faithfully simulate wave interactions with moving obstacles in real time while simultaneously preserving minute details and accommodating very large simulation domains.
Previous methods for simulating 2D water waves directly compute the change in height of the water surface, a strategy which imposes limitations based on the CFL condition (fast moving waves require small time steps) and Nyquist's limit (small wave details require closely-spaced simulation variables). This paper proposes a novel wavelet transformation that discretizes the liquid motion in terms of amplitude-like functions that vary over space, frequency, and direction, effectively generalizing Fourier-based methods to handle local interactions. Because these new variables change much more slowly over space than the original water height function, our change of variables drastically reduces the limitations of the CFL condition and Nyquist limit, allowing us to simulate highly detailed water waves at very large visual resolutions. Our discretization is amenable to fast summation and easy to parallelize. We also present basic extensions like pre-computed wave paths and two-way solid fluid coupling. Finally, we argue that our discretization provides a convenient set of variables for artistic manipulation, which we illustrate with a novel wave-painting interface.
We propose a temporally coherent generative model addressing the super-resolution problem for fluid flows. Our work represents a first approach to synthesize four-dimensional physics fields with neural networks. Based on a conditional generative adversarial network that is designed for the inference of three-dimensional volumetric data, our model generates consistent and detailed results by using a novel temporal discriminator, in addition to the commonly used spatial one. Our experiments show that the generator is able to infer more realistic high-resolution details by using additional physical quantities, such as low-resolution velocities or vorticities. Besides improvements in the training process and in the generated outputs, these inputs offer means for artistic control as well. We additionally employ a physics-aware data augmentation step, which is crucial to avoid overfitting and to reduce memory requirements. In this way, our network learns to generate adverted quantities with highly detailed, realistic, and temporally coherent features. Our method works instantaneously, using only a single time-step of low-resolution fluid data. We demonstrate the abilities of our method using a variety of complex inputs and applications in two and three dimensions.
We present a learning-based method to control a coupled 2D system involving both fluid and rigid bodies. Our approach is used to modify the fluid/rigid simulator's behavior by applying control forces only at the simulation domain boundaries. The rest of the domain, corresponding to the interior, is governed by the Navier-Stokes equation for fluids and Newton-Euler's equation for the rigid bodies. We represent our controller using a general neural-net, which is trained using deep reinforcement learning. Our formulation decomposes a control task into two stages: a precomputation training stage and an online generation stage. We utilize various fluid properties, e.g., the liquid's velocity field or the smoke's density field, to enhance the controller's performance. We set up our evaluation benchmark by letting controller drive fluid jets move on the domain boundary and allowing them to shoot fluids towards a rigid body to accomplish a set of challenging 2D tasks such as keeping a rigid body balanced, playing a two-player ping-pong game, and driving a rigid body to sequentially hit specified points on the wall. In practice, our approach can generate physically plausible animations.
When creating line drawings, artists frequently depict intended curves using multiple, tightly clustered, or overdrawn, strokes. Given such sketches, human observers can readily envision these intended, aggregate, curves, and mentally assemble the artist's envisioned 2D imagery. Algorithmic stroke consolidation---replacement of overdrawn stroke clusters by corresponding aggregate curves---can benefit a range of sketch processing and sketch-based modeling applications which are designed to operate on consolidated, intended curves. We propose StrokeAggregator, a novel stroke consolidation method that significantly improves on the state of the art, and produces aggregate curve drawings validated to be consistent with viewer expectations. Our framework clusters strokes into groups that jointly define intended aggregate curves by leveraging principles derived from human perception research and observation of artistic practices. We employ these principles within a coarse-to-fine clustering method that starts with an initial clustering based on pairwise stroke compatibility analysis, and then refines it by analyzing interactions both within and in-between clusters of strokes. We facilitate this analysis by computing a common 1D parameterization for groups of strokes via common aggregate curve fitting. We demonstrate our method on a large range of line drawings, and validate its ability to generate consolidated drawings that are consistent with viewer perception via qualitative user evaluation, and comparisons to manually consolidated drawings and algorithmic alternatives.
We present an interactive approach for inking, which is the process of turning a pencil rough sketch into a clean line drawing. The approach, which we call the Smart Inker, consists of several "smart" tools that intuitively react to user input, while guided by the input rough sketch, to efficiently and naturally connect lines, erase shading, and fine-tune the line drawing output. Our approach is data-driven: the tools are based on fully convolutional networks, which we train to exploit both the user edits and inaccurate rough sketch to produce accurate line drawings, allowing high-performance interactive editing in real-time on a variety of challenging rough sketch images. For the training of the tools, we developed two key techniques: one is the creation of training data by simulation of vague and quick user edits; the other is a line normalization based on learning from vector data. These techniques, in combination with our sketch-specific data augmentation, allow us to train the tools on heterogeneous data without actual user interaction. We validate our approach with an in-depth user study, comparing it with professional illustration software, and show that our approach is able to reduce inking time by a factor of 1.8X, while improving the results of amateur users.
We present a novel system for sketch-based face image editing, enabling users to edit images intuitively by sketching a few strokes on a region of interest. Our interface features tools to express a desired image manipulation by providing both geometry and color constraints as user-drawn strokes. As an alternative to the direct user input, our proposed system naturally supports a copy-paste mode, which allows users to edit a given image region by using parts of another exemplar image without the need of hand-drawn sketching at all. The proposed interface runs in real-time and facilitates an interactive and iterative workflow to quickly express the intended edits. Our system is based on a novel sketch domain and a convolutional neural network trained end-to-end to automatically learn to render image regions corresponding to the input strokes. To achieve high quality and semantically consistent results we train our neural network on two simultaneous tasks, namely image completion and image translation. To the best of our knowledge, we are the first to combine these two tasks in a unified framework for interactive image editing. Our results show that the proposed sketch domain, network architecture, and training procedure generalize well to real user input and enable high quality synthesis results without additional post-processing.
X-ray computed tomography (CT) is a valuable tool for analyzing objects with interesting internal structure or complex geometries that are not accessible with optical means. Unfortunately, tomographic reconstruction of complex shapes requires a multitude (often hundreds or thousands) of projections from different viewpoints. Such a large number of projections can only be acquired in a time-sequential fashion. This significantly limits the ability to use x-ray tomography for either objects that undergo uncontrolled shape change at the time scale of a scan, or else for analyzing dynamic phenomena, where the motion itself is under investigation.
In this work, we present a non-parametric space-time tomographic method for tackling such dynamic settings. Through a combination of a new CT image acquisition strategy, a space-time tomographic image formation model, and an alternating, multi-scale solver, we achieve a general approach that can be used to analyze a wide range of dynamic phenomena. We demonstrate our method with extensive experiments on both real and simulated data.
We present an algorithm for constructing 3D panoramas from a sequence of aligned color-and-depth image pairs. Such sequences can be conveniently captured using dual lens cell phone cameras that reconstruct depth maps from synchronized stereo image capture. Due to the small baseline and resulting triangulation error the depth maps are considerably degraded and contain low-frequency error, which prevents alignment using simple global transformations. We propose a novel optimization that jointly estimates the camera poses as well as spatially-varying adjustment maps that are applied to deform the depth maps and bring them into good alignment. When fusing the aligned images into a seamless mosaic we utilize a carefully designed data term and the high quality of our depth alignment to achieve two orders of magnitude speedup w.r.t. previous solutions that rely on discrete optimization by removing the need for label smoothness optimization. Our algorithm processes about one input image per second, resulting in an end-to-end runtime of about one minute for mid-sized panoramas. The final 3D panoramas are highly detailed and can be viewed with binocular and head motion parallax in VR.
Planar reflective surfaces such as glass and mirrors are notoriously hard to reconstruct for most current 3D scanning techniques. When treated naïvely, they introduce duplicate scene structures, effectively destroying the reconstruction altogether. Our key insight is that an easy to identify structure attached to the scanner---in our case an AprilTag---can yield reliable information about the existence and the geometry of glass and mirror surfaces in a scene. We introduce a fully automatic pipeline that allows us to reconstruct the geometry and extent of planar glass and mirror surfaces while being able to distinguish between the two. Furthermore, our system can automatically segment observations of multiple reflective surfaces in a scene based on their estimated planes and locations. In the proposed setup, minimal additional hardware is needed to create high-quality results. We demonstrate this using reconstructions of several scenes with a variety of real mirrors and glass.
Numerous techniques have been proposed for reconstructing 3D models for opaque objects in past decades. However, none of them can be directly applied to transparent objects. This paper presents a fully automatic approach for reconstructing complete 3D shapes of transparent objects. Through positioning an object on a turntable, its silhouettes and light refraction paths under different viewing directions are captured. Then, starting from an initial rough model generated from space carving, our algorithm progressively optimizes the model under three constraints: surface and refraction normal consistency, surface projection and silhouette consistency, and surface smoothness. Experimental results on both synthetic and real objects demonstrate that our method can successfully recover the complex shapes of transparent objects and faithfully reproduce their light refraction properties.
To carry out autonomous 3D scanning and online reconstruction of unknown indoor scenes, one has to find a balance between global exploration of the entire scene and local scanning of the objects within it. In this work, we propose a novel approach, which provides object-aware guidance for autoscanning, for exploring, reconstructing, and understanding an unknown scene within one navigation pass. Our approach interleaves between object analysis to identify the next best object (NBO) for global exploration, and object-aware information gain analysis to plan the next best view (NBV) for local scanning. First, an objectness-based segmentation method is introduced to extract semantic objects from the current scene surface via a multi-class graph cuts minimization. Then, an object of interest (OOI) is identified as the NBO which the robot aims to visit and scan. The robot then conducts fine scanning on the OOI with views determined by the NBV strategy. When the OOI is recognized as a full object, it can be replaced by its most similar 3D model in a shape database. The algorithm iterates until all of the objects are recognized and reconstructed in the scene. Various experiments and comparisons have shown the feasibility of our proposed approach.
Angle-preserving or conformal surface parameterization has proven to be a powerful tool across applications ranging from geometry processing, to digital manufacturing, to machine learning, yet conformal maps can still suffer from severe area distortion. Cone singularities provide a way to mitigate this distortion, but finding the best configuration of cones is notoriously difficult. This paper develops a strategy that is globally optimal in the sense that it minimizes total area distortion among all possible cone configurations (number, placement, and size) that have no more than a fixed total cone angle. A key insight is that, for the purpose of optimization, one should not work directly with curvature measures (which naturally represent cone configurations), but can instead apply Fenchel-Rockafellar duality to obtain a formulation involving only ordinary functions. The result is a convex optimization problem, which can be solved via a sequence of sparse linear systems easily built from the usual cotangent Laplacian. The method supports user-defined notions of importance, constraints on cone angles (e.g., positive, or within a given range), and sophisticated boundary conditions (e.g., convex, or polygonal). We compare our approach to previous techniques on a variety of challenging models, often achieving dramatically lower distortion, and demonstrating that global optimality leads to extreme robustness in the presence of noise or poor discretization.
Deployable structures are physical mechanisms that can easily transition between two or more geometric configurations; such structures enable industrial, scientific, and consumer applications at a wide variety of scales. This paper develops novel deployable structures that can approximate a large class of doubly-curved surfaces and are easily actuated from a flat initial state via inflation or gravitational loading. The structures are based on two-dimensional rigid mechanical linkages that implicitly encode the curvature of the target shape via a user-programmable pattern that permits locally isotropic scaling under load. We explicitly characterize the shapes that can be realized by such structures---in particular, we show that they can approximate target surfaces of positive mean curvature and bounded scale distortion relative to a given reference domain. Based on this observation, we develop efficient computational design algorithms for approximating a given input geometry. The resulting designs can be rapidly manufactured via digital fabrication technologies such as laser cutting, CNC milling, or 3D printing. We validate our approach through a series of physical prototypes and present several application case studies, ranging from surgical implants to large-scale deployable architecture.
We present a framework to generate mesh patterns that consist of a hybrid of both triangles and quads. Given a 3D surface, the generated patterns fit the surface boundaries and curvatures. Such regular and near regular triangle-quad hybrid meshes provide two key advantages: first, novel-looking polygonal patterns achieved by mixing different arrangements of triangles and quads together; second, a finer discretization of angle deficits than utilizing triangles or quads alone. Users have controls over the generated patterns in global and local levels. We demonstrate applications of our approach in architectural geometry and pattern design on surfaces.
Convincing audio for games and virtual reality requires modeling directional propagation effects. The initial sound's arrival direction is particularly salient and derives from multiply-diffracted paths in complex scenes. When source and listener straddle occluders, the initial sound and multiply-scattered reverberation stream through gaps and portals, helping the listener navigate. Geometry near the source and/or listener reveals its presence through anisotropic reflections. We propose the first precomputed wave technique to capture such directional effects in general scenes comprising millions of polygons. These effects are formally represented with the 9D directional response function of 3D source and listener location, time, and direction at the listener, making memory use the major concern. We propose a novel parametric encoder that compresses this function within a budget of ~100MB for large scenes, while capturing many salient acoustic effects indoors and outdoors. The encoder is complemented with a lightweight signal processing algorithm whose filtering cost is largely insensitive to the number of sound sources, resulting in an immediately practical system.
We explore an integrated approach to sound generation that supports a wide variety of physics-based simulation models and computer-animated phenomena. Targeting high-quality offline sound synthesis, we seek to resolve animation-driven sound radiation with near-field scattering and diffraction effects. The core of our approach is a sharp-interface finite-difference time-domain (FDTD) wavesolver, with a series of supporting algorithms to handle rapidly deforming and vibrating embedded interfaces arising in physics-based animation sound. Once the solver rasterizes these interfaces, it must evaluate acceleration boundary conditions (BCs) that involve model-and phenomena-specific computations. We introduce acoustic shaders as a mechanism to abstract away these complexities, and describe a variety of implementations for computer animation: near-rigid objects with ringing and acceleration noise, deformable (finite element) models such as thin shells, bubble-based water, and virtual characters. Since time-domain wave synthesis is expensive, we only simulate pressure waves in a small region about each sound source, then estimate a far-field pressure signal. To further improve scalability beyond multi-threading, we propose a fully time-parallel sound synthesis method that is demonstrated on commodity cloud computing resources. In addition to presenting results for multiple animation phenomena (water, rigid, shells, kinematic deformers, etc.) we also propose 3D automatic dialogue replacement (3DADR) for virtual characters so that pre-recorded dialogue can include character movement, and near-field shadowing and scattering sound effects.
Thin shells --- solids that are thin in one dimension compared to the other two --- often emit rich nonlinear sounds when struck. Strong excitations can even cause chaotic thin-shell vibrations, producing sounds whose energy spectrum diffuses from low to high frequencies over time --- a phenomenon known as wave turbulence. It is all these nonlinearities that grant shells such as cymbals and gongs their characteristic "glinting" sound. Yet, simulation models that efficiently capture these sound effects remain elusive.
We propose a physically based, multi-scale reduced simulation method to synthesize nonlinear thin-shell sounds. We first split nonlinear vibrations into two scales, with a small low-frequency part simulated in a fully nonlinear way, and a high-frequency part containing many more modes approximated through time-varying linearization. This allows us to capture interesting nonlinearities in the shells' deformation, tens of times faster than previous approaches. Furthermore, we propose a method that enriches simulated sounds with wave turbulent sound details through a phenomenological diffusion model in the frequency domain, and thereby sidestep the expensive simulation of chaotic high-frequency dynamics. We show several examples of our simulations, illustrating the efficiency and realism of our model.
Although 360° cameras ease the capture of panoramic footage, it remains challenging to add realistic 360° audio that blends into the captured scene and is synchronized with the camera motion. We present a method for adding scene-aware spatial audio to 360° videos in typical indoor scenes, using only a conventional mono-channel microphone and a speaker. We observe that the late reverberation of a room's impulse response is usually diffuse spatially and directionally. Exploiting this fact, we propose a method that synthesizes the directional impulse response between any source and listening locations by combining a synthesized early reverberation part and a measured late reverberation tail. The early reverberation is simulated using a geometric acoustic simulation and then enhanced using a frequency modulation method to capture room resonances. The late reverberation is extracted from a recorded impulse response, with a carefully chosen time duration that separates out the late reverberation from the early reverberation. In our validations, we show that our synthesized spatial audio matches closely with recordings using ambisonic microphones. Lastly, we demonstrate the strength of our method in several applications.
We present a joint audio-visual model for isolating a single speech signal from a mixture of sounds such as other speakers and background noise. Solving this task using only audio as input is extremely challenging and does not provide an association of the separated speech signals with speakers in the video. In this paper, we present a deep network-based model that incorporates both visual and auditory signals to solve this task. The visual features are used to "focus" the audio on desired speakers in a scene and to improve the speech separation quality. To train our joint audio-visual model, we introduce AVS
Sensors which capture 3D scene information provide useful data for tasks in vehicle navigation, gesture recognition, human pose estimation, and geometric reconstruction. Active illumination time-of-flight sensors in particular have become widely used to estimate a 3D representation of a scene. However, the maximum range, density of acquired spatial samples, and overall acquisition time of these sensors is fundamentally limited by the minimum signal required to estimate depth reliably. In this paper, we propose a data-driven method for photon-efficient 3D imaging which leverages sensor fusion and computational reconstruction to rapidly and robustly estimate a dense depth map from low photon counts. Our sensor fusion approach uses measurements of single photon arrival times from a low-resolution single-photon detector array and an intensity image from a conventional high-resolution camera. Using a multi-scale deep convolutional network, we jointly process the raw measurements from both sensors and output a high-resolution depth map. To demonstrate the efficacy of our approach, we implement a hardware prototype and show results using captured data. At low signal-to-background levels, our depth reconstruction algorithm with sensor fusion outperforms other methods for depth estimation from noisy measurements of photon arrival times.
In typical cameras the optical system is designed first; once it is fixed, the parameters in the image processing algorithm are tuned to get good image reproduction. In contrast to this sequential design approach, we consider joint optimization of an optical system (for example, the physical shape of the lens) together with the parameters of the reconstruction algorithm. We build a fully-differentiable simulation model that maps the true source image to the reconstructed one. The model includes diffractive light propagation, depth and wavelength-dependent effects, noise and nonlinearities, and the image post-processing. We jointly optimize the optical parameters and the image processing algorithm parameters so as to minimize the deviation between the true and reconstructed image, over a large set of images. We implement our joint optimization method using autodifferentiation to efficiently compute parameter gradients in a stochastic optimization algorithm. We demonstrate the efficacy of this approach by applying it to achromatic extended depth of field and snapshot super-resolution imaging.
Adaptive optics has become a valuable tool for correcting minor optical aberrations in applications such as astronomy and microscopy. However, due to the limited resolution of both the wavefront sensing and the wavefront correction hardware, it has so far not been feasible to use adaptive optics for correcting large-scale waveform deformations that occur naturally in regular photography and other imaging applications.
In this work, we demonstrate an adaptive optics system for regular cameras. We achieve a significant improvement in focus for large wavefront distortions by improving upon a recently developed high resolution coded wavefront sensor, and combining it with a spatial phase modulator to create a megapixel adaptive optics system with unprecedented capability to sense and correct large distortions.
Graphic designers often manipulate the overall look and feel of their designs to convey certain personalities (e.g., cute, mysterious and romantic) to impress potential audiences and achieve business goals. However, understanding the factors that determine the personality of a design is challenging, as a graphic design is often a result of thousands of decisions on numerous factors, such as font, color, image, and layout. In this paper, we aim to answer the question of what characterizes the personality of a graphic design. To this end, we propose a deep learning framework for exploring the effects of various design factors on the perceived personalities of graphic designs. Our framework learns a convolutional neural network (called personality scoring network) to estimate the personality scores of graphic designs by ranking the crawled web data. Our personality scoring network automatically learns a visual representation that captures the semantics necessary to predict graphic design personality. With our personality scoring network, we systematically and quantitatively investigate how various design factors (e.g., color, font, and layout) affect design personality across different scales (from pixels, regions to elements). We also demonstrate a number of practical application scenarios of our network, including element-level design suggestion and example-based personality transfer.
Flat design is a modern style of graphics design that minimizes the number of design attributes required to convey 3D shapes. This approach suits design contexts requiring simplicity and efficiency, such as mobile computing devices. This `less-is-more' design inspiration has posed significant challenges in practice since it selects from a restricted range of design elements (e.g., color and resolution) to represent complex shapes. In this work, we investigate a means of computationally generating a specialized 2D flat representation - image formed by black-and-white patches - from 3D shapes. We present a novel framework that automatically abstracts 3D man-made shapes into 2D binary images at multiple scales. Based on a set of identified design principles related to the inference of geometry and structure, our framework jointly analyzes the input 3D shape and its counterpart 2D representation, followed by executing a carefully devised layout optimization algorithm. The robustness and effectiveness of our method are demonstrated by testing it on a wide variety of man-made shapes and comparing the results with baseline methods via a pilot user study. We further present two practical applications that are likely to benefit from our work.
Artist-drawn images with distinctly colored, piecewise continuous boundaries, which we refer to as semi-structured imagery, are very common in online raster databases and typically allow for a perceptually unambiguous mental vector interpretation. Yet, perhaps surprisingly, existing vectorization algorithms frequently fail to generate these viewer-expected interpretations on such imagery. In particular, the vectorized region boundaries they produce frequently diverge from those anticipated by viewers. We propose a new approach to region boundary vectorization that targets semi-structured inputs and leverages observations about human perception of shapes to generate vector images consistent with viewer expectations. When viewing raster imagery observers expect the vector output to be an accurate representation of the raster input. However, perception studies suggest that viewers implicitly account for the lossy nature of the rasterization process and mentally smooth and simplify the observed boundaries. Our core algorithmic challenge is to balance these conflicting cues and obtain a piecewise continuous vectorization whose discontinuities, or corners, are aligned with human expectations.
Our framework centers around a simultaneous spline fitting and corner detection method that combines a learned metric, that approximates human perception of boundary discontinuities on raster inputs, with perception-driven algorithmic discontinuity analysis. The resulting method balances local cues provided by the learned metric with global cues obtained by balancing simplicity and continuity expectations. Given the finalized set of corners, our framework connects those using simple, continuous curves that capture input regularities. We demonstrate our method on a range of inputs and validate its superiority over existing alternatives via an extensive comparative user study.
Character rigs are procedural systems that compute the shape of an animated character for a given pose. They can be highly complex and must account for bulges, wrinkles, and other aspects of a character's appearance. When comparing film-quality character rigs with those designed for real-time applications, there is typically a substantial and readily apparent difference in the quality of the mesh deformations. Real-time rigs are limited by a computational budget and often trade realism for performance. Rigs for film do not have this same limitation, and character riggers can make the rig as complicated as necessary to achieve realistic deformations. However, increasing the rig complexity slows rig evaluation, and the animators working with it can become less efficient and may experience frustration. In this paper, we present a method to reduce the time required to compute mesh deformations for film-quality rigs, allowing better interactivity during animation authoring and use in real-time games and applications. Our approach learns the deformations from an existing rig by splitting the mesh deformation into linear and nonlinear portions. The linear deformations are computed directly from the transformations of the rig's underlying skeleton. We use deep learning methods to approximate the remaining nonlinear portion. In the examples we show from production rigs used to animate lead characters, our approach reduces the computational time spent on evaluating deformations by a factor of 5X-10X. This significant savings allows us to run the complex, film-quality rigs in real-time even when using a CPU-only implementation on a mobile device.
In this paper, an efficient and scalable approach for simulating inhomogeneous and non-linear elastic objects is introduced. Our numerical coarsening approach consists in optimizing non-conforming and matrix-valued shape functions to allow for predictive simulation of heterogeneous materials with non-linear constitutive laws even on coarse grids, thus saving orders of magnitude in computational time compared to traditional finite element computations. The set of local shape functions over coarse elements is carefully tailored in a preprocessing step to balance geometric continuity and local material stiffness. In particular, we do not impose continuity of our material-aware shape functions between neighboring elements to significantly reduce the fictitious numerical stiffness that conforming bases induce; however, we enforce crucial geometric and physical properties such as partition of unity and exact reproduction of representative fine displacements to eschew the use of discontinuous Galerkin methods. We demonstrate that we can simulate, with no parameter tuning, inhomogeneous and non-linear materials significantly better than previous approaches that traditionally try to homogenize the constitutive model instead.
The goal of this paper is to simulate the interactions between magnetic objects in a physically correct way. The simulation scheme is based on magnetization dynamics, which describes the temporal change of magnetic moments. For magnetization dynamics, the Landau-Lifshitz-Gilbert equation is adopted, which is widely used in micromagnetics. Through effectively-designed novel models of magnets, it is extended into the macro scale so as to be combined with real-time rigid-body dynamics. The overall simulation is stable and enables us to implement mutual induction and remanence that have not been tackled by the state-of-the-art technique in magnet simulation. The proposed method can be applied to various fields including magnet experiments in the virtual world.
We present a visual analogue for musical rhythm derived from an analysis of motion in video, and show that alignment of visual rhythm with its musical counterpart results in the appearance of dance. Central to our work is the concept of visual beats --- patterns of motion that can be shifted in time to control visual rhythm. By warping visual beats into alignment with musical beats, we can create or manipulate the appearance of dance in video. Using this approach we demonstrate a variety of retargeting applications that control musical synchronization of audio and video: we can change what song performers are dancing to, warp irregular motion into alignment with music so that it appears to be dancing, or search collections of video for moments of accidentally dance-like motion that can be used to synthesize musical performances.
Digital drawing is becoming a favorite technique for many artists. It allows for quick swaps between different materials, reverting changes, and applying selective modifications to finished artwork. These features enable artists to be more efficient and creative. A significant disadvantage of digital drawing is poor haptic feedback. Artists are usually limited to one surface and a few different stylus nibs, and while they try to find a combination that suits their needs, this is typically challenging. In this work, we address this problem and propose a method for designing, evaluating, and optimizing different stylus designs. We begin with collecting a representative set of traditional drawing tools. We measure their physical properties and conduct a user experiment to build a perceptual space that encodes perceptually-relevant attributes of drawing materials. The space is optimized to both explain our experimental data and correlate it with measurable physical properties. To embed new drawing tool designs into the space without conducting additional experiments and measurements, we propose a new, data-driven simulation technique for characterizing stylus-surface interaction. We finally leverage the perceptual space, our simulation, and recent advancements in multi-material 3D printing to demonstrate the application of our system in the design of new digital drawing tools that mimic traditional drawing materials.
We present a modular convolutional architecture for denoising rendered images. We expand on the capabilities of kernel-predicting networks by combining them with a number of task-specific modules, and optimizing the assembly using an asymmetric loss. The source-aware encoder---the first module in the assembly---extracts low-level features and embeds them into a common feature space, enabling quick adaptation of a trained network to novel data. The spatial and temporal modules extract abstract, high-level features for kernel-based reconstruction, which is performed at three different spatial scales to reduce low-frequency artifacts. The complete network is trained using a class of asymmetric loss functions that are designed to preserve details and provide the user with a direct control over the variance-bias trade-off during inference. We also propose an error-predicting module for inferring reconstruction error maps that can be used to drive adaptive sampling. Finally, we present a theoretical analysis of convergence rates of kernel-predicting architectures, shedding light on why kernel prediction performs better than synthesizing the colors directly, complementing the empirical evidence presented in this and previous works. We demonstrate that our networks attain results that compare favorably to state-of-the-art methods in terms of detail preservation, low-frequency noise removal, and temporal stability on a variety of production and academic datasets.
Direct illumination calculation is an important component of any physically-based Tenderer with a substantial impact on the overall performance. We present a novel adaptive solution for unbiased Monte Carlo direct illumination sampling, based on online learning of the light selection probability distributions. Our main contribution is a formulation of the learning process as Bayesian regression, based on a new, specifically designed statistical model of direct illumination. The net result is a set of regularization strategies to prevent over-fitting and ensure robustness even in early stages of calculation, when the observed information is sparse. The regression model captures spatial variation of illumination, which enables aggregating statistics over relatively large scene regions and, in turn, ensures a fast learning rate. We make the method scalable by adopting a light clustering strategy from the Lightcuts method, and further reduce variance through the use of control variates. As a main design feature, the resulting algorithm is virtually free of any preprocessing, which enables its use for interactive progressive rendering, while the online learning still enables super-linear convergence.
We present an image-based relighting method that can synthesize scene appearance under novel, distant illumination from the visible hemisphere, from only five images captured under pre-defined directional lights. Our method uses a deep convolutional neural network to regress the relit image from these five images; this relighting network is trained on a large synthetic dataset comprised of procedurally generated shapes with real-world reflectances. We show that by combining a custom-designed sampling network with the relighting network, we can jointly learn both the optimal input light directions and the relighting function. We present an extensive evaluation of our network, including an empirical analysis of reconstruction quality, optimal lighting configurations for different scenarios, and alternative network architectures. We demonstrate, on both synthetic and real scenes, that our method is able to reproduce complex, high-frequency lighting effects like specularities and cast shadows, and outperforms other image-based relighting methods that require an order of magnitude more images.
We propose a novel framework that automatically learns the lighting patterns for efficient reflectance acquisition, as well as how to faithfully reconstruct spatially varying anisotropic BRDFs and local frames from measurements under such patterns. The core of our framework is an asymmetric deep autoencoder, consisting of a nonnegative, linear encoder which directly corresponds to the lighting patterns used in physical acquisition, and a stacked, nonlinear decoder which computationally recovers the BRDF information from captured photographs. The autoencoder is trained with a large amount of synthetic reflectance data, and can adapt to various factors, including the geometry of the setup and the properties of appearance. We demonstrate the effectiveness of our framework on a wide range of physical materials, using as few as 16 ~ 32 lighting patterns, which correspond to 12 ~ 25 seconds of acquisition time. We also validate our results with the ground truth data and captured photographs. Our framework is useful for increasing the efficiency in both novel and existing acquisition setups.
Texture, highlights, and shading are some of many visual cues that allow humans to perceive material appearance in single pictures. Yet, recovering spatially-varying bi-directional reflectance distribution functions (SVBRDFs) from a single image based on such cues has challenged researchers in computer graphics for decades. We tackle lightweight appearance capture by training a deep neural network to automatically extract and make sense of these visual cues. Once trained, our network is capable of recovering per-pixel normal, diffuse albedo, specular albedo and specular roughness from a single picture of a flat surface lit by a hand-held flash. We achieve this goal by introducing several innovations on training data acquisition and network design. For training, we leverage a large dataset of artist-created, procedural SVBRDFs which we sample and render under multiple lighting directions. We further amplify the data by material mixing to cover a wide diversity of shading effects, which allows our network to work across many material classes. Motivated by the observation that distant regions of a material sample often offer complementary visual cues, we design a network that combines an encoder-decoder convolutional track for local feature extraction with a fully-connected track for global feature extraction and propagation. Many important material effects are view-dependent, and as such ambiguous when observed in a single image. We tackle this challenge by defining the loss as a differentiable SVBRDF similarity metric that compares the renderings of the predicted maps against renderings of the ground truth from several lighting and viewing directions. Combined together, these novel ingredients bring clear improvement over state of the art methods for single-shot capture of spatially varying BRDFs.
A critical advantage of additive manufacturing is its ability to fabricate complex small-scale structures. These microstructures can be understood as a metamaterial: they exist at a much smaller scale than the volume they fill, and are collectively responsible for an average elastic behavior different from that of the base printing material making the fabricated object lighter and/or flexible along specific directions. In addition, the average behavior can be graded spatially by progressively modifying the micro structure geometry.
The definition of a microstructure is a careful trade-off between the geometric requirements of manufacturing and the properties one seeks to obtain within a shape: in our case a wide range of elastic behaviors. Most existing microstructures are designed for stereolithography (SLA) and laser sintering (SLS) processes. The requirements are however different than those of continuous deposition systems such as fused filament fabrication (FFF), for which there is currently a lack of microstructures enabling graded elastic behaviors.
In this work we introduce a novel type of microstructures that strictly enforce all the requirements of FFF-like processes: continuity, self-support and overhang angles. They offer a range of orthotropic elastic responses that can be graded spatially. This allows to fabricate parts usually reserved to the most advanced technologies on widely available inexpensive printers that also benefit from a continuously expanding range of materials.
We introduce the first fully automatic pipeline to convert arbitrary 3D shapes into knit models. Our pipeline is based on a global parametrization remeshing pipeline to produce an isotropic quad-dominant mesh aligned with a 2-RoSy field. The knitting directions over the surface are determined using a set of custom topological operations and a two-step global optimization that minimizes the number of irregularities. The resulting mesh is converted into a valid stitch mesh that represents the knit model. The yarn curves are generated from the stitch mesh and the final yarn geometry is computed using a yarn-level relaxation process. Thus, we produce topologically valid models that can be used with a yarn-level simulation. We validate our algorithm by automatically generating knit models from complex 3D shapes and processing over a hundred models with various shapes without any user input or parameter tuning. We also demonstrate applications of our approach for custom knit model generation using fabrication via 3D printing.
Typical design for manufacturing applications requires simultaneous optimization of conflicting performance objectives: Design variations that improve one performance metric may decrease another performance metric. In these scenarios, there is no unique optimal design but rather a set of designs that are optimal for different trade-offs (called Pareto-optimal). In this work, we propose a novel approach to discover the Pareto front, allowing designers to navigate the landscape of compromises efficiently. Our approach is based on a first-order approximation of the Pareto front, which allows entire neighborhoods rather than individual points on the Pareto front to be captured. In addition to allowing for efficient discovery of the Pareto front and the corresponding mapping to the design space, this approach allows us to represent the entire trade-off manifold as a small collection of patches that comprise a high-quality and piecewise-smooth approximation. We illustrate how this technique can be used for navigating performance trade-offs in computer-aided design (CAD) models.
Digital sculpting is a popular means to create 3D models but remains a challenging task. We propose a 3D sculpting system that assists users, especially novices, in freely creating models with reduced input labor and enhanced output quality. With an interactive sculpting interface, our system silently records and analyzes users' workflows including brush strokes and camera movements, and predicts what they might do in the future. Users can accept, partially accept, or ignore the suggestions and thus retain full control and individual style. They can also explicitly select and clone past workflows over output model regions. Our key idea is to consider how a model is authored via dynamic workflows in addition to what is shaped in static geometry. This allows our method for more accurate analysis of user intentions and more general synthesis of shape structures than prior workflow or geometry methods, such as large overlapping deformations. We evaluate our method via user feedbacks and authored models.
While folds and pleats add interest to garments and cloth objects, incorporating them into an existing design manually or using existing software requires expertise and time. We present FoldSketch, a new system that supports simple and intuitive fold and pleat design. FoldSketch users specify the fold or pleat configuration they seek using a simple schematic sketching interface; the system then algorithmically generates both the fold-enhanced 3D garment geometry that conforms to user specifications, and the corresponding 2D patterns that reproduce this geometry within a simulation engine. While previous work aspired to compute the desired patterns for a given target 3D garment geometry, our main algorithmic challenge is that we do not have target geometry to start with. Real-life garment folds have complex profile shapes, and their exact geometry and location on a garment are intricately linked to a range of physical factors such as fabric properties and the garment's interaction with the wearer's body; it is therefore virtually impossible to predict the 3D shape of a fold-enhanced garment using purely geometric means. At the same time, using physical simulation to model folds requires appropriate 2D patterns and initial drape, neither of which can be easily provided by the user. We obtain both the 3D fold-enhanced garment and its corresponding patterns and initial drape via an alternating 2D-3D algorithm. We first expand the input patterns by allocating excess material for the expected fold formation; we then use these patterns to produce an estimated fold-enhanced drape geometry that balances designer expectations against physical reproducibility. We use the patterns and the estimated drape as input to a simulation generating an initial reproducible output. We improve the output's alignment with designer expectations by progressively refining the patterns and the estimated drape, converging to a final fully physically reproducible fold-enhanced garment. Our experiments confirm that FoldSketch reliably converges to a desired garment geometry and corresponding patterns and drape, and works well with different physical simulators. We demonstrate the versatility of our approach by showcasing a collection of garments augmented with diverse fold and pleat layouts specified via the FoldSketch interface, and further validate our approach via comparisons to alternative solutions and feedback from potential users.
This paper presents a new method to fabricate 3D models on a robotic printing system equipped with multi-axis motion. Materials are accumulated inside the volume along curved tool-paths so that the need of supporting structures can be tremendously reduced - if not completely abandoned - on all models. Our strategy to tackle the challenge of tool-path planning for multi-axis 3D printing is to perform two successive decompositions, first volume-to-surfaces and then surfaces-to-curves. The volume-to-surfaces decomposition is achieved by optimizing a scalar field within the volume that represents the fabrication sequence. The field is constrained such that its iso-values represent curved layers that are supported from below, and present a convex surface affording for collision-free navigation of the printer head. After extracting all curved layers, the surfaces-to-curves decomposition covers them with tool-paths while taking into account constraints from the robotic printing system. Our method successfully generates tool-paths for 3D printing models with large overhangs and high-genus topology. We fabricated several challenging cases on our robotic platform to verify and demonstrate its capabilities.
Molding is a popular mass production method, in which the initial expenses for the mold are offset by the low per-unit production cost. However, the physical fabrication constraints of the molding technique commonly restrict the shape of moldable objects. For a complex shape, a decomposition of the object into moldable parts is a common strategy to address these constraints, with plastic model kits being a popular and illustrative example. However, conducting such a decomposition requires considerable expertise, and it depends on the technical aspects of the fabrication technique, as well as aesthetic considerations. We present an interactive technique to create such decompositions for two-piece molding, in which each part of the object is cast between two rigid mold pieces. Given the surface description of an object, we decompose its thin-shell equivalent into moldable parts by first performing a coarse decomposition and then utilizing an active contour model for the boundaries between individual parts. Formulated as an optimization problem, the movement of the contours is guided by an energy reflecting fabrication constraints to ensure the moldability of each part. Simultaneously the user is provided with editing capabilities to enforce aesthetic guidelines. Our interactive interface provides control of the contour positions by allowing, for example, the alignment of part boundaries with object features. Our technique enables a novel workflow, as it empowers novice users to explore the design space, and it generates fabrication-ready two-piece molds that can be used either for casting or industrial injection molding of free-form objects.
We propose a new method for fabricating digital objects through reusable silicone molds. Molds are generated by casting liquid silicone into custom 3D printed containers called metamolds. Metamolds automatically define the cuts that are needed to extract the cast object from the silicone mold. The shape of metamolds is designed through a novel segmentation technique, which takes into account both geometric and topological constraints involved in the process of mold casting. Our technique is simple, does not require changing the shape or topology of the input objects, and only requires of-the-shelf materials and technologies. We successfully tested our method on a set of challenging examples with complex shapes and rich geometric detail.
We present an automatic algorithm for subtractive manufacturing of freeform 3D objects using high-speed machining (HSM) via CNC. A CNC machine operates a cylindrical cutter to carve off material from a 3D shape stock, following a tool path, to "expose" the target object. Our method decomposes the input object's surface into a small number of patches each of which is fully accessible and machinable by the CNC machine, in continuous fashion, under a fixed cutter-object setup configuration. This is achieved by covering the input surface with a minimum number of accessible regions and then extracting a set of machinable patches from each accessible region. For each patch obtained, we compute a continuous, space-filling, and iso-scallop tool path which conforms to the patch boundary, enabling efficient carving with high-quality surface finishing. The tool path is generated in the form of connected Fermat spirals, which have been generalized from a 2D fill pattern for layered manufacturing to work for curved surfaces. Furthermore, we develop a novel method to control the spacing of Fermat spirals based on directional surface curvature and adapt the heat method to obtain iso-scallop carving. We demonstrate automatic generation of accessible and machinable surface decompositions and iso-scallop Fermat spiral carving paths for freeform 3D objects. Comparisons are made to tool paths generated by commercial software in terms of real machining time and surface quality.
A growing number of visual computing applications depend on the analysis of large video collections. The challenge is that scaling applications to operate on these datasets requires efficient systems for pixel data access and parallel processing across large numbers of machines. Few programmers have the capability to operate efficiently at these scales, limiting the field's ability to explore new applications that leverage big video data. In response, we have created Scanner, a system for productive and efficient video analysis at scale. Scanner organizes video collections as tables in a data store optimized for sampling frames from compressed video, and executes pixel processing computations, expressed as dataflow graphs, on these frames. Scanner schedules video analysis applications expressed using these abstractions onto heterogeneous throughput computing hardware, such as multi-core CPUs, GPUs, and media processing ASICs, for high-throughput pixel processing. We demonstrate the productivity of Scanner by authoring a variety of video processing applications including the synthesis of stereo VR video streams from multi-camera rigs, markerless 3D human pose reconstruction from video, and data-mining big video datasets such as hundreds of feature-length films or over 70,000 hours of TV news. These applications achieve near-expert performance on a single machine and scale efficiently to hundreds of machines, enabling formerly long-running big video data analysis tasks to be carried out in minutes to hours.
Gradient-based optimization has enabled dramatic advances in computational imaging through techniques like deep learning and nonlinear optimization. These methods require gradients not just of simple mathematical functions, but of general programs which encode complex transformations of images and graphical data. Unfortunately, practitioners have traditionally been limited to either hand-deriving gradients of complex computations, or composing programs from a limited set of coarse-grained operators in deep learning frameworks. At the same time, writing programs with the level of performance needed for imaging and deep learning is prohibitively difficult for most programmers.
We extend the image processing language Halide with general reverse-mode automatic differentiation (AD), and the ability to automatically optimize the implementation of gradient computations. This enables automatic computation of the gradients of arbitrary Halide programs, at high performance, with little programmer effort. A key challenge is to structure the gradient code to retain parallelism. We define a simple algorithm to automatically schedule these pipelines, and show how Halide's existing scheduling primitives can express and extend the key AD optimization of "checkpointing."
Using this new tool, we show how to easily define new neural network layers which automatically compile to high-performance GPU implementations, and how to solve nonlinear inverse problems from computational imaging. Finally, we show how differentiable programming enables dramatically improving the quality of even traditional, feed-forward image processing algorithms, blurring the distinction between classical and deep methods.
In this paper, we present a real-time graphics pipeline implemented entirely in software on a modern GPU. As opposed to previous work, our approach features a fully-concurrent, multi-stage, streaming design with dynamic load balancing, capable of operating efficiently within bounded memory. We address issues such as primitive order, vertex reuse, and screen-space derivatives of dependent variables, which are essential to real-world applications, but have largely been ignored by comparable work in the past. The power of a software approach lies in the ability to tailor the graphics pipeline to any given application. In exploration of this potential, we design and implement four novel pipeline modifications. Evaluation of the performance of our approach on more than 100 real-world scenes collected from video games shows rendering speeds within one order of magnitude of the hardware graphics pipeline as well as significant improvements over previous work, not only in terms of capabilities and performance, but also robustness.
Designers of real-time rendering engines must balance the conflicting goals of maintaining clear, extensible shading systems and achieving high rendering performance. In response, engine architects have established effective design patterns for authoring shading systems, and developed engine-specific code synthesis tools, ranging from preprocessor hacking to domain-specific shading languages, to productively implement these patterns. The problem is that proprietary tools add significant complexity to modern engines, lack advanced language features, and create additional challenges for learning and adoption. We argue that the advantages of engine-specific code generation tools can be achieved using the underlying GPU shading language directly, provided the shading language is extended with a small number of best-practice principles from modern, well-established programming languages. We identify that adding generics with interface constraints, associated types, and interface/structure extensions to existing C-like GPU shading languages enables real-time Tenderer developers to build shading systems that are extensible, maintainable, and execute efficiently on modern GPUs without the need for additional domain-specific tools. We embody these ideas in an extension of HLSL called Slang, and provide a reference design for a large, extensible shader library implemented using Slang's features. We rearchitect an open source Tenderer to use this library and Slang's compiler services, and demonstrate the resulting shading system is substantially simpler, easier to extend with new features, and achieves higher rendering performance than the original HLSL-based implementation.
Basketball is one of the world's most popular sports because of the agility and speed demonstrated by the players. This agility and speed makes designing controllers to realize robust control of basketball skills a challenge for physics-based character animation. The highly dynamic behaviors and precise manipulation of the ball that occur in the game are difficult to reproduce for simulated players. In this paper, we present an approach for learning robust basketball dribbling controllers from motion capture data. Our system decouples a basketball controller into locomotion control and arm control components and learns each component separately. To achieve robust control of the ball, we develop an efficient pipeline based on trajectory optimization and deep reinforcement learning and learn non-linear arm control policies. We also present a technique for learning skills and the transition between skills simultaneously. Our system is capable of learning robust controllers for various basketball dribbling skills, such as dribbling between the legs and crossover moves. The resulting control graphs enable a simulated player to perform transitions between these skills and respond to user interaction.
A longstanding goal in character animation is to combine data-driven specification of behavior with a system that can execute a similar behavior in a physical simulation, thus enabling realistic responses to perturbations and environmental variation. We show that well-known reinforcement learning (RL) methods can be adapted to learn robust control policies capable of imitating a broad range of example motion clips, while also learning complex recoveries, adapting to changes in morphology, and accomplishing user-specified goals. Our method handles keyframed motions, highly-dynamic actions such as motion-captured flips and spins, and retargeted motions. By combining a motion-imitation objective with a task objective, we can train characters that react intelligently in interactive settings, e.g., by walking in a desired direction or throwing a ball at a user-specified target. This approach thus combines the convenience and motion quality of using motion clips to define the desired style and appearance, with the flexibility and generality afforded by RL methods and physics-based animation. We further explore a number of methods for integrating multiple clips into the learning process to develop multi-skilled agents capable of performing a rich repertoire of diverse skills. We demonstrate results using multiple characters (human, Atlas robot, bipedal dinosaur, dragon) and a large variety of skills, including locomotion, acrobatics, and martial arts.
Learning locomotion skills is a challenging problem. To generate realistic and smooth locomotion, existing methods use motion capture, finite state machines or morphology-specific knowledge to guide the motion generation algorithms. Deep reinforcement learning (DRL) is a promising approach for the automatic creation of locomotion control. Indeed, a standard benchmark for DRL is to automatically create a running controller for a biped character from a simple reward function [Duan et al. 2016]. Although several different DRL algorithms can successfully create a running controller, the resulting motions usually look nothing like a real runner. This paper takes a minimalist learning approach to the locomotion problem, without the use of motion examples, finite state machines, or morphology-specific knowledge. We introduce two modifications to the DRL approach that, when used together, produce locomotion behaviors that are symmetric, low-energy, and much closer to that of a real person. First, we introduce a new term to the loss function (not the reward function) that encourages symmetric actions. Second, we introduce a new curriculum learning method that provides modulated physical assistance to help the character with left/right balance and forward movement. The algorithm automatically computes appropriate assistance to the character and gradually relaxes this assistance, so that eventually the character learns to move entirely without help. Because our method does not make use of motion capture data, it can be applied to a variety of character morphologies. We demonstrate locomotion controllers for the lower half of a biped, a full humanoid, a quadruped, and a hexapod. Our results show that learned policies are able to produce symmetric, low-energy gaits. In addition, speed-appropriate gait patterns emerge without any guidance from motion examples or contact planning.
Quadruped motion includes a wide variation of gaits such as walk, pace, trot and canter, and actions such as jumping, sitting, turning and idling. Applying existing data-driven character control frameworks to such data requires a significant amount of data preprocessing such as motion labeling and alignment. In this paper, we propose a novel neural network architecture called Mode-Adaptive Neural Networks for controlling quadruped characters. The system is composed of the motion prediction network and the gating network. At each frame, the motion prediction network computes the character state in the current frame given the state in the previous frame and the user-provided control signals. The gating network dynamically updates the weights of the motion prediction network by selecting and blending what we call the expert weights, each of which specializes in a particular movement. Due to the increased flexibility, the system can learn consistent expert weights across a wide range of non-periodic/periodic actions, from unstructured motion capture data, in an end-to-end fashion. In addition, the users are released from performing complex labeling of phases in different gaits. We show that this architecture is suitable for encoding the multi-modality of quadruped locomotion and synthesizing responsive motion in real-time.
We present a physically accurate low-order elastic shell model that incorporates active material response to dynamically changing stimuli such as heat, moisture, and growth. Our continuous formulation of the geometrically non-linear elastic energy derives from the principles of differential geometry, and as such naturally incorporates shell thickness, non-zero rest curvature, and physical material properties. By modeling the environmental stimulus as local, dynamic changes in the rest metric of the material, we are able to solve for the corresponding shape changes by integrating the equations of motions given this non-Euclidean rest state. We present models for differential growth and shrinking due to moisture and temperature gradients along and across the surface, and incorporate anisotropic growth by defining an intrinsic machine direction within the material. Comparisons with experiments and volumetric finite elements show that our simulations achieve excellent qualitative and quantitative agreement. By combining the reduced-order shell theory with appropriate physical models, our approach accurately captures all the physical phenomena while avoiding expensive volumetric discretization of the shell volume.
We present a novel method for simulation of thin shells with frictional contact using a combination of the Material Point Method (MPM) and subdivision finite elements. The shell kinematics are assumed to follow a continuum shell model which is decomposed into a Kirchhoff-Love motion that rotates the mid-surface normals followed by shearing and compression/extension of the material along the mid-surface normal. We use this decomposition to design an elastoplastic constitutive model to resolve frictional contact by decoupling resistance to contact and shearing from the bending resistance components of stress. We show that by resolving frictional contact with a continuum approach, our hybrid Lagrangian/Eulerian approach is capable of simulating challenging shell contact scenarios with hundreds of thousands to millions of degrees of freedom. Without the need for collision detection or resolution, our method runs in a few minutes per frame in these high resolution examples. Furthermore we show that our technique naturally couples with other traditional MPM methods for simulating granular and related materials.
We propose a comprehensive approach to characterizing the mechanical properties of structured sheet materials, i.e., planar rod networks whose mechanics and aesthetics are inextricably linked. We establish a connection between the complex mesoscopic deformation behavior of such structures and their macroscopic elastic properties through numerical homogenization. Our approach leverages 3D Kirchhoff rod simulation in order to capture nonlinear effects for both in-plane and bending deformations. We apply our method to different families of structures based on isohedral tilings---a simple yet extensive and aesthetically interesting group of space-filling patterns. We show that these tilings admit a wide range of material properties, and our homogenization approach allows us to create concise and intuitive descriptions of a material's direction-dependent macromechanical behavior that are easy to communicate even to non-experts. We perform this characterization for an extensive set of structures and organize these data in a material browser to enable efficient forward exploration of the aesthetic-mechanical space of structured sheet materials. We also propose an inverse design method to automatically find structure parameters that best approximate a user-specified target behavior.
In this paper, we present a mixed explicit and semi-implicit Material Point Method for simulating particle-laden flows. We develop a Multigrid Preconditioned fluid solver for the Locally Averaged Navier Stokes equation. This is discretized purely on a semi-staggered standard MPM grid. Sedimentation is modeled with the Drucker-Prager elastoplasticity flow rule, enhanced by a novel particle density estimation method for converting particles between representations of either continuum or discrete points. Fluid and sediment are two-way coupled through a momentum exchange force that can be easily resolved with two MPM background grids. We present various results to demonstrate the efficacy of our method.
In this paper, we introduce the Moving Least Squares Material Point Method (MLS-MPM). MLS-MPM naturally leads to the formulation of Affine Particle-In-Cell (APIC) [Jiang et al. 2015] and Polynomial Particle-In-Cell [Fu et al. 2017] in a way that is consistent with a Galerkin-style weak form discretization of the governing equations. Additionally, it enables a new stress divergence discretization that effortlessly allows all MPM simulations to run two times faster than before. We also develop a Compatible Particle-In-Cell (CPIC) algorithm on top of MLS-MPM. Utilizing a colored distance field representation and a novel compatibility condition for particles and grid nodes, our framework enables the simulation of various new phenomena that are not previously supported by MPM, including material cutting, dynamic open boundaries, and two-way coupling with rigid bodies. MLS-MPM with CPIC is easy to implement and friendly to performance optimization.
Humans can predict the functionality of an object even without any surroundings, since their knowledge and experience would allow them to "hallucinate" the interaction or usage scenarios involving the object. We develop predictive and generative deep convolutional neural networks to replicate this feat. Specifically, our work focuses on functionalities of man-made 3D objects characterized by human-object or object-object interactions. Our networks are trained on a database of scene contexts, called interaction contexts, each consisting of a central object and one or more surrounding objects, that represent object functionalities. Given a 3D object in isolation, our functional similarity network (fSIM-NET), a variation of the triplet network, is trained to predict the functionality of the object by inferring functionality-revealing interaction contexts. fSIM-NET is complemented by a generative network (iGEN-NET) and a segmentation network (iSEG-NET). iGEN-NET takes a single voxelized 3D object with a functionality label and synthesizes a voxelized surround, i.e., the interaction context which visually demonstrates the corresponding functionality. iSEG-NET further separates the interacting objects into different groups according to their interaction types.
We introduce P2P-NET, a general-purpose deep neural network which learns geometric transformations between point-based shape representations from two domains, e.g., meso-skeletons and surfaces, partial and complete scans, etc. The architecture of the P2P-NET is that of a bi-directional point displacement network, which transforms a source point set to a prediction of the target point set with the same cardinality, and vice versa, by applying point-wise displacement vectors learned from data. P2P-NET is trained on paired shapes from the source and target domains, but without relying on point-to-point correspondences between the source and target point sets. The training loss combines two uni-directional geometric losses, each enforcing a shape-wise similarity between the predicted and the target point sets, and a cross-regularization term to encourage consistency between displacement vectors going in opposite directions. We develop and present several different applications enabled by our general-purpose bidirectional P2P-NET to highlight the effectiveness, versatility, and potential of our network in solving a variety of point-based shape transformation problems.
Packed atlases, consisting of 2D parameterized charts, are ubiquitously used to store surface signals such as texture or normals. Tight packing is similarly used to arrange and cut-out 2D panels for fabrication from sheet materials. Packing efficiency, or the ratio between the areas of the packed atlas and its bounding box, significantly impacts downstream applications.
We propose Box Cutter, a new method for optimizing packing efficiency suitable for both settings. Our algorithm improves packing efficiency without changing distortion by strategically cutting and repacking the atlas charts or panels. It preserves the local mapping between the 3D surface and the atlas charts and retains global mapping continuity across the newly formed cuts. We balance packing efficiency improvement against increase in chart boundary length and enable users to directly control the acceptable amount of boundary elongation. While the problem we address is NP-hard, we provide an effective practical solution by iteratively detecting large rectangular empty spaces, or void boxes, in the current atlas packing and eliminating them by first refining the atlas using strategically placed axis-aligned cuts and then repacking the refined charts. We repeat this process until no further improvement is possible, or until the desired balance between packing improvement and boundary elongation is achieved. Packed chart atlases are only useful for the applications we address if their charts are overlap-free; yet many popular parameterization methods, used as-is, produce atlases with global overlaps. Our pre-processing step eliminates all input overlaps while explicitly minimizing the boundary length of the resulting overlap-free charts. We demonstrate our combined strategy on a large range of input atlases produced by diverse parameterization methods, as well as on multiple sets of 2D fabrication panels. Our framework dramatically improves the output packing efficiency on all inputs; for instance with boundary length increase capped at 50% we improve packing efficiency by 68% on average.
Processing signals on surfaces often involves resampling the signal over the vertices of a dense mesh and applying mesh-based filtering operators. We present a framework to process a signal directly in a texture atlas domain. The benefits are twofold: avoiding resampling degradation and exploiting the regularity of the texture image grid. The main challenges are to preserve continuity across atlas chart boundaries and to adapt differential operators to the non-uniform parameterization. We introduce a novel function space and multigrid solver that jointly enable robust, interactive, and geometry-aware signal processing. We demonstrate our approach using several applications including smoothing and sharpening, multiview stitching, geodesic distance computation, and line integral convolution.
We introduce a practical pipeline to create UV T-layouts for real-world quad dominant semi-regular meshes. Our algorithm creates large rectangular patches by relaxing the notion of motorcycle graphs and making it insensitive to local irregularities in the mesh structure such as non-quad elements, redundant irregular vertices, T-junctions, and others. Each surface patch, which can contain multiple singularities and/or polygonal elements, is mapped to an axis-aligned rectangle, leading to a simple and efficient UV layout, which is ideal for texture mapping (allowing for mipmapping and artifact-free bilinear interpolation). We demonstrate that our algorithm is an ideal solution for both recent semi-regular, quad-dominant meshing methods, and for the low-poly meshes typically used in games and movies.
This paper develops a global variational approach to cutting curved surfaces so that they can be flattened into the plane with low metric distortion. Such cuts are a critical component in a variety of algorithms that seek to parameterize surfaces over flat domains, or fabricate structures from flat materials. Rather than evaluate the quality of a cut solely based on properties of the curve itself (e.g., its length or curvature), we formulate a flow that directly optimizes the distortion induced by cutting and flattening. Notably, we do not have to explicitly parameterize the surface in order to evaluate the cost of a cut, but can instead integrate a simple evolution equation defined on the cut curve itself. We arrive at this flow via a novel application of shape derivatives to the Yamabe equation from conformal geometry. We then develop an Eulerian numerical integrator on triangulated surfaces, which does not restrict cuts to mesh edges and can incorporate user-defined data such as importance or occlusion. The resulting cut curves can be used to drive distortion to arbitrarily low levels, and have a very different character from cuts obtained via purely discrete formulations. We briefly explore potential applications to computational design, as well as connections to space filling curves and the problem of uniform heat distribution.
We present an efficient and scalable pipeline for fabricating full-colored objects with spatially-varying translucency from practical and accessible input data via multi-material 3D printing. Observing that the costs associated with BSSRDF measurement and processing are high, the range of 3D printable BSSRDFs are severely limited, and that the human visual system relies only on simple high-level cues to perceive translucency, we propose a method based on reproducing perceptual translucency cues. The input to our pipeline is an RGBA signal defined on the surface of an object, making our approach accessible and practical for designers. We propose a framework for extending standard color management and profiling to combined color and translucency management using a gamut correspondence strategy we call opaque relative processing. We present an efficient streaming method to compute voxel-level material arrangements, achieving both realistic reproduction of measured translucent materials and artistic effects involving multiple fully or partially transparent geometries.
A great deal of attention has been devoted to the fabrication of reflectors that can display different color images when viewed from different directions not only in industry but also for the arts. Although such reflectors have previously been successfully fabricated, the number of images displayed has been limited to two or they suffer from ghosting artifacts where mixed images appear. Furthermore, the previous methods need special hardware and/or materials to fabricate the reflectors. Thus, those techniques are not suitable for printing reflectors on everyday personal objects made of different materials, such as name cards, letter sheets, envelopes, and plastic cases. To overcome these limitations, we propose a method for fabricating reflectors using a standard ultraviolet printer (UV printer). UV printer can render a specified 2D color pattern on an arbitrary material and by overprinting the printed pattern can be raised, that is, the printed pattern becomes a microstructure having color and height. We propose using these micro structures to formulate a method for designing spatially varying reflections that can display different target images when viewed from different directions. The microstructure is calculated by minimizing an objective function that measures the differences between the intensities of the light reflected from the reflector and that of the target image. We show several fabricated reflectors to demonstrate the usefulness of the proposed method.
Additive manufacturing has recently seen drastic improvements in resolution, making it now possible to fabricate features at scales of hundreds or even dozens of nanometers, which previously required very expensive lithographic methods. As a result, additive manufacturing now seems poised for optical applications, including those relevant to computer graphics, such as material design, as well as display and imaging applications.
In this work, we explore the use of additive manufacturing for generating structural colors, where the structures are designed using a fabrication-aware optimization process. This requires a combination of full-wave simulation, a feasible parameterization of the design space, and a tailored optimization procedure. Many of these components should be re-usable for the design of other optical structures at this scale.
We show initial results of material samples fabricated based on our designs. While these suffer from the prototype character of state-of-the-art fabrication hardware, we believe they clearly demonstrate the potential of additive nanofabrication for structural colors and other graphics applications.
We present a computation-driven approach to design optimization and motion synthesis for robotic creatures that locomote using arbitrary arrangements of legs and wheels. Through an intuitive interface, designers first create unique robots by combining different types of servomotors, 3D printable connectors, wheels and feet in a mix-and-match manner. With the resulting robot as input, a novel trajectory optimization formulation generates walking, rolling, gliding and skating motions. These motions emerge naturally based on the components used to design each individual robot. We exploit the particular structure of our formulation and make targeted simplifications to significantly accelerate the underlying numerical solver without compromising quality. This allows designers to interactively choreograph stable, physically-valid motions that are agile and compelling. We furthermore develop a suite of user-guided, semi-automatic, and fully-automatic optimization tools that enable motion-aware edits of the robot's physical structure. We demonstrate the efficacy of our design methodology by creating a diverse array of hybrid legged/wheeled mobile robots which we validate using physics simulation and through fabricated prototypes.
We present a novel deep-learning based approach to producing animator-centric speech motion curves that drive a JALI or standard FACS-based production face-rig, directly from input audio. Our three-stage Long Short-Term Memory (LSTM) network architecture is motivated by psycho-linguistic insights: segmenting speech audio into a stream of phonetic-groups is sufficient for viseme construction; speech styles like mumbling or shouting are strongly co-related to the motion of facial landmarks; and animator style is encoded in viseme motion curve profiles. Our contribution is an automatic real-time lip-synchronization from audio solution that integrates seamlessly into existing animation pipelines. We evaluate our results by: cross-validation to ground-truth data; animator critique and edits; visual comparison to recent deep-learning lip-synchronization solutions; and showing our approach to be resilient to diversity in speaker and language.
We present a deep learning-based technique to infer high-quality facial reflectance and geometry given a single unconstrained image of the subject, which may contain partial occlusions and arbitrary illumination conditions. The reconstructed high-resolution textures, which are generated in only a few seconds, include high-resolution skin surface reflectance maps, representing both the diffuse and specular albedo, and medium- and high-frequency displacement maps, thereby allowing us to render compelling digital avatars under novel lighting conditions. To extract this data, we train our deep neural networks with a high-quality skin reflectance and geometry database created with a state-of-the-art multi-view photometric stereo system using polarized gradient illumination. Given the raw facial texture map extracted from the input image, our neural networks synthesize complete reflectance and displacement maps, as well as complete missing regions caused by occlusions. The completed textures exhibit consistent quality throughout the face due to our network architecture, which propagates texture features from the visible region, resulting in high-fidelity details that are consistent with those seen in visible regions. We describe how this highly underconstrained problem is made tractable by dividing the full inference into smaller tasks, which are addressed by dedicated neural networks. We demonstrate the effectiveness of our network design with robust texture completion from images of faces that are largely occluded. With the inferred reflectance and geometry data, we demonstrate the rendering of high-fidelity 3D avatars from a variety of subjects captured under different lighting conditions. In addition, we perform evaluations demonstrating that our method can infer plausible facial reflectance and geometric details comparable to those obtained from high-end capture devices, and outperform alternative approaches that require only a single unconstrained input image.
We present a novel approach that enables photo-realistic re-animation of portrait videos using only an input video. In contrast to existing approaches that are restricted to manipulations of facial expressions only, we are the first to transfer the full 3D head position, head rotation, face expression, eye gaze, and eye blinking from a source actor to a portrait video of a target actor. The core of our approach is a generative neural network with a novel space-time architecture. The network takes as input synthetic renderings of a parametric face model, based on which it predicts photo-realistic video frames for a given target actor. The realism in this rendering-to-video transfer is achieved by careful adversarial training, and as a result, we can create modified target videos that mimic the behavior of the synthetically-created input. In order to enable source-to-target video re-animation, we render a synthetic target video with the reconstructed head animation parameters from a source video, and feed it into the trained network - thus taking full control of the target. With the ability to freely recombine source and target parameters, we are able to demonstrate a large variety of video rewrite applications without explicitly modeling hair, body or background. For instance, we can reenact the full head using interactive user-controlled editing, and realize high-fidelity visual dubbing. To demonstrate the high quality of our output, we conduct an extensive series of experiments and evaluations, where for instance a user study shows that our video edits are hard to detect.
We propose HeadOn, the first real-time source-to-target reenactment approach for complete human portrait videos that enables transfer of torso and head motion, face expression, and eye gaze. Given a short RGB-D video of the target actor, we automatically construct a personalized geometry proxy that embeds a parametric head, eye, and kinematic torso model. A novel realtime reenactment algorithm employs this proxy to photo-realistically map the captured motion from the source actor to the target actor. On top of the coarse geometric proxy, we propose a video-based rendering technique that composites the modified target portrait video via view- and pose-dependent texturing, and creates photo-realistic imagery of the target actor under novel torso and head poses, facial expressions, and gaze directions. To this end, we propose a robust tracking of the face and torso of the source actor. We extensively evaluate our approach and show significant improvements in enabling much greater flexibility in creating realistic reenacted output videos.
Raw optical motion capture data often includes errors such as occluded markers, mislabeled markers, and high frequency noise or jitter. Typically these errors must be fixed by hand - an extremely time-consuming and tedious task. Due to this, there is a large demand for tools or techniques which can alleviate this burden. In this research we present a tool that sidesteps this problem, and produces joint transforms directly from raw marker data (a task commonly called "solving") in a way that is extremely robust to errors in the input data using the machine learning technique of denoising. Starting with a set of marker configurations, and a large database of skeletal motion data such as the CMU motion capture database [CMU 2013b], we synthetically reconstruct marker locations using linear blend skinning and apply a unique noise function for corrupting this marker data - randomly removing and shifting markers to dynamically produce billions of examples of poses with errors similar to those found in real motion capture data. We then train a deep denoising feed-forward neural network to learn a mapping from this corrupted marker data to the corresponding transforms of the joints. Once trained, our neural network can be used as a replacement for the solving part of the motion capture pipeline, and, as it is very robust to errors, it completely removes the need for any manual clean-up of data. Our system is accurate enough to be used in production, generally achieving precision to within a few millimeters, while additionally being extremely fast to compute with low memory requirements.
Optical marker-based motion capture is the dominant way for obtaining high-fidelity human body animation for special effects, movies, and video games. However, motion capture has seen limited application to the human hand due to the difficulty of automatically identifying (or labeling) identical markers on self-similar fingers. We propose a technique that frames the labeling problem as a keypoint regression problem conducive to a solution using convolutional neural networks. We demonstrate robustness of our labeling solution to occlusion, ghost markers, hand shape, and even motions involving two hands or handheld objects. Our technique is equally applicable to sparse or dense marker sets and can run in real-time to support interaction prototyping with high-fidelity hand tracking and hand presence in virtual reality.
We present a new example-based approach for synthesizing hand-colored cartoon animations. Our method produces results that preserve the specific visual appearance and stylized motion of manually authored animations without requiring artists to draw every frame from scratch. In our framework, the artist first stylizes a limited set of known source skeletal animations from which we extract a style-aware puppet that encodes the appearance and motion characteristics of the artwork. Given a new target skeletal motion, our method automatically transfers the style from the source examples to create a hand-colored target animation. Compared to previous work, our technique is the first to preserve both the detailed visual appearance and stylized motion of the original hand-drawn content. Our approach has numerous practical applications including traditional animation production and content creation for games.