Are Point Clouds Missing the Point?

Posted by Christi Kroll.

I have seen some beautiful 3D data in my day. Reality capture technology like drones, self-driving cars, and laser scanners have matured and proliferated incredibly in the past decade. My new iPhone can even spit out an accurate 3D model of anything I point it at - live!

However, reality capture's most popular formats -- meshes and point clouds -- are completely missing the point.

Point Cloud of San Francisco waterfront

These digital copies of the real world display actual places and unique objects that we humans can recognize immediately, like the above point cloud of the San Francisco waterfront. It's all there: the pointy silhouette of the Transamerica Pyramid; the historic Ferry Building; and the offices of Autodesk, Inc. Iconic buildings and memorable roads. It IS San Francisco, right?

Wrong! In fact, these point cloud data are so fundamentally disorganized that they are distinctly NOT MAPS: because they are not symbolic, or diagrammatic.

Map Definition

Source: Google.

If you are an experienced reality capturer, you already know this. End users are usually in search of a specific analytic or a CAD model; rarely the point cloud or mesh itself. For all the gigabytes of precision data collected by my great Skycatch customers in the construction industry, the ultimate deliverable is often a plain PDF. Point clouds keep missing the point!

What are the fundamental shortcomings of reality capture data, and how can we work differently?

The point cloud data type is wonderfully simple. If colorized, the format is just:


It's a three dimensional raster. Points with colors, when grouped in the millions, create the red roofs, green trees, and the rest of the urban jungle seen in this image. However, the fact that two adjacent green points are part of the same tree is contained nowhere in the data. Only we humans see distinct objects in the 3D scans. Critically, reality capture data is not machine readable.

Even more, capturing, saving, and displaying at the right resolution is a challenge: it's difficult to establish a spatial resolution that looks good at more than one dimensional order of magnitude. This is why Google Earth data looks incredible at the city scale, but terrifying at human scale -- like a melted, waxy, reality.

Google Earth zoomed in

Image: Google Earth

The solution to this is close at hand, where the original 3D data experts are: in video gaming. What if we could build the Earth like a video game designer would? Or a CAD designer?

Well, we could start with a library of sprites and primitives: objects and basic shapes to which explicit characteristics are assigned. Instead of the jumbled point cloud, or millions of mesh triangles, a tree could look like this:

Code Tree

With the right primitives, and the capability to derive their characteristics from massive amounts of reality capture data*, you can begin to derive a true digital twin: a representation of the real world that embodies much more than just its visual characteristics; and makes it machine-readable. My collaborators at Geopipe are doing exactly this, focused on optimizing this problem at the human scale and the city scale. It is borne out in the Geopipe mission statement: Creating the Authoritative, Whole-Earth Digital Twin.

* this is extremely hard; indeed, it is the kernel of the problem itself.

What will it take to make reality capture data smarter?

Solutions to this growing problem are being addressed largely along industrial verticals. Self-driving car companies have extensive, safety-critical programs based on 2D & 3D sensor fusion techniques as well as billions of dollars in support. Companies like Avvir and ClearEdge3D are targeting the $20 trillion construction industry. But in comparison to the explosion in reality capture methods, like Ouster’s recent IPO or the aforementioned iPhone 12 LiDAR, fusing symbolic and 3D data is in its infancy. A Google search for “3D segmentation” turns up mostly research papers -- not impactful off-the-shelf applications.

Geopipe's segmentation algorithm in an urban setting

Image: Geopipe's segmentation algorithm in an urban setting

Building a next generation 3D data stream requires fulfilling the needs of two quite different consumers: the human and the machine. Porting an entire city into an immersive VR experience means it has to look as the human would expect it; and, it has to be tagged with relevant characteristics, so computers can simulate its interaction with users, and the world.

Geopipe is leveraging massive public datasets and novel machine learning methods to advance this task, and has built some of the most comprehensive urban digital twins in existence. AEC customers like Gilbane appreciate Geopipe’s accurate and detailed Worlds. Others, focused on creating and selling machine learning training datasets, consume exclusively a metadata stream: no traditional 3D data at all.

That we can serve both customer bases simultaneously is an early sign of traction for this idea. We are helping to move customers away from fuzzy point clouds and lumpy meshes, to explicitly-defined 3D worlds with embedded metadata. This substantially reduces file sizes, makes simulations smarter, and enables human-scale interaction unlike any traditional data type. As we move into the future of reality capture and build the mirrorworld, I encourage my colleagues to demand more from their point clouds. They are a temporary, necessary evil -- just a means to a more structured end.

William Pryor is the Vice President of Business Development at Geopipe, and an Enterprise Applications Manager at Skycatch. He invites similarly minded 3D technologists to reach out and connect.