Hacker News new | past | comments | ask | show | jobs | submit login
The Camera Is the Lidar (medium.com/angus.pacala)
200 points by nthuser on Sept 2, 2018 | hide | past | favorite | 73 comments



The hardware is described, vaguely, at [1]. It's a rotating drum scanner with 16 or 64 lasers. $12,000 for the 64-laser model. $24,000 for the upcoming model with 200m range. Still a long way from a useful auto part.

There are about a dozen companies in this space now. Nobody has the price down yet. Continental, the European auto parts company, is the most likely winner. Quanergy made a lot of noise but didn't ship much.[2] There's a conference on automotive LIDAR this month in Detroit.[3] Many of the exhibitors are major semiconductor packaging companies, with various approaches to putting lots of little LIDAR units in a convenient package at a reasonable price.

[1] https://www.ouster.io/faq/ [2] https://news.ycombinator.com/item?id=17755183 [3] http://www.automotivelidar.com/


Why Continental?

Flash lidar seems to be a fundamentally broken concept to me wrt range.


One might think that, except that Advanced Scientific Concepts, which Continental bought, has had it working for a decade.[1] Their units work fine, but are expensive. They're mostly sold to DoD and used for space applications. The Space-X Dragon spacecraft uses one for docking.

There's a tradeoff between field of view and range. Automotive systems will probably include a long-range narrow field of view unit and a shorter range wide field of view unit.

Flash LIDAR has some advantages. No moving parts. Can be fabbed by semiconductor processes. The one big laser is separate from the sensor array, which helps with cooling. Also, you can spread the outgoing beam, which helps with eye safety. (Eye safety involves how much energy is in an eye iris sized, 1/4" or so, cross section of the beam. If the beam is spread out, energy density is lower.)

[1] https://ieeexplore.ieee.org/document/7268968/


ASC is a textbook example of what not to do as a company. I honestly think their technology will be better implemented by someone else in a few years, which is tragic given how much of a head start they had. Eh.


What do you see as the key challenges in driving down the cost of flash LIDAR?


The light from the flash spreads out with the inverse square law, so you get much less signal compared with collimated laser beams. To compensate for that, you need much higher power. To get more power without blinding people, you need to use 1550 nm. This requires large arrays of exotic gallium arsenide semiconductors, which are expensive.


Also note that in space there's way less background noise.


This made me curious, how wide is Earth's shadow the the height the space station is at?


I doubt it'd be significantly smaller than the diameter of the earth already. Can't think of how to set it up but the the ISS is only ~250mi above the earth so my guess would be only a couple hundred miles shorter than the earth's diameter at most.


Almost exactly as wide as the Earth.


What's the power output for the unit in the paper?


14-16 W

Velodyne's 64-channel is 60 W


Why is this better than separate LIDAR and camera?

Because you're collecting NIR ambient light, your optics are wideband. Meaning that daylight would have a more pronounced negative effect on system range (easier to saturate the photocells). It's also low resolution (as most LIDARs are), and there is no color segmentation data.

In an automotive application, I can't see a justification to unify both visual and LIDAR into a single sensor, rather than having an extrinsically calibrated array of sensors. You can improve the calibration out of the data over time if you're very concerned about system stability.

It seems like a nice party trick, but the vehicle LIDAR game focuses on solid state long range units, as this will be what gets into mass production. The visual band imagers in the car are a given for many other reasons anyway.


First, the optics are not wideband. It is only collecting narrow band NIR light. Saturation is not a problem. The CEO of Ouster explains this in a reddit comment [0]:

> We are not sacrificing lidar performance by adding ambient imaging functionality. The lidar subsystem has a short integration time that avoids saturation, and if anything our approach outperforms other lidars.

> As proof, the example videos linked to in the article show raw unedited point cloud data with the lidar operating in extremely sunny environments with plenty of specular reflectors. You can see lens flare in the ambient imagery as any camera would exhibit, but the lidar signal and range data are unaffected. In addition, if you point a velodyne directly at the sun its false positive rate increases significantly while our sensor's FPR does not. No lidar will return the distance to the sun so the only thing that matters is FPR in this scenario.

> We've independently verified the OS-1's range performance with customers under all levels of solar exposure and I guarantee you can't get a smaller, cheaper lidar with even close to this combination of resolution and performance. If you have any doubts, download the raw pcap files from our github page and play them back yourself. We stand behind our data, our pricing, and our spec!

Second, even if you do plan on adding extra cameras, the extrinsic calibration between camera and lidar may become easier if you have good quality ambient light measurement from the lidar. For example Jesse Levinson, cofounder of Zoox, computes extrinsic calibration between camera and Velodyne lidar by assuming that depth discontinuities are correlated with visual features [1]. But obviously the correlation between 850 nm images and visible light images would be way better.

[0] https://www.reddit.com/r/SelfDrivingCars/comments/9c60pe/the...

[1] http://www.roboticsproceedings.org/rss09/p29.pdf


>> First, the optics are not wideband. It is only collecting narrow band NIR light. Saturation is not a problem. The CEO of Ouster explains this in a reddit comment [0]:

They are probably wider band than would be required to read only the sensor self illumination.

>> Second, even if you do plan on adding extra cameras, the extrinsic calibration between camera and lidar may become easier if you have good quality ambient light measurement from the lidar. For example Jesse Levinson, cofounder of Zoox, computes extrinsic calibration between camera and Velodyne lidar by assuming that depth discontinuities are correlated with visual features [1]. But obviously the correlation between 850 nm images and visible light images would be way better.

I agree with that - but you could probably go the other way around and coorelate the LIDAR depth map with depth obtained through stereo imaging. The temporal synchronization between NIR and depth provided by this unit is nice though.

Let me phrase this differently - while the videos are cool to watch, I don't think calibration is the problem in vehicles, and nor are baseline artifacts between sensors when operating at such far ranges (whether your camera and LIDAR are perfectly aligned or translated 10cm apart, it won't matter much looking 10m down the road).

Having moving parts, however, won't get this system into a production model.


It's better because there's no need for calibration, you always have perfect calibration.

Solid state lidar has issues. The cofounder of Ouster, Angus Pacala previously cofounded Quanergy, a solid state lidar startup.


Solid state LIDAR certainly has issues - but someone is going to solve those and this is what will get into automotive, definitely not $10k units with moving parts.

There was an announcement on a cooperation between BMW and Innoviz (an Israeli maker of solid state LIDARs) with Magna being their OEM sponsor.

I'm not sure calibration is that big of a deal for this application. Sensors are going to be calibrated and tested in the factory or at a module level regardless, and the accuracy requirements in automotive are much lower than consumer products using similar technology.

You can't overcome not having colors (traffic lights, anyone?), limited ranging distance or sensor saturation due to ambient conditions.


Agreed, spent time last year on a project fusing lidar and rotating LWIR (thermal) with some smart people and calibration took significant effort, mechanically, in electronic timing, and in algorithmic fusing. This looks like a nice step forward.


I'm sure this works well in bright light but I'm sceptical that this can perform at all well on overcast days or at night. The OS-1 device spins at 10Hz and the LIDAR samples 2048 points over one 360 degree revolution. This means each column of pixels is sampled at 1/20480 of a second. To sample that fast requires a lot of light which is fine on a sunny day but on a cloudy day you can have 100 time less light. And at night you would have no ambient near infrared light at all.


Could you have a camera without the IR filter and then use IR leds to make up the difference? You would have to worry about other vehicles doing to same and blinding the cameras somehow but that might be a solution.


The problem is that deep learning should not be allowed in safety critical systems, because (1) the accuracy is always less than 100% even in known test situations, and (2) we don't know how it works and under what conditions it breaks down.


I think it should be allowed but the tests it should pass must be far more strenuous than for traditional software. I'm happy with failure rates of around 1 catastrophic failure every million hours. Even humans sometimes fail catastrophically and black out at the wheel for no detectable reason.

That level of testing is well beyond what today's software and hardware is capable of. Waymo has to override their cars (disengagement) approximately every 6000 miles [1] which equates to about 200 hours of driving. To reach a confidence level of 1 in a million hours you would need a test fleet of a thousand vehicles operating for a whole year without any incident occurring that requires human intervention. The costs for such testing would run into the hundreds of millions of dollars which makes me feel like only the largest corporations in the world could develop this technology.

[1] https://www.dmv.ca.gov/portal/dmv/detail/vr/autonomous/disen...


And after certification, what if the company wants to push a quick update to all of its cars every now and then, through a remote update? Would that be allowed? How would we even know that it happens?


> And after certification, what if the company wants to push a quick update to all of its cars every now and then, through a remote update?

Late and a bit rough but here is my idea:

Install redundant self driving units in at least a good number number of the first few thousand cars in each generation.

When planning a release, push to the redundant unit in the cars already running in "production".

Use only primary unit as input to car as usual, but log the diffs between the new version and old version in the same way they now log driver intervention.

I think there is a number of issues this won't catch, off the top of my head what if the new self driving unit attempts to turn slightly faster on slippery road etc.

But it should be able to collect up realistic feedback really fast i.e. in a few months (crazy slow for modern application developers like me but more than fast enough for anything that should be allowed to drive unsupervised I guess :-)


I am fascinated by the testing methodology you described (current version and proposed upgrade run in parallel in production). I hope this isn't a dumb question but is there a name for that style of testing and do their exist categories of systems for which it is commonly used?


> I hope this isn't a dumb question

Definitely not : )

> but is there a name for that style of testing and do their exist categories of systems for which it is commonly used

I don't know but I guess parallell deployment, production testing (hehe) or something.

I guess I learned about it here on HN but I've heard similar approaches elsewhere.

Three concrete systems I can think of that has been tested in similar but not identical ways:

- trajectory calculation systems for some space probe (I forgot the details) where two separate vendors where tasked with writing software and their software where then run in parallel in a simulation of possible trajectories to root out any bugs. Mentioned as an example to point out an extreme variant of testing, probably by someone here on HN or in something linked to from HN.

- a vendor running testing of bag sorting equipment on an airport. Probably told me from a colleague of mine at the time who knew them. AFAIK they'd rerun batches and verify the outputs, making sure the new system produced similar or better results.

- an API testing service, probably mentioned here on HN as well, tjat worked by firing three similar requests for each resource and method: two to a deployment of the old system and one to the new. The two that were fired against the old would be used to find parts of the response that changed from request to request (time stamps etc) and the the rest of the response would be used as a template to verify the response from the new system.

more generally capturing and reusing production input or running two sets of services in parallel in a data center, - or across datacenters: the existing system as usual and the system under test with only input data.


Just make update transparency part of the initial certification.

Of course they might cheat, but who really knows if their car has an airbag where it is supposed to.


> Just make update transparency part of the initial certification.

Somehow I doubt the authorities have the foresight.


Even when they do, corporations will attempt to subvert process and policy "interlocks". See: Industry wide emissions cheating scandal.


They do that sort of thing occasionally, but on the whole, over time, cars have improved with regard to emissions and safety, mostly due to regulation.


Perhaps, but it's going to be difficult to justify that logic to a jury in a wrongful death civil lawsuit.


I was in a car crash two years ago where a man went into a diabetic fit/seizure and sped through an intersection, ultimately hitting a building and my car and killing himself in the process. It is too bad his car did not have some of this deep learning that is not 100% accurate.

We don't know how humans work and under what conditions they break down, either.


I'm not in the market for a new car, but from what I read: There is something called "city safety" by Volvo, and I know that Mercedes has the similar tech (a friend learned that by not being run over by a distracted driver). So there are already technologies to prevent (or at least reduce the severity) of what happened to you (assuming he was below a certain speed threshold).

In constrast to the whole self driving stuff this DL is popular for: User input overrides DL input.


There is no evidence that deep learning would give better performance than other collision avoidance algorithms in such a scenario.


But that doesn't mean we shouldn't try. I was agreeing with GP (u/amelius) because I had the same idea when reading the post, but the parent of your comment (u/simonsarris) makes a good point: we might not know deep learning as well as we might like to know it, given that it is being used in applications that have the potential to kill, but we also don't know our own brains that well.

Even if we don't understand deep learning to the degree that we would like, we can observe its safety record and compare it to humans'.


There's a difference between active and passive use of self-driving with the current technology.

Passive self-driving systems that take over when the human gets distracted/unwell are great because human vision exceeds computers where as computers are always alert. This would capture the case you describe, I think it would also have a massive improvement for when bus/lorry drivers should collapse at the wheel (Elon Musk used this as a valid use case for Tesla auto-pilot in the Tesla Semi unveiling).

However active self-driving systems (e.g. Tesla's auto-pilot) are currently worse because they rely on computer vision and humans to be always alert.


Every self driving car company is using neural networks. If neural nets are going to be used anyway, I'd rather have centimeter level range information per pixel than not.

By the way, although the article focuses on deep learning, there are many applications that don't involve deep learning. For example, although you can run the deep neural network based SuperPoint on the intensity data, you can also run any classical feature extraction algorithm such as SIFT, SURF, ORB, BRISK, FAST, AGAST, etc. Doing so provides an elegant solution to the problem of localizing in a geometrically sparse but visually rich environment, such as a smooth but well-illuminated tunnel.


The status quo is neural nets already are allowed in safety critical systems - humans.

And with the amount of drivers snapchatting behind the wheel, I'd rather take my chances with a self-driving car.


> neural nets already are allowed in safety critical systems - humans.

I’d really like to see a demonstration that human behavior is just neural nets. Sadly, I think that’s still an open question.


Nothing you do will prevent a snapchatting driver from crushing into you.At that point I am sure you will observe with interest how a neural network deals with the situation.


You do realize that traditional computer vision doesn't work 100% of the time, right?

What would you rather use to interpret sensor data?


Somewhat related anegdote: in France there one of the option for toll roads is a device that handles a payment for you and opens the barrier as the car approaches (it works up to 30 km/h).

A friend of mine was doing that in a new car with some pedestrian detection system that decided to detect the barrier as a human and slam the brakes to the complete stop. From what I've heard it was not exactly pleasant.


So n=1 and it's not even the same technology.


Delaying the roll out of algorithms if they achieve superhuman performance in avoiding accidents might not be the moral high ground...


A semi-off topic question. Is it not possible to get an accurate depth map based on a two camera stereoscopic setup? Like human eyes? Perhaps combine it with video processing to isolate objects at different depths.


I was watching a talk from Cruise that mentions this. The main problem with cameras is dynamic range. Dealing with different lighting conditions that can change quickly is hard (the sun is really good at washing out colors). Lidar doesn't care about the current lighting conditions.

https://youtu.be/s-8cYj_eh8E?t=22m39s


Also heavy rain would be a problem for regular cameras. Not just seeing through the airborne droplets, but also (at a guess far more significantly) the water directly in contact with the windscreen causing severe random distortions.


I built a hacky prototype, combining:

- FLIR thermal camera

- 3 different small cameras manually set at different settings, models chosen for their qualities handling light levels.

Those 4 live feeds were fed into a small black magic design quad layout device, that turned them into a single hd feed via hardware/real-time. That was fed into a hardware capture, that stacked the quad arrangement, applied some other filters and did hardware compression. At that point almost no latency was introduced but had a nice working base video feed. That was fed into the Linux box for processing.

The quad device created a sort of super hdr video, and the thermal layer took it to the next level. All of the cameras had drawbacks, but combined they were minimized.


but heavy rain and also snow are also problems for lidar.


the images need texture and edge detail to pick up on depth. Any portion of an image that is a solid colour would need to be interpolated in some way, possibly inaccurately. Occlusion is also a bit of a problem resulting in gaps in the depth map that need to be filled somehow.

here's an old comparison of algorithms. I imagine the state of the art has improved with Deep Neural Nets recently.

http://vision.middlebury.edu/stereo/eval3/

edit: surprise! the page appears to be kept up to-date with new algorithms and recent techniques, and indeed the top performer is from 2018.


It is and works great, see Subaru


This is so brilliant, I'm considering completely switching the type of software/hardware work I've been doing for the last decade.


The blog post is very badly written to figure out what really these folks are doing and what sets them apart. Here’s my guess: they are detecting laser bounces using modified camera and using that camera to generate visual image at the same time as point cloud. It’s still a mechanical moving lidar at 10hz, range is 120m and resolution of 2048 horizontal beams per 360-degree and 16 beams verticle looks pretty good, although not completely out of the league.


Something i don't understand : i've seen those kind of videos (i think it was nvidia or maybe waymo) where a signal (either camera or lidar or both) is processed in real-time and displays boxes around cars, street lights, etc. It seemed to be that detouring real-life objects from sensors in real-time has been working for a long time now.

What does this bring that's new ?


Estimating depth and 3D bounding boxes using cameras only, while possible, is much less accurate than using actual range data from a lidar.

On the other hand if you fuse lidar and camera data such as with Waymo and others, there may be issues with the sensors being out of sync (as they run at different framerates, and the lidar continually spins) or physically offset (leading to parallax issues). Dealing with such issues is very difficult. Having a single sensor output both accurate range information and camera data makes it much nicer to work with.


Relevant discussion with some comments from Angus on reddit: https://www.reddit.com/r/SelfDrivingCars/comments/9c60pe/the...


So I'm not up on the Lidar industry but $12k for a sensor seems really expensive, but then again from my casual observations Lidar is really expensive. Is there a physical / first principles reason this is true or is it just really new technology?


The most general answer to your question is Peter Drucker's (famous management guru) observation that every doubling of production of an item (over the lifetime, not yearly) resulted in a cost reduction of 20-30%.

So right now, with very few LIDARs produced, we have a high price, which will start dropping as more are produced.

You might find this interesting: a single transistor used to sell for roughly the equivalent of $8 USD in today's money; today the cheapest ones are 6 Cents USD (price checked today from Mouser.com) in qty 1 pricing...

https://spectrum.ieee.org/tech-talk/semiconductors/devices/h...


At quantity 1, though, most of the cost is from the person who has to package it. Qty 100 will give you a much more accurate price.


You can buy 9.6 billion transistors for AU$599 (US$430) in the form of an AMD Threadripper 12 Core 1920X.

That's AU$0.000,000,062,395,833 (US$0.000,000,044,791,667) per transistor;


Each unit has 16 or 64 little laser rangefinders in it. They all have to have uniform response (which, from their pictures, Ouster hasn't achieved yet) and be lined up properly. Eventually somebody will develop high volume ways to do that, but in the prototype stage they're probably hand assembled.

Image orthicons for TV cameras once cost $10,000 each, and color cameras needed three of them, plus another $50K or so of electronics to drive them. Today a cell phone camera costs about $10.


I'm not either, but I think there's no reason for them not to charge a lot of money. I expect goal is to advance their tech as rapidly as possible while claiming mindshare. The people buying this stuff are generally swimming in investor money, perceived value is often related to price, and it's much easier to lower prices than raise them. So as long as they're selling enough units to get useful real-world feedback that supports their development, in their shoes I'd basically gouge people.


Presumably the price is currently dominated by the effects of low volume production. The market is r&d for new products. When those start going to mass production, then this part can also go to mass production, bringing the price down.


Equivalent Velodyne lidar (the ones commonly used) cost 64K each.

And no, it's my understanding that it's just a matter of volume.


If that were true, they would already be selling a zillion of them.


I get really excited about this technology, yet the not-made-here-syndrome force is strong in this one. I wonder if theres an EU equivalent?


There are many EU lidar companies. SICK, Ibeo, Pepperl+Fuchs, Osram, Innoluce, Blickfeld, and so on.

However, none of those matches the capabilities of the Ouster OS-1 exactly.


Could hundreds of self driving cars cruising around a city with LIDAR affect the eyes of pedestrians?


It would not affect the eyes of pedestrians.

The Ouster OS-1 in the article, as well as all other automotive lidars that I know of, are class 1 laser eye-safe, meaning that it is safe even if you put your eye right up to it for hours.

The power also decreases dramatically once you get far away from it, since the laser beams spend most of their time pointed in different directions, and the collimation is not perfect.


Is it just me, or are we seeing yet another impressive leap in Computer Vision that's soon going to be hyped as an incremental step into Skynet?


Unless you think Skynet is impossible, all impressive leaps are an incremental step towards Skynet.


Not necessarily. My expectation is that Skynet is highly unlikely, a side branch we probably won't take.

Think of the 1920s-1950s version of robots, for example. They were machines shaped like people and that acted like people. In retrospect, they seem not scary but silly. The human shape isn't particularly useful or easy to build; our most common robots are vacuums shaped like hockey pucks.

Skynet is another "what if machines acted like people" fairy tale. It makes sense if you imagine yourself as a computer that wakes up; we wake up all the time, so it seems normal to us. But self-awareness and self-preservation are biological systems that evolved over very long time scales. Those are intricate systems, again not really useful or easy to build. And also not likely to randomly occur.

It could be that we'll build those kinds of systems, of course. But I think it will take a long time to get them right, and then it's not really the skynet story, it's the mad scientist with the robot army story.


None of this technology is new. It's been done to death as terrain following guidance systems for cruise missiles, now reapplied in a civilian context.

It's technology that already exists, but must be reinvented in a non-military context from scratch, since the tech transfer between weapons systems and civilian applications is likely locked up in policy. So, we know that this technology exists, and is proven, but we have to reinvent the wheel, because reasons.

The reason we see this interminable slow motion public struggle to bring it to consumer applications, is likely because there are no controls in place that can actually prevent "contemporaneous discovery" wink, wink.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: