Bob's Adventures in Wireless and Video Headline Animator

Sunday, April 14, 2013

Getting through the hype at ISC West

It was interesting to walk the ISC West 2013 show floor and speak with vendors. There were two wild claims being made at this years show and misrepresentations of what vendors were showing that I think integrators need to be aware of.

Video Streaming Over 4G

Example of a matrix view of thumbnails on a VMS
The most outlandish claim came from one VMS vendor who was saying that their product could transmit 16 megapixel camera streams at 30fps and full resolution in 1Mbps of bandwidth over 3G/4G. This was just a flat our misrepresentation of what they were doing.

There is a huge difference between transporting 16 streams as full frame rate and full resolution and sending thumbnail views of camera streams. Let me explain.

A typical H.264 IP camera with a 1.3 megapixel resolution will compress and stream video in 25.6KB frame size. At 30fps, that is 768KBps or 6144Kbps or 6.1Mbps. This assumes some motion, etc, but is a general ballpark. By turning up compression, you can get this down to around 2.5Mbps at the cost of image quality (artifacts).

In a typical security configuration the cameras send a primary stream over a high speed local area network to the VMS server without concern for bandwidth, recording at best quality and frame rate. Sending a 2.5 Mbps, or even a 6Mbps stream, between the camera and the server is not usually an issue. However, sending from the server out to a remote client such as a smartphone over a narrow cellular link can be an issue.

Example of how HauteSpot MVE Connects
If you are using a LTE phone you may be lucky and get 8 Mbps or more down, depending on network conditions, but rarely is this sustainable over long periods. A more likely down stream speed is going to be around 1 to 2 Mbps.

If you tried to stream at full resolution, full frame rate and full image quality even one H.264 1.3 megapixel stream the video will have little chance of getting through in real time. You will need to buffer a long time (aka YouTube, Hulu, etc). Remember that most broadcast streaming is non-realtime and they use multipass encoding techniques to improve the quality and reduce the size of the streams they send.

Further, a 1080p monitor can display 1920x1080 pixels or approximately 2 megapixels at any given time. So sending any more data than this from the server to your client is really useless.

So what most VMS systems do is set a second stream from the camera to a lower resolution for live viewing. They are still recording the primary live stream at full resolution. When someone goes to the VMS from their cell phone, they first get a thumbnail of each of the cameras using the small substream and then, when you click on a camera to look at in detail, the VMS will shift from the small secondary stream to the primary full resolution stream. But even this may be scaled and transcoded on the server to fit the size of the screen of your device.

So, really what you are getting from the VMS to the smartphone client is a 2 megapixel or smaller transcoded rendering of your cameras. When you are looking at 16 cameras, you are not receiving all 16 streams at full resolution, you are receiving either a bunch of small substreams or a transcoded single stream representing all of the substreams in 2 megapixels. This may then additionally be compressed to fit into a 1 Mbps pipe. Think of it as a "image of your streams".

There are many VMS who do this and the technique has existed for years. It is a commonly known method that is well documented in the public domain.

What is missing from most vendors implementations are several critical elements which are addressed by the HauteSpot MVE system:
  • Inbound Mobile Ingest of Cameras - It is all well and good to be able to view your cameras remotely over 4G, but connecting your cameras to the VMS over cellular is a different story. This requires uplink speed, which rarely exceeds 500kbps and can be extremely variable. MVE dynamically adjusts the uplink speed to fit the available bandwidth. It is not preset to a low "live view" rate and size. If you have good bandwidth, you will get a better image.
  • Persistence - Most VMS have no ability to deal with the constant disconnects of cellular. Roaming between cell towers, changing IP addresses, and lost signal all contribute to loss of connection. MVE makes and keeps persistent connections that have been proven to work on cellular networks in the worst conditions (hurricane Irene, tropical storm Sandy, floods and hot weather).
  • Remote Record with Transfer - The HauteSpot microNVR allows remote recording at full frame rate and resolution at the camera source. It then provides a persistent connection to the MVE server and the dynamic rate adaptation which adjusts to changing available bandwidth. It also allows remote, chain of evidence transfer of the high resolution video from the remote site in batch.
There is a big difference between transmitting full resolution, full frame rate, full quality video over cellular and sending a transcoded or down-res version of video from megapixel sources. Know what you are getting.

VMS IP Video Decode Capacity Vs Record Capacity

As we walked the show we asked a simple question of a number of VMS vendors "How many 1080p 30fps live streams can you simultaneously display (assuming that they had a multiple monitor video card on their system)?" The answers ranged from 4 to 64. And the answers were coming from field application engineers, not sales people.

A VMS provides a number functions including recording video streams to disk, displaying live and recorded streams, providing search capabilities, etc. The answers that we got indicated that the vendors really did not understand the difference between storing video and decoding video for display.

Again, IP VMS systems receive incoming video streams from cameras. These are typically either H.264 or MJPEG streams that have a fixed frame rate, resolution and compression profile as set on the camera. Most VMS systems will receive the video stream and write it to disk. This function does not require any significant CPU or GPU functions, since it is really just network IO. The system is limited only by the bandwidth of the network as to how many streams it can support. Disk IO is generally faster than network IO.

Tom's Hardware Benchmark for H.264 encoding
Displaying a live or recorded stream does require CPU and/or GPU resources. A stream must be read and decoded for display. This is significant work for a computer.

The chart on the left shows just how much time it takes to encode 1080i video using hardware and software. This is an example of when you are reading a raw source and sending it to a client to view. Performance is similar for decoding. As you can see, using hardware such as the Intel Quick Sync technology built into Ivy Bridge based systems can significantly improve performance. Without Quick Sync, software encoding/decoding consumes approximately 25-50% of a quad core CPU for just one 1080p 30fps stream. Using high end video cards helps some, but again, they are targeted at single or maybe dual display systems, so they really don't try to decode more than what they are capable of displaying.

If you were to use the hardware acceleration of Quick Sync, then you could get 120 fps of 1080p
video encoded or decoded for playback. So the best possible case for displaying full 1080p at 30fps is 4 simultaneous streams. But this is using hardware. If you are using software to decode/encode, as most VMS systems do, then you are really down to about 1 stream. Which makes sense given that most VMS systems only have one display attached.

So the vendors that said they are able to simultaneously decode 12 or 64 streams are clearly not correct.

If you are writing a file to disk or reading a file from disk to and from the network, then yes, you can support many cameras. If you are trying to actually display the video, then the capacity is much much lower.

It is interesting that only a couple of products like Network Optix HD Witness or HauteSpot's MVE system take advantage of Intel Quick Sync hardware acceleration in order to increase performance and are designed for high definition video processing.

This is actually a good thing for companies like RGB Spectrum who make display wall processors that allow you to aggregate multiple server video outputs into a single consolidated display. Solutions like their QuadView HDx allow you to consolidate multiple video sources into one screen, so even if you can only get a single 1080p stream out of your VMS, you can combine it with other servers to create faster, full frame rate, full resolution systems. Watch for more on high resolution display, particularly 4K, when I review NAB 2013 in my next blog.


  1. Small comment on the amount of Video stream you can decode,
    personnal test with 1920x1080 p30 H264 @6MBps
    On Sandy bridge give 8 streams with no problems ( I had not enough screens/ video ouput to show further images)

    Also encoding/ decoding H.264 is not symetrical a good encoder is much more "costy" then a decoder

  2. Glad to see we werent the only ones to notice the misrepresentation. Seems that the more things change, the more they stay the same, our industry seems to run rampant with false claims and snake oil salesmen types, and it appears to be common even within the biggest names in the industry.

  3. We stopped in at the Network Optix Booth and had a discussion/demo with their CTO who showed us frame stuttering and loss at more than 4 streams during decode. He pointed at 4 being the max . His video was by far the smoothest and he demonstrated and could explain exactly what was going on. Other VMS we looked at were clearly dropping frames at the same rates. So most of the evidence we saw supported 4 streams as about it. Need to do our own test some time I suppose.

  4. Great description Bob! I've looked at some of the claims and wondered how truthful it was. Now I fully understand! Thanks!