Technical SupportSkip Navigation
border

Conferences and Events | Online Resources | Programs | Security | Services | Shared Network | Technical Support | Training
About MOREnet | Contact Us | Search | MyMOREnet Login | Collaboration Matrix


Home » Technical Support » Research and Innovation » Audio/Video on the Network
Document Links
 
Spacer Graphic

Audio/Video on the Network

Overview

Use of the global network marches on to ever more bandwidth-intensive traffic. The original long-range networks predominately carried text messages, documents and e-mail. The use of FTP bolstered this use to comparatively large numeric research databases and static graphic files. The advent of HTTP (the World Wide Web) made the combination of text, graphics and pictures the norm. Webcams added sequential pictures. RealAudio added stored and streaming sound. Lately RealVideo, Windows Media Player, Quicktime and other applications have made stored and streaming video the bandwidth hogs of the network. Video has not totally captured the bandwidth hog title; Napster, Kazaa and other application have spurred the progress from the exchange of short sound files to full CDs. But the combination of motion pictures and audio can, without a doubt, form the largest depository of commonly exchanged information. And what peer-to-peer applications have done for music can easily be extended to DVD movies and digital video captures.

What the paper covers

This paper is written to assist MOREnet’s customers in making informed choices in how to handle audio/video traffic over the MOREnet network. The customers could be individual audio/video enthusiasts who want to get the best quality video while surfing the net, audio/video producers who want to share the best product possible or network administrators who need to know how much streaming audio/video traffic their WAN links can support. All groups predominately deal with audio/video offerings from companies like Microsoft, Apple and RealNetworks.

What the paper does not cover

This paper is not written to cover dedicated audio/video connections and specialized research networks. A market of hardware, software and standards exists for broadcast quality (TV or movie studio) video. Very few networks need to deal with streaming broadcast quality video traffic and those that do are specifically designed for the task. High definition television (HDTV) broadcast quality delivers 1920 by 1080 pixels of video and 60 frames per second, including seven channels of synchronized audio. Some streaming HDTV variations are certainly possible over gigabit Ethernet in a home or dedicated LAN but not on the shared WAN links that make up MOREnet. Streaming HDTV has been tested on the I2 network at a sustained rate of over 200 Mbps per channel (see http://qos.internet2.edu/houston2000/proceedings/Gray/20000209-QoS2000-Gray.pdf). As network speeds increase, broadcast quality video may be a feature of the network so the topic is mentioned in the futures section below.

This paper does not cover any of the audio/video transports involving the H.320 or H.323 standards. MOREnet has a group to assist with H.323 video conferencing for customers. Also, the emerging standards for IP phones using H.323 have previously been covered.

Back to top

Background

General history

What consumes the net

Nothing > Text > Graphics > Photo > Audio > Animation > Video > Movie

When the first two computers were networked, they communicated what computers always communicate: bits. The bits can represent anything that can be digitized (i.e., converted to bits). But on the new and slow networks human-to-human communication stayed with text because it was easy to digitize and transport. Text was transmitted in files, remote keyboard entries (telnet) and eventually worldwide e-mail systems. Varieties of text have grown over time; various markup additions have added color, font and typestyle; text description languages like Postscript and TeX have dramatically expanded what computers can do, but text is still the lowest common denominator in communication. It is efficient to transmit and can carry lots of meaning. Instant messaging has added short bursts of text to the network. Push technologies have added continuously updated text in applications like stock market tickers, but the network impact is still minimal.

But text was not enough. Humans also communicate with various types of images. The easiest image form to communicate on the growing computer networks was line drawings. Vector-based graphic languages like HPGL (Hewlett-Packard Graphics Language) and CAD (Computer Aided Design) files were some of the first graphic file exchanges. Other raster (lines of pixels) based graphic formats developed to handle simple graphics that could not be represented as lines on a plotter. The communication of line graphics has not advanced much since then. Occasionally, a new line graphic standard will be proposed for computer networking but they rarely impact the traffic on existing networks. CAD formats have grown to include 3-D modeling with rich visual textures. Raster-based graphics have advanced in leaps and bounds with increasing resolution, depth of color and compression.

As graphics became more prominent, scanners started to load the network with art and photographic images. These images were first transferred as files by the File Transfer Protocol (FTP) then later attached to e-mail and a combined text-image form called the Hypertext Markup Language (HTML) on the World Wide Web. Photography and other raster imagery may not be the majority of the visual space on webpages but they far exceed text in their impact on the network. Many people have learned to surf or sort e-mail faster by turning images off or using text-only browsers. Many webpages are now designed where the graphics are required to understand the meaning or navigate the pages. Even traditional text services like e-mail have added pictures, graphics and animation.

Audio files were exchanged and music backgrounds were added to webpages. Initially sound was captured by directly digitizing the audio input into .wav files. Later music formats such as MIDI allowed audio to be expressed as instructions to a musical instrument. Sound has advanced to multiple channels (stereo and more), higher fidelity and more complex compression types. Peer-to-peer sharing of audio files has a significant impact on modern networks. Internet radio adds lots of streaming audio to the network but relatively few people bother to tap in.

Bandwidth grew and graphics became commonplace, so some users began to use dynamic or animated graphics. A common image format called Graphical Interchange Format (gif) added the ability to change the image displayed, and another type called jpeg became motion-jpeg. Animation itself has not had a major impact on the network. Animated gif and m-jpeg files are many times larger than a simple image of the same screen size, but animation is harder to produce and difficult to use well. Webpages that overuse animation are considered jokes in many groups. But animation in limited quantity can add a visual interest to a webpage. Many pages have "rollover" animation, and more extensive animation is often used in advertisements.

Inexpensive webcams led to images that update at fixed frequencies. The digital capture of video led to video files and streaming video on the network. While the simplest digitized photography can dwarf text in the impact on storage and the network, updating that image multiplies the problem. Updates have escalated from every minute to ten seconds, one second, 1/10 of a second, 1/30 of a second and 1/60 of a second. Compression techniques can reduce the impact but can never completely mitigate the network stress.

Standards that allowed video capture and transmission also integrated audio and video. DVD readers can flood the network with movies the way that audio CD readers and peer-to-peer networks have flooded the net with music transfers. The typical audio file is listed in kilobytes; the typical video file in is listed in megabytes; the typical DVD is listed in gigabytes.

Network history shows continual advances in the speed and robustness of computer networks. It also shows that advances in bandwidth go hand-in-hand with increasing access, usage and complexity of expression. Understanding the advances in audio and video usage provides a glimpse not only of what currently stresses the network but also what is to come.

FourCCs

The growth of audio/video in computing can be roughly traced by the proliferation of the four character codes (FourCC) given in the headers of graphic files. Graphic standards have always been somewhat loose and historically a four character code was placed in the start of the graphic file to indicate what software (codec) generated the file. As more and more codecs were created information on the character codes was collected and indexed as de facto standards. For most users, the file extensions, such as .mov, .avi, .mp3 and .jpg are accepted as the file format standard, but many of the file extensions themselves incorporate multiple FourCCs within their specifications. For example an .avi file can be created by over 40 different standards using multiple compression methods and format options.

For further information, go to http://www.fourcc.org/ and http://www.am-soft.ru/fourcc.html (for .avi information).

Lossless Formats

One critical feature of the various codecs and compression standards affects network usability and file quality: whether or not data is discarded in the preparation of the file. Creation or compression methods that preserve all the original data are called "lossless." Creation or compression methods that discard some data to produce a more compressed file are called "lossy." Computer users have become used to lossless compression utilities like ZIP and RAR because loss of data during compression/archival/decompression is unacceptable for spreadsheets, databases and text files. However lossless methods are rarely used for audio/video files and graphics. File extensions of .wav, .bmp, .gif and some .jpg and .mov files are produced and compressed by lossless methods. One example of a lossless compression method is Run Length Encoding (RLE) where long strings of equivalent values are replaced by one statement of the value and an indication of how long the value is repeated. Think of compressing a voice conversation by taking out periods of silence between words and replacing them with some code that indicates that there is a pause followed by how many milliseconds the pause should be.

Lossy Formats

The number and utility of lossy creation and compression methods is a testimony to the stress that the large file sizes and the high bandwidth of audio/video has had on computers and networks. Lossy methods take away some data each and every time they are used. Below is a graph of the size of a compressed video file as it goes through many compression/decompression cycles.

(Click to enlarge.)
File size vs. repeated compressions

Here are the specifics of the compression.

(Click to enlarge.)
Video compression specifics.

The decompressed files are all the same length because they contained the same number of frames and pixels. The difference is in the richness of colors (larger "color palette" prior to compression) and image detail. Many years of extensive research and development have attempted to answer the question: what data can be discarded in audio/video processing and still produce an acceptable recording? The MPEG standards used for audio and video files have been refined many times for many purposes. The goal has been to achieve greater compression ratios (size of uncompressed file vs. the size of the compressed file) by eliminating the redundant or unchanged information between frames and deleting pixels or changes that are not normally noticed by a human observer. Eliminating the redundant information is a lossless compression technique similar to the RLE example above. Deleting information that is not normally noticed is what can reduce color richness and image detail.

Human perception is tunable and tricky to evaluate. Developments in fiber optic transmission and digital audio have underscored that the perception of silence can affect a conversation. Humans have become used to a certain amount of background noise when making a telephone call. When the first long distance fiber optic networks were in production, many callers would perceive silence as a lost connection rather than a pause in the conversation. Digital voice standards now have a "comfort noise" setting that will fill pauses with a low volume random noise ("white noise") to indicate that a connection still exists. Any lossy compression represents a trade-off between quality and file size/bandwidth.

Lossy compression or creation standards vary greatly by the intended bandwidth of the output. Image details that are noticeable on a 1024 by 1290 pixel window can be easily eliminated for a 240 by 320 pixel streaming file. Dropping the frame rate from 60 frames per second to 25 frames per second may allow viewers on slow links to get the transmission. Even compressed files can be created or compressed to specific window size, frame rate and audio quality. Video codecs list the intended bandwidth of the stream or file created.

Back to top

Current

Winners

The use of graphics and audio/video on the Internet changes with new developments and fads. The hot file types or technologies can change rapidly but here is a current unscientific guess:

  • JPG is the predominant graphic file type and it is likely that most of the files were generated by one of the lossy methods of creation.
  • MP3 is the predominant audio file type transmitted. This is hard to gauge because much of the audio file traffic is hidden.
  • RealAudio is the predominant audio streaming method.
  • MPG files are the predominate type of video, but they are a produced by a wide mixture of codecs. Taken as a group, the types that can be played by the Windows Media Player clearly predominate.
  • Streaming video is split between Windows applications and RealNetworks applications.

Impact

Bandwidth Examples

Webcams

Webcams operate at anywhere from 1 second to 10 minute refresh and generally use TCP port 80 (World Wide Web) or a high TCP port connection. The slow refresh rate and lack of sound mean minimal impact on the connection. Faster refresh rates, larger images and richer image detail will use proportionally more bandwidth. In some cases, the refresh rate is not obvious. The image may be captured from the cam at a lesser rate than the web application is serving the images over the network. Thus, the same image is sent over and over before it changes. TCP is used because timing is not critical and knowledge of the status of the receiver is. Most webcam servers try to conserve their bandwidth by dropping connections that are not acknowledged by a return TCP ack packet. Websites use software like Webcam32, Webcam 1-2-3 or INetCam to capture and publish cam images on the web.

(Click to enlarge.)
Screen of PacketPup analysis

Testing shows that slow-refresh webcams use about 13 kilobits per refresh for a typical 320x240 pixel image. The usage varies by the type of image and how well it compresses using the JPEG standard. The test webcams above were chosen for their rich detail in the image, which limits the effectiveness of the compression. With such an image it would take 1200 webcams refreshing at 10 second intervals to consume a T1 connection.

RealVideo

RealVideo streaming uses known TCP ports and high UDP ports. It is unusual, but possible, for servers and clients to use TCP port 80 or otherwise mask their traffic. Streams can be served at multiple bandwidths or kept for download, depending on the preparation of the file. The following is a chart of file sizes vs. selected bandwidth for a 100 second 320x240 pixel video.

(Click to enlarge.)
RealVideo: File size vs. bandwidth.

The video quality varied from unwatchable at 12 Kbps to good quality at 128 Kbps. For a typical dial-up user, better quality can be delivered by preparing a file for downloading rather than using a video server.

Windows media streams and files

Like RealVideo, Windows video streams and files are produced for various bandwidths. Low cost conversion utilities are available to resample a file from larger to smaller screen sizes so that reasonable quality video can be delivered for low-bandwidth users. More compression settings are available so files prepared for the Windows Media Player can vary in quality and screen size for the same bandwidth. Windows streaming is known for being difficult to detect at start up. The following two screenshots show how a protocol analyzer initially misreads the Windows stream and then correctly identifies the traffic. The difference can be seen in the lower line chart. The first screenshot is from the delivery of an .asf format file.

(Click to enlarge.)

The second screen shot is from a webpage-embedded .wmv format file.

(Click to enlarge.)

Back to top

Future

More of the same

In the future, we can expect larger quantities of the same material that is filling the networks now. Graphics, both integrated and downloadable, will be getting larger and more numerous. Gigapixel images are being generated (http://www.tawbaware.com/maxlyons/gigapixel.htm), and 360 degree panoramas are becoming common (http://www.panoramas.dk/fullscreen2/full41.html). With digital photography becoming increasingly available to consumers, the steps between photography and web publishing are much easier. More people enter the webcam community every day and there are many multiple webcam families. Webcams are now fulfilling a function previously reserved for expensive video-phone systems.

The popularity of exchanging music is proven. Legal and network administration problems may keep .mp3 file exchanges down for the short term but they will come back. It seems like every garage band has its own website including samples that have no exchange restrictions. For less than $500 a year, a band, would-be radio personality or community group can make audio tracks available internationally. For a little more they can broadcast live.

Continuing trends

Digital video cameras are becoming available and we can expect that the video trend will follow the graphics trend. Streaming digital video will start to replace webcams and chat with continuous motion and integrated sound. The underground file exchange community will transition from music to captured video and DVDs. The quality of the video will increase, possibly to the broadcast quality previously mentioned.

Combination/Integration

Websites will integrate more A/V material and A/V players will become full browsers. It is not hard to find webpages with an audio background and pages with embedded video are becoming common. RealPlayer and Windows Media Player have led the way in becoming special purpose browsers first, and then more general browsers. Here is an example from RealPlayer.

(Click to enlarge.)
RealPlayer window

Bandwidth increase

The available bandwidth on networks continues to climb as the cost of higher speed circuits declines. The total bandwidth of consumer access circuits is growing due to increasing online population, increasing usage and the movement from dial-up to DSL or cable where available.

Video storage

CNN has proven the popularity of stored and streaming video. Other mass market outlets will follow and public access stations will follow them. Many public and private video stores will be created for archival video or educational use. There are already companies that serve video files on purchase or as a pay-per-view scheme.

DVD exchange

As noted earlier DVDs represent a huge repository of high bandwidth video that will be crossing future networks. Napster-like serving of DVDs is quite possible. And there are also many legal and legitimate reasons to send gigabits of video files or streams over the network.

Back to top

Recommendations

Production

  • In producing audio/video files, archive a lossless original file. The file will be large but as available bandwidth increases, having it will allow updating the streaming media or download file. Any file splicing or editing should be done on a copy of this original file. Decompressing a lossy file to edit and compress again multiplies the amount of data discarded in the procedure.
  • Be careful to provide the legal context of the A/V file. If it is copyrighted, it should be properly marked and protected. Streaming the content does not protect it because there are multiple stream capture routines available.
  • Downloadable A/V files are generally a better option than A/V imbedded in the webpages or streaming A/V servers. Consider making the data file or audio/video stream available in several bandwidths so viewers can choose one that makes good use of their connections.
  • It is handy to list the codecs and options that you used to create A/V files to help users troubleshoot unreadable downloads.
  • Match your use of webcams with your available bandwidth. Serving webcams probably will not overload a network, but even 13 kilobytes per refresh can add up if the subject becomes popular.
  • Match your use of A/V streaming to your available bandwidth. Your streaming software should tell you what the network requirements are per stream. On the high end, www.videolan.org recommends .5 to 4 Mbps for an MPEG-4 stream and 6 to 9 Mbps for a DVD.

Consumption

  • While listening or watching audio/video information over the network keep in mind that bandwidth and hard disk space are a limited resource.
  • Audio and video file formats are complex. Computers may be able to play some .avi files but not others. To troubleshoot problems such as the inability to read an .avi file, get an analysis program like GSpot which can determine what codec the system might be missing.
  • Be careful of the legal context of A/V material. Is the material copyrighted? Are you within the fair use of the material? Finding the material on the net does not mean that you can legally keep it or even view it.
  • File downloads are generally easier on the network and provide a better quality listening or viewing experience than streaming video.

Network administration

  • In managing the bandwidth of audio/video transmissions, keep in mind that information is power. Network administrators must have some way to determine what traffic is on their networks. MOREnet provides some online tools to determine network utilization and a rough breakdown of the types of traffic. A local bandwidth analysis device or a smart switch to provide content analysis might be a good investment.
  • Control some traffic by using a well publicized acceptable use policy, firewall, caching device, bandwidth management device or a combination of these solutions. MOREnet has network quality of service information and consulting available.

Back to top

Conclusion

Networks are often cursed by their own success. The Internet is vastly more educational and useful due to the graphic and audio/video content that is distributed. But the audio/video transfers choke networks that cannot become faster and smarter. MOREnet has experienced the collision of changes in network usage and demands for network quality. Audio/video producers, consumers and network administrators all need to consider the impact on the network in reacting to demand for quality delivery of material on a growing but still limited resource.

Back to top

Acknowledgements

This paper was prepared by using commonly available tools. Some of the tools came with Microsoft XP; SnagIt Studio; Packet Pup; AVSVideoConverter; GSpot

Back to top

border
Copyright © 2002 MOREnet. All rights reserved. Reviewed February 3, 2004.
Contact strategic-tech@more.net. DMCA and other copyright information.
Site Information: Copyright, accessibility, privacy and other information about this site.
PageMinder: Receive an e-mail notice when this page updates.

Search MOREnet  Advanced Search