Multimedia Notes: 2016

Saturday, December 17, 2016

What kind of projections Youtube and Facebook use for VR?

They are both using H.264 video coding standard and MP4 container [1]. But, they have different projections:

Facebook is using the cubemap projection [2].
Youtube is using equirectangular projection [3].

References
[1] Qian, Feng, et al. "Optimizing 360 video delivery over cellular networks." Proceedings of the 5th Workshop on All Things Cellular: Operations, Applications and Challenges. ACM, 2016.
[2] Under the hood: Building 360 video. https://code.facebook.com/posts/1638767863078802.
[3] YouTube live in 360 degrees encoder settings. https://support.google.com/youtube/answer/6396222.

VR Forcast

Virtual Reality (VR) technology is projected to form a big market of $120 billion by 2020 [1].

References
[1] Augmented/Virtual Reality revenue forecast revised to hit $120 billion by 2020. http://goo.gl/Lxf4Sy.

Friday, December 16, 2016

OpenVQ

OpenVQ is an independent implementation of Perceptual Evaluation of Video Quality (PEVQ) described in ITU-T J.247 [44, Annex B]. Due to patent issues, OpenVQ has not implemented the temporal alignment impairments in PEVQ. OpenVQ is released under the terms of the GNU Affero General Public License (AGPL) version 3.

https://bitbucket.org/mpg_code/openvq

Thursday, December 15, 2016

Quality metrics for virtual reality videos

Quality Metric for Spherical Panoramic Video [1]
A Framework to Evaluate Omnidirectional Video Coding Schemes [2]
A perceptual quality metric for high-definition stereoscopic 3D video [3]
Towards reliable and reproducible 3D video quality assessment [4]

References
[1] Zakharchenko, Vladyslav, Kwang Pyo Choi, and Jeong Hoon Park. "Quality metric for spherical panoramic video." SPIE Optical Engineering+ Applications. International Society for Optics and Photonics, 2016.
[2] Yu, Matt, Haricharan Lakshman, and Bernd Girod. "A Framework to Evaluate Omnidirectional Video Coding Schemes." Mixed and Augmented Reality (ISMAR), 2015 IEEE International Symposium on. IEEE, 2015.
[3] Battisti, F., et al. "A perceptual quality metric for high-definition stereoscopic 3D video." SPIE/IS&T Electronic Imaging. International Society for Optics and Photonics, 2015.
[4] Goldmann, Lutz, and Touradj Ebrahimi. "Towards reliable and reproducible 3D video quality assessment." SPIE Defense, Security, and Sensing. International Society for Optics and Photonics, 2011.

Thursday, December 1, 2016

Quality-Bitrate Curve Fitting

It seems that PSNR-bitrate curves can be better modeled via "PSNR = a * log( b * Rate + c)" formula. For SSIM-bitrate, "SSIM = a * (Rate ^ b) + c" seems to work better.

Wednesday, November 9, 2016

How to achieve max-min fairness in discrete domain?

A feasible solution is max-min fair if it is not possible to increase the utility of one user (1) while maintaining feasibility, and (2) without reducing the utility of another user that has equal or less utility.

In discrete domains, max-min fair solution may not exist. In this case, maximal fairness was defined in [1], as an alternative.

Maximally fair solution can be achieved using a progressive filling approach: allocate each stream its lowest bitrate. Then, select and upgrade the stream with the lowest utility value to the next higher bitrate if the new total of allocated bitrates does not exceed the link capacity. Repeat the previous step.

References
[1] A. Mansy. Network and End-host support for HTTP Adaptive Video Streaming. PhD thesis, Georgia Institute of Technology, 2014.

Tuesday, October 11, 2016

Recency Effect

Higher quality in the end of a video clip leads to higher QoE.

HTTP Adaptive Streaming Standards

MPEG DASH Standard [1]
3GP DASH Standard [2]
HbbTV DASH Recommendation [3]

A comparison in terms of data description format, video codec, audio codec, format, and segment length can be found in TABLE II of [4].

References
[1] Information Technology—Dynamic Adaptive Streaming Over HTTP (DASH)—Part 1: Media Presentation Description and Segment Formats, ISO/IEC 23009-1:2012, 2012.
[2] European Telecommunications Standard Institute (ETSI). (2009). Universal Mobile Telecommunication System (UMTS); LTE; Transparent end-to-end Packet-Switched Streaming Service (PSS); Protocols and Codecs, Sophia-Antipolis Cedex, France, 3GPP TS 26.234 Version 9.1.0 Release 9.
[3] HbbTV Specification, HbbTV Association, Erlangen, Germany, 2012.
[4] Seufert, Michael, et al. "A survey on quality of experience of http adaptive streaming." IEEE Communications Surveys & Tutorials 17.1 (2015): 469-492.

Saturday, October 8, 2016

VMAF (Video Multi-Method Assessment Fusion)

VMAF (Video Multi-Method Assessment Fusion) is a perceptual quality metric developed by Netflix in collaboration with University of Southern California researchers [1].

References
[1] http://techblog.netflix.com/2016/06/toward-practical-perceptual-video.html

Friday, October 7, 2016

Internet video traffic

Global Internet video traffic accounted for 15 PB per month in 2012, which is 57% of all consumer traffic. By 2017, it is expected to reach 52 PB per month, which will then be 69% of the entire consumer Internet traffic [1].

References
[1] “Cisco visual networking index: Forecast and methodology, 2012–2017,” San Jose, CA, USA, Tech. Rep., 2013.

Tuesday, October 4, 2016

Subjective Tests

Subjective Video Quality Assessment Methods for Multimedia Applications [1]

Absolute category rating (ACR)
Absolute category rating with hidden reference (ACR-HR)
Degradation category rating (DCR)
Pair comparison method (PC)

The method of limits [2]

In [3], the authors found the JND (Just Noticeable Difference) and JUD (Just Unacceptable Difference) using this method for mixing video tiles with different resolutions.

References

[1] Subjective Video Quality Assessment Methods for Multimedia Applications, ITU-T Recommendation P.910, April. 2008.
[2] George A. Gescheider. 1997. Psychophysics: The Fundamentals. Psychology Press.
[3] Wang, Hui, Mun Choon Chan, and Wei Tsang Ooi. "Wireless multicast for zoomable video streaming." ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 12.1 (2015): 5.

How to determine the reliability of the subjects?

Before analizing the subjective test results, the reliability of the subjects can be determined using the Cronbach's alpha factor [1].

References
[1] J. L. Cronbach, “Coefficient alpha and the internal structure of tests,” Psychometrika, vol. 16, no. 3, pp. 297–334, Sep. 1951.

Depth-Image-Based Rendering (DIBR)

Depth-based image processing for 3D video rendering applications [1][2]
Virtual view synthesis method and self-evaluation metrics for free viewpoint television and 3D video [3]
DIBR based view synthesis for free-viewpoint television [4]
Free-viewpoint depth image based rendering [5]
Free-viewpoint rendering algorithm for 3D TV [6]
View generation with 3D warping using depth information for FTV [7]
View synthesis with depth information based on graph cuts for FTV [8]
Symmetric bidirectional expansion algorithm to remove artifacts for view synthesis based DIBR [9]

References

[1] T. Zarb and C. J. Debono, “Depth-based image processing for 3D video rendering applications,” in Proc. IEEE Int. Conf. Syst. Signals Image Process., May 2014, pp. 215–218.
[2] Zarb, Terence, and Carl James Debono. "Broadcasting Free-Viewpoint Television Over Long-Term Evolution Networks." IEEE Systems Journal 10.2 (2016): 773-784.
[3] K. J. Oh, S. Yea, A. Vetro, and Y. S. Ho, “Virtual view synthesis method and self-evaluation metrics for free viewpoint television and 3D video,” Int. J. Imag. Syst. Technol., vol. 20, no. 4, pp. 378–390, Dec. 2010.
[4] X. Yang et al., “DIBR based view synthesis for free-viewpoint television,”
in Proc. 3DTV Conf., May 2011, pp. 1–4.
[5] S. Zinger, L. Do, andP. H. N. de With, “Free-viewpoint depth image based rendering,” J. Vis. Commun. Image Represent., vol. 21, no. 5/6, pp. 533–541, Jul. 2010.
[6] P. H. N. de With and S. Zinger, “Free-viewpoint rendering algorithm for 3D TV,” in Proc. 2nd Int. Workshop Adv. Commun.,May 2009, pp. 19–23.
[7] Y. Mori, N. Fukushima, T. Fujii, and M. Tanimoto, “View generation with 3D warping using depth information for FTV,” in Proc. 3DTV Conf., May 2008, pp. 229–232.
[8] A. T. Tran and K. Harada, “View synthesis with depth information based on graph cuts for FTV,” in Proc. 19th Korea-Japan Joint Workshop Frontiers Comput. Vis., Feb. 2013, pp. 289–294.
[9] H. Ding, Z. Li, and R. Hu, “Symmetric bidirectional expansion algorithm to remove artifacts for view synthesis based DIBR,” in Proc. Int. Conf. Multisensor Fus. Inf. Integr. Intell. Syst., Sep. 2014, pp. 1–4.

Friday, September 30, 2016

Bit allocation between texture and depth

Bit allocation for multiview image compression using cubic synthesized view distortion model [1]
Asymmetric coding of multi-view video plus depth based 3-D video for view rendering [2]
View and rate scalable multiview image coding with depth-image-based rendering [3]
Video and depth bitrate allocation in multiview compression [4]

References

[1] V. Velisavljevic, G. Cheung, and J. Chakareski, “Bit allocation for multiview image compression using cubic synthesized view distortion model,” in Proc. IEEE Int. Conf. Multimedia Expo., Jul. 2011, pp. 1–6.

[2] F. Shao, G. Jiang, M. Yu, K. Chen, and Y. S. Ho, “Asymmetric coding of multi-view video plus depth based 3-D video for view rendering,” IEEE Trans. Multimedia, vol. 14, no. 1, pp. 157–167, Feb. 2012.

[3] V. Velisavljevic, V. Stankovic, J. Chakareski, and G. Cheung, “View and rate scalable multiview image coding with depth-image-based rendering,” in Proc. IEEE Int. Conf. Digit. Signal Process., Jul. 2011, pp. 1–8.

[4] K. Klimaszewski, K. Wegner, and M. Domanski, “Video and depth bitrate allocation in multiview compression,” in Proc. IEEE Int. Conf. Syst. Signals Image Process., May 2014, pp. 207–210.

How to code multiview depth maps?

Multiview depth map sequences can be also encoded using H.264/MVC. However, MVC is optimized for natural video sequences and does not consider the depth map characteristics. Current depth coding methods that handle depth map characteristics include:

Adaptive wavelet coding of the depth map for stereoscopic view synthesis [1]
Mesh-based depth coding for 3D video using hierarchical decomposition of depth maps [2]
Depth reconstruction filter and down/up sampling for depth coding in 3-D video [3]
The effect of depth compression on multiview rendering quality [4]

References

[1] I. Daribo, C. Tillier, and B. Pesquet-Popescu, “Adaptive wavelet coding of the depth map for stereoscopic view synthesis,” in Proc. IEEE Int. Workshop Multimedia Signal Process., Oct. 2008, pp. 413–417.

[2] S.-Y. Kim and Y.-S. Ho, “Mesh-based depth coding for 3D video using hierarchical decomposition of depth maps,” in Proc. IEEE Int. Conf. Image Process., Sep. 2007, pp. V-117–V-120.

[3] K.-J. Oh, S. Yea, A. Vetro, and Y.-S. Ho, “Depth reconstruction filter and down/up sampling for depth coding in 3-D video,” IEEE Signal Process. Lett., vol. 16, no. 9, pp. 747–750, Sep. 2009.

[4] P. Merkle et al., “The effect of depth compression on multiview rendering quality,” in Proc. 3DTV Conf., May 2008, pp. 245–248.

Tuesday, September 20, 2016

How to use an MDP to solve an optimization problem?

First, we need to define a set of states, actions, rewards and transition probabilities. From the definition of the states, we must observe that the Markov property exists (all of these states depend on their immediately previous state only). Then, we can use dynamic programming under the reinforcement learning framework [1] to find the optimal state-value function which maximizes the long term rewards.

References
[1] R. Sutton and A. Barto. Reinforcement learning: An introduction. Cambridge Univ Press, 1998.

How to model the variation of wireless channel conditions?

Finite-state Markov model can be used to describe the variation of wireless channel conditions.

Finite-state Markov channel – a useful model for radio communication channels [1]
A packet-level model for UWB channel with people shadowing process based on angular spectrum analysis [2]

References

[1] H. S. Wang and N. Moayeri. Finite-state Markov channel – a useful model for radio communication channels. IEEE Trans on Veh. Tech., 44(1):163–171, 1995.

[2] R. Zhang and L. Cai. A packet-level model for uwb channel with people shadowing process based on angular spectrum analysis. IEEE Trans. on Wireless Comm., 8(8):4048–55, Aug. 2009.

How to encode a video in a scalable manner?

The open source SVC coder (JSVM) [1] can be used to encode a video into multiple layers.
A SVC-enabled video player can be implemented using an open source SVC decoder [2].
A scalable video streaming testbed was prototyped in [3].

References
[1] J. Reichel, H. Schwarz, and M. Wien. Joint scalable video model 11 (JSVM 11). Joint Video Team, Doc. JVT- X, 2007.
[2] M. Blestel and M. Raulet. Open SVC decoder: a flexible SVC library. ACM MM ’10, pages 1463–1466, New York, NY, USA, 2010.
[3] S. Xiang. Scalable streaming. https://sites.google.com/site/svchttpstreaming/.

Friday, September 16, 2016

Adaptive Video Streaming Products

Adobe HTTP Dynamic Streaming (HDS) [1]
Apple HTTP Live Streaming (HLS) [2]
Microsoft Live Smooth Streaming [3]
Adaptive Scalable Video Streaming over HTTP [4]

For an experimental evaluation of rate-adaptation algorithms in adaptive streaming over HTTP, refer to [5].

References

[1] Adobe Systems: HTTP Dynamic Streaming (HDS), February 2015, Available: http://tiny.cc/HDS.

[2] Apple: HTTP Live Streaming (HLS), February 2015, Available: http://tiny.cc/HLS2015.

[3] Microsoft: Live Smooth Streaming, February 2015, Available: http://tiny.cc/MSSmooth.
[4] S. Xiang. Scalable streaming. https://sites.google.com/site/svchttpstreaming/.
[5] S. Akhshabi, A. C. Begen, and C. Dovrolis. An experimental evaluation of rate-adaptation algorithms in adaptive streaming over HTTP. In ACM MMSys’11, pages 157–168, New York, NY, USA, 2011.

Optimization Problem Solvers

Integer Programming

CPLEX [1]
GLPK [2]

Decentralized POMDP

Multi-Agent Decision Process (MADP) Toolbox [3]

For more information about optimization problems, refer to [4].

References

[1] CPLEX: IBM ILOG Optimizer, July 2009, Available: http://tiny.cc/CPLEX.

[2] GLPK: GNU Linear Programming Kit, June. 2012, http://tiny.cc/GLPK.

[3] F. A. Oliehoek, M. T. J. Spaan, and P. Robbel, “MultiAgent Decision Process (MADP) Toolbox 0.3,” 2014.

[4] Practical Optimization: A Gentle Introduction, http://www.sce.carleton.ca/faculty/chinneck/po.html

Monday, September 12, 2016

Streaming methods in cloud gaming systems

Video Streaming

A game attention model for efficient bit rate allocation in cloud gaming [1]

Graphics Streaming
Collaborative Rendering

High-quality mobile gaming using gpu offload. [2]

References

[1] Ahmadi, Hamed, et al. "A game attention model for efficient bit rate allocation in cloud gaming." Multimedia Systems 20.5 (2014): 485-501.
[2] E. Cuervo, A. Wolman, L. P. Cox, K. Lebeck, A. Razeen, S. Saroiu, and M. Musuvathi. Kahawai: High-quality mobile gaming using gpu offload. In Proceedings of the 13th Annual International Conference on Mobile Systems, Applications, and Services, MobiSys ’15, pages 121–135, New York, NY, USA, 2015. ACM.

Open source cloud gaming frameworks

GamingAnywhere [1]

Video Streaming

Cloud Gaming Testbed [2]

Video Streaming

Games@large [3]

Graphics Streaming

References

[1] Huang, Chun-Ying, et al. "GamingAnywhere: an open cloud gaming system." Proceedings of the 4th ACM multimedia systems conference. ACM, 2013.

[2] Ahmadi, Hamed, et al. "A game attention model for efficient bit rate allocation in cloud gaming." Multimedia Systems 20.5 (2014): 485-501.
[3] I. Nave, H. David, A. Shani, Y. Tzruya, A. Laikari, P. Eisert, and P. Fechteler. Games@large graphics streaming architecture. In 2008 IEEE International Symposium on Consumer Electronics, pages 1–4, April 2008.

Applying Visual Attention in Video Coding

A game attention model for efficient bit rate allocation in cloud gaming [1]

H.264/AVC Standard
Cloud Gaming Application
Proposed a hybrid bottom-up and top-down attention model
Assigning QPs based on attention map
Extension of [2]

Fast Mode Decision in the HEVC Video Coding Standard by Exploiting Region with Dominated Motion and Saliency Features [3]

HEVC
Deciding on modes based on attention map
Used GBVS (a bottom-up attention model)

References
[1] Ahmadi, Hamed, et al. "A game attention model for efficient bit rate allocation in cloud gaming." Multimedia Systems 20.5 (2014): 485-501.
[2] Ahmadi, Hamed, et al. "Efficient bitrate reduction using a game attention model in cloud gaming." Haptic Audio Visual Environments and Games (HAVE), 2013 IEEE International Symposium on. IEEE, 2013.
[3] Podder, Pallab Kanti, Manoranjan Paul, and Manzur Murshed. "Fast Mode Decision in the HEVC Video Coding Standard by Exploiting Region with Dominated Motion and Saliency Features." PloS one 11.3 (2016): e0150673.

How to compare two rate-distortion curves?

Rate-distortion (RD) curves are two-dimensional, therefore comparing them is not straightforward. Bjøntegaard Delta-rate (BD-rate) criterion [1] can be used to compare rate-distortion curves.

References
[1] G. Bjøntegaard. “Calculation of average PSNR differences between RD-curves”. ITU-T Video Coding Experts group document VCEG-M33, April. 2001.

Friday, September 9, 2016

VQM: Video Quality Metric

VQM [1] is an objective video quality metric which has a good correlation with human perception. The VQM value is number between 0 and 1. A lower VQM value indicates a better video quality.

References
[1] Pinson, Margaret H., and Stephen Wolf. "A new standardized method for objectively measuring video quality." IEEE Transactions on broadcasting50.3 (2004): 312-322.

Encoding Settings For Streaming Vancouver Olympics

Level	Bitrate (kbps)	Resolution	Frame rate
1	400	312x176	15
2	600	400x224	15
3	900	512x288	15
4	950	544x304	15
5	1250	640x360	25
6	1600	736x416	25
7	1950	848x480	25
8	3450	1280x720	30

References
[1] Jan Ozer, “Adaptive Streaming in the Field”, in Streaming Media Magazine, 2011.
[2] Liu, Yao, et al. "User experience modeling for DASH video." 2013 20th International Packet Video Workshop. IEEE, 2013.

Thursday, September 8, 2016

When to use Proportional Fairness?

Proportional fairness performs well when all users have the same utility function [1].

References
[1] Mu, Mu, et al. "User-level fairness delivered: network resource allocation for adaptive video streaming." 2015 IEEE 23rd International Symposium on Quality of Service (IWQoS). IEEE, 2015.

Wednesday, September 7, 2016

What affects the quality of adaptive video streams?

Quality Switches

Amplitude
Frequency

Stallings

Duration
Frequency

Initial Startup Delay

There are some surveys on the quality of experience for adaptive video streaming:

Quality of Experience and HTTP adaptive streaming: A review of subjective studies [1]
A Survey on Quality of Experience of HTTP Adaptive Streaming [2]
User experience modeling for DASH video [3]

QoE is affected by spatial and temporal quality of the video stream. Initial delay, total stall duration, and number of stalls affect the temporal quality. Average video quality, number of switches, and average switch magnitude affect the spatial quality. In [3], the effect of each of these factors has been quantified.

Some important notes:

Gradual multiple variations are preferred over abrupt variations.
Constant quality is usually preferred to varying quality.
In general, providing a bitrate as high as possible does not necessarily lead to the highest QoE.
All is end that ends well: the end quality of the video has a definite impact on the perceived quality.
The effect of spatial and temporal switching varies depending on the content type.
A single long stalling is preferred over multiple short freezes.
Regular freezes are preferred over irregular freezes.
Tolerable startup delay is dependent on the type of the application.
Users prefer to wait longer if they can get less video stalling.

Metrics

Number of Quality Changes (NoC) [5]
Number of Interruptions (NoI) [5]
Percentage of Interruptions(PoI) [5]
Impairment due to initial delay [3]
Impairment due to stall [3]
Impairment due to level fluctuations [3]
Impairment due to low level video quality [3]
Average playback Quality (APQ) [4]
Playback Smoothness (PS) [6]
Interruption Ratio [4]

References

[1] Garcia, M-N., et al. "Quality of experience and HTTP adaptive streaming: A review of subjective studies." Quality of Multimedia Experience (QoMEX), 2014 Sixth International Workshop on. IEEE, 2014.

[2] Seufert, Michael, et al. "A survey on quality of experience of http adaptive streaming." IEEE Communications Surveys & Tutorials 17.1 (2015): 469-492.
[3] Liu, Yao, et al. "User experience modeling for DASH video." 2013 20th International Packet Video Workshop. IEEE, 2013.
[4] S. Xiang, L. Cai, and J. Pan, “Adaptive scalable video streaming in wireless networks,” in Proc. of ACM MMSys, Feb. 2012, pp. 167–172.
[5] Yan, Zhisheng, Jingteng Xue, and Chang Wen Chen. "QoE continuum driven HTTP adaptive streaming over multi-client wireless networks." 2014 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2014.
[6] S. Nelakuditi, R. Harinath, E. Kusmierek, and Z. Zhang. Providing smoother quality layered video stream. In ACM NOSSDAV’00, June 2000.

What parameters affect the bitrate of a video?

Encoding Settings
Spatial Settings (such as resolution)
Temporal Settings (such as frame rate)

Tuesday, September 6, 2016

should users receive the same bitrate or the same quality?

Since there is no linear correlation between the bitrate of a video stream and its perceptual quality [1], the fairness algorithm should assure that users get similar qualities rather than similar bitrates.

References
[1] G. Cermak, M. Pinson, and S. Wolf. The relationship among video quality, screen resolution, and bit rate. Broadcasting, IEEE Transactions on, 57(2):258–262, 2011.

Finding the optimal values for discrete parameters

One way, used in [1], is use this heuristic: Solve the problem in continuous space. Conduct a bi-dimensional search for the nearest valid (discrete) values. If there were N parameters, there would be at most 2^N possible options. Apply the same utility function or a different one to find the best option.

Another way, used in [2], is a branch and bound algorithm [3] that uses heuristic evaluation, which allows to optimise over a discrete data set in linear time. In [2], the utilized two versions (Promote and Boost), but the results were similar.

References
[1] Mu, Mu, et al. "User-level fairness delivered: network resource allocation for adaptive video streaming." 2015 IEEE 23rd International Symposium on Quality of Service (IWQoS). IEEE, 2015.
[2] Georgopoulos, Panagiotis, et al. "Towards network-wide QoE fairness using openflow-assisted adaptive video streaming." Proceedings of the 2013 ACM SIGCOMM workshop on Future human-centric multimedia networking. ACM, 2013.
[3] R. J. Dakin. A Tree-search Algorithm for Mixed Integer Programming Problems. Comput. J., 8(3):250{255, 1965.

Forgiveness Effect

Refer to these papers:

Layerp2p: Using layered video chunks in p2p live streaming [1]
Forgiveness effect in subjective assessment of packet video [2]
Temporal characterization of forgiveness effect [3]
User-level fairness delivered: network resource allocation for adaptive video streaming [4]

References
[1] Z. Liu, Y. Shen, K. W. Ross, S. S. Panwar, and Y. Wang. Layerp2p: Using layered video chunks in p2p live streaming. IEEE Transactions on Multimedia, 11(7):1340–1352, 2009.
[2] V. Seferidis, M. Ghanbari, and D. Pearson. Forgiveness effect in subjective assessment of packet video. Electronics Letters, 28(21):2013–2014, 1992.
[3] D. Hands. Temporal characterization of forgiveness effect. Electronics Letters, 37, 2002.
[4] Mu, Mu, et al. "User-level fairness delivered: network resource allocation for adaptive video streaming." 2015 IEEE 23rd International Symposium on Quality of Service (IWQoS). IEEE, 2015.

Metrics for Adaptive Video Streaming

Metrics can be divided into user-level and network-level groups.

User-level Fairness Delivered: Network Resource Allocation for Adaptive Video Streaming [1]

They used three metrics: video quality, switching impact, and network cost (or utility)
They solved the problem just based on video quality metrics, and then adjusted the results based on a weighted average of the three metrics.
They considered that clients might have devices with varying resolutions, but didn't consider that the representations on the server might also have different resolutions.
To simulate wireless conditioned they randomly changed the available bandwidth between 500 kbps and 8 Mbps.
In switching impact metric, they considered forgiveness effect [2][3][4].
To implement fairness, they minimized the Relative Standard Deviation (RSD) of the metrics.

References

[1] Mu, Mu, et al. "User-level fairness delivered: network resource allocation for adaptive video streaming." 2015 IEEE 23rd International Symposium on Quality of Service (IWQoS). IEEE, 2015.

[2] Z. Liu, Y. Shen, K. W. Ross, S. S. Panwar, and Y. Wang. Layerp2p: Using layered video chunks in p2p live streaming. IEEE Transactions on Multimedia, 11(7):1340–1352, 2009.

[3] V. Seferidis, M. Ghanbari, and D. Pearson. Forgiveness effect in subjective assessment of packet video. Electronics Letters, 28(21):2013–2014, 1992.

[4] D. Hands. Temporal characterization of forgiveness effect. Electronics Letters, 37, 2002.

Monday, September 5, 2016

How to solve Nonlinear Integer Programming problems?

A nonlinear integer programming problem in general is NP-hard. If the objective function has some special form (e.g., convexity), the Lagrangian relaxation [1] or dynamic programming [2] can be used. If the objective function has a concave or log-concave property, the steepest ascent algorithm [3] can be used.

References
[1] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge Univ. Press, 2004.
[2] T. Ibaraki and N. Katoh, Resource Allocation Problems: Algorithmic Approaches. The MIT Press, 1988.
[3] G. P. Akilov, L. V. Kantorovich, Functional Analysis, 2nd edition. Pergamon Press, 1982.

Tuesday, August 30, 2016

What good does HTTP Adaptive Streaming (HAS) do?

HAS utilizes the HTTP protocol for streaming video content and inherits the advantages of HTTP such as transparent caching and network address translation (NAT) traversal, while the underlying transport control protocol (TCP) over which HTTP objects are transported offers congestion control functionality [1].

HAS is adaptive in the sense that the quality of the video is adjusted based on the bandwidth or data rate available between the server and the client. This is a particularly useful feature for a wireless environment since the data rate of the wireless link can vary substantially over time because of physical mobility or time-varying channel impairments such as shadowing or multipath fading, and variations in other traffic served by the same base station [1].

References
[1] D. De Vleeschauwer, H. Viswanathan, A. Beck, S. Benno, G. Li, and R. Miller, “Optimization of HTTP adaptive streaming over mobile cellular networks,” in Proc. IEEE INFOCOM, 2013, pp. 898–997.

Sunday, August 28, 2016

Mixed Integer Nonlinear Programming

Mixed integer nonlinear programming (MINLP) refers to optimization problems with continuous and discrete variables and nonlinear functions in the objective function and/or the constraints.

In [2] such a problem has solved for HTTP adaptive streaming over LTE networks.

References
[1] http://www.neos-guide.org/content/mixed-integer-nonlinear-programming
[2] Cicalo, Sergio, et al. "Improving QoE and Fairness in HTTP Adaptive Streaming over LTE Network." (2015).

DASHEncoder: DASH content generation tool

DASHEncoder is open source an available at: https://github.com/slederer/DASHEncoder

References
[1] Stefan Lederer, Christopher Müller and Christian Timmerer, “Dynamic Adaptive Streaming over HTTP Dataset”, In Proceedings of the ACM Multimedia Systems Conference 2012, Chapel Hill, North Carolina, February 22-24, 2012.

How to estimate achievable rate for each UE in LTE?

If gamma is the average signal-to-noise ratio (SNR) experienced by UE, the average rate per unit bandwidth is estimated as log2(1 + gamma) [1][2][3].
In fact, it is a simplified air interface model where the achievable rate for each UE is estimated according to the average channel state information (CSI) of its link.

References
[1] D. De Vleeschauwer, H. Viswanathan, A. Beck, S. Benno, G. Li, and R. Miller, “Optimization of HTTP adaptive streaming over mobile cellular networks,” in Proc. IEEE INFOCOM, 2013, pp. 898–997.
[2] A. E. Essaili, D. Schroeder, D. Staehle, M. Shehada, W. Kellerer, and E. Steinbach, “Quality-of-experience driven adaptive HTTP media delivery,” in Proc. IEEE Int. Conf. on Commun. (ICC), Budapest, Hungary, Jun 2013.
[3] Cicalo, Sergio, et al. "Improving QoE and Fairness in HTTP Adaptive Streaming over LTE Network." (2015).

Quality Class Indicators (QCI)

"LTE supports different types of services including web browsing, video streaming, VoIP, online gaming, real-time video, etc., with standardized quality class indicators (QCI) [1]. Each QCI defines a set of requirements for quality of service (QoS) bearers, e.g., maximum tolerable delay, packet loss rate and/or guaranteed bit-rate (GBR). A GBR bearer allows to define a minimum bit-rate and a maximum bit-rate (MBR) to be allocated to a particular UE." an excerpt from [2].

Best Effort Scenario: All UEs are non-GBR with QCI equal to 9.

According to [1]:
QoS class identifier (QCI): A scalar that is used as a reference to a specific packet forwarding behaviour (e.g. packet loss rate, packet delay budget) to be provided to a Service Data Flow (SDF). This may be implemented in the access network by the QCI referencing node specific parameters that control packet forwarding treatment (e.g. scheduling weights, admission thresholds, queue management thresholds, link layer protocol configuration, etc.), that have been pre-configured by the operator at a specific node(s) (e.g. eNodeB).

Services using a GBR QCI and sending at a rate smaller than or equal to GBR can in general assume that congestion related packet drops will not occur, and 98 percent of the packets shall not experience a delay exceeding the QCI's PDB.

Services using a Non-GBR QCI should be prepared to experience congestion related packet drops, and 98 percent of the packets that have not been dropped due to congestion should not experience a delay exceeding the QCI's PDB.

The Packet Error Loss Rate (PELR) defines an upper bound for the rate of SDUs (e.g. IP packets) that have been processed by the sender of a link layer protocol (e.g. RLC in E-UTRAN) but that are not successfully delivered by the corresponding receiver to the upper layer (e.g. PDCP in E-UTRAN). Thus, the PELR defines an upper bound for a rate of non congestion related packet losses.

In general, the rate of congestion related packet drops can not be controlled precisely for Non-GBR traffic. This rate is mainly determined by the current Non-GBR traffic load, the UE's current radio channel quality, and the configuration of user plane packet processing functions (e.g. scheduling, queue management, and rate shaping).

An operator would choose GBR QCIs for services where the preferred user experience is "service blocking over service dropping", i.e. rather block a service request than risk degraded performance of an already admitted service request.

References
[1] 3GPP, “Policy and charging control architecture,” TS 23.203, v10.7.0, 2012.
[2] Cicalo, Sergio, et al. "Improving QoE and Fairness in HTTP Adaptive Streaming over LTE Network." (2015).

Comprehensive review of the MPEG DASH standard

A comprehensive review of the MPEG DASH standard for multimedia streaming over the internet can be found in [1].

References
[1] I. Sodagar, “The MPEG-DASH standard for multimedia streaming over the internet,” IEEE MultiMedia, vol. 18, no. 4, pp. 62–67, April 2011.

Adaptive Video Streaming Over LTE Networks

Improving QoE and Fairness in HTTP Adaptive Streaming over LTE Network [1]

The proposed algorithm can be used into two modes: client-side and network-assisted.
Authors have considered three elements to assign the rates

Client Buffer Size (DASH standard allows clients to report it)
Channel Condition
Video Complexity

Experiments have been done in "ns2".
In their simulations, the available resources dedicated to streaming users is dynamically updated.
They have used SSIM as for the quality metric.

Optimization of HTTP adaptive streaming over mobile cellular networks [2]

They have considered data and video users at the same time.
They have used the same utility function for data and video users, however with different parameters.

All fairness utility functions are of the same form [3].

They didn't change the default proportionally fair scheduler inside eNB. They instructed that scheduler by providing it with GBR values.
They compared their work with Best Effort and traditional GBR schedulers.

All these schedulers share the same principle, described in [4].

References
[1] Cicalo, Sergio, et al. "Improving QoE and Fairness in HTTP Adaptive Streaming over LTE Network." (2015).
[2] D. De Vleeschauwer, H. Viswanathan, A. Beck, S. Benno, G. Li, and R. Miller, “Optimization of HTTP adaptive streaming over mobile cellular networks,” in Proc. IEEE INFOCOM, 2013, pp. 898–997.
[3] M. Uchida, J. Kurose, “An Information-Theoretic Characterization of Weighted α-Proportional Fairness,” In Proceedings of IEEE INFOCOM'09, ( pp. 1053-1061)
[4] M. Andrews, L. Qian, A. Stlyar, “Optimal Utility Based Multi-user Throughput Allocation subject to Throughput Constraints,” In Proceedings of IEEE INFOCOM'05, Vol. 4, pp. 2415-2424, 2005

Friday, August 19, 2016

How to generate depth information

The free-viewpoint videos are captured using a camera array which capture the scene from multiple viewpoints. To generate the depth information, an additional depth camera may be provided for each viewpoint of the camera array. Alternatively, depth maps can be generated at a later time using one of the known depth estimation methods [1].

References
[1] D. Scharstein and R. Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47(1-3):7–42, 2002.

Virtual View Distortion Models

You can find virtual view distortion models in these papers:

[1] A. Hamza and M. Hefeeda. A DASH-based free-viewpoint video streaming system. In Proc. of the ACM Workshop on Network and Operating Systems Support for Digital Audio and Video, pages 55–60, March 2014.
[2] Hamza, Ahmed, and Mohamed Hefeeda. "Adaptive streaming of interactive free viewpoint videos to heterogeneous clients." Proceedings of the 7th International Conference on Multimedia Systems. ACM, 2016.
[3] T.-Y. Chung, J.-Y. Sim, and C.-S. Kim. Bit allocation algorithm with novel view synthesis distortion model for multiview video plus depth coding. IEEE Transactions on Image Processing, 23(8):3254–3267, August 2014.
[4] V. Velisavljevi´c, G. Cheung, and J. Chakareski. Bit allocation for multiview image compression using cubic synthesized view distortion model. In Proc. of the IEEE International Conference on Multimedia and Expo, pages 1–6, July 2011.

Depth-Image-Based Rendering (DIBR)

DIBR is a technique to synthesize a non-captured view (virtual) using some of the captured (reference) views of the 3D world.

Thursday, August 18, 2016

Wednesday, August 17, 2016

Free Viewpoint Streaming

A DASH-based Free Viewpoint Video Streaming System [1]

They have proposed a framework for adaptive free-viewpoint video streaming.
They have developed an empirical rate-distortion model for MVD (multi-view-plus-depth) videos.
They have developed two DIBR implementations which exploit graphics processing units (GPUs) to speed up the view synthesis process. The first implementation performs double warping view synthesis for an arbitrary camera arrangement. The second implementation performs horizontal pixel shifting for 1D parallel camera arrangements or rectified camera views. Both implementations use the OpenGL graphics API to perform the different stages of the view synthesis process.

Adaptive Streaming of Interactive Free Viewpoint Videos to Heterogeneous Clients [2]

This is extended version of [1].
They have predicted user's view request using a simple location estimation technique known as dead reckoning.
They have developed a virtual view distortion model which is based on mathematical analysis. It stills needs some pre-measured rate-distortion points to find the model's parameters.

References

[1] A. Hamza and M. Hefeeda. A DASH-based free-viewpoint video streaming system. In Proc. of the ACM Workshop on Network and Operating Systems Support for Digital Audio and Video, pages 55–60, March 2014.
[2] Hamza, Ahmed, and Mohamed Hefeeda. "Adaptive streaming of interactive free viewpoint videos to heterogeneous clients." Proceedings of the 7th International Conference on Multimedia Systems. ACM, 2016.

The optimal ratio between texture and depth

According to [1], the optimal ratio between texture and depth data remains the same for any total target bit-rate.

References
[1] E. Bosc, F. Racapé, V. Jantet, P. Riou, M. Pressigout, L. Morin, and V. Jantet. A study of depth/texture bit-rate allocation in multi-view video plus depth compression. annals of telecommunications - annales des télécommunications, pages 1–11, 2013.

How to avoid short-term bandwidth fluctuations?

In the process of estimating network bandwidth, the effect of short-term bandwidth fluctuations should be considered to avoid sudden sharp consequences.

In [1], they have used an exponential weighted moving average method.

References
[1] A. Hamza and M. Hefeeda. A DASH-based free-viewpoint video streaming system. In Proc. of the ACM Workshop on Network and Operating Systems Support for Digital Audio and Video, pages 55–60, March 2014.

DIBR for FVV

In [1], they have used Depth-Image-Based Rendering (DIBR) to synthesize a non-reference video from reference views (free-viewpoint application).

References
[1] S. Zinger, L. Do, and P. de With. Free-viewpoint depth image based rendering. Journal of Visual Communication and Image Representation, 21(5-6):533–541, 2010.

Tools to generate DASH segments

The following tools can be used to generate DASH segments out of a video source:

GPAC Multimedia Framework [1]

References
[1] GPAC multimedia framework. http://gpac.wp.mines-telecom.fr/.

Free-viewpoint video streaming challenges

Free-viewpoint video (FVV) streaming systems face several challenges [1]:

Responsiveness. Users should be able to interact with the system in real-time. The delay between a request for viewpoint change and the rendering of the target view should be minimized. This includes network-related delays as well as processing delays.
Scalability. The system should be able to handle a large number of concurrent clients that are possibly viewing the scene from different angles.
Adaptability. The system should provide the best possible quality to heterogeneous clients while handling network dynamics, such as bandwidth variation.
Immersiveness. A user should be able to choose between a large number of viewpoints and transition smoothly between them in order to provide a truly immersive experience.

References
[1] A. Hamza and M. Hefeeda. A DASH-based free-viewpoint video streaming system. In Proc. of the ACM Workshop on Network and Operating Systems Support for Digital Audio and Video, pages 55–60, March 2014.