It’s been a while since I wrote this blog on OCS 2007 media traversal: http://communicationsserverteam.com/archive/2008/03/25/133.aspx.
I’ve since left Microsoft to join a Unified Communications consulting
company called Unify Square, but media traversal is still near and
dear. This blog describes some of the improvements in media traversal
that have been implemented in OCS 2007 R2.
Some things haven’t changed
The overall architecture of media endpoints using ICE and the STUN/TURN
capabilities of the A/V Edge server has not changed. Signaling is still
protected by TLS encryption, media is still protected by SRTP
encryption. STUN/TURN allocations against the A/V Edge are still
protected by a digest authentication mechanism whose password rotates
every eight hours, and obtaining this allocation password is still
protected within a TLS encrypted SIP SERVICE message. That said, a lot
of improvements have been made in OCS 2007 R2. Let’s take a look at
some of them.
Support of Early Media
In OCS 2007, negotiation of a media path (i.e. ICE connectivity checks)
started when the called party answered the call. Specifically, ICE
candidates were sent by the caller in the INVITE and by the callee in
the 200 OK. This resulted in a slight delay between the called party
answering and when media would actually flow. (The one exception to
this was outbound calls to PSTN. To support PSTN gateways that started
sending audio before the 200OK, the mediation server would actually send
ICE candidates in a 180 RINGING in addition to the 200 OK. This
enabled a poor man’s version of early media where one-way audio could be
transferred from the mediation server to the calling endpoint before
the full ICE negotiation occurred, preventing any initial “Hello?” audio
from being clipped.)
OCS 2007 R2 endpoints support early media, a feature which enables
negotiation of media before the call is accepted by the called party.
This addresses the audio clipping issue and enables a number of other
scenarios such as playing custom ring back tones to the caller.
Practically speaking, this means that ICE must be negotiated before the
200 OK. What you’ll notice is that the called party will send back ICE
candidates in a 183 SESSION PROGRESS message. Under the covers, this
triggers a full ICE negotiation, enabling the media path to be ready the
instant the called party actually answers the call. (Note that the
called party still sends candidates in the 200 OK message and a final
ICE negotiation still happens, though this rarely results in a switch of
the media path.)
If a called user has multiple R2 endpoints register, each will allocate
ICE endpoints and negotiate an early media ICE path with the caller.
However, as soon as the caller receives an audio packet from one of the
dialed endpoints, it will stop listening on the other early media
paths. In theory, the media path could switch after the final ICE
negotiation occurs with the 200 OK. (e.g. Let’s say an incoming call is
set to simulring a user’s OC endpoint and a his cell phone. The cell
phone system generates a custom ring back tone, but the user ends up
answering on OC.) Typically, the endpoint that sent early media audio
packets will be the same endpoint that actually answers the call and
sends the 200 OK.
App Sharing Use of ICE/STUN/TURN
OCS 2007 R2 introduces a new modality called App Sharing, built upon the
same RDP protocol used in Terminal Services. Though functionally
similar to the desktop sharing feature in Live Meeting, it functions as a
totally separate modality outside of a Live Meeting conference. For
app sharing sessions involving two OC endpoints, the app sharing media
stream flows point to point. For conferences that use app sharing or if
a CWA endpoint is involved, the media flows through the new app sharing
MCU. In either case, the same ICE/STUN/TURN mechanism used to
negotiate an audio and video path is also used to negotiate an app
sharing media path…with one key difference. Unlike audio and video, the
RDP protocol is not designed to be run over an unreliable transport
protocol like UDP. Therefore, the app sharing modality uses
ICE/STUN/TURN in a TCP-only mode. One interesting note is that in this
TCP-only mode, TCP candidates are actually supported on the endpoint
hosts, enabling a point to point TCP media stream. For voice and video,
only a point to point UDP stream is possible.
Support of ICE version 19
In OCS 2007 R2, all endpoints support ICE version 19. In actually, OCS
2007 R2 endpoints support both ICE version 19 and the legacy ICE version
6 implemented in OCS 2007. Full treatment of the differences between
these two versions is beyond the scope of this blog and probably not
something you’ll ever need to know, but let’s look at an SDP fragment
from on R2 OC client to get a sense for some of the key differences:
------=_NextPart_000_0149_01C9A22E.BDA43360
Content-Type: application/sdp
Content-Transfer-Encoding: 7bit
Content-Disposition: session; handling=optional; ms-proxy-2007fallback
v=0
o=- 0 0 IN IP4 192.168.5.150
s=session
c=IN IP4 192.168.5.150
b=CT:99980
t=0 0
m=audio 50010 RTP/AVP 114 111 112 115 116 4 8 0 97 13 118 101
k=base64:ROFyvlcWFwsPej5xrWlQj+PFsw9Uyy0OSHoFv62mLTPvXdpnn5XvqcxI556k
a=candidate:Y821qEyRKswvPiFeMBgkQBTTL0vJDm//txizLAGyhKQ 1 o4IBYszjQDYWPTb58I7szQ UDP 0.830 192.168.5.150 50010
a=candidate:Y821qEyRKswvPiFeMBgkQBTTL0vJDm//txizLAGyhKQ 2 o4IBYszjQDYWPTb58I7szQ UDP 0.830 192.168.5.150 50008
a=candidate:VS7Zjeu4CJwh6kMO3xTuwAOhW6gGpoC9NpqEv7S8geA 1 9cJV/DeRmf+hwEws92rRNQ TCP 0.190 64.105.253.213 56653
a=candidate:VS7Zjeu4CJwh6kMO3xTuwAOhW6gGpoC9NpqEv7S8geA 2 9cJV/DeRmf+hwEws92rRNQ TCP 0.190 64.105.253.213 56653
a=candidate:cnsB1P6I85tVDpl/UgjTWRl8rFOYSkXOa8nPvnl2RJU 1 +Mkh11586TV6kN8IpnLVMQ UDP 0.490 64.105.253.213 58140
a=candidate:cnsB1P6I85tVDpl/UgjTWRl8rFOYSkXOa8nPvnl2RJU 2 +Mkh11586TV6kN8IpnLVMQ UDP 0.490 64.105.253.213 55208
a=candidate:/YhjMGvsupfnJrUraPnPUwnSUV3IsMpMLHwZIqW4aQI 1 Fvf+CecTZF6sVN/Svuunrg TCP 0.250 10.0.0.2 50014
a=candidate:/YhjMGvsupfnJrUraPnPUwnSUV3IsMpMLHwZIqW4aQI 2 Fvf+CecTZF6sVN/Svuunrg TCP 0.250 10.0.0.2 50014
a=candidate:VCZf8gadJG6G8Pb3xS7bj/4CVK/P+GeIhuew2tHBy9k 1 DIX0ZzFlrnlzdLGqfqWB0w UDP 0.550 10.0.0.2 50005
a=candidate:VCZf8gadJG6G8Pb3xS7bj/4CVK/P+GeIhuew2tHBy9k 2 DIX0ZzFlrnlzdLGqfqWB0w UDP 0.550 10.0.0.2 50017
a=cryptoscale:1 client AES_CM_128_HMAC_SHA1_80 inline:yEiOl3HA+vbDHvqSmvplV9BGpfg19jSxwjFElAPz|2^31|1:1
a=crypto:2 AES_CM_128_HMAC_SHA1_80 inline:HdnKHORdSJgC/rcYZ1y3uMRbKvybFruyFiD+UkoZ|2^31|1:1
a=maxptime:200
a=rtcp:50008
a=rtpmap:114 x-msrta/16000
a=fmtp:114 bitrate=29000
a=rtpmap:111 SIREN/16000
a=fmtp:111 bitrate=16000
a=rtpmap:112 G7221/16000
a=fmtp:112 bitrate=24000
a=rtpmap:115 x-msrta/8000
a=fmtp:115 bitrate=11800
a=rtpmap:116 AAL2-G726-32/8000
a=rtpmap:4 G723/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:97 RED/8000
a=rtpmap:13 CN/8000
a=rtpmap:118 CN/16000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-16
a=encryption:required
------=_NextPart_000_0149_01C9A22E.BDA43360
Content-Type: application/sdp
Content-Transfer-Encoding: 7bit
Content-Disposition: session; handling=optional
v=0
o=- 0 0 IN IP4 192.168.5.150
s=session
c=IN IP4 192.168.5.150
b=CT:99980
t=0 0
m=audio 50003 RTP/AVP 114 111 112 115 116 4 8 0 97 13 118 101
k=base64:ROFyvlcWFwsPej5xrWlQj+PFsw9Uyy0OSHoFv62mLTPvXdpnn5XvqcxI556k
a=ice-ufrag:VXim
a=ice-pwd:OKEB+HhXDUoNP4lrx8AH+syY
a=candidate:1 1 UDP 2130706431 192.168.5.150 50003 typ host
a=candidate:1 2 UDP 2130705918 192.168.5.150 50006 typ host
a=candidate:2 1 TCP-PASS 6556159 64.105.253.213 53119 typ relay raddr 64.105.253.213 rport 53119
a=candidate:2 2 TCP-PASS 6556158 64.105.253.213 53119 typ relay raddr 64.105.253.213 rport 53119
a=candidate:3 1 UDP 16648703 64.105.253.213 54183 typ relay raddr 64.105.253.213 rport 54183
a=candidate:3 2 UDP 16648702 64.105.253.213 51646 typ relay raddr 64.105.253.213 rport 51646
a=candidate:4 1 TCP-ACT 7076863 64.105.253.213 53119 typ relay raddr 64.105.253.213 rport 53119
a=candidate:4 2 TCP-ACT 7076350 64.105.253.213 53119 typ relay raddr 64.105.253.213 rport 53119
a=candidate:5 1 TCP-ACT 1684797951 10.0.0.2 50001 typ srflx raddr 192.168.5.150 rport 50001
a=candidate:5 2 TCP-ACT 1684797438 10.0.0.2 50001 typ srflx raddr 192.168.5.150 rport 50001
a=candidate:6 1 UDP 1694234623 10.0.0.2 50011 typ srflx raddr 192.168.5.150 rport 50011
a=candidate:6 2 UDP 1694234110 10.0.0.2 50009 typ srflx raddr 192.168.5.150 rport 50009
a=cryptoscale:1 client AES_CM_128_HMAC_SHA1_80 inline:yEiOl3HA+vbDHvqSmvplV9BGpfg19jSxwjFElAPz|2^31|1:1
a=crypto:2 AES_CM_128_HMAC_SHA1_80 inline:HdnKHORdSJgC/rcYZ1y3uMRbKvybFruyFiD+UkoZ|2^31|1:1
a=maxptime:200
a=rtcp:50006
a=rtpmap:114 x-msrta/16000
a=fmtp:114 bitrate=29000
a=rtpmap:111 SIREN/16000
a=fmtp:111 bitrate=16000
a=rtpmap:112 G7221/16000
a=fmtp:112 bitrate=24000
a=rtpmap:115 x-msrta/8000
a=fmtp:115 bitrate=11800
a=rtpmap:116 AAL2-G726-32/8000
a=rtpmap:4 G723/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:97 RED/8000
a=rtpmap:13 CN/8000
a=rtpmap:118 CN/16000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-16
a=encryption:required
------=_NextPart_000_0149_01C9A22E.BDA43360--
The first thing you notice is that this contains two complete sets of
SDP. The first SDP block contains a version 6 ICE candidate list and
the second contains one for version 19. You can see
“ms-proxy-2007fallback” string identifies which one is the legacy
block. This is called a multipart SDP and explains how OCS 2007 R2
endpoints are still able to negotiate media with Exchange 2007 UM and
other legacy OCS 2007 endpoints. If the caller is R2, both SDPs are
offered and the legacy endpoint responds with a ICE version 6 SDP only.
This tells the R2 endpoint to go into legacy mode. If the callee is
R2, the offer will contain just a legacy ICE SDP which indicates to the
callee that it should only respond with a legacy ICE SDP. Keep in mind
that because app sharing is a new feature of OCS 2007 R2, you will never
see any app sharing candidate lists or media offer in a legacy SDP
block.
You’ll also notice the version 19 candidate list is shorter and more
readable. Rather than encoding a unique username/password per
candidate, a common one is used for the entire set of candidates. The
type of ICE candidate is also encoded, where HOST is a candidate on the
endpoint itself, SRFLX (short for Server Reflexive) is a STUN candidate
on the NAT, and RELAY a candidate on the A/V Edge. You’ll also notice
that TCP candidates are denoted as ACT (Active) or PASS (Passive),
indicating whether the candidate will initiate or receive connectivity
check requests. In OCS 2007, TCP A/V Edge candidates behave as active
and passive, but TCP NAT candidates were passive only. However this was
not apparent from looking at the candidate list SDP. Another
difference is the priority encoding. ICE version 6 used a three digit
decimal to encode the priority and required floating point math to
compute the combined priority of a candidate pair. In ICE version 19,
the priority is now an integer, which makes the computation less
intensive.
Again, the details of the SDP differences between the two ICE versions
is not terribly important. Just remember the multipart nature of the
SDP and how an R2 endpoint negotiates with legacy ICE endpoints.
Differences in A/V Edge 50,000 port range requirement
In OCS 2007, the external side of the A/V Edge server role required
ports 50,000-59,999 to be open for UDP and TCP in the inbound and
outbound direction. Although this was a secure solution (see my
original blog post), networking administrators perceived this to be a
security threat and were very resistant to deploying the A/V Edge role.
To mitigate this deployment hurdle, OCS 2007 R2 reduces the requirement
to just allowing ports 50,000-59,999 for TCP outbound only. Moreover,
the product documentation now states that this outbound TCP port support
is only required to support federation with OCS 2007 R2 environment.
To support remote users only, opening ports UDP 3478 and TCP 443 is
sufficient. (This remote-only mode worked in OCS 2007, but was not
officially supported.) What changed in the A/V Edge? Well, the A/V
Edge now supports a federation over a “tunneled” link.
Let’s say a R2 OC endpoint within the Contoso company network calls an
R2 OC endpoint within the Litware company network. Both endpoints still
advertise allocated ports in the 50,000-59,999 range in their candidate
lists. Now let’s say connectivity checks are happening and the Contoso
R2 A/V Edge receives a UDP STUN connectivity check destined for the
Litware A/V Edge. Instead of sending that to the Litware A/V Edge using
a source and destination port in the 50,000-59,999 range, the Contoso
A/V Edge actually encapsulates this connectivity check in a new TURN
tunnel message and sends it to the Litware A/V Edge using a UDP source
and destination port of 3478. Keep in mind that the intended source and
destination IP/port numbers are passed within this tunnel packet. When
the Litware R2 A/V Edge receives this tunnel packet, it unpacks the
message, looks at the intended source/destination IP/port info, and
treats the packet as if it came to the destination IP/port from the
source IP/port.
The idea is that conveying the knowledge of the intended source and
destination IP/port for this connectivity check provides the equivalent
security as actually sending the connectivity check along that route.
This explains why UDP ports in the 50,000-59,999 range are no longer
needed. Why is TCP needed in the outbound direction only? In turns out
TCP also supports the same tunneling mechanism. However, TCPs
connection oriented nature means problems can arise if the listening
port is used as the source port when opening a TCP connection. So in
the connectivity check example used above, the Contoso A/V Edge opens a
TCP connection to port 443 on Litware’s A/V Edge, choosing and ephemeral
source port in the 50,000-59,999 port range.
Supporting federation with legacy A/V Edge servers
The example above works for two R2 OCS deployments. What would happen
if Litware was still on OCS 2007? Again, both OC endpoints will
advertise A/V Edge candidates in the 50,000-59,999 port range. In order
for connectivity to succeed, Contoso’s R2 A/V Edge must be able to send
a connectivity check to Litware’s A/V Edge and vice versa. To support
the former, Contoso doesn’t know that Litware’s A/V Edge is only on OCS
2007, so it tries to send the tunneling connectivity check packet, but
Litware’s A/V Edge is legacy, so it drops these packets. Hearing no
response, the Contoso A/V Edge will then flip to direct mode where it
will send the packet using a source and destination port in the
50,000-59,999 port range. Similarly in the other direction, the Litware
A/V Edge has no ability to send a tunneled connectivity check, so it
sends directly in the 50,000-59,999 port range as well. The same logic
applies to TCP connectivity checks. You can now see why opening the
50,000-59,999 port range for UDP and TCP in the inbound and outbound
direction is required to support federation with legacy OCS 2007 A/V
Edge deployments.
Port Range Implications
Supporting two versions of Ice in an Invite does have implications
on the number of ports allocated at the start of a call. In the SDP
snippet above, you’ll notice the version 6 ICE candidates are totally
different than the version 18 ICE candidates, meaning two full candidate
sets are allocated instead of just one set in OCS 2007. Early media
could also have an impact on the number of allocated ports if a called
user has multiple points of presence. Each called endpoint will
allocate a set of candidates and perform a full ICE negotiation prior to
the call being answered. That application sharing uses ICE could also
increase the port allocation usage for ICE.
The majority of these ports are short lived and will be de-allocated
within 10 seconds of the call being answered. The only ports that
remain for the duration of a call are actually used to send and receive
media. Nonetheless, this increased port usage at the start of a call
could be an issue for enterprises who have narrowed the allowed port
range of their endpoints or the reduced number of ports in the A/V
Edge’s 50,000 port range. For these reasons, the OCS team recommends
the media port range for R2 Office Communicator clients to be at least
40, twice the recommendation provided in OCS 2007.
Conclusion
Although the fundamental architecture of media traversal remains the
same in OCS 2007 R2, a number of enhancements have been. Key impacts
include: faster negotiation of the media stream through early media ICE
negotiations, leveraging ICE/STUN/TURN for new modalities such as
application sharing, and easing the port range requirements on the A/V
Edge server through a tunneled federation mode. This revised
implementation of ICE/STUN/TURN will serve as a great foundation for
enabling connectivity of new media scenarios in future versions of the
Microsoft Unified Communications product line.
Alan Shen | Director