SIP : Session Initiation
Protocol
The Session Initiation Protocol (SIP)
is an application-layer control (signaling) protocol for creating,
modifying, and terminating sessions with one or more participants.
These sessions include Internet telephone calls, multimedia distribution,
and multimedia conferences." (cit. RFC 3261). It was originally
designed by Henning Schulzrinne (Columbia University) and Mark Handley
(UCL) starting in 1996. The latest version of the specification
is RFC 3261 from the IETF SIP Working Group. In November 2000, SIP
was accepted as a 3GPP signaling protocol and permanent element
of the IMS architecture. It is widely used as a signaling protocol
for Voice over IP, along with H.323 and others.
SIP has the following characteristics:
Transport-independent, because SIP can be used with
UDP, TCP, ATM & so on.
Text-based, allowing for humans to read SIP messages.
Protocol design
SIP clients use TCP or UDP (typically on port 5060) to connect to
SIP servers and other SIP endpoints. SIP is primarily used in setting
up and tearing down voice or video calls. However, it can be used
in any application where session initiation is a requirement. These
include Event Subscription and Notification, Terminal mobility and
so on. There are a large number of SIP-related RFCs that define
behavior for such applications. All voice/video communications are
done over separate session protocols, typically RTP.
A motivating goal for SIP was to provide a signalling
and call setup protocol for IP-based communications that can support
a superset of the call processing functions and features present
in the public switched telephone network (PSTN). SIP by itself does
not define these features; rather, its focus is call-setup and signalling.
However, it has been designed to enable the building of such features
in network elements known as Proxy Servers and User Agents. These
are features that permit familiar telephone-like operations: dialing
a number, causing a phone to ring, hearing ringback tones or a busy
signal. Implementation and terminology are different in the SIP
world but to the end-user, the behavior is similar.
s
SIP-enabled telephony networks can also implement
many of the more advanced call processing features present in Signalling
System 7 (SS7), though the two protocols themselves are very different.
SS7 is a highly centralized protocol, characterized by a highly
complex central network architecture and dumb endpoints (traditional
telephone handsets). SIP is a peer-to-peer protocol. As such it
requires only a very simple (and thus highly scalable) core network
with intelligence distributed to the network edge, embedded in endpoints
(terminating devices built in either hardware or software). SIP
features are implemented in the communicating endpoints (i.e. at
the edge of the network) as opposed to traditional SS7 features,
which are implemented in the network.
Although many other VoIP signalling protocols exist,
SIP is characterized by its proponents as having roots in the IP
community rather than the telecom industry. SIP has been standardized
and governed primarily by the IETF while the H.323 VoIP protocol
has been traditionally more associated with the ITU. However, the
two organizations have endorsed both protocols in some fashion.
SIP works in concert with several other protocols
and is only involved in the signalling portion of a communication
session. SIP acts as a carrier for the Session Description Protocol
(SDP), which describes the media content of the session, e.g. what
IP ports to use, the codec being used etc. In typical use, SIP "sessions"
are simply packet streams of the Real-time Transport Protocol (RTP).
RTP is the carrier for the actual voice or video content itself.
The first proposed standard version (SIP 2.0) was
defined in RFC 2543. The protocol was further clarified in RFC 3261,
although many implementations are still using interim draft versions.
Note that the version number remains 2.0.
SIP is similar to HTTP and shares some of its design
principles: It is human readable and request-response structured.
SIP shares many HTTP status codes, such as the familiar '404 not
found'. SIP proponents also claim it to be simpler than H.323. However,
some would counter that while SIP originally had a goal of simplicity,
in its current state it has become as complex as H.323. Others would
argue that SIP is a stateless protocol, hence making it possible
to easily implement failover and other features that are difficult
in stateful protocols such as H.323. SIP and H.323 are not limited
to voice communication but can mediate any kind of communication
session from voice to video or future, unrealized applications.
SIP network elements
Hardware endpoints devices with the look, feel, and shape of a
traditional telephone, but that use SIP and RTP for communication
are commercially available from several vendors. Some of these
can use Electronic Numbering (ENUM) or DUNDi to translate existing
phone numbers to SIP addresses, so calls to other SIP users can
bypass the telephone network, even though your service provider
might normally act as a gateway to the PSTN network for traditional
phone numbers (and charge you for it). Today, software SIP endpoints
are common.
SIP also requires proxy and registrar network elements
to work as a practical service. Although two SIP endpoints can communicate
without any intervening SIP infrastructure, which is why the protocol
is described as peer-to-peer, this approach is impractical for a
public service. There are various implementations that can act as
proxy and registrar.
From the RFCs:
"SIP makes use of elements called proxy servers
to help route requests to the user's current location, authenticate
and authorize users for services, implement provider call-routing
policies, and provide features to users."
"SIP also provides a registration function that allows users
to upload their current locations for use by proxy servers. "
"Since registrations play an important role in SIP, a User
Agent Server that handles a REGISTER is given the special name registrar."
"It is an important concept that the distinction between types
of SIP servers is logical, not physical."
Instant messaging (IM) and presence
A standard instant messaging protocol based on SIP, called SIMPLE,
has been proposed and is under development. SIMPLE can also carry
presence information, conveying a person's willingness and ability
to engage in communications. Presence information is most recognizable
today as buddy status in IM clients such as Yahoo! Messenger, AIM,
Skype, or the open standard XMPP.
Some efforts have been made to integrate SIP-based
VoIP with the XMPP specification used by Jabber. Most notably Google
Talk, which extends XMPP to support voice, plans to integrate SIP.
Google's XMPP extension is called Jingle and, like SIP, it acts
as a Session Description Protocol carrier.
SIP itself defines a method of passing instant messages
between endpoints, similar to SMS messages. This is not generally
supported by commercial operators.
Commercial applications
Firewalls typically block media packet types such as UDP, though
one way around this is to use TCP tunnelling and relays for media
in order to provide NAT and firewall traversal. One solution involves
tunnelling the media packets within TCP or HTTP packets to a relay.
This solution uses additional functionality in conjunction with
SIP, and packages the media packets into a TCP stream which is then
sent to the relay. The relay then extracts the packets and sends
them on to the other endpoint. If the other endpoint is behind a
symmetrical NAT, or corporate firewall that does not allow VOIP
traffic, the relay would transfer the packets to another tunnel.
One disadvantage of this approach is that TCP was not designed for
real time traffic such as voice, so an optimized form of the protocol
is sometimes used.
As envisioned by its originators, SIP's peer-to-peer
nature does not enable network-provided services. For example, the
network can not easily support legal interception of calls (referred
to in the United States by the law governing wiretaps, CALEA). Emergency
calls (calls to E911 in the USA) are difficult to route. It is difficult
to identify the proper Public Service Answering Point, PSAP because
of the inherent mobility of IP end points and the lack of any network
location capability. However, as commercial SIP services begin to
take off practical solutions to these problems are being proven.
Standards being developed by such organizations as 3GPP and 3GPP2
define applications of the basic SIP model which facilitate commercialization
and enable support for network-centric capabilities such as CALEA.
Many VoIP phone companies allow customers to bring
their own SIP devices, as SIP-capable telephone sets, or softphones.
The new market for consumer SIP devices continues to expand.
The free software community started to provide more
and more of the SIP technology required to build both end points
as well as proxy and registrar servers leading to a commoditization
of the technology, which accelerates global adoption. SIPfoundry
has made available and actively develops a variety of SIP stacks,
client applications and SDKs, in addition to entire IP PBX solutions
that compete in the market against mostly proprietary IP PBX implementations
from established vendors.
The National Institute of Standards and Technology
(NIST), Advanced Networking Technologies Division provides a public
domain implementation of the JAVA Standard for SIP JAIN-SIP which
serves as a reference implementation for the standard. The stack
can work in proxy server or user agent scenarios and has been used
in numerous commercial and research projects. It supports RFC 3261
in full and a number of extension RFCs including RFC 3265 (Subscribe
/ Notify) and RFC 3262 (Provisional Reliable Responses) etc.
See
also
-ISDN BRI and ISDN PRI Services
-FXO vs FXS
-Global
System for Mobile Communications
-About VoIP
-SIP:Session Initiation Protocol
-List of commercial voice over IP
network providers
-Mobile VoIP
-List of SIP software
|