H.323 was designed with a good understanding of the requirements for multimedia communication over IP networks, including audio, video, and data conferencing. It defines an entire, unified system for performing these functions, leveraging the strengths of the IETF and ITU-T protocols.
As a result, it might be reasonable for users to expect about the same level of robustness and interoperability as is found on the PSTN today, although this admittedly varies across the globe.
H.323 was designed to scale to add new functionality. The most widely deployed use of H.323 is "Voice over IP" followed by "Videoconferencing", both of which are described in the H.323 specifications
SIP was designed to setup a "session" between two points and to be a modular, flexible component of the Internet architecture. It has a loose concept of a call (that being a "session" with media streams), has no support for multimedia conferencing, and the integration of sometimes disparate standards is largely left up to each vendor.
As a result, SIP is now a 10-year old protocol with a vast number of interoperability problems. While SIP has been successfully deployed in some environments, those are generally "closed" environments where the means of interoperability has been PSTN gateways.
H.323 has defined a number of features to handle failure of intermediate network entities, including "alternate gatekeepers", "alternate endpoints", and a means of recovering from connection failures
SIP has not defined procedures for handling device failure. If a proxy fails, the user agent detects this through timer expiration. It is the responsibility of the user-agent to send a re-INVITE to another proxy, leading to long delays in call establishment.
H.323-> ASN.1, a standardized, extremely precise, easy-to-understand structural notation that is used by many other systems.
SIP->ABNF, or Augmented Backus-Naur Form, a syntactical notation. SIP uses the ABNF as defined in RFC 2234
H.323 encodes messages in a compact binary format that is suitable for narrowband and broadband connections. Messages are efficiently encoded and decoded by machines, with decoders widely available (e.g., Ethereal).
SIP messages are encoded in ASCII text format, suitable for humans to read. As a consequence, the messages are large and less suitable for networks where bandwidth, delay, and/or processing are a concern.
SIP messages get so large that they sometimes exceed the MTU size when going over WAN links, resulting in delays, packet loss, etc. As a result, effort has been made to binary encode SIP (e.g., RFC 3485 and RFC 3486).
H.323 - RTP/RTCP, SRTP
SIP - RTP/RTCP, SRTP
Extensibility -Vendor Specific
H.323 is extended with non-standard features in such a way as to avoid conflicts between vendors. Globally unique identifiers prevent feature and data element collision.
SIP is extended by adding new header lines or message bodies that may be used by different vendors to serve different purposes, thus risking interoperability problems.
H.323 is extended by the standards community to add new features to H.323 in such a way as to not impact existing features. However, new revisions of H.323 are published periodically, which introduce new functionality that is mandatory, yet done in such a way as to preserve backward compatibility
SIP is extended by the standards community to add new features to SIP in such a way as to not impact existing features. However, new revisions of SIP are potentially not backward compatible (e.g., RFC 3261 was not entirely compatible with RFC 2543). In addition, several extensions are "mandatory" in some implementations, which cause interoperability problems
Scalability -Load Balancing
H.323 has the ability to load balance endpoints across a number of alternate gatekeepers in order to scale a local point of presence. In addition, endpoints report their available and total capacity so that calls going to a set of gateways, for example, may be best distributed across those gateways.
SIP has no notion of load balancing, except "trial and error" across pre-provisioned devices or devices learned from DNS SRV records. There is no means of detecting the load on a particular gateway or to know whether a device has failed, meaning that proxies simply have to try a PSTN gateway, wait for the call to timeout, and then try another
Scalability -Call Signaling
When an H.323 gatekeeper is used, it may simply provide address resolution through one RAS message exchange, or it may route all call signaling traffic. In large networks, the direct call model may be used so that endpoints connect directly to one another.
When using a SIP proxy to perform address resolution for the SIP device, the proxy is required to handle at least 3 full message exchanges for every call. In large networks, such as IMS networks, the number of messages on the wire may be excessive. A basic call between two users may require as many as 30 messages on the wire!
An H.323 gatekeeper can be stateless using the direct call model.
A SIP proxy can be stateless if it does not fork, use TCP, or use multicast
Scalability -Address Resolution
H.323 defines an interface between the endpoint and gatekeeper for address resolution using ARQ or LRQ. The H.323 gatekeeper may use any number of protocols to discover the destination address of the callee, including LRQs to other gatekeepers, Annex G/H.225.0, TRIP, ENUM, and/or DNS. The endpoint does not have to be concerned with the mechanics of this process, and the processing requirements for address resolution placed on the gatekeeper by H.323 are for just a single message exchange.
While SIP has no address-resolution protocol, per se, a SIP user agent may route its INVITE message through a proxy or redirect server in order to resolve addresses. The SIP proxy may use various protocols to discover the destination address of the callee, including TRIP, ENUM, and/or DNS. The endpoint does not have to be concerned with the mechanics of this process. Unfortunately, the processing requirements placed on the SIP proxy are higher than with H.323 because at least 3 message exchanges must take place between the SIP device, SIP proxy, and the next hop.
Flexible addressing mechanisms, including URIs, e-mail addresses, and E.164 numbers.
H.323 supports these aliases:
E.164 dialed digits
generic H.323 ID
ISUP numberH.323 also supports overlap sending with no additional overhead, except conveyance of the newly received digits in a single message.
SIP only understands URI-style addresses. This works fine for SIP-SIP devices, but causes some confusion when trying to translated various dialed digits. The unofficial convention is that a "+" sign is inserted in the SIP URI (e.g., "sip:+email@example.com") in order to indicate that the number is in E.164 format, versus a user ID that might be numeric.
SIP has support for overlapped signaling defined in RFC 3578, though additional digit received requires transmission of three messages on the wire (a new INVITE, a 484 response to indicate that the address is incomplete, and an ACK).
Even with H.323's direct call model, the ability to successfully bill for the call is not lost because the endpoint reports to the gatekeeper the beginning and end time of the call via the RAS protocol. Various pieces of billing information may be present in the ARQ and DRQ messages at the start and end of the call
If the SIP proxy wants to collect billing information, it has no choice but to stay in the call signaling path for the entire duration of the call so that it can detect when the call completes. Even then, the statistics are skewed because the call signaling may have been delayed. Otherwise, there is no mechanism in SIP to perform any accounting/billing function.
A call can be established in as few as 1.5 round trips using UDP:
Setup -> <- Connect Ack ->
Of course, more elaborate call establishment procedures may be required to negotiate complex capabilities, negotiate complex video modes, etc.
A call can be established in as few as 1.5 round trips using UDP:
INVITE -> <- 200 OK Ack ->
Most real-world flows are more complex, as they often pass through one or more proxy devices, have intermediary response messages, and "negotiate" capabilities through a "trial and error" process that is far from scientific.
H.323 entities may exchange capabilities and negotiate which channels to open, including audio, video, and data channels. Individual channels may be opened and closed during the call without disrupting the other channels.
SIP entities have limited means of exchanging capabilities. RFC 3407 is the state of the art, which is more or less a "declaration" mechanism, not a negotiation procedure. The end result is still a "trial and error" approach in case the called party does not support the proposed media
H.323 gatekeeper can control the call signaling and may fork the call to any number of devices simultaneously.
SIP proxies can control the call signaling and may fork the call to any number of devices simultaneously.
H.323 borrows from traditional PSTN protocols, e.g., Q.931, and is therefore well suited for PSTN integration. However, H.323 does not employ the PSTN's circuit-switched technology--like SIP, H.323 is completely packet-switched. How Media Gateway Controllers fit into the overall H.323 architecture is well-defined within the standard.
SIP has no commonality with the PSTN and such signaling must be "shoe-horned" into SIP. SIP has no architecture that describes the decomposition of the gateway into the Media Gateway Controller and the Media Gateways. This has been a recent study of 3GPP and others in the form of IMS. Presently, there are about 4 "IMS" variants: 3GPP, ITU NGN, 3GPP2, and PacketCable. Pick the architecture you like best, I suppose.
H.323 Services may be provided to the endpoint through a web-browser interface using HTTP or a feature server using Megaco/H.248. In addition, services may be provided to an endpoint as it places a call, as a call arrives, or during the middle of a call by a gatekeeper or other entity that routes the call signaling. As a result, H.323 is well-suited to providing new services.
SIP devices can receive service from a SIP proxy as the endpoint places a call, as a call arrives, or during the middle of a call. There is no defined way within SIP of providing services via a web browser or a feature server, as everything is done within the context of a "session".
Video and Data Conferencing
H.323 fully supports video and data conferencing. Procedures are in place to provide control for the conference as well as lip synchronization of audio and video streams.
SIP has limited support for video and no support for data conferencing protocols like T.120. SIP has no protocol to control the conference and there is no mechanism within SIP for lip synchronization. There is no standard means of recovering from packet loss in a video stream (to parallel H.323's "video fast update" command).
H.323 does not require a gatekeeper. A call can be made directly between two endpoints.
However, most devices do utilize a gatekeeper for the purpose of registration and address resolution
SIP does not require a proxy. A call can be made directly between two user agents.
However, most devices do utilize a SIP proxy for the purpose of registration, address resolution, and call routing.
H.323 supports any codec, standardized or proprietary. No registration authority is required to use any codec in H.323.
SIP supports any IANA-registered codec (as a legacy feature) or other codec whose name is mutually agreed upon.
Provided by H.323 "proxy" or by the endpoint, both in conjunction with a gatekeeper residing in the public network. Refer to H.460.17, H.460.18, and H.460.19.
SIP does not defined a NAT/FW traversal mechanism, as this is left to other standard. Some standards that have been defined or are being defined are STUN, TURN, ANAT, and ICE. (All of this has been work in progress for years, with most workable solutions done by agreed convention.)
H.323 Reliable or unreliable, e.g., TCP or UDP. Most H.323 entities use a reliable transport for signaling.
SIP Reliable or unreliable, e.g., TCP or UDP. Most SIP entities use an unreliable transport for signaling.
Third-party Call Control
H.323 ->Yes, through third-party pause and re-routing which is defined within H.323. More sophisticated control is defined by the related H.450.x series of standards.
SIP->Yes, through SIP as described in RFC 3
H.323-Yes, an MC is required for this, but it could be co-located in a participating endpoint, or all endpoints could contain an MC. A stand-alone conference bride may provide this functionality and H.323 has well-defined procedures for such entities.
SIP-No; however, SIP user agents may perform conferencing themselves. A stand-alone conference bridge may also provide this functionality
H 323 - Yes, via H.235.
SIP-Yes, via HTTP (Digest and Basic), SSL, PGP, S/MIME, or various other means.
H323 - Three ways, with the alphanumeric choice of the H.245 UserInputIndication message being the baseline carriage common to all H.323 endpoints
SIP- Three ways. There is no baseline carriage, which presents issues of interoperability. However, transport of DTMF via the INFO method and RFC 2833 are most common.