mediasoup (server side, so the Node + C++ component you mean) does not implement "RTCPeerConnection". That's just needed for browsers. In mediasoup we don't use SDP but specific RTP parameters as defined here:
Ah, okay, whenever I think webrtc I assume p2p with no server but I am now actually reading into what SFU is, etc. Makes sense. Thanks for pointer to these resources.
DataChannels are transmitted over the same UDP/ICE "connection" that is used to transmit audio and video packets. So if you plan to send real-time data (for example: real-time subtitles, metadata related to the current video position, etc) by sending such a data over DataChannel it will reach the remote without delay over the audio/video. If you use WebSocket to transmit the data, there may desynchronization between audio/video and data because they use a different network path.
mediasoup (in server side) is a Node.js library or a NPM dependency that you integrate into your Node.js app. Of course it comes with tons of C++ lines but, from the point of view of the user, it's just yet another NPM dependency into your Node.js project.
Yep, two active developers but being just a set of libraries it's good enough. We also get nice contributions (C++ fixes and optimizations) via PR in GitHub. And we use mediasoup in different commercial products.
Jitsi is a full application (web app, backend servers) with a specific use case: meetings (similar to Zoom or Google Meet).
mediasouop is not an application but a set of server and client low level libraries to build whichever kind of audio/video applications (not just meetings). You don't "install mediasoup and configure it". You create your Node app and integrate mediasoup as you do with any other NPM dependency. Same in client side. More here:
Well, it's webRTC. So the main use case is live video streaming. But one would need to define 'live'. webRTC is really made for sub-second latency, which you need for conversations. If you don't require this you're better of using HLS streaming. Because achieving ultra-low latency does come with tradeoffs in complexity and quality.
webRTC is peer 2 peer, but that doesn't work if you have a lot of peers. That is where an SFU like mediasoup comes into the picture. That's a kind of relay server so you can send to many peers still over webRTC (thus with ultra low latency). Also, if the peers are behind firewalls, peer 2 peer also doesn't work and you need a TURN server to relay the video.
I think there's nothing stopping you from attempting that, but you would need some pretty complex client software to get a good experience with P2P live streaming.
mediasoup is a SFU that must be deployed in a reachable server, so STUN is not needed at all. You may need a TURN server if a client has a restrictive firewall that blocks UDP. mediasoup is not a TURN server but you can deploy a TURN server (i.e. coturn) in your backend.
Are there plans to support a TCP candidate so a TURN server isn't needed at all? It feels a bit wasteful to effectively use a TURN server as a TCP->UDP proxy for a publicly accessible server.
> TCP candidate so a TURN server isn't needed at all?
This is not true. A router may still block TCP traffic different than TLS or traffic that does not have destinatiuon port 80 or 443. So ICE TCP candidates do not avoid the need for a TURN server in certain cases.
You cannot select a specific listening port for a specific transport, because each WebRTC transport requires, at least, a different listening port in the server:
Is there a reason for the restriction of one connection per port? I would have thought you would be able to use the same port for each peer source ip/port tuple?
Not doubting you - but I never experienced this limitation with other client/server applications. I have an http server serving over 200k concurrent websockets on port 443, for example.
rfc3550 states that it is per destination ip/port tuple. So you should be able to support multiple connections per local port. Is it possible this is an oversight in the current implementation? I appreciate this isn’t TCP, which is why I have just read through all relevant RFCs.
Why is that so important? As I said, choosing a specific port is not enough. This is not TLS. An aggressive firewall may drop those TCP connections because there is no TLS data on them.
TLS port was just a thought, as I want to reduce cases where turn server is used because of a limitation with scalability (65k connection limit per turn server due to a shared source ip). But our discussion has raised another issue regarding mediasoups limitation of one source per local port - which compounds the issue.
I’m replacing a web socket server with a data channel server. If I use mediasoup then I will need to listen over 4 ip4 addresses to support the 200k clients I can currently support on 1 ip address with web sockets. Not a huge deal right now, but if I want to support millions of user it means managing 40 or so ip addresses instead of 1 or 2.
Not knocking mediasoup at all, just now aware of a limitation that sounds like it doesn’t need to exist so seeing if we can do something about it.
This is RTP not WebSocket or HTTP. Media servers need a separate port for each RTP communication. A hack could be done to make all WebRTC endpoints to use a single port in mediasoup side. However mediasoup also support plain RTP endpoints and, in those, you need to be ready to listen for RTP from any remote IP:port (you don't know it in advance due to NATs). In WebRTC we can use ICE user/pwd (previously given to the server via signaling) but that's not possible with plain/regular RTP (no ICE).
Isn’t that what the SSRC is for? I.e you use the SSRC (sent as part of media in the SDP) to identify the stream, rather than trusting an authentic stream is the only one to send to an open port? At least, that is how I understood the (multiple) rfcs. Not an expert here by any means.
In WebRTC spec (although not super mandatory but the current way to go), the client no longer signals its sending SSRCs into the SDP but a MID and optional RID values (if simulcast is in use), and those MID and RID are not supposed to be unique across all participants (not at all, but neither SSRCs are supposed to). Those MID and RID values are signaled in the SDP and then included into RTP packets as header extensions. The remote matches RTP packets based on them and then learns the associated SSRC for a faster lookup for future packets.
Anyway, WebRTC is not just about RTP. In fact, before RTP happens, ICE and DTLS must de done.
I just tried the demo - btw, best UX so far from all that I've tried.
It works fine on the same wifi network, but won't connect if one of the devices is using 4G network - is this because the TURN server is not setup? Is it easy to implement that?
The demo is just a demo. mediasoup is a low level library (no UX into mediasoup). The online demo backend does not have any TURN server. It's obviously recommended to deploy a TURN server.