TCP might seem straightforward, especially when using Go. You set up a net.Conn, call Write to send data, Read to receive it, and everything just seems to work. It’s straightforward, minimal, and very Go-like. However, this simplicity can lead to misunderstandings because the details beneath the surface are easy to overlook.
This blog post aims to demystify the real workings of TCP using practical examples, tackling common myths, all within the context of Go development.
A Simple Starting Point
When using Go’s net.Conn, it’s very straightforward to get things up and running quickly. For example:
Sender:
conn.Write([]byte("hello"))
Receiver:
buf := make([]byte, 1024)
n, _ := conn.Read(buf)
fmt.Println(string(buf[:n]))
In basic situations, this approach seems flawless. One write, one read—it seems like it just works. This can lead to the assumption that the process will always be this simple.
From Simple Messages to File Transfers
Let’s take this further and try transferring a file:
Sender:
file, _ := os.Open("large_file.dat")
io.Copy(conn, file)
Receiver:
buf := make([]byte, 1024)
n, _ := conn.Read(buf)
out.Write(buf[:n])
Suddenly, issues arise. File transfers might not complete, data seems to disappear, and it sometimes works, sometimes doesn’t. This brings us to the actual nature of TCP.
The Nature of TCP: Byte Stream, Not Messages
TCP operates as a continuous flow of bytes, not discrete messages like you might think. Here’s what that means:
- If you send
conn.Write(A)thenconn.Write(B), what you receive could be: ABas one combined readAfollowed byBAbroken across multiple reads
TCP doesn’t recognize the end of one message and the start of another. It only guarantees the data arrives in the order it was sent.
The 1024 Buffer: A Convenient Illusion
Why did the small example work? Thanks to the small message size and good timing—everything fit in one go. However, this reliability falls apart with larger data, exposing the underlying issue: that buffer size isn’t a substitute for message boundaries.
Creating Reliable Transfers: Using Message Boundaries
To transfer data consistently, the receiver needs to know how much data belongs to each message. We can solve this by including the size of the data before the data itself:
[length][payload]
Sender:
data := []byte("hello world")
binary.Write(conn, binary.BigEndian, uint32(len(data)))
conn.Write(data)
Receiver:
var length uint32
binary.Read(conn, binary.BigEndian, &length)
buf := make([]byte, length)
io.ReadFull(conn, buf)
This method lets the receiver know exactly how many bytes to expect for each piece of data.
Creating a SendFile and ReceiveFile Function
Let’s implement a file transfer in Go with message boundaries:
Sender:
func sendFile(conn net.Conn, filePath string) error {
file, err := os.Open(filePath)
if err != nil {
return err
}
defer file.Close()
fileInfo, err := file.Stat()
if err != nil {
return err
}
fileSize := uint32(fileInfo.Size())
binary.Write(conn, binary.BigEndian, fileSize)
_, err = io.Copy(conn, file)
return err
}
Receiver:
func receiveFile(conn net.Conn, destPath string) error {
var fileSize uint32
binary.Read(conn, binary.BigEndian, &fileSize)
file, err := os.Create(destPath)
if err != nil {
return err
}
defer file.Close()
buf := make([]byte, fileSize)
_, err = io.ReadFull(conn, buf)
if err != nil {
return err
}
_, err = file.Write(buf)
return err
}
Explaining conn.Write and binary.Write
There’s often confusion about why we use both conn.Write and binary.Write. Here’s a quick rundown:
conn.Writesends raw data as bytes.binary.Writeformats and sends structured data, like numbers, by converting them into a specific byte order.
For example:
binary.Write(conn, binary.BigEndian, uint32(100))
This is equivalent to:
conn.Write([]byte{0x00, 0x00, 0x00, 0x64})
Importance of Endianness
When you send numbers as bytes, the order matters:
- Big Endian:
[00][00][00][64](standard in networking) - Little Endian:
[64][00][00][00]
Ensuring both sender and receiver agree on this order is crucial for correct data interpretation.
Beyond Boundaries: Understanding Data Types
Knowing how many bytes to read is one thing, but understanding what those bytes mean is another. For instance, a byte sequence like [00 00 00 64] could be a number, a string, or part of a file.
When Structure Is Essential
If your system only handles one type of data, like file transfers, boundaries are enough. But if it handles various data types (file chunks, messages, commands), you need a different approach:
[type][length][payload]
This helps the receiver understand what the data is and how to handle it.
Managing Data: File Transfer vs. Streaming
An important aspect is knowing whether data exchanges have a defined end:
File Transfer:
[length][file]
- Has a clear start and finish.
Streaming:
[chunk][chunk][chunk][...]
- Can go on indefinitely, ending only when the connection is closed or a special signal indicates the end.
Practical File Transfers with Chunks
Instead of
[file size][huge file]
more practical systems use:
[chunk][chunk][chunk]
This allows for tracking progress, resuming transfers, and ensuring more reliable communication.
Conclusion: Building a Solid Understanding
Here’s the key takeaway:
TCP → Gives you bytes
Length → Tells you how many bytes to read
Structure → Explains what those bytes are
Keep these guidelines in mind:
- Never assume
Readcorresponds to a complete message. - Buffer size is not a boundary marker.
- Define clear message boundaries for complex data.
- Use length prefixes for simplicity.
- Incorporate type definitions only when necessary.
By focusing on these elements—bytes, boundaries, and meaning—you can develop more reliable networking systems. Keep this guide handy as you navigate TCP’s complexities.
Enjoyed this article? Support my work with a coffee ☕ on Ko-fi.