-
Schema Definition Language (SDL):
- Protobuf schemas are defined using a simple language called SDL.
- Schemas define the structure of your data using messages, fields, and data types.
syntax = "proto3";
message Person {
string name = 1;
int32 age = 2;
repeated string emails = 3;
}
-
Data Types:
- Protobuf supports various data types, including scalar types (integers, floats, booleans, strings) and complex types (enums, messages, repeated fields).
-
Defining Messages:
- Messages are the primary building blocks of Protobuf schemas.
- You define messages using the
message
keyword followed by a name.
- Inside a message, you define fields with a unique field number, a data type, and a field name
message Person {
string name = 1;
int32 age = 2;
}
- What its doing above is defining where the field is, and of what type.
- Now, if I was to send the message name = Tahir, age = 100, the message will be sent as:
- “125Tahir234Muhammad”
- This is read as 1st field, type = 2 = string, 5 characters for first name and then the value, Tahir
- 2 is for 2nd field, 3 is for type of data, here its INT32, and 4 is for 4 bytes of data
-
Nested Messages:
- Messages can be nested within other messages to create hierarchical structures.
- This allows you to represent complex data structures.
message Address {
string street = 1;
string city = 2;
}
message Person {
string name = 1;
int32 age = 2;
Address address = 3;
}
-
Enums:
- Protobuf supports enumerations (enums) for defining a set of named values.
- Enums are useful when a field can only have one of a predefined set of values.
enum Color {
RED = 0;
BLUE = 1;
GREEN = 2;
}
message Person {
string name = 1;
int32 age = 2;
Color favorite_color = 3;
}
-
Repeated Fields:
- Repeated fields allow you to have a field with multiple values of the same type.
- They are useful when you want to represent a collection of elements within a message.
-
Code Generation:
- Protobuf schemas can be compiled into code in various programming languages.
- The compiled code provides APIs for serializing/deserializing data in the Protobuf format.
- After defining your Protobuf schema, you can generate code using the Protobuf compiler (
protoc
) and the corresponding language-specific plugin.
-
Serialization:
- Serialization refers to converting structured data into a binary format.
- With Protobuf, you can serialize messages to a binary format that can be efficiently transmitted over the network or stored on disk.
// Assume a Person message object
Person person = Person.newBuilder()
.setName("Alice")
.setAge(25)
.addEmails("[email protected]")
.addEmails("[email protected]")
.build();
// Serialize the message to a byte array
byte[] serializedData = person.toByteArray();
-
Deserialization:
- Deserialization is the process of converting binary data back into structured objects.
- Protobuf provides methods to deserialize binary data into the corresponding message objects.
// Assume we have the serializedData byte array
Person person = Person.parseFrom(serializedData);
System.out.println(person.getName()); // "Alice"
System.out.println(person.getAge()); // 25
-
Versioning and Compatibility:
- Protobuf supports forward and backward compatibility when evolving schemas.
- You can add new fields without breaking existing code, and old code can still read new messages (with missing fields).
-
Interoperability:
- Protobuf is language-agnostic, meaning you can generate code in different programming languages.
- This allows you to exchange data between different systems implemented in different languages.
Protobufs offer a compact binary representation, faster serialization/deserialization, and versioning flexibility, making them popular for data interchange between different systems.