Protocol Buffers

Protocol buffers are a language-neutral, platform-neutral extensible mechanism for serializing structured data.
tl;dr
How do I start?
- Download and install the protocol buffer compiler.
- Read the overview .
- Try the tutorial for your chosen language. (python )
overview
https://developers.google.com/protocol-buffers/docs/overview
It’s like JSON, except it’s smaller and faster, and it generates native language bindings.
Protocol buffers are a combination of the definition language (created in .proto files), the code that the proto compiler generates to interface with data, language-specific runtime libraries, and the serialization format for data that is written to a file (or sent across a network connection).
These are protobuf’s main components (2)
Protoc
(compiler)
- It is for data format compile
- It compiles
.proto
files
SDK
-
each language support
-
The proto compiler is invoked at build time on .proto files to generate code in various programming languages
-
Each generated class contains simple accessors for each field and methods to serialize and parse the whole structure to and from raw bytes.
languages
The following languages are supported directly in the protocol buffers compiler, protoc:
- C++
- C#
- Java
- Kotlin
- Objective-C
- PHP
- Python
- Ruby
The following languages are supported by Google, but the projects’ source code resides in GitHub repositories. The protoc compiler uses plugins for these languages
- Dart
- Go
Pros
- language/platform-neutral (low coupling, microsevice)
- Compact data storage
- Fast Parsing (compared to json?)
- Availability in many programming languages
- Optimized functionality through automatically-generated classes
- You can update
Proto Definitions
without updating code.- which refers you can control code(especially data schema) version compatibility.
Cons
- Protocol buffers tend to assume that entire messages can be loaded into memory at once and are not larger than an object graph. For data that exceeds a few megabytes, consider a different solution; when working with larger data, you may effectively end up with several copies of the data due to serialized copies, which can cause surprising spikes in memory usage.
- When protocol buffers are serialized, the same data can have many different binary serializations. You cannot compare two messages for equality without fully parsing them.
- Messages are not compressed. While messages can be zipped or gzipped like any other file, special-purpose compression algorithms like the ones used by JPEG and PNG will produce much smaller files for data of the appropriate type.
- Protocol buffer messages are less than maximally efficient in both size and speed for many scientific and engineering uses that involve large, multi-dimensional arrays of floating point numbers. For these applications,
FITS
and similar formats have less overhead. - Protocol buffers are not well supported in non-object-oriented languages popular in scientific computing, such as Fortran and IDL.
- Protocol buffer messages don’t inherently self-describe their data, but they have a fully reflective schema that you can use to implement self-description. That is, you cannot fully interpret one without access to its corresponding
.proto
file. - Protocol buffers are not a formal standard of any organization. This makes them unsuitable for use in environments with legal or other requirements to build on top of standards.
Flow
.proto definition syntax (3)
- optionality(field rules)
optional
repeated
- Repeated fields are represented as an object that acts like a Python sequence
singular
(proto3, default, 단수형)required
(deprecated)reversed
- field type
message
- you can nest parts of the definition, such as for repeating sets of data.
enum
- set of values to choose from.
oneof
- which you can use when a message has many optional fields and at most one field will be set at the same time.
map
- field number
- basic scalar type
- additional scalar type
Field numbers cannot be repurposed or reused. If you delete a field, you should reserve its field number to prevent someone from accidentally reusing the number.
필드 번호는 Protobuf의 중요한 부분입니다. 이진 인코딩된 데이터의 필드를 식별하는 데 사용됩니다. 즉, 서비스 버전에서 버전으로 변경할 수 없습니다. 장점은 이전 버전과의 호환성 및 앞으로 호환성이 가능하다는 것입니다. 클라이언트 및 서비스는 누락된 값의 가능성이 처리되는 한 모르는 필드 번호를 무시합니다.
이진 형식에서 필드 번호는 형식 식별자와 결합됩니다. 1에서 15까지의 필드 번호는 해당 형식으로 단일 바이트로 인코딩할 수 있습니다. 16에서 2,047까지의 숫자는 2바이트를 사용합니다. 어떤 이유로든 메시지에 2,047개 이상의 필드가 필요한 경우 더 높아질 수 있습니다. 필드 번호 1에서 15까지의 싱글 바이트 식별자는 더 나은 성능을 제공하므로 가장 기본적으로 자주 사용되는 필드에 사용해야 합니다.
Example
This is the status.proto
file used by Google. (refs
)
|
|
protobuf: python
https://developers.google.com/protocol-buffers/docs/reference/python-generated#invocation
protobuf vs …
- protobuf(grpc) vs thrift
- protobuf(grpc) vs graphql
- https://github.com/google/rejoiner
- 구글에서 graphql과 호환 lib 오픈소스화 시킴.
- https://medium.com/@lvdbrink/graphql-meets-protocol-buffers-in-go-cdbf11090934