gRPC for Data Engineers
If you’ve been around Data Engineering for a while, like me, you’ve noticed a few trends in the industry at wide, and in individual data engineers themselves. There seem to be a few types of data engineers, and it depends on where you’ve worked, and what your projects have looked like that put you here or there. Some data engineers focus on general ETL, Data Warehousing, and such things. They move data around and transform it using a myriad of tools. The other set of data engineers are more focused on infrastructure at a low level, they provide the underlying tools and services others use to make that data move around and transfer.
Which are you? One of those topics you may or may not be familiar with depending on your background is RPC
or more specifically gRPC
. What is it?
What is gRPC?
This might be confusing, or not, but stay with me. RPC
and gRPC
is a method of client-server communication, similar to, yet different from popular options you know as REST
and OpenAPI
. Where in a client and server are communicating via HTTP, exposed to the developer. For example, the popular Python package requests
. It usually has to do with URL's
and sending data over those port 80’s via some URL encoded call.
All code is available on GitHub.
Ok, so if that isn’t gRPC
, what is it?
gRPC
is that same client-server communication, usually using protobuffs
, where the client can actually call a method
on the server application … directly.
Many times, in software architecture, if you have to applications sitting on different machines, and one needs to pass information or data to another, to take action … say to add a new user, this would have been done via building a REST
or OpenAPI
. Some applications would build json
messages and call an API endpoint via HTTP methods.
The benefit of gRPC
is abstracting away some of that busy work, and maybe get a little smarter and faster about making the communication. Using a pre-defined format, “directly” exercise code on that target machine.
gRPC pieces and parts.
gRPC
uses Google’s open-sourced message standard (“serializing structured data”) called Protocol Buffers or ProtoBuffs.
You define what you want the data message serialized to look like in a .proto
file.
Once these definitions are complete, you use a compiler protoc
to automatically generate the classes you need for your language of choice, like Python. These generated classes/files will allow you in your code to send and receive (serialize and deserialize) the data/messages.
Next, you will also use the .proto
files to define services. This concept should be familiar to you. A service is going to a logical grouping of your methods/messages.
In review…
- define a “service“
- define your protobuf “messages“
- compile (
protoc
) your.proto
file(s) into code for your lanauge.
Try out gRPC with Python.
What a better way to try out gRPC
with Python? First things first, install what we need.
pip3 install grpcio grpcio-tools
Let’s start with a simple example.
Learn gRPC with Dune … be an evail Harkonnen trying to find the Muad’dib.
We are going to learn to use gRPC
and using Python by playing a little Dune together. Since you are clearly an evil genius let’s pretend you are the Mentat for the House of Harkonnen and your very life depends on finding that sneaky Muad’dib.
There are of course Harkonnen soldiers scouring the face of Arrakis in search of the Muad’dib, they send you a message asking if this person they found in the Muad’dib or just some unfortunate Freeman. Being an evil genius Mentat, you decide to write an gRPC
service to respond to these requests.
The first step is to define in our .proto
file the service
and the request
and response
to and from that service.
syntax = "proto3";
package dune;
service Dune{
rpc isMuadDib(WeFoundHim) returns (DidYouReally) {}
}
message WeFoundHim{
string message = 1;
}
message DidYouReally{
string message = 1;
}
You can see we defined a service Dune
and can send a isMuadDib
for someone, and get a DidYouReally
response. Pretty straightforward.
Our next step is to use our pip installed grpc_tools
to compile and push out all our code needed to run this service.
python -m grpc_tools.protoc -I protos --python_out=. --grpc_python_out=. example.proto
Here I am simply stating my proto file lives in a folder called protos
, and just output the Python files into the current directory. You can see the result.
Two files were generated example_pb2.py
and example_pb2_grpc.py
The first file contains classes for the Request and Response Messages, the second file is the Client and Server classes.
Now that the base code has been generated for us we need to actually implement the logic of our isMuadDib
. I mean we need something
to happen when a Harkonnen soldier sends his request to us to know if we found the Muad’dib or not.
Let’s create a new file called dune_server.py
and populate it with a new class and method definition as follows.
import example_pb2_grpc
import example_pb2
import random
import grpc
from concurrent import futures
class Dune(example_pb2_grpc.DuneServicer):
def __init__(self):
self.choices = ["yes", "no"]
def isMuadDib(self, request, context):
response = random.choice(self.choices).upper()
name = request.message
return example_pb2.DidYouReally(message=f"Hello minion, You ask me if this {name} is the Muad'dib .... {response}!")
def serve():
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
example_pb2_grpc.add_DuneServicer_to_server(Dune(), server)
server.add_insecure_port('[::]:50051')
server.start()
server.wait_for_termination()
if __name__ == '__main__':
serve()
So we imported our files that the compile pushed out for us. We made a class Dune
and implemented our isMuadDib
method which uses the power of a Mentat to decide if this the Muad’dib or not. Notice in the return
of our method we call out the DidYouReally
message definition that was defined in our .proto
as the response. Also, our Dune
class needs to inherit from the DuneServicer
class that was autogenerated.
The serve
the definition can be taken from the grpc
example and quick start guide, but is straightforward.
Finally, we are going to make our last file … dune_client.py
and populate it to run this whole thing.
import grpc
import example_pb2_grpc
import example_pb2
def run():
channel = grpc.insecure_channel('localhost:50051')
dune = example_pb2_grpc.DuneStub(channel)
response = dune.isMuadDib(example_pb2.DidYouReally(message='Daniel Beach'))
print("Greeter client received: " + response.message)
if __name__ == '__main__':
run()
The client is simply creating a grpc
channel on the wire and sending a isMuadDib
request and getting a DidYouReally
response. Simple enough.
Final task.
Start the server in a terminal window …
python3 dune_server.py
While that bugger is running, try running the client that is going to check if Daniel Beach
is the Muad’dib or not.
python3 dune_client.py
And it works!!!
danielbeach@Daniels-MacBook-Pro gRPC % python3 dune_client.py
Greeter client received: Hello minion, You ask me if this Daniel Beach is the Muad'dib .... NO!
This is great … first, I’m not the Muad’dib and won’t be executed by the Harkonnen soldiers! Second, this gRPC
wasn’t bad at all, was it?!
Musings on gRPC and Dune, Harkonnen’s and the Muad’dib.
Honestly, I was quite impressed with the ease of implement a gRPC
project in Python. It was smooth and was very simple.
It seems to me if you are building a service or project and you’re thinking about using REST API of some sort, and use something out of the box …. your code base will probably explode with the boilerplate and complexity of adding just the REST service. This doesn’t appear to be the case with gRPC
on the surface.
Fewer lines of code are always good in my book. Fewer things to break and have to manage.
Also, the way you define a .proto
file is genius. It really distills into a single and simple file what EXACTLY you are trying to do and the expected calls and responses. I’ve had to dig through REST API code before trying to figure out what is happening and it was never this easy or straightforward.
Also, if you haven’t read Dune yet, shame on you. The movie is coming out soon, get on it.
The article was useful and very appropriate. Thank you for sharing your experience