GatewayD Internals: Dissecting a Potent OpenSource Database Gateway
A state-of-the art approach involving Machine Learning and query-level tapping
Database gateways are relatively unknown to the average programmer as there is no need for them on a local PC. However, in production is used for a variety of reasons. This post dives into the inner workings of GatewayD, an OpenSource database proxy with a bold vision. This post covers it all, from overcoming challenges to applying matching learning. Email readers, excuses, the post might be too long for you.
What’s an API gateway, again?
As a man in the middle, an API gateway sits between clients and API servers.
Since an API gateway sits between requests and servers, it can be used for many different purposes. Here is a caching example.
The request is sent.
The gateway looks if a response is available in the cache.
If it is OK, it returns it.
And sends it back to the client.
For authentication, it can verify if requests are authenticated before forwarding them.
For analytics, we can use it to have insights on traffic.
You get the idea, since it sits in between, it can be used to achieve a lot of features.
What’s a database gateway?
A database gateway is similar. It sits between clients and databases.
To better understand the whole picture, let’s see what happens when sending a query to a database.
The client takes the query, converts it through the driver, and sends it to the server which decodes and acts upon the query as needed. So, if we put a database query in the middle, it will have access to this converted form of the query. For a database agnostic gateway such as GatewayD, it needs to have a query parser for each database it supports if it wants to understand the query and allow people to interact.
The evolution of GatewayD
GatewayD responds well to the challenges of a gateway. The first issue is performance. Golang was chosen to provide decent benchmarks.
Since GatewayD does not provide a changelog, I crafted an emojilog to visualize when important features were added. You can view the complete changelog here if you wish.
Or, if you want a shortcutter version:
The features were added in chronological necessity order. First, a connection pool was coded to accept connections.
Then a proxy was added to ferry requests from the client to the server and the response back.
Then hooks were added to track when events occur, to be used by the plugins.
Then the plugins system was implemented. Plugins are notified of the hooks it subscribe to.
Plugins can also modify traffic.
Then logs and configs were added.
Then Prometheus metrics system was added.
Then the order of logger loading was changed, and dev mode was added, as well as the admin API, which exposes a series of endpoints.
Then openTelemetry was added, then plugin commands, until finally TLS support was introduced.
Here is a recap of what we’ve been talking about.
Tackling waiting tasks
Now, let’s say you want to handle a task when receiving a request. If you wait for the task to complete, the proxy will not be performant. In the case where you can handle the task in the meantime, you can use a go routine.
TLS termination
Postgres supports TLS termination. Now, sometimes, a TLS termination proxy is used to switch from HTTPs to HTTP traffic.
Normally, to upgrade to an SSL connection, the server sends back an S.
GatewayD supports TLS termination. You can see the GatewayD commit where this is discussed.
Difference between events and hooks
There is no difference really. Hooks are divided into 3 categories: Config, Notification and Traffic. Events are of the notification type.
Logging and multi-tenancy
GatewayD provides a lot of logging configuration, ready for multi-tenancy.
A note about pools
In the case of a fixed proxy, 2 pools are used, the available connections pool and the busy connections pool. When a client is connected, a connection is removed from the first one and added to the second one, and added back when a client disconnects.
It also has a plugins pool. In the case where the user wants an elastic proxy, then the available connections pool is not used.
Exciting uses
One of the WIP is providing a query tree available to plugins, so that analysis can be done.
This also greatly improves Machine Learning based tasks.
Conclusion
Though not being flashy about marketing, GatewayD has been delivering awesome work, with good release rhythm and contributor traction. It adequately responds to proxy challenges and enhances the user experience with extremely high configurability and observability setup. It tries hard to give plugins the right amount of power.
It’s solid IMO and at the same level as classical products like the Envoy proxy. The code is a joy to dive in and the documentation a joy to read. Hope you liked diving into a database gateway internals post!
Article is very good, keep on posting chief
Thanks for the article, it was really interesting.
But, why do we need a database proxy? Is it to share the load between read replicas of a DB? 🤔