So just before I start, I know some of you, but for the rest, so who's a database person,
who's a hardware or computer architect person?
Maybe...
It's in the morning.
So who's a database person?
Okay, so if I'm too fast with some of the database things, just stop me and ask questions.
If something is unclear, I'm happy to answer in between.
With the rest, I'm also happy to take questions offline at the end.
So today, what I want to do is give you an overview of my work, of my group, what we
are doing in the, let's say, intersection of databases and modern network technology.
And this has different flavors.
So at the beginning, I will go more in the direction of high-speed networks, and at the
end, some outlook what we are currently working on, how we can use programmable networks,
so smart devices, smart network interface cards and switches to support data processing.
But let's start with the first part of the talk.
So if you look just 10 years back when the main memory revolution somehow happened, people
have seen the network as an evil.
Because if you at that time compared what were typical numbers for memory speed versus
network speed, you saw that network was painfully slow compared to what you could get from memory.
So if you would read something from a remote machine in a distributed database system,
your queries would have been really tremendously been slowed down.
So the distributed database mantra was for many years, avoid the network if you can.
So locality first.
Try to cluster data, partition data across a distributed system such that you avoid network
communication.
And this resulted in a line of work in different directions, for example, complex partitioning
schemes, so trying to somehow capture the semantics of the application in the data
partitioning scheme such that you don't need to shuffle data around.
Or even, let's say, some more complex communication avoiding schemes, so if you do query processing,
you were spending some extra cycles to find out what you actually need to ship and what
you don't need to ship over the network.
So the idea was spend more computation in order to avoid communication.
And so this was, I would say, the design for many, many years in databases.
But recently, if you look what happened, and you would revisit those numbers, you would
see, OK, and the network has quite evolved.
And network technology that was for a long time used in high-performance clusters, which
provided high network bandwidth and low latencies would now come affordable.
So what you get now today, for example, is if you compare ethernet with high-speed networks
like InfiniBand, then you see that actually the difference in prices is not that different.
So you can afford an InfiniBand network for a normal cluster today.
And as I said, the bandwidth of the network itself is getting close to what the memory
is providing.
So here's just a comparison, again, DDR3 numbers, one memory channel.
So not the full bandwidth of a system which has multiple memory channels, and one port
of a modern InfiniBand network interface card of the current standard.
So current standard is even going higher.
But one of the recent standards is providing you see that they are actually, so one memory
channel is comparable to what one network port can provide.
And we did the experiment in our group.
So if you put multiple network cards into your machine, you can actually match the memory
Presenters
Prof. Carsten Binnig
Zugänglich über
Offener Zugang
Dauer
00:40:17 Min
Aufnahmedatum
2019-11-29
Hochgeladen am
2019-12-05 12:31:32
Sprache
de-DE
As data processing evolves towards large scale, distributed platforms, the network will necessarily play a substantial role in achieving efficiency and performance. Modern high-speed networks such as InfiniBand, RoCE, or Omni-Path provide advanced features such as Remote-Direct-Memory-Access (RDMA) that have shown to improve the performance and scalability of distributed data processing systems. Furthermore, switches and network cards are becoming more flexible while programmability at all levels (aka, software-defined networks) opens up many possibilities to tailor the network to data processing applications and to push processing down to the network elements. In this talk, I will discuss opportunities and present our recent research results to redesign scalable data management systems for the capabilities of modern networks.