46 - Cross Media File Storage with Strata [ID:9509]

50 von 891 angezeigt

The following content has been provided by the University of Erlangen-Nürnberg.

So yeah, hey everybody. The talk is basically in three parts. I'm going to take a bit of time in the beginning, just introduce myself to you and tell you a bit about the type of research that I do.

Then we're going to do a deep dive into Strata, which is a new file system that we're building at UT Austin according to some new server hardware and software trends.

I'm going to tell you what those are. And then finally at the end I'm going to take just a little bit of time to tell you about some further future trends in server computing and in data centers and some ongoing projects that I have.

I should also acknowledge the students that have been working hard on this project, Youngjin, Henrique and Tyler from UT Austin as well as Walid Rada, originally from KTH, who was my intern last semester.

And then Emmett Wichel from UT Austin, Tom Anderson from UW, and Marco Canini from KAUST to help steer the project together with me.

Okay, so you probably already know I'm a systems researcher. I work primarily in operating systems and networks.

My research approach is typically that I try to understand an entire problem space, including hardware and application trends that are adjacent to this problem space.

So, for example, what I typically do is for a certain area, let's say data centers, I try to think myself five years into the future, look at the hardware trends, look at the application trends,

and then determine what they're going to look like in five years and how they might progress in the future after that. And then I say, well, what does that mean for the operating system, for example?

How should we be building it? And once I made up my mind as to how to build it, then I go ahead and do build it, actually, build a real system, evaluate it, and then publish on it, and then typically release it as open source afterwards so other people can benefit from it as well.

I'm typically driven by big problems. We're going to see a bit later what I mean by that. And all of these things together then hopefully allows me to publish in top conferences such as OSDI and SOSP.

I'm also pretty collaborative. I work with colleagues in industry and academia, which you've already seen in the title slide here.

I'm also the director of the Texas Systems Research Consortium, which I started at UT Austin, which is a platform that allows me and the other systems-minded faculty at UT Austin to collaborate better with industry colleagues, for example, to learn from them what they see are the big problems that are coming or new technologies that might be coming in the next couple of years.

And then we try to see if we can do some research together. It doesn't always work, but it works in many cases.

My current focus is on data centers, which I try to understand top to bottom. I've been working on pretty much everything from cabling or how to wire a data center up for better fault tolerance to routing problems such as balancing load and doing congestion control to raise the data center's network utilization to operating systems and how to do more efficient I.O. in servers, which is also going to be a topic of the rest of the talk, of the Strata file system.

I've also worked on memory management, in particular address translation problems as DRAM sizes grow, large pages and things like that.

Scalability, Vasha already mentioned, I was one of the founding members of the Verifish operating system, all the way out to security or how we access a data center safely.

Over the Internet, for example, if there might be untrusted actors on the Internet, such as entire countries that control certain areas of the Internet that might be eavesdropping in our connections and how we can route around those.

But also inside the data center using technologies such as secure enclaves when the data center provider is untrusted.

Why do I work on data centers? Well, they drive the digital economy and so essentially every one of us is affected by things that we do in the data center these days.

Perhaps an anecdote on that is, this is now perhaps a year old, roughly about a year ago from today, I read in the New York Times that there was a watershed moment in the stock market that the top three to five

companies in the S&P 500, which is one big index, it's kind of the same thing as the Ducks in Germany, are not all data center companies.

They used to be banks and oil companies, all of that is now replaced by data. Data really runs the world.

And of course also being a technical person, there are a number of recent fundamental changes in the way the hardware works, but also from the application side, which makes it very interesting for an operating systems guy to tinker with these things.

And here are some of these trends.

Starting here on the application side, the types of applications that we increasingly see in the data center are often kind of interactive type.

They're either kinds of web applications where we want to return websites very quickly to a user or they're things like real time analytics frameworks, Hadoop, but also things like Apache Storm,

where the later model of that, which is the Heron, I think some of the people here have worked on those as well.

And these kinds of applications are really pushing the performance and scalability limits off our data centers.

They always want to execute more requests and work on larger data sets. There really seems to be no end in sight.

At the same time, they want to work with interactive response times. And in order to do all of this, they have to scale to many thousands of machines within the data center.

So this leads to a kind of interaction pattern among these different servers in the data center of very many small, frequent, oftentimes durable remote procedure calls.

So just viewing this from a networking perspective, we have kind of many-to-many communication here, which often has to be reliable in order.

Obviously, the network shouldn't drop anything for these applications to work. We've got to do congestion and flow control right.

Looking within a server, we often end up with many small, frequent random I.O., both to memory as well as to storage devices.

And I cannot repeat this often enough, latency is as important as throughput for all of these, which is a big shift from what we used to have,

where basically we could get by by just kind of batching up a bunch of things and then we would improve throughput with that.

But of course, at a latency cost, that doesn't really work anymore. These applications want low latencies and low tail latencies for their operations.

So is this actually possible? Well, looking at the hardware side here, it's actually looking pretty good these days.

Networks are already fast and are growing faster. 25 gigabits is very standard in data centers these days.

40 gigabits is also available, oftentimes deployed already. And the network just continues to scale up.

400 gigabits per second networks are already specced. And if you look in the network research, then the researchers will tell you that a terabit per second network is not a big problem for them.

Beyond that, we don't know. But this is kind of as far as it goes right now. So we're looking pretty good.

This is a huge improvement, basically Moore's law curve for networking performance.

On the storage side, this actually looks really similar recently. And this is, of course, a big news for storage,

because storage used to be hard disk drives for a long, long time. But due to SSDs and then most recently also due to the introduction of other non-volatile memory technologies

that live on the memory bus, such as Intel's 3D Crosspoint, which Intel just announced, officially announced as a product, I believe, about three weeks ago.

So they're now identified fourth quarter of this year, the first products are going to ship.

The storage performance has really improved tremendously. But this happens at a capacity cost, which means that these older storage devices

Teil einer Videoserie :

InvasIC Seminar: Vorträge im SFB/TRR 89

Presenters

Prof. Simon Peter

Zugänglich über

Offener Zugang

Dauer

01:25:56 Min

Aufnahmedatum

2018-06-15

Hochgeladen am

2018-09-28 11:11:34

Sprache

de-DE

Current hardware and application storage trends put immense pressure on the operating system's storage subsystem. On the hardware side, the market for storage devices has diversified to a multi-layer storage topology spanning multiple orders of magnitude in cost and performance. Applications increasingly need to process small, random IO on vast data sets with low latency, high throughput, and simple crash consistency. File systems designed for a single storage layer cannot support all of these demands together. In this talk, I characterize these hardware and software trends and then present Strata, a cross-media file system that leverages the strengths of one storage medium to compensate for weaknesses of another. In doing so, Strata provides performance, capacity, and a simple, synchronous IO model all at once, while having a simpler design than that of file systems constrained by a single storage device. At its heart, Strata uses a log-structured approach with a novel split of responsibilities among user mode, kernel, and storage layers that separates the concerns of scalable, high-performance persistence from storage layer management. On common server workloads, Strata achieves up to 2.6x better IO latency and throughput than the state-of-the-art in low-latency and cross media file systems.

Tags

Per RSS abonnieren