5 - ML Pipeline [ID:58780]
50 von 185 angezeigt

We now talk about machine learning pipeline and good practices.

So now we care about defining a concept of a pipeline that can guide our machine learning

project from its very definition to a success product

for example.

A pipeline is a step-by-step process where every step has a goal and a method and hopefully also a way to evaluate the success of every operation in the individual step.

There exist many proposals for pipelines in the literature

even very complex

and some of them may be the right choice for you.

What we present here in this part of the lecture is the minimal

compact five-step pipeline that is general enough to include every possible project.

Eventually

for your needs

you would like to define more steps or maybe to split certain of these five steps into more steps because your specific problems is better addressed that way.

But here we have the minimal proposal

this pipeline that you can really use everywhere and that can guide your machine learning project to a real success.

The pipeline starts on step number one with problem definition.

Our aim with machine learning is to develop a solution for a problem.

In order to develop a satisfying solution

we need to define the problem first.

How do we do this?

We should define what goals or tasks we want to solve and what kind of data do we need.

This definition of the problem lays the foundation for our solution and it shows us what kind of data we need

what kind of algorithms can we use

and it's probably the most important step of our pipeline for that purpose.

So let's give an example.

Our goal is to monitor an industrial machine and we want to predict future failures.

And this is important because it maybe allows us to schedule corrective maintenance in time.

Therefore, we define first our goal.

Our goal could be that of predicting failures when we can advance.

So we are making this quantitative and precise enough.

We could have alternative goals that serve the same purpose.

For example

we could define our goal as predicting the remaining use of the lifetime of the machine at every time and at every point in time.

It's similar, but slightly different.

And maybe this will change the way we want to evaluate the specific performance.

And the prediction is based on data.

What type of data would we use for this kind of problem?

Well

we would probably use sensors

data coming from sensors attached to the machine or probably also data generated by the users of that machine

for example

logs.

The next step in the pipeline is the one of data collection and pre-processing.

The phase of gathering the data and creating our data set is called data ingestion.

This is part of the data collection and pre-processing step.

So data should contain necessary information to solve the task.

We should make sure of that.

And also data should be enough to describe all possible states of interest for our problem.

After the data ingestion comes the data preparation.

The goal now is to make the data usable for our machine learning solution.

Teil eines Kapitels:
Introduction

Presenters

Zugänglich über

Offener Zugang

Dauer

00:14:13 Min

Aufnahmedatum

2025-10-06

Hochgeladen am

2025-10-06 15:25:06

Sprache

en-US

A simple yet effective machine learning pipeline