We now talk about machine learning pipeline and good practices.
So now we care about defining a concept of a pipeline that can guide our machine learning
project from its very definition to a success product
for example.
A pipeline is a step-by-step process where every step has a goal and a method and hopefully also a way to evaluate the success of every operation in the individual step.
There exist many proposals for pipelines in the literature
even very complex
and some of them may be the right choice for you.
What we present here in this part of the lecture is the minimal
compact five-step pipeline that is general enough to include every possible project.
Eventually
for your needs
you would like to define more steps or maybe to split certain of these five steps into more steps because your specific problems is better addressed that way.
But here we have the minimal proposal
this pipeline that you can really use everywhere and that can guide your machine learning project to a real success.
The pipeline starts on step number one with problem definition.
Our aim with machine learning is to develop a solution for a problem.
In order to develop a satisfying solution
we need to define the problem first.
How do we do this?
We should define what goals or tasks we want to solve and what kind of data do we need.
This definition of the problem lays the foundation for our solution and it shows us what kind of data we need
what kind of algorithms can we use
and it's probably the most important step of our pipeline for that purpose.
So let's give an example.
Our goal is to monitor an industrial machine and we want to predict future failures.
And this is important because it maybe allows us to schedule corrective maintenance in time.
Therefore, we define first our goal.
Our goal could be that of predicting failures when we can advance.
So we are making this quantitative and precise enough.
We could have alternative goals that serve the same purpose.
For example
we could define our goal as predicting the remaining use of the lifetime of the machine at every time and at every point in time.
It's similar, but slightly different.
And maybe this will change the way we want to evaluate the specific performance.
And the prediction is based on data.
What type of data would we use for this kind of problem?
Well
we would probably use sensors
data coming from sensors attached to the machine or probably also data generated by the users of that machine
for example
logs.
The next step in the pipeline is the one of data collection and pre-processing.
The phase of gathering the data and creating our data set is called data ingestion.
This is part of the data collection and pre-processing step.
So data should contain necessary information to solve the task.
We should make sure of that.
And also data should be enough to describe all possible states of interest for our problem.
After the data ingestion comes the data preparation.
The goal now is to make the data usable for our machine learning solution.
Presenters
Zugänglich über
Offener Zugang
Dauer
00:14:13 Min
Aufnahmedatum
2025-10-06
Hochgeladen am
2025-10-06 15:25:06
Sprache
en-US