Did you say clustering? How data analysis can help predict cow health and performance

16 April 2025 by
Did you say clustering? How data analysis can help predict cow health and performance
HoliCow

Thanks to transnational collaboration and the pooling of multiple databases, the HoliCow project gathered a very large number of data and prediction models [1] acquired and developed during previous projects(see diagram below). The aim, therefore, is not to create yet more new models but to combine information in order to offer farmers a holistic view of the condition of their cows. With all these resources, several stages were necessary in order to work optimally for the development of the final tool.


Step 1: Selection

Firstly, the scientists selected  the most relevant models for the project. Indeed, in the pooling and inventory carried out at the start of the project, redundancies were observed or several versions of the same model [2]  were in the database.

Step 2: Grouping

Each model provides its own information, which can be linked to a specific theme. Six working groups corresponding to these themes were therefore created to obtain a more accurate holistic view:

Step 3: The clusters

Great! But for each theme, we still had to find a way of working with all this data. That's where clustering comes in! Don't panic, behind this word, which may seem complicated, is simply a data analysis technique that brings together individuals with similar data. These individuals then form a group according to a certain data profile. Analysts generally refer to these groups as “clusters”.

To understand this better, let's take a simple example: below we look at the “colour” thematic group, which contains the “black colour” and “brown colour” clusters. We can therefore classify our cows in one cluster or the other according to the colour of their coat. However, this labelling only allows the animal to belong strictly to one or the other group (that’s what we call discrete distribution).


In the project, however, it made more sense to work with continuous distribution. If we take the ‘health/welfare’ theme group as an example, a cow starts by showing symptoms and develops an illness little by little. Or conversely, during her convalescence, she gets better and better. So it's a process that evolves over time. That's why we decided to work on the probabilities of belonging to one or another cluster. If we take the example of the “colour” group, we now have this information: ‘I have an x% probability of belonging to the “black colour” cluster’, or, in the example of the sick cow, ‘I have an x% probability of belonging to the “sick cow” cluster’. This percentage probability will therefore vary according to the stage of the disease.

Step 4: Towards the tool

These probabilities can now be communicated to farmers to support them in managing their herds. But how? Our aim is for these probability results to give information both at herd-level and at inter-herd-level. This will enable a farmer to compare his herd to others. Indeed, a high or low percentage of animals for a certain cluster within a given herd will have the effect of making this herd deviate from the others.

Conclusion

Each thematic group is therefore defined by several clusters. At this stage of the project, 4 thematic groups have completed the “clustering” part and will now enter the validation stage on pilot farms. It is important and essential to go through this validation stage in order to compare the researchers' developments with the reality of the field. A very concrete example would be to check whether a cow was really ill when the results predicted that she was likely to be. With this validation stage, we are therefore answering the question:

  1. ‘Are all the clusters in a thematic group relevant in the farms?’
  2. ‘Do the probabilities of an animal belonging to a cluster reflect reality?’

By answering these questions, we will then be able, in the future, to provide farmers with relevant alerts when a cow encounters problems, or, on the contrary, when a cow is performing well because it is likely to produce more or to produce milk that is highly suitable for processing, etc.

We'll keep you posted!


[1] Prediction models are calculations that use milk samples to estimate the cow's status in terms of multiple aspects (e.g. estimation of minerals, fatty acids, ketones, methane emissions, etc.). 

[2] A model can evolve over time, for example to become more robust through the acquisition of new data.

Share this post
Our blogs