• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

New Clustering Method Simplifies Analysis of Large Data Sets

New Clustering Method Simplifies Analysis of Large Data Sets

© iStock

Researchers from HSE University and the Institute of Control Sciences of the Russian Academy of Sciences have proposed a new method of data analysis: tunnel clustering. It allows for the rapid identification of groups of similar objects and requires fewer computational resources than traditional methods. Depending on the data configuration, the algorithm can operate dozens of times faster than its counterparts. The study was published in the journal Doklady Rossijskoj Akademii Nauk. Mathematika, Informatika, Processy Upravlenia.

Each year, the volume of information requiring processing continues to grow. Data comes from a variety of sources: scientific research, financial reports, medical examinations, and many others. Clustering methods—which group data based on similar characteristics—are used to detect patterns and organise information within such large datasets. These groupings are known as clusters.

One of the most widely used clustering methods is the k-means algorithm. It divides data into a predetermined number of clusters, initially selecting their centres (centroids). However, this method has a limitation: the number of clusters must be known beforehand, which is not always possible when dealing with complex data. Scientists from HSE University and the V.A. Trapeznikov Institute of Control Sciences have proposed a new approach to simplify this process—tunnel clustering. Unlike the k-means method, this algorithm does not require the number of clusters to be set in advance; it determines the necessary number itself by analysing the data structure.

‘The algorithm forms “tunnels” in the data—regions in multidimensional space where objects with similar characteristics group together,’ explained Fuad Aleskerov, Head of the Department of Mathematics at the HSE Faculty of Economic Sciences. ‘Users can choose from three modes of operation: with fixed cluster boundaries, with adaptive boundaries that adjust to the data structure, or a combined approach. This makes the method flexible and suitable for various types of tasks.’

The method was tested on a synthetic (artificially generated) dataset of 100,000 objects, as well as on real-world tasks in public administration and the banking sector.

Visualisation of the original data and the results of tunnel clustering in a four-dimensional parallel coordinates system.
© Aleskerov, F.T., Myachin, A.L. & Yakuba, V.I. Tunnel Clustering Method. Dokl. Math. 110, 474–479 (2024)

The main advantage of the new method is its speed. Unlike classical algorithms that demand significant computational resources, tunnel clustering can, depending on the data configuration, perform the analysis dozens of times faster.

In addition, the researchers introduced the concept of the ‘transition degree’—a parameter indicating how many characteristics of an object must change for it to be classified into a different cluster. This helps assess the clarity of cluster boundaries and identify objects situated at the intersection of different groups.

‘People are generating more and more data, and the pace is only accelerating. According to the latest Digital 2025: Global Overview Report, as of early 2025, there were 5.56 billion internet users—nearly 68% of the global population. Adults spend an average of 6 hours and 38 minutes online each day, communicating, working, watching videos, and consuming content,’ said Alexey Myachin, Senior Research Fellow at the HSE International Centre for Decision Choice and Analysis. ‘Companies that ignore data analysis are losing vast sums of money.’

The authors continue to refine the algorithm, including conducting research into dimensionality reduction, which will help further decrease the time required to identify patterns in data.

The study was carried out with partial support from the Russian Science Foundation.

See also:

Scientists Develop Effective Microlasers as Small as a Speck of Dust

Researchers at HSE University–St Petersburg have discovered a way to create effective microlasers with diameters as small as 5 to 8 micrometres. They operate at room temperature, require no cooling, and can be integrated into microchips. The scientists relied on the whispering gallery effect to trap light and used buffer layers to reduce energy leakage and stress. This approach holds promise for integrating lasers into microchips, sensors, and quantum technologies. The study has been published in Technical Physics Letters.

HSE Scientists Test New Method to Investigate Mechanisms of New Word Acquisition

Researchers at the HSE Centre for Language and Brain were among the first to use transcranial alternating current stimulation to investigate whether it can influence the acquisition of new words. Although the authors of the experiment have not yet found a link between brain stimulation and word acquisition, they believe that adjusting the stimulation parameters may yield different results in the future. The study has been published in Language, Cognition and Neuroscience.

Twenty vs Ten: HSE Researcher Examines Origins of Numeral System in Lezgic Languages

It is commonly believed that the Lezgic languages spoken in Dagestan and Azerbaijan originally used a vigesimal numeral system, with the decimal system emerging later. However, a recent analysis of numerals in various dialects, conducted by linguist Maksim Melenchenko from HSE University, suggests that the opposite may be true: the decimal system was used originally, with the vigesimal system developing later. The study has been published in Folia Linguistica.

Scientists Rank Russian Regions by Climate Risk Levels

Researchers from HSE University and the Russian Academy of Sciences have assessed the levels of climate risks across Russian regions. Using five key climate risks—heatwaves, water stress, wildfires, extreme precipitation, and permafrost degradation—the scientists ranked the country’s regions according to their need for adaptation to climate change. Krasnoyarsk Krai, Irkutsk Region, and Sverdlovsk Region rank among the highest for four of the five climate risks considered. The study has been published in Science of the Total Environment.

HSE Researchers Teach Neural Network to Distinguish Origins from Genetically Similar Populations

Researchers from the AI and Digital Science Institute, HSE Faculty of Computer Science, have proposed a new approach based on advanced machine learning techniques to determine a person’s genetic origin with high accuracy. This method uses graph neural networks, which make it possible to distinguish even very closely related populations.

HSE Economists Reveal the Secret to Strong Families

Researchers from the HSE Faculty of Economic Sciences have examined the key factors behind lasting marriages. The findings show that having children is the primary factor contributing to marital stability, while for couples without children, a greater income gap between spouses is associated with a stronger union. This is the conclusion reported in Applied Econometrics.

Fifteen Minutes on Foot: How Post-Soviet Cities Manage Access to Essential Services

Researchers from HSE University and the Institute of Geography of the Russian Academy of Sciences analysed three major Russian cities to assess their alignment with the '15-minute city' concept—an urban design that ensures residents can easily access essential services and facilities within walking distance. Naberezhnye Chelny, where most residents live in Soviet-era microdistricts, demonstrated the highest levels of accessibility. In Krasnodar, fewer than half of residents can easily reach essential facilities on foot, and in Saratov, just over a third can. The article has been published in Regional Research of Russia.

HSE Researchers Find Counter-Strike Skins Outperform Bitcoin and Gold as Alternative Investments

Virtual knives, custom-painted machine guns, and gloves are common collectible items in videogames. A new study by scientists from HSE University suggests that digital skins from the popular video game Counter-Strike: Global Offensive (CS:GO) rank among the most profitable types of alternative investments, with average annual returns exceeding 40%. The study has been published in the Social Science Research Network (SSRN), a free-access online repository.

HSE Neurolinguists Reveal What Makes Apps Effective for Aphasia Rehabilitation

Scientists at the HSE Centre for Language and Brain have identified key factors that increase the effectiveness of mobile and computer-based applications for aphasia rehabilitation. These key factors include automated feedback, a variety of tasks within the application, extended treatment duration, and ongoing interaction between the user and the clinician. The article has been published in NeuroRehabilitation.

'Our Goal Is Not to Determine Which Version Is Correct but to Explore the Variability'

The International Linguistic Convergence Laboratory at the HSE Faculty of Humanities studies the processes of convergence among languages spoken in regions with mixed, multiethnic populations. Research conducted by linguists at HSE University contributes to understanding the history of language development and explores how languages are perceived and used in multilingual environments. George Moroz, head of the laboratory, shares more details in an interview with the HSE News Service.