Statistical Modelling of Bot Detection in Social Media Using Logistic Regression and Numerical Algorithms
Main Article Content
Abstract
The recent explosion in social networking websites has released real problems of proliferation of automatic accounts, or bots, that could be employed to manipulate public opinion, spread misinformation, and skew data-driven applications. This study develops a statistical framework for detecting such bots using logistic regression models based on numerical optimization techniques. Through the integration of computational mathematics and data science, the paper aims to model user behavior on Twitter and other social media with regard to classifying accounts as bots or authentic users. The logistic regression model is optimized with gradient-based numerical solvers in an effort to improve classification performance. Information is gathered from real and verified public datasets such as the PAN 2019 bots dataset and Twitter's bot repository in an effort to stay empirically grounded. The results confirm the effectiveness of logistic regression in predicting decision boundaries between bots and humans statistically, at 89.4% accuracy level on test data. Additionally, the explainability capability of this model gives researchers more insights into behaviour indicators such as tweet rate, retweet rate, posting time entropy, and friend/follower ratios. This paper presents a mathematicised social media monitoring mechanism that not only feeds into computational statistics but offers an efficient instrument for digital policy and cybersecurity interventions.