Статья опубликована в рамках: Научного журнала «Студенческий» № 21(359)
Рубрика журнала: Информационные технологии
Скачать книгу(-и): скачать журнал
A MULTI-TASK TRANSFORMER FRAMEWORK FOR ASPECT-BASED SENTIMENT ANALYSIS OF MOBILE APP REVIEWS
ABSTRACT
User reviews in mobile application stores provide essential insights for developers to improve software quality and user satisfaction. Standard sentiment analysis typically evaluates overall sentiment, but fails to capture opinions about specific aspects such as functionality, performance, user interface, and advertisements.
This study introduces a Multi-Task Aspect-Based Sentiment Analysis (MT-ABSA) framework using BERT as a shared encoder with separate output layers for aspect detection and sentiment classification. A domain-specific dataset of mobile app reviews was manually annotated with four aspects: Feature, Performance, User Interface (UI), and Advertisement. Experimental evaluation demonstrates that MT-ABSA outperforms conventional machine learning and single-task transformer models in Macro-F1 score, effectively capturing relationships between aspects and improving fine-grained sentiment understanding.
Keywords: Mobile app reviews; aspect-based sentiment analysis; BERT; multi-task learning; deep learning.
INTRODUCTION
The exponential growth of mobile applications has generated vast numbers of user reviews in app marketplaces. These reviews often contain information about software defects, feature requests, and overall user satisfaction [1, p. 125].
For example, a single review may state:
"The new features are great, but the app starts very slowly and shows too many advertisements."
This review expresses opinions across multiple aspects:
Table 1.
The comments expressed opinions
|
Aspect |
Sentiment |
|
Feature |
Positive |
|
Performance |
Negative |
|
Advertisement |
Negative |
Aspect-Based Sentiment Analysis (ABSA) aims to extract aspect terms and determine sentiment polarity for each aspect [2, p. 27]. Existing ABSA research largely focuses on restaurants and consumer products, leaving mobile app reviews relatively underexplored. Key challenges include:
- Multi-aspect reviews – multiple opinions within one sentence.
- Domain-specific vocabulary – e.g., "lag," "crash," "battery drain" [3].
- Aspect interdependence – performance issues may influence UI evaluation.
To address these challenges, we propose a multi-task transformer framework for ABSA of mobile app reviews, jointly learning aspect detection and sentiment classification.
RELATED WORK
2.1 Mobile App Review Analysis
Early work by Pagano and Maalej analyzed user feedback in app stores, highlighting that reviews are a rich source of information for software evolution [1, p. 127]. Maalej et al. developed automated methods to classify reviews into bug reports, feature requests, and user experiences [3, p. 311]. Guzman and Maalej further explored fine-grained sentiment analysis for app features [4, p. 153].
2.2 Aspect-Based Sentiment Analysis
ABSA commonly involves two steps: aspect extraction followed by sentiment classification [2, p. 30]. Pipeline approaches suffer from error propagation. Recent end-to-end transformer-based methods, such as BERT, achieve superior performance by modeling contextual representations [5, p. 4172; 6, p. 380].
2.3 Multi-Task Learning
Multi-task learning allows related tasks to share a representation, improving generalization [7, p. 41]. Aspect detection and sentiment classification are naturally related, motivating joint training frameworks [8, p. 2874].
PROPOSED METHOD
3.1 Problem Definition
Given a review:

we aim to identify aspects
from the set {Feature, Performance, UI, Advertisement} and assign a sentiment
for each aspect.
3.2 Model Architecture
The MT-ABSA model consists of
- BERT encoder – generates contextualized representations.
- Aspect detection head – multi-class classifier predicting presence of each aspect:

- Sentiment classification head – predicts sentiment for each detected aspect:

- Joint loss – weighted combination of aspect and sentiment losses:

EXPERIMENTAL SETUP
4.1 Dataset
Source: Google Play reviews.
Categories: Social, Productivity, Finance, Gaming.
Annotated subset: 8,000 reviews labeled for 4 aspects and sentiment.
4.2 Annotation Example
Table 2.
Annotation Example
|
Aspect |
Expression |
Sentiment |
|
Feature |
New features are excellent |
Positive |
|
Performance |
Starts very slowly |
Negative |
|
UI |
UI is intuitive |
Positive |
|
Advertisement |
Too many ads |
Negative |
4.3 Baselines
SVM with TF-IDF.
BiLSTM with GloVe embeddings.
BERT-Single – single-task BERT model.
4.4 Metrics
Precision, Recall, F1-score.
Macro-F1 used for multi-aspect evaluation.
RESULTS
5.1 Overall Performance
Table 3.
Overall Performance
|
Model |
Precision |
Recall |
Macro-F1 |
|
SVM |
71.3 |
69.8 |
70.5 |
|
BiLSTM |
76.8 |
75.5 |
76.1 |
|
BERT-Single |
83.2 |
82.4 |
82.8 |
|
MT-ABSA |
86.4 |
85.7 |
86.0 |
5.2 Aspect-wise Performance
Table 4.
Aspect-wise Performance
|
Aspect |
F1 |
|
Feature |
88.1 |
|
Performance |
86.7 |
|
UI |
4.2 |
|
Advertisement |
85.3 |
5.3 Case Study
"The new features are great, but the app starts very slowly and shows too many advertisements."
Table 5.
Review Predicted
|
Aspect |
Sentiment |
|
Feature |
Positive |
|
Performance |
Negative |
|
Advertisement |
Negative |
DISCUSSION
- Multi-task learning captures aspect-sentiment relationships.
- Domain-specific expressions like "lag," "crash," "battery drain" are crucial.
- Future work: multilingual analysis, large language models, multi-modal ABSA (text + screenshots).
CONCLUSION
The proposed MT-ABSA framework jointly learns aspect detection and sentiment classification for mobile app reviews. Experiments demonstrate improved performance over baselines, enabling developers to extract actionable insights from user feedback.
References:
- Pagano D., Maalej W. User feedback in the appstore: An empirical study // Proceedings of the 21st IEEE International Requirements Engineering Conference. Rio de Janeiro: IEEE, 2013. P. 125–134.
- Pontiki M., Galanis D., Pavlopoulos J., et al. SemEval-2014 Task 4: Aspect Based Sentiment Analysis // Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014). Dublin: Association for Computational Linguistics, 2014. P. 27–35.
- Maalej W., Kurtanović Z., Nabil H., Stanik C. On the automatic classification of app reviews // Requirements Engineering. 2016. Vol. 21, No. 3. P. 311–331.
- Guzman E., Maalej W. How do users like this feature? A fine grained sentiment analysis of app reviews // Proceedings of the 22nd IEEE International Requirements Engineering Conference. Karlskrona: IEEE, 2014. P. 153–162.
- Devlin J., Chang M. W., Lee K., Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding // Proceedings of NAACL-HLT 2019. Minneapolis: Association for Computational Linguistics, 2019. P. 4171–4186.
- Sun C., Huang L., Qiu X. Utilizing BERT for aspect-based sentiment analysis via constructing auxiliary sentence // Proceedings of NAACL-HLT 2019. Minneapolis: Association for Computational Linguistics, 2019. P. 380–385.
- Caruana R. Multitask learning // Machine Learning. 1997. Vol. 28, No. 1. P. 41–75.
- Liu P., Qiu X., Huang X. Recurrent neural network for text classification with multi-task learning // Proceedings of IJCAI 2016. New York: AAAI Press, 2016. P. 2873–2879.

