Pembuatan Aplikasi Mobile Augmented Reality dengan Scene Recognition Memanfaatkan Convolutional Neural Network

Joseph Nathanael Witanto, Gregorius Satia Budhi, Liliana Liliana


The development of smartphone technology and the need of people to gather information of places open chance to utilize augmented reality to help people get the information of a scene and see the scene through camera at the same time. The sensor-based AR system depends on GPS which is not always available. This research uses AR system using convolutional neural network for scene recognition.

There are 45 network variations that will be tested to find the best combination. The different parameters that will be used are architecture, initialization method, and activation function that will be used. The architectures used are GoogLeNet and 2 variations of simplified GoogLeNet. The initialization methods used are random-based (Xavier and MSRA) and pretrained weights. The activation functions used are ReLU, PReLU, and ELU. The data augmentations used during training are random cropping, color balance, rotation, blur, sharpen, and brightness-contrast manipulation.

Out of 1649 photos from 12 scene categories, 321 photos will be used for testing with variations on rotation (30 degrees interval), blur, sharpen, brightness, and contrast. Network with initialization method using finetuning on all areas of network and PReLU activation function has better average accuration.


Convolutional Neural Network; Scene Recognition; Mobile Augmented Reality; Neural Network Architecture

Full Text:



Agrawal, P., Girshick, R., & Malik, J. 2014. Analyzing the Performance of Multilayer Neural Networks for Object Recognition. European Conference on Computer Vision, (ss. 329-344).

Chatfield, K., Simonyan, K., Vedaldi, A., & Zisserman, A. 2014. Return of the Devil in the Details: Delving Deep into Convolutional Nets. British Machine Vision Conference, (ss. 1-12).

Clevert, D.-A., Unterthiner, T., & Hochreiter, S. 2016. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). International Conference on Learning Representations.

Grubert, J., & Grasset, R. 2013. Augmented Reality for Android Application Development. Birmingham: Packt Publishing.

He, K., Zhang, X., Ren, S., & Sun, J. 2015. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. IEEE International Conference on Computer Vision, (ss. 1026-1034).

Howard, A. G. 2014. Some Improvements on Deep Convolutional Neural Network Based Image Classification. International Conference on Learning Representations.

Ronneberger, O., Fischer, P., & Brox, T. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. Medical Image Computing and Computer-Assisted Intervention.

Russel, S. J., & Norvig, P. 2010. Artificial Intelligence. Upple Saddle River: Prentice Hall.

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. 2014. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research, 1929-1958.

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., . . . Rabinovich, A. 2015. Going Deeper with Convolutions. IEEE Conference on Computer Vision and Pattern Recognition, (ss. 1-9).

Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., & Oliva, A. 2014. Learning Deep Features for Scene Recognition using Places Database. Advances in Neural Information Processing Systems.


  • There are currently no refbacks.