| Peer-Reviewed

Automatic Persian Text Summarizer Using Simulated Annealing and Genetic Algorithm

Received: 7 October 2014     Accepted: 11 October 2014     Published: 6 November 2014
Views:       Downloads:
Abstract

Automatic text summarization is a process to reduce the volume of text documents using computer programs to create a text summary with keeping the key terms of the documents. Due to cumulative growth of information and data, automatic text summarization technique needs to be applied in various domains. The approach helps in decreasing the quantity of the document without changing the context of information. In this paper, the proposed Persian text summarizer system employs combination of graph-based and the TF-IDF methods after word stemming in order to weight the sentences. SA-GA based sentence selection is used to make a summary, and once the summary is created. The SA-GA is a hybrid algorithm that combines Genetic Algorithm (GA) and Simulated Annealing (SA). The fitness function is based on three following factors: Readability Factor, Cohesion Factor, and Topic-Relation Factor. Evaluation results demonstrated the efficiency of the proposed system.

Published in International Journal of Intelligent Information Systems (Volume 3, Issue 6-1)

This article belongs to the Special Issue Research and Practices in Information Systems and Technologies in Developing Countries

DOI 10.11648/j.ijiis.s.2014030601.26
Page(s) 84-90
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2014. Published by Science Publishing Group

Keywords

Automatic Text Summarization, Stemming, TF-IDF, Genetic Algorithm, Simulated Annealing

References
[1] Karen Sparck Jones, "Automatic summarizing: factors and directions, in: Advances in Automatic Text Summarization", MIT Press, pp. 1–12, 1999.
[2] Bahrepour Majid, Mahdipour Elham, Ghalibaf. K. Azadeh, Amiri Malihe, Tahmaseby Aida, Akbarzadeh T. Mohammad Reza, "Automatic Persian Text Summarization",14th Annual Conference of Computer Society of Iran, Amirkabir Universityof Technology (Tehran Polytechnic), 2009. (Persian) Available on: http://www.civilica.com/Paper-ACCSI14-ACCSI14_082.html
[3] Oi Mean Foong, Alan Oxley and Suziah Sulaiman, “Challenges and Trends of Automatic Text Summarization”, IJITT, Vol. 1, Issue 1, ISSN: 0976–5972, 2010.
[4] Mahdipour Elham, Bahrepour Majid, Amiri Malihe, Tahmaseby Aida, "Parsina: Automatic Persian Text Summarizer", Registered software in the development center of information technology and digital media, Register Number: 10.308, Register Date: 2010, Identification Number: 8-00202-000269. Iran, Tehran (Persian)
[5] Hassel,M.,"Resource Lean and Portable Automatic TextSummarization",Stockholm,Sweden.p.144,2007.
[6] Jen-Yuan Yeh, H.-R.K, Wei-Pang Yang, I-HengMeng, "Text Summarization using a trainable summarizer and latent semantic analysis", Information Processing & Management, Vol. 41, Issue 1, pp:75-95, 2005.
[7] Karel Jezek, Josef Stainberger, "Automatic Text Summarization (The state of the art 2007 and new challenges)", Vaclav Snasel(Ed.): Znalosti 2008, pp.1-12, ISBN 978-80-227-2827-0, FIIT STU Brarislava, Ustav Informatiky a softveroveho inzinierstva,, 2008.
[8] Bazqandi Mahdi, Taday'ounTabriziQamarnaz," Clustering the sentences based on swarm intelligence", 4th Iranian Conference on Electrical and electronic, Islamic Azad University of Gonabad, 2011. (Persian) Available on: http://www.civilica.com/Paper-ICEEE04-ICEEE04_153.html
[9] Ohtake,K.,Okamoto,D.,Kodama,M.,Masuyama,S.,"Yet another summarization system with two modules using empirical knowledge", In Proceeding of NTCIR Workshop2 Meeting,2001.
[10] Neto, J., Freitas, A., Kaestner, "Automatic text summarization using machine learning approach", Proc. 16th Brazilian Symp. On Artificial Intelligence (SBIA-2002). Lecture Notes in Artificial Intelligence 2507, pp 205-215, 2002.
[11] Qazvinian,Vahed.,SharifHassnabadi,Leila.,Halavati, Ramin.,"Summarizing Text With a Genetic Algorithm-Based Sentence Extraction", Int. J. Knowledge Management Studies, Vol. 2, No. 4, pp:426-444, 2008, Available on: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.130.2201&rep=rep1&type=pdf.
[12] Ladda Suanmali, Naomie Salim and Mohammed Salem Binwahlan,"Fuzzy Logic Based Method for Improving Text Summarization", (IJCSIS) International Journal of Computer Science and Information Security, Vol. 2, No. 1, 2009.
[13] Md. Majharul Haque, Suraiya Pervin, and Zerina Begum, "Literature Review of Automatic Multiple Documents Text Summarization", International Journal of Innovation and Applied Studies, ISSN 2028-9324 Vol. 3 No. 1, pp. 121-129, 2013.
[14] Mazdak,N., Hassel,M.,"FarsiSum-a Persian Text Summarizer", Master Thesis, Department of Linguistics, Stockholm University, 2004.
[15] Dalianis,H.,"SweSum- A Text Summarizer for Swedish, Technical Report", TRITANA-p0015, IPLab-174,NADA, KTH, 2000.
[16] ShamsFard Mehrnoush, Karimi Zohre, "The automatic Persian text summarization system", 12th Annual Conference of Computer Society of Iran, Tehran, 2007. (Persian) Available on: http://www.civilica.com/Paper-ACCSI12-ACCSI12_377.html
[17] Riahi Noushin, Ghazali Fatemeh, Ghazali Mohammad Ali, "Improved the Persian text summarizer performance using pruning algorithm of neural networks", 1th conference of Line processing and Persian language, Department of Electrical and Computer Engineering, Semnan, Iran, 2011. (Persian) Available on: http://conf.semnan.ac.ir/uploads/conferance_khat/Persian/110.pdf
[18] Mahdipour Elham, Bahrepour Majid, Mohammad Kazemi Farhad, Akbarzadeh T. Mohammad Reza, "A novel method for hybrid of genetic algorithm and simulated annealing", 2nd Joint Congress on Fuzzy and Intelligent Systems, Malek Ashtar University of Technology, Tehran, Iran, 2009. (Persian) Available on: http://www.civilica.com/Paper-FJCFIS02-FJCFIS02_016.html
[19] R. Krovetz, "Viewing morphology as an inference process", Proc.16th ACM SIGIR, 1993.
[20] Hessami Fard Reza, Ghasem sany Gholamreza, "Design of a stemming algorithm for Persian", 11th Annual Conference of Computer Society of Iran, Tehran, 2006. (Persian) Available on: http://www.civilica.com/Paper-ACCSI11-ACCSI11_066.html
Cite This Article
  • APA Style

    Elham Mahdipour, Masoumeh Bagheri. (2014). Automatic Persian Text Summarizer Using Simulated Annealing and Genetic Algorithm. International Journal of Intelligent Information Systems, 3(6-1), 84-90. https://doi.org/10.11648/j.ijiis.s.2014030601.26

    Copy | Download

    ACS Style

    Elham Mahdipour; Masoumeh Bagheri. Automatic Persian Text Summarizer Using Simulated Annealing and Genetic Algorithm. Int. J. Intell. Inf. Syst. 2014, 3(6-1), 84-90. doi: 10.11648/j.ijiis.s.2014030601.26

    Copy | Download

    AMA Style

    Elham Mahdipour, Masoumeh Bagheri. Automatic Persian Text Summarizer Using Simulated Annealing and Genetic Algorithm. Int J Intell Inf Syst. 2014;3(6-1):84-90. doi: 10.11648/j.ijiis.s.2014030601.26

    Copy | Download

  • @article{10.11648/j.ijiis.s.2014030601.26,
      author = {Elham Mahdipour and Masoumeh Bagheri},
      title = {Automatic Persian Text Summarizer Using Simulated Annealing and Genetic Algorithm},
      journal = {International Journal of Intelligent Information Systems},
      volume = {3},
      number = {6-1},
      pages = {84-90},
      doi = {10.11648/j.ijiis.s.2014030601.26},
      url = {https://doi.org/10.11648/j.ijiis.s.2014030601.26},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijiis.s.2014030601.26},
      abstract = {Automatic text summarization is a process to reduce the volume of text documents using computer programs to create a text summary with keeping the key terms of the documents. Due to cumulative growth of information and data, automatic text summarization technique needs to be applied in various domains. The approach helps in decreasing the quantity of the document without changing the context of information. In this paper, the proposed Persian text summarizer system employs combination of graph-based and the TF-IDF methods after word stemming in order to weight the sentences. SA-GA based sentence selection is used to make a summary, and once the summary is created. The SA-GA is a hybrid algorithm that combines Genetic Algorithm (GA) and Simulated Annealing (SA). The fitness function is based on three following factors: Readability Factor, Cohesion Factor, and Topic-Relation Factor. Evaluation results demonstrated the efficiency of the proposed system.},
     year = {2014}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Automatic Persian Text Summarizer Using Simulated Annealing and Genetic Algorithm
    AU  - Elham Mahdipour
    AU  - Masoumeh Bagheri
    Y1  - 2014/11/06
    PY  - 2014
    N1  - https://doi.org/10.11648/j.ijiis.s.2014030601.26
    DO  - 10.11648/j.ijiis.s.2014030601.26
    T2  - International Journal of Intelligent Information Systems
    JF  - International Journal of Intelligent Information Systems
    JO  - International Journal of Intelligent Information Systems
    SP  - 84
    EP  - 90
    PB  - Science Publishing Group
    SN  - 2328-7683
    UR  - https://doi.org/10.11648/j.ijiis.s.2014030601.26
    AB  - Automatic text summarization is a process to reduce the volume of text documents using computer programs to create a text summary with keeping the key terms of the documents. Due to cumulative growth of information and data, automatic text summarization technique needs to be applied in various domains. The approach helps in decreasing the quantity of the document without changing the context of information. In this paper, the proposed Persian text summarizer system employs combination of graph-based and the TF-IDF methods after word stemming in order to weight the sentences. SA-GA based sentence selection is used to make a summary, and once the summary is created. The SA-GA is a hybrid algorithm that combines Genetic Algorithm (GA) and Simulated Annealing (SA). The fitness function is based on three following factors: Readability Factor, Cohesion Factor, and Topic-Relation Factor. Evaluation results demonstrated the efficiency of the proposed system.
    VL  - 3
    IS  - 6-1
    ER  - 

    Copy | Download

Author Information
  • Computer Engineering Department, Khavaran Institute of Higher Education, Mashhad, Iran

  • Computer Engineering Department, Khavaran Institute of Higher Education, Mashhad, Iran

  • Sections