Using Few-Shot Learning for Creating Phishing Chat Dataset

Please use this identifier to cite or link to this item: http://kmutnb-ir.kmutnb.ac.th/jspui/handle/123456789/301

Full metadata record

DC Field	Value	Language
dc.contributor	SARAN HANSAKUL	en
dc.contributor	ศรัณย์ หงสกุล	th
dc.contributor.advisor	NAWAPORN WISITPONGPHAN	en
dc.contributor.advisor	นวพร วิสิฐพงศ์พันธ์	th
dc.contributor.other	King Mongkut's University of Technology North Bangkok	en
dc.date.accessioned	2025-07-02T08:47:15Z	-
dc.date.available	2025-07-02T08:47:15Z	-
dc.date.created	2526
dc.date.issued	8/6/2526
dc.identifier.uri	http://kmutnb-ir.kmutnb.ac.th/jspui/handle/123456789/301	-
dc.description.abstract	Online businesses relying on messaging services encounter increasingly severe phishing chat threats. So, they need to raise awareness among employees to identify malicious messages. But, high employee turnover disrupts these efforts. Even there are many phishing detection tools for emails, websites, SMSs, etc., they cannot be applied to detect phishing from 1-1 chat. And there is still no public phishing chat dataset that has enough quality for training models. However, advancement in AI, in LLM enable data augmenting from limited real-world data through few-shot learning techniques. This study aims to create a phishing chat dataset from limited real-world data, and evaluate for quality, for training phishing detection models. We were able to increase chat messages by 10 times using GPT-3.5 Turbo model. The quality of dataset received 7.68 out of 10 from GPT-4o model. Upon testing the dataset against various Machine Learning models, Logistic Regression together with dataset transforming using MaxAbsScaler outperformed other models, achieving the accuracy of 99.87% with 100% precision and 0.99 F1 scores. These results imply that our augmented dataset can efficiently be used for training models to detect phishing.	en
dc.description.abstract	ธุรกิจที่พึ่งพาการสื่อสารผ่านข้อความโต้ตอบบนแพลตฟอร์มออนไลน์ มักพบปัญหาข้อความแชทแบบฟิชชิ่งที่รุนแรงมากขึ้น ซึ่งต้องสร้างความตระหนักรู้ข้อความที่เป็นอันตรายให้กับพนักงาน แต่พนักงานเหล่านี้มักเปลี่ยนงานเร็วกว่าที่จะได้รับการอบรมเพียงพอ แม้ปัจจุบันมีการนำเทคนิคการเรียนรู้ด้วยเครื่องมาตรวจจับแล้วก็ตาม แต่ข้อความที่สามารถตรวจจับได้มักอยู่ในรูปแบบอื่น เช่น อีเมล เว็บไซต์ ข้อความสั้น ฯลฯ ซึ่งมีรูปแบบต่างจากการแชทโต้ตอบแบบ 1-1 กับลูกค้า และปัจจุบันยังไม่มีชุดข้อมูลข้อความแชทแบบสาธารณะที่นำมาใช้สร้างโมเดลสำหรับตรวจจับฟิชชิ่งได้ แต่ด้วยความสามารถของระบบ AI แบบ LLM จึงทำให้สามารถเพิ่มจำนวนข้อมูลตัวอย่างที่มีลักษณะคล้ายกันได้ แม้จะมีข้อมูลจริงที่นำมาใช้ตั้งต้นน้อย งานวิจัยนี้มีวัตถุประสงค์เพื่อจัดทำชุดข้อมูลข้อความแชทแบบฟิชชิ่งจากข้อมูลจริงที่มีจำนวนน้อยมาก แล้วนำมาทดสอบคุณภาพในการนำมาใช้เทรนโมเดลจริง ซึ่งพบว่า การใช้โมเดล LLM แบบ GPT-3.5 Turbo เพิ่มจำนวนแชทได้ 10 เท่า ได้คะแนนประเมินความสมจริงเฉลี่ยที่ 7.68 (เต็ม 10) เมื่อประเมินด้วยโมเดล GPT-4o เมื่อนำไปเทรนโมเดลแบบ Logistic Regression ที่แปลงข้อมูลแบบ MaxAbsScaler จะได้ประสิทธิภาพสูงที่สุด ได้ค่าความเที่ยงตรงที่ 99.87% ความแม่นยำที่ 100% (แบบไบนารี่) คะแนน F1 เท่ากับ 0.99 จึงถือว่าสามารถนำชุดข้อมูลแชทที่สร้างขึ้นนี้ไปใช้ประโยชน์ได้	th
dc.language.iso	th
dc.publisher	King Mongkut's University of Technology North Bangkok
dc.rights	King Mongkut's University of Technology North Bangkok
dc.subject	ข้อความแชท ฟิชชิ่ง แพลตฟอร์มออนไลน์ LLM Few-Shot การเพิ่มจำนวนข้อมูล เทคนิคการเรียนรู้ของเครื่อง	th
dc.subject	Chat Messages	en
dc.subject	Phishing	en
dc.subject	Online Platform	en
dc.subject	LLM	en
dc.subject	Few-Shot	en
dc.subject	Augmentation	en
dc.subject	Machine Learning	en
dc.subject.classification	Computer Science	en
dc.subject.classification	Information and communication	en
dc.subject.classification	Computer science	en
dc.title	Using Few-Shot Learning for Creating Phishing Chat Dataset	en
dc.title	การสร้างชุดข้อมูลข้อความแชทฟิชชิ่งด้วยโมเดลการเรียนรู้แบบฟิวช็อต	th
dc.type	Independent Study	en
dc.type	การค้นคว้าอิสระ	th
dc.contributor.coadvisor	NAWAPORN WISITPONGPHAN	en
dc.contributor.coadvisor	นวพร วิสิฐพงศ์พันธ์	th
dc.contributor.emailadvisor	nawaporn.w@itd.kmutnb.ac.th,nawapornn@kmutnb.ac.th
dc.contributor.emailcoadvisor	nawaporn.w@itd.kmutnb.ac.th,nawapornn@kmutnb.ac.th
dc.description.degreename	Master of Science (วท.ม.)	en
dc.description.degreename	วิทยาศาสตรมหาบัณฑิต (M.Sc.)	th
dc.description.degreelevel	Master's Degree	en
dc.description.degreelevel	ปริญญาโท	th
dc.description.degreediscipline	Data Communication and Networking	en
dc.description.degreediscipline	การสื่อสารข้อมูลและเครือข่าย	th
Appears in Collections:	FACULTY OF INFORMATION TECHNOLOGY AND DIGITAL INNOVATION

Files in This Item:

File	Description	Size	Format
s6607031857086.pdf		8.93 MB	Adobe PDF	View/Open

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets