Quantization and Parallel Inference for Large Language Models on Edge Devices

doi:; ?>

IJRTI

International Journal for Research Trends and Innovation

International Peer Reviewed & Refereed Journals, Open Access Journal

ISSN Approved Journal No: 2456-3315 | Impact factor: 8.14 | ESTD Year: 2016

Scholarly open access journals, Peer-reviewed, and Refereed Journals, Impact factor 8.14 (Calculate by google scholar and Semantic Scholar | AI-Powered Research Tool) , Multidisciplinary, Monthly, Indexing in all major database & Metadata, Citation Generator, Digital Object Identifier(DOI)

Submit Paper Online Track Paper

Call For Paper

Issue: March 2026

Volume 11 | Issue 3

Submit Paper Online

Review Result and Publication of Paper within : 2-3 days

Click Here For more Details

For Authors

Submit Paper Online Publication Guidelines Publication Charges HardCopy and DOI Charges Pay Publication Charges Track Paper Research Area All Policy

Forms / Download

Undertaking Form Paper Format Sample Certificate Sample Publication Letter Sample Hard Copy of Journal

Published Issue Details

Current Issue Archive Conference Proposal Recent Conference Details

Editorial Board

Editorial Board Join As A Referral/Reviewer Benefits of Referral/Reviewer

Other IMP Links

START A NEW JOURNAL &
JOURNAL SUPPORTING SOFTWARE Publish BOOK, DISSERTATION AND THESIS Best Research Paper Award

Facts & Figure

Impact Factor : 8.14

Issue per Year : 12

Volume Published : 11

Issue Published : 118

Article Submitted : 21574

Article Published : 8528

Total Authors : 22430

Total Reviewer : 805

Total Countries : 159

Indexing Partner

Licence

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License

Published Paper Details

Paper Title:	Quantization and Parallel Inference for Large Language Models on Edge Devices
Authors Name:	Ana Dorthy L , Mounikha K V , N Priya , V Keerthika
Download E-Certificate:	Download
Author Reg. ID:	IJRTI_206959
Published Paper Id:	IJRTI2511001
Published In:	Volume 10 Issue 11, November-2025
DOI:
Abstract:	Large language models (LLMs) require high computational power and memory, which has limited their execution to cloud-based environments. This relies on remote infrastructure; however, it introduces challenges such as latency, privacy risks, and dependence on stable connectivity. This work examines the feasibility of performing LLM inference directly on consumer-grade edge devices by combining model compression through quantization with parallel inference on available hardware accelerators. Using the TinyLlama-1.1B model in Q4_K_M format, performance can be evaluated on two representative platforms: a MacBook Air M2 with GPU acceleration via Metal, and an Android smartphone utilizing CPU NEON vectorization. Results show that quantization substantially reduces memory requirements, while parallel execution enables interactive throughput of approximately 96 tokens per second on laptop and 46 tokens per second on smartphone. The analysis further indicates that the autoregressive decoding stage continues to be the important feature that influences the performance. In contrast to earlier studies that explored quantization and parallelism as separate strategies, this study offers empirical evidence of their combined impact on edge hardware. The findings demonstrate a practical approach for enabling efficient, privacy-preserving, and scalable LLM applications at the edge.
Keywords:	Large Language Models, Edge Computing, Model Quantization, Parallel Inference, Inference Optimization
Cite Article:	"Quantization and Parallel Inference for Large Language Models on Edge Devices", International Journal for Research Trends and Innovation (www.ijrti.org), ISSN:2455-2631, Vol.10, Issue 11, page no.a1-a7, November-2025, Available :http://www.ijrti.org/papers/IJRTI2511001.pdf
Downloads:	000285
ISSN:	2456-3315 \| IMPACT FACTOR: 8.14 Calculated By Google Scholar\| ESTD YEAR: 2016 An International Scholarly Open Access Journal, Peer-Reviewed, Refereed Journal Impact Factor 8.14 Calculate by Google Scholar and Semantic Scholar \| AI-Powered Research Tool, Multidisciplinary, Monthly, Multilanguage Journal Indexing in All Major Database & Metadata, Citation Generator
Publication Details:	Published Paper ID: IJRTI2511001 Registration ID:206959 Published In: Volume 10 Issue 11, November-2025 DOI (Digital Object Identifier): Page No: a1-a7 Country: Coimbatore, Tamilnadu, India Research Area: Computer Science & Technology Publisher : IJ Publication Published Paper URL : https://www.ijrti.org/viewpaperforall?paper=IJRTI2511001 Published Paper PDF: https://www.ijrti.org/papers/IJRTI2511001
Share Article:	Share Facebook Twitter Google+ Pinterest LinkedIn Email Tumblr WhatsApp Google Gmail

Click Here to Download This Article

Article Preview

Click Here to Download This Article

Major Indexing from www.ijrti.org

Google Scholar	ResearcherID Thomson Reuters	Mendeley : reference manager	Academia.edu
arXiv.org : cornell university library	Research Gate	CiteSeerX	DOAJ : Directory of Open Access Journals
DRJI	Index Copernicus International	Scribd	DocStoc

ISSN Details

ISSN: 2456-3315
Impact Factor: 8.14 and ISSN APPROVED, Journal Starting Year (ESTD) : 2016

DOI (A digital object identifier)

Providing A digital object identifier by DOI.ONE
How to Get DOI?

Conference

Conference Proposal Recent Conference Details

Open Access License Policy

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License

Important Details

Terms & Condition FAQ Privacy Policy Copyright Infringement

Join RMS/Earn 300

IJRTI

WhatsApp
Click Here

Indexing Partner

For Authors Sample Paper Format Submit Paper Online Call For Paper Undetaking Form Publication Charges FAQ Contact US	Publications Current Issue Past Issue	Proposals Join As a Reviewer Editiorial Board Join in RMS Program Conference Proposal	Policies Privacy Policy Payment Terms and Condition Copyright Infringement Claims Payment Refund Policy
Copyright © 2026 - All Rights Reserved - IJRTI

WhatsApp Click Here

Indexing Partner

WhatsApp
Click Here