IJRTI
International Journal for Research Trends and Innovation
International Peer Reviewed & Refereed Journals, Open Access Journal
ISSN Approved Journal No: 2456-3315 | Impact factor: 8.14 | ESTD Year: 2016
Scholarly open access journals, Peer-reviewed, and Refereed Journals, Impact factor 8.14 (Calculate by google scholar and Semantic Scholar | AI-Powered Research Tool) , Multidisciplinary, Monthly, Indexing in all major database & Metadata, Citation Generator, Digital Object Identifier(DOI)

Call For Paper

For Authors

Forms / Download

Published Issue Details

Editorial Board

Other IMP Links

Facts & Figure

Impact Factor : 8.14

Issue per Year : 12

Volume Published : 11

Issue Published : 118

Article Submitted : 21574

Article Published : 8528

Total Authors : 22430

Total Reviewer : 805

Total Countries : 159

Indexing Partner

Licence

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
Published Paper Details
Paper Title: Quantization and Parallel Inference for Large Language Models on Edge Devices
Authors Name: Ana Dorthy L , Mounikha K V , N Priya , V Keerthika
Download E-Certificate: Download
Author Reg. ID:
IJRTI_206959
Published Paper Id: IJRTI2511001
Published In: Volume 10 Issue 11, November-2025
DOI:
Abstract: Large language models (LLMs) require high computational power and memory, which has limited their execution to cloud-based environments. This relies on remote infrastructure; however, it introduces challenges such as latency, privacy risks, and dependence on stable connectivity. This work examines the feasibility of performing LLM inference directly on consumer-grade edge devices by combining model compression through quantization with parallel inference on available hardware accelerators. Using the TinyLlama-1.1B model in Q4_K_M format, performance can be evaluated on two representative platforms: a MacBook Air M2 with GPU acceleration via Metal, and an Android smartphone utilizing CPU NEON vectorization. Results show that quantization substantially reduces memory requirements, while parallel execution enables interactive throughput of approximately 96 tokens per second on laptop and 46 tokens per second on smartphone. The analysis further indicates that the autoregressive decoding stage continues to be the important feature that influences the performance. In contrast to earlier studies that explored quantization and parallelism as separate strategies, this study offers empirical evidence of their combined impact on edge hardware. The findings demonstrate a practical approach for enabling efficient, privacy-preserving, and scalable LLM applications at the edge.
Keywords: Large Language Models, Edge Computing, Model Quantization, Parallel Inference, Inference Optimization
Cite Article: "Quantization and Parallel Inference for Large Language Models on Edge Devices", International Journal for Research Trends and Innovation (www.ijrti.org), ISSN:2455-2631, Vol.10, Issue 11, page no.a1-a7, November-2025, Available :http://www.ijrti.org/papers/IJRTI2511001.pdf
Downloads: 000285
ISSN: 2456-3315 | IMPACT FACTOR: 8.14 Calculated By Google Scholar| ESTD YEAR: 2016
An International Scholarly Open Access Journal, Peer-Reviewed, Refereed Journal Impact Factor 8.14 Calculate by Google Scholar and Semantic Scholar | AI-Powered Research Tool, Multidisciplinary, Monthly, Multilanguage Journal Indexing in All Major Database & Metadata, Citation Generator
Publication Details: Published Paper ID: IJRTI2511001
Registration ID:206959
Published In: Volume 10 Issue 11, November-2025
DOI (Digital Object Identifier):
Page No: a1-a7
Country: Coimbatore, Tamilnadu, India
Research Area: Computer Science & Technology 
Publisher : IJ Publication
Published Paper URL : https://www.ijrti.org/viewpaperforall?paper=IJRTI2511001
Published Paper PDF: https://www.ijrti.org/papers/IJRTI2511001
Share Article:

Click Here to Download This Article

Article Preview
Click Here to Download This Article

Major Indexing from www.ijrti.org
Google Scholar ResearcherID Thomson Reuters Mendeley : reference manager Academia.edu
arXiv.org : cornell university library Research Gate CiteSeerX DOAJ : Directory of Open Access Journals
DRJI Index Copernicus International Scribd DocStoc

ISSN Details

ISSN: 2456-3315
Impact Factor: 8.14 and ISSN APPROVED, Journal Starting Year (ESTD) : 2016

DOI (A digital object identifier)


Providing A digital object identifier by DOI.ONE
How to Get DOI?

Conference

Open Access License Policy

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License

Creative Commons License This material is Open Knowledge This material is Open Data This material is Open Content

Important Details

Join RMS/Earn 300

IJRTI

WhatsApp
Click Here

Indexing Partner