Symbol Distributions in Semantic Communications: A Source-Channel Equilibrium Perspective

Bib Paper

@ARTICLE{yoo2026dist,
  author={Yoo, Hanju and Choi, Dongha and Kim, Songkuk and Chae, Chan-Byoung and {Heath, Jr.}, Robert W.},
  journal={Submitted},
  title={Symbol Distributions in Semantic Communications: A Source-Channel Equilibrium Perspective},
  year={2026},
  arxiv={2512.14022}
}

Hanju Yoo^†, Dongha Choi^†, Songkuk Kim^†, Chan-Byoung Chae^†, and Robert W. Heath, Jr.^‡

^† School of Integrated Technology, Yonsei University, Seoul 03722, South Korea
^‡ Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA 92093, USA

Fixed-length semantic communication architecture over an AWGN channel — Conceptual diagram of fixed-length semantic communication over an AWGN channel.

Abstract

Semantic communication systems often use end-to-end neural networks to map input data into continuous symbols. These symbols, which are essentially neural network features, have fixed dimensions and often exhibit heavy-tailed distributions. However, the mechanism behind this distributional shape remains underexplored due to the end-to-end nature of encoder training, hindering systematic analysis and design. In this paper, we propose a parametric model for semantic symbol distributions. We model end-to-end training as inducing two coupled pressures on the symbol distribution: a source pressure that favors power allocation minimizing the average description cost, and a channel pressure that favors distributions with higher channel utilization. Under surrogate objectives that capture these effects, we obtain a Student’s t-distribution as a model for the semantic symbols. Experiments on image-based semantic systems show that the model closely predicts how the shape parameter varies with (i) explicit symbol rate control and (ii) dataset entropy variability. Furthermore, enforcing a target symbol distribution via regularization (e.g., a Gaussian prior) improves training convergence, which is consistent with our hypothesis.

Entropy analysis for semantic symbol distributions — Rate decomposition for end-to-end semantic communication over AWGN at a fixed training SNR. The decomposition motivates the source-channel pressure view used to explain symbol shaping.

Why This Matters

Semantic communication systems are usually evaluated by reconstruction quality, while the transmitted symbols are treated as a black-box by-product of training. That is a problem for deployment. Symbol statistics determine entropy, peak behavior, fronthaul compressibility, RF stress, and how well the learned interface matches a physical channel.

This work makes those statistics analyzable. It separates two forces that are usually entangled in end-to-end training: channel pressure, which prefers high-entropy channel-useful symbols, and source pressure, which prefers energy allocation that behaves like implicit variable-length coding.

What This Paper Does

The paper derives a Student-t symbol model from a source-channel tradeoff and validates it on image semantic communication systems. DeepJSCC, which has a fixed symbol budget, produces heavier-tailed symbols because it must adapt through symbol amplitudes. NTSCC, which has explicit symbol-rate control, shifts closer to a Gaussian-like regime because part of the rate adaptation is handled structurally.

The same logic explains dataset effects. ImageNet has much larger image-to-image entropy variability than CIFAR-10, so fixed-length transmission benefits more from implicit rate adaptation and learns heavier tails. CIFAR-10 is more uniform, so the learned distribution is closer to Gaussian.

nu = 2.84DeepJSCC symbols: heavier-tailed, fixed-length signaling.

nu = 7.92NTSCC symbols: more Gaussian-like with explicit rate control.

4.8%Extra training time for the KDE regularizer in the CIFAR-10 setup.

Symbol distributions by coding scheme — Symbol distributions with respect to coding schemes. Fixed-length DeepJSCC and variable-rate NTSCC produce measurably different tails, matching the source-channel pressure interpretation.

Key Results

The learned symbol distribution is well described by a variance-normalized Student-t family.
Explicit symbol-rate control pushes symbols toward larger nu, i.e., a more Gaussian-like regime.
Larger sample-to-sample source entropy variability pushes fixed-length systems toward heavier tails.
Training SNR changes the fitted tail behavior, showing that the channel condition also shapes the latent law.
A weak Gaussian-prior regularizer improves convergence in source-coding-dominated regimes, supporting the idea that distribution shaping affects training dynamics.