Volcano and Jitter Plots
=========================

sjanpy provides two plotting functions for visualizing differential expression
results: a volcano plot and a per-cluster jitter plot with gene highlights.

Preparing DEG results
---------------------

These plots consume DataFrames produced by the DEG computation functions in
``sjanpy.tl``. Here we use PBMC 3k as an example:

.. code-block:: python

   import scanpy as sc
   from sjanpy.tl import fast_two_group_deg

   adata = sc.datasets.pbmc3k_processed()

   # Compare B cells vs T cells
   deg_results = fast_two_group_deg(
       adata,
       label_col='louvain',
       lst1=['B cells'],
       lst2=['CD4 T cells'],
   )
   print(deg_results.head())

Volcano plot
------------

.. code-block:: python

   from sjanpy.pl import plot_volcano

   plot_volcano(
       deg_results,
       logfc_col='log2FC',
       padj_col='padj',
       lfc_thr=1.0,
       adj_p_thr=0.05,
       title='B cells vs CD4 T cells',
   )

The plot marks genes as Up (teal), Down (salmon), or NS (grey) based on the
log fold-change and adjusted p-value thresholds. Dashed lines show the cutoffs.

Cluster-level jitter plot
--------------------------

When using :func:`~sjanpy.tl.deg.compute_nested_deg_df` to compute DEGs within
each cluster, visualize results with a jitter plot:

.. code-block:: python

   from sjanpy.tl import compute_nested_deg_df, generate_highlight_dict
   from sjanpy.pl import plot_cluster_deg_jitter_highlight

   # Compute within-cluster DEGs (requires a condition column)
   # For demonstration, assume adata.obs has a 'condition' column
   nested_deg = compute_nested_deg_df(
       adata,
       cluster_key='louvain',
       condition_key='condition',
       target_condition='Disease',
       reference_condition='Control',
   )

   # Select genes to highlight
   highlights = generate_highlight_dict(
       nested_deg,
       strategies=['topn'],
       cluster_key='cluster',
       top_n=3,
       exclude_regex=[r'^MT-', r'^RP[SL]'],
   )

   # Plot
   plot_cluster_deg_jitter_highlight(
       nested_deg,
       cluster_key='cluster',
       target_name='Disease',
       reference_name='Control',
       highlight_dict=highlights,
       vrange=(-5, 5),
   )

Customizing highlight strategies
---------------------------------

``generate_highlight_dict`` supports three strategies that can be combined:

- ``'topn'``: top N genes by absolute logFC per cluster
- ``'ktimes'``: genes that are significant in at least k clusters
- ``'manual'``: explicitly specified gene list

.. code-block:: python

   highlights = generate_highlight_dict(
       nested_deg,
       strategies=['topn', 'ktimes', 'manual'],
       top_n=5,
       k=3,
       ktimes_poscut=1.0,
       ktimes_negcut=-1.0,
       manual_genes=['CD3D', 'MS4A1', 'LYZ'],
       exclude_regex=[r'^MT-', r'^RP[SL]', r'^AC\d+'],
   )