SentenceTransformer

This model was finetuned with Unsloth.

based on unsloth/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from unsloth/all-MiniLM-L6-v2 on the codesearchnet dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: unsloth/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
  • Language: en

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'PeftModelForFeatureExtraction'})
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    '<p>\nUser-supplied properties in key-value form.\n</p>\n\n@param parameters\nUser-supplied properties in key-value form.\n@return Returns a reference to this object so that method calls can be chained together.',
    'public StorageDescriptor withParameters(java.util.Map<String, String> parameters) {\n        setParameters(parameters);\n        return this;\n    }',
    "public static function unserializeFromStringRepresentation($string)\n    {\n        if (!preg_match('~k:(?P<k>\\d+)/m:(?P<m>\\d+)\\((?P<bitfield>[0-9a-zA-Z+/=]+)\\)~', $string, $matches)) {\n            throw new InvalidArgumentException('Invalid string representation');\n        }\n        $bf = new self((int) $matches['m'], (int) $matches['k']);\n        $bf->bitField = base64_decode($matches['bitfield']);\n        return $bf;\n    }",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000,  0.6597, -0.0469],
#         [ 0.6597,  1.0000,  0.0107],
#         [-0.0469,  0.0107,  1.0000]], dtype=torch.float16)

Training Details

Training Dataset

codesearchnet

  • Dataset: codesearchnet at 079a958
  • Size: 1,375,067 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 4 tokens
    • mean: 29.95 tokens
    • max: 127 tokens
    • min: 28 tokens
    • mean: 131.03 tokens
    • max: 256 tokens
  • Samples:
    anchor positive
    Computes the new parent id for the node being moved.

    @return int
    protected function parentId()
    {
    switch ( $this->position )
    {
    case 'root':
    return null;

    case 'child':
    return $this->target->getKey();

    default:
    return $this->target->getParentId();
    }
    }
    // SetWinSize overwrites the playlist's window size. func (p *MediaPlaylist) SetWinSize(winsize uint) error {
    if winsize > p.capacity {
    return errors.New("capacity must be greater than winsize or equal")
    }
    p.winsize = winsize
    return nil
    }
    Show the sidebar and squish the container to make room for the sidebar.
    If hideOthers is true, hide other open sidebars.
    function() {
    var options = this.options;

    if (options.hideOthers) {
    this.secondary.each(function() {
    var sidebar = $(this);

    if (sidebar.hasClass('is-expanded')) {
    sidebar.toolkit('offCanvas', 'hide');
    }
    });
    }

    this.fireEvent('showing');

    this.container.addClass('move-' + this.opposite);

    this.element
    .reveal()
    .addClass('is-expanded')
    .aria('expanded', true);

    if (options.stopScroll) {
    $('body').addClass('no-scroll');
    }

    this.fireEvent('shown');
    }
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 64
  • gradient_accumulation_steps: 4
  • learning_rate: 0.0002
  • num_train_epochs: 2
  • warmup_ratio: 0.03
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 4
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 0.0002
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.03
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss
0.0186 50 0.5333
0.0372 100 0.3948
0.0559 150 0.311
0.0745 200 0.2721
0.0931 250 0.2809
0.1117 300 0.2533
0.1303 350 0.2472
0.1489 400 0.2378
0.1676 450 0.2383
0.1862 500 0.2239
0.2048 550 0.2236
0.2234 600 0.2191
0.2420 650 0.2248
0.2606 700 0.2176
0.2793 750 0.2171
0.2979 800 0.2114
0.3165 850 0.222
0.3351 900 0.2066
0.3537 950 0.2059
0.3723 1000 0.2053
0.3910 1050 0.2011
0.4096 1100 0.2024
0.4282 1150 0.2006
0.4468 1200 0.1976
0.4654 1250 0.1968
0.4840 1300 0.195
0.5027 1350 0.1921
0.5213 1400 0.1967
0.5399 1450 0.1895
0.5585 1500 0.1864
0.5771 1550 0.189
0.5957 1600 0.1857
0.6144 1650 0.1889
0.6330 1700 0.1796
0.6516 1750 0.1718
0.6702 1800 0.1866
0.6888 1850 0.1874
0.7074 1900 0.178
0.7261 1950 0.1763
0.7447 2000 0.1734
0.7633 2050 0.1823
0.7819 2100 0.1796
0.8005 2150 0.1737
0.8191 2200 0.1796
0.8378 2250 0.1794
0.8564 2300 0.1703
0.8750 2350 0.1746
0.8936 2400 0.1864
0.9122 2450 0.173
0.9308 2500 0.1729
0.9495 2550 0.1742
0.9681 2600 0.1776
0.9867 2650 0.182
1.0052 2700 0.1661
1.0238 2750 0.1627
1.0424 2800 0.158
1.0611 2850 0.1585
1.0797 2900 0.1555
1.0983 2950 0.1566
1.1169 3000 0.1511
1.1355 3050 0.1557
1.1541 3100 0.1589
1.1728 3150 0.1545
1.1914 3200 0.1567
1.2100 3250 0.1561
1.2286 3300 0.1515
1.2472 3350 0.153
1.2658 3400 0.1557
1.2845 3450 0.1506
1.3031 3500 0.1572
1.3217 3550 0.1543
1.3403 3600 0.1619
1.3589 3650 0.1586
1.3775 3700 0.16
1.3962 3750 0.1594
1.4148 3800 0.1528
1.4334 3850 0.1516
1.4520 3900 0.1529
1.4706 3950 0.149
1.4892 4000 0.1572
1.5079 4050 0.1505
1.5265 4100 0.1552
1.5451 4150 0.1488
1.5637 4200 0.161
1.5823 4250 0.151
1.6009 4300 0.1442
1.6196 4350 0.1511
1.6382 4400 0.1475
1.6568 4450 0.1509
1.6754 4500 0.1512
1.6940 4550 0.1484
1.7127 4600 0.1491
1.7313 4650 0.143
1.7499 4700 0.1479
1.7685 4750 0.1459
1.7871 4800 0.1434
1.8057 4850 0.1475
1.8244 4900 0.1485
1.8430 4950 0.147
1.8616 5000 0.157
1.8802 5050 0.1447
1.8988 5100 0.1425
1.9174 5150 0.1491
1.9361 5200 0.1433
1.9547 5250 0.1382
1.9733 5300 0.1391
1.9919 5350 0.1492

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.1.1
  • Transformers: 4.57.1
  • PyTorch: 2.10.0+cu128
  • Accelerate: 1.11.0
  • Datasets: 4.3.0
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
15
Safetensors
Model size
22.7M params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kamp0010/all-minilm-l6-v2-code

Finetuned
(2)
this model

Dataset used to train kamp0010/all-minilm-l6-v2-code

Papers for kamp0010/all-minilm-l6-v2-code