SekoKuva Mobile 423K

SekoKuva Mobile 423K

A lightweight, 100% open-source image feature extractor designed for on-device transfer learning

🇫🇮 Designed and trained in Finland by BC Bertenex Oy
📱 Built for mobile deployment — runs on any Android/iOS device
🎓 Ideal for education, prototyping, and production mobile apps
⚖️ 100% clean license chain — architecture, data, and weights all openly licensed

Developer BC Bertenex Oy (Finland)
Trademark SekoKuva®
Model Type Convolutional Neural Network (Image Classification / Feature Extraction)
Parameters 423,020 (0.42M)
Model Size 1.6 MB (float32) · 0.4 MB (int8 quantized)
Feature Dimension 512
Input Resolution 224 × 224 × 3 (RGB)
Training Data OpenImages V7, bounding-box verified subset (CC BY 4.0)
Training Classes 65 diverse categories
License Apache 2.0

Highlights

  • Ultra-Lightweight: Only 423K parameters — 8× smaller than MobileNetV2, runs on the cheapest smartphones
  • 512-Dim Feature Extractor: Outputs a compact, powerful feature vector that enables on-device transfer learning with as few as 5–10 images per class
  • 100% Open License Chain: Own architecture → own training code → CC BY 4.0 training data → own weights. No legal grey areas. No inherited license restrictions. Every component is transparent and traceable.
  • Designed in Finland: Built by BC Bertenex Oy, a Finnish company, to European standards of transparency and data responsibility
  • Education-Friendly Design: Clear, well-documented architecture suitable for teaching how neural networks work — from first-year students to professionals exploring on-device AI
  • Production Ready: Exported to ONNX (and TFLite), suitable for embedding in Android/iOS/Flutter apps via standard inference runtimes

Why This Model Exists

Most pre-trained mobile models (MobileNetV2, EfficientNet-Lite) carry weights derived from ImageNet training under ambiguous or restrictive license terms. If you build a commercial product on those weights, your legal standing is unclear.

SekoKuva Mobile 423K solves this. Every component has a clear, permissive license:

Component Source License
Architecture Original design by BC Bertenex Oy Apache 2.0
Training code Written from scratch Apache 2.0
Training data OpenImages V7 (bbox-verified only) CC BY 4.0
Model weights Trained from scratch (random init) Apache 2.0

You can use this model in commercial products, modify it, redistribute it, and build upon it — with full legal clarity.


Model Architecture

SekoKuva Mobile 423K uses depthwise separable convolutions — the same core building block as MobileNetV1 — arranged in 5 progressive stages that transform a raw photo into a compact feature vector.

Input: 224×224×3 (RGB photo)
    │
    ├── Stage 1: Conv2d 3→32, stride 2          → 112×112×32
    ├── Stage 2: DepthwiseSeparable 32→64, s2    → 56×56×64
    ├── Stage 3: DepthwiseSeparable 64→128→128   → 28×28×128
    ├── Stage 4: DepthwiseSeparable 128→256→256  → 14×14×256
    ├── Stage 5: DepthwiseSeparable 256→512, s2  → 7×7×512
    │
    ├── Global Average Pooling                   → 512
    │
    └── Output: 512-dimensional feature vector

Architecture Details

Property Value
Building block Depthwise Separable Convolution
Activation ReLU6 (quantization-friendly)
Normalization Batch Normalization
Pooling Global Average Pooling
Feature dimension 512
Classifier head Linear (512 → num_classes), removable
Dropout 0.2 (before classifier only)
Total blocks 7 depthwise separable + 1 standard conv
FLOPs (224×224) ~85M

Parameter Distribution

Stage Output Shape Parameters
First Conv (3→32) 112×112×32 896
Stage 2 (32→64) 56×56×64 2,400
Stage 3 (64→128→128) 28×28×128 26,432
Stage 4 (128→256→256) 14×14×256 101,376
Stage 5 (256→512) 7×7×512 267,264
Backbone total 512-dim 398,368
Classifier (65 classes) 65 33,345

Training Details

Data

Trained on a curated subset of OpenImages V7 with the following key properties:

  • 65 diverse categories spanning fruits, vegetables, animals, people, vehicles, household objects, plants, and nature
  • Only bounding-box verified images — every training image has a human-drawn bounding box confirming the object's presence (no machine-generated labels, no ambiguity)
  • Cropped to object region with 20% context padding — ensures every image actually shows the labeled object
  • 52,898 training images + 9,478 validation images
  • Minimum bounding box size: 50×50 pixels (filters annotation errors)

Categories

Click to expand full class list (65 classes)

airplane · apple_fruit · backpack · ball · banana · bicycle · bird · boat · book · bottle · bowl · boy · bread · broccoli · bus · butterfly · cabbage · cake · car · carrot · cat · cattle · chair · chicken · coconut · cookie · dog · elephant · fish · flower · frog · girl · goat · grape · hat · horse · houseplant · knife · laptop · lemon · man · mango · mobile_phone · motorcycle · mushroom · orange_fruit · palm_tree · peach · pear · pen · person · pineapple · plate · potato · rose · sheep · strawberry · table · tomato · tortoise · tree · truck · umbrella · watermelon · woman

Training Configuration

Setting Value
Optimizer SGD (momentum=0.9, weight decay=1e-4)
Learning rate 0.05 → cosine annealing → 1e-6
Batch size 128
Epochs 200
Label smoothing 0.1
Mixed precision FP16 (AMP)
Augmentation RandomResizedCrop, HorizontalFlip, Rotation(15°), ColorJitter, CutMix + MixUp
Class balancing WeightedRandomSampler (inversely proportional to class size)
Progressive resolution 112px (ep 0–60) → 160px (ep 60–120) → 224px (ep 120–200)
SWA Stochastic Weight Averaging in final 25% of training
Hardware NVIDIA GeForce RTX 4050 Laptop GPU (6 GB VRAM)
Training time ~7 hours total (200 epochs)

Training Techniques

All techniques are implemented in the open-source training script (train.py) and enabled by default:

  1. Automatic Mixed Precision (AMP): ~2× speedup on GPUs with Tensor Cores
  2. CutMix + MixUp: Advanced augmentation that creates mixed training samples, improving regularization (+2–5% accuracy)
  3. Class-Balanced Sampling: WeightedRandomSampler ensures underrepresented classes (e.g., Cabbage: 368 images) get equal training time as larger classes (e.g., Car: 862 images)
  4. Progressive Resolution: Training starts at 112×112 and gradually increases to 224×224, allowing early epochs to run faster while later epochs refine fine details
  5. Stochastic Weight Averaging (SWA): Averages model weights across the final 25% of training epochs, finding a flatter minimum that generalizes better (+1–2% accuracy)
  6. Gradient Accumulation: Configurable effective batch size without additional VRAM

Performance

Classification Accuracy

Metric Value
Top-1 Accuracy (65 classes) 67.9%
Top-5 Accuracy (65 classes) ~88%
Random baseline (65 classes) 1.5%

Context: Model Size vs. Accuracy

Model Params Pre-trained on ImageNet? Expected Accuracy (65 classes)
Random guess 1.5%
SekoKuva Mobile 423K 423K No (trained from scratch) 67.9%
MobileNetV2 3.4M No ~75%
MobileNetV2 3.4M Yes ~85%

SekoKuva Mobile 423K achieves competitive accuracy at 8× fewer parameters than MobileNetV2, with the critical advantage of a fully clean license chain. The model is not designed to compete on raw accuracy — it is designed to provide the best possible feature quality at the smallest possible size for on-device transfer learning.

Inference Speed

Platform Latency (224×224, single image)
NVIDIA RTX 4050 (FP16) < 1 ms
NVIDIA RTX 4050 (FP32) ~2 ms
Snapdragon 8 Gen 2 (TFLite, int8) ~5 ms
Mid-range Android (TFLite, int8) ~15 ms
Low-end Android (TFLite, int8) ~30 ms

Quick Start

Installation

pip install torch torchvision onnxruntime

Classification (PyTorch)

import torch
from PIL import Image
from torchvision import transforms

# Load model
from sekokuva_mobile.model import SekoKuvaMobile

checkpoint = torch.load("checkpoints/best.pt", map_location="cpu")
model = SekoKuvaMobile(num_classes=65)
model.load_state_dict(checkpoint["model_state_dict"])
model.eval()

# Preprocess image (same as validation transforms)
transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225]),
])

img = Image.open("photo.jpg").convert("RGB")
input_tensor = transform(img).unsqueeze(0)  # [1, 3, 224, 224]

# Classify
with torch.no_grad():
    logits = model(input_tensor)
    probs = torch.softmax(logits, dim=1)
    top5 = torch.topk(probs, 5)

# Load class names
import json
with open("checkpoints/class_names.json") as f:
    class_names = json.load(f)

for prob, idx in zip(top5.values[0], top5.indices[0]):
    print(f"  {class_names[idx]:<20s} {prob:.1%}")

Feature Extraction (PyTorch)

# Extract 512-dim features for transfer learning
with torch.no_grad():
    features = model.forward_features(input_tensor)  # [1, 512]

print(f"Feature vector: {features.shape}")  # torch.Size([1, 512])

ONNX Inference

import numpy as np
import onnxruntime as ort

# Load ONNX model
session = ort.InferenceSession("exported/sekokuva_mobile_classifier.onnx")

# Run inference (input_array: numpy float32 [1, 3, 224, 224])
result = session.run(None, {"input_image": input_array})
logits = result[0]  # [1, 65] for classifier, [1, 512] for features

Feature Extraction (ONNX)

# Use the features ONNX model for transfer learning pipelines
session = ort.InferenceSession("exported/sekokuva_mobile_features.onnx")
result = session.run(None, {"input_image": input_array})
features = result[0]  # [1, 512]

Transfer Learning — The Key Feature

This is the primary use case. SekoKuva Mobile 423K is designed as a frozen feature extractor that enables on-device transfer learning with minimal data.

How It Works

  1. The model converts any photo into a 512-dimensional feature vector — a compact numerical "fingerprint" that describes the visual content
  2. A user trains a tiny linear layer on top (512 × num_classes parameters) using just 5–10 images per class
  3. Training happens on-device in under a second — no GPU, no server, no cloud

Example: Custom 3-Class Classifier

import torch
import torch.nn as nn

# 1. Freeze the feature extractor
feature_model = SekoKuvaMobile(num_classes=0)  # Feature-only mode
feature_model.load_state_dict(checkpoint["model_state_dict"], strict=False)
feature_model.eval()

# 2. Collect features from user's photos (e.g., 10 photos × 3 classes)
features = []  # list of [512] tensors
labels = []    # list of class indices (0, 1, 2)

for img_path, label in user_training_data:
    img = transform(Image.open(img_path).convert("RGB")).unsqueeze(0)
    with torch.no_grad():
        feat = feature_model.forward_features(img)  # [1, 512]
    features.append(feat.squeeze())
    labels.append(label)

X = torch.stack(features)       # [30, 512]
y = torch.tensor(labels)        # [30]

# 3. Train a tiny classifier (512 × 3 = 1,536 parameters)
classifier = nn.Linear(512, 3)
optimizer = torch.optim.Adam(classifier.parameters(), lr=0.01)
criterion = nn.CrossEntropyLoss()

for epoch in range(100):  # Takes < 1 second total
    logits = classifier(X)
    loss = criterion(logits, y)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

# 4. Classify a new image
new_img = transform(Image.open("new_photo.jpg").convert("RGB")).unsqueeze(0)
with torch.no_grad():
    feat = feature_model.forward_features(new_img)
    prediction = classifier(feat)
    class_idx = prediction.argmax().item()
    print(f"Predicted class: {class_idx}")

Transfer Learning Performance

The quality of transfer learning depends on the feature vector quality, not the top-1 classification accuracy. With SekoKuva Mobile 423K features:

Task Images per class Expected accuracy
Binary classification (e.g., healthy vs. sick leaf) 10 85–95%
3-class classification 10 80–90%
5-class classification 15 75–85%
10-class classification 20 70–80%

These estimates assume the classes are visually distinct. Performance may vary for very similar classes.


Available Model Files

File Format Size Description
checkpoints/best.pt PyTorch ~2 MB Full model checkpoint (classifier + backbone)
checkpoints/swa.pt PyTorch ~2 MB SWA-averaged model (best generalization)
exported/sekokuva_mobile_classifier.onnx ONNX ~1.7 MB Full 65-class classifier
exported/sekokuva_mobile_features.onnx ONNX ~1.6 MB Feature extractor only (512-dim output)
checkpoints/class_names.json JSON 1 KB Ordered list of 65 class names

Which File Should I Use?

  • Building a mobile app with on-device learning?sekokuva_mobile_features.onnx
  • Quick image classification demo?sekokuva_mobile_classifier.onnx
  • Fine-tuning on your own dataset?checkpoints/best.pt (PyTorch)
  • Research or architecture exploration?checkpoints/best.pt + source code

Fine-Tuning Guide

You can fine-tune the entire model on your own dataset. This is different from transfer learning — fine-tuning updates all weights, while transfer learning only trains a new head on frozen features.

When to Fine-Tune vs. Transfer Learn

Approach Best for Data needed Compute needed
Transfer learning (frozen features) Quick, few-shot tasks on phone 5–20 per class CPU, < 1 second
Full fine-tuning Specialized domains (medical, industrial) 100+ per class GPU, minutes–hours

Fine-Tuning Example

# Replace the classifier head and train on your data
python train.py \
    --data_dir /path/to/your/dataset \
    --epochs 50 \
    --batch_size 128 \
    --lr 0.01 \
    --resume checkpoints/best.pt

The training script automatically:

  • Detects the number of classes from your dataset folder structure
  • Replaces the classifier head if class count differs
  • Applies all training enhancements (AMP, CutMix, SWA, etc.)

Dataset Format

your_dataset/
├── train/
│   ├── class_a/
│   │   ├── img001.jpg
│   │   └── img002.jpg
│   └── class_b/
│       ├── img001.jpg
│       └── img002.jpg
└── val/
    ├── class_a/
    │   └── img001.jpg
    └── class_b/
        └── img001.jpg

Intended Use

Primary Use Cases

  • Mobile image classification: Deploy as a lightweight classifier in Android/iOS/Flutter apps that runs fast even on low-end devices
  • On-device transfer learning: Use as a frozen feature extractor so end users can build custom classifiers with just a few photos — no server, no cloud
  • Educational tool: Teach students how neural networks, feature extraction, and transfer learning work through hands-on experimentation
  • Prototyping: Rapidly test image classification ideas before scaling to larger models
  • Clean-license foundation: Build commercial products with full legal clarity — no inherited license ambiguity

Out of Scope

  • Text recognition / OCR: The model processes whole-image features, not localized text
  • Object detection: The model classifies entire images, not bounding boxes within images
  • High-accuracy production classifier: For applications requiring >90% accuracy, consider larger models or the upcoming SekoKuva Mobile 5M
  • Video processing: Designed for single-frame classification

Limitations

  • 65-class vocabulary: The classifier head recognizes 65 categories. The feature extractor generalizes beyond these, but performance on very dissimilar domains (e.g., medical imaging, satellite imagery) may be limited.
  • Small model capacity: With 423K parameters, the model cannot learn as many fine-grained distinctions as larger models. It trades accuracy for size and speed.
  • No pre-training on ImageNet: The model was trained from scratch on ~53K images. Models pre-trained on ImageNet's 1.2M images will have richer feature representations.
  • Resolution: Fixed 224×224 input. Very small objects or fine details may not be captured.

Ethical Considerations

  • Training data: All training data comes from OpenImages V7, which is publicly available under CC BY 4.0. Images were selected using human-verified bounding box annotations to minimize label noise.
  • Bias: The training categories reflect a curated subset of OpenImages chosen for broad everyday-object diversity (fruits, vegetables, animals, vehicles, household items). The model may perform unevenly across underrepresented visual domains.
  • Privacy: No personal data was used beyond what is publicly available in OpenImages V7. The model does not store, transmit, or identify personal information.
  • Environmental impact: Total training compute was approximately 7 GPU-hours on a laptop GPU — orders of magnitude less than large-scale model training.

About

BC Bertenex Oy

BC Bertenex Oy is a Finnish startup based in Eurajoki, Finland. We build AI-driven solutions for small businesses and design AI models for different purposes. We develop our own AI-based products and create educational content about AI.

The SekoKuva Project

SekoKuva (from the Finnish seko kuva — "messed-up image", inspired by the noisy initial state in diffusion models) is a media generation and AI model project started in July 2025. SekoKuva is a Finnish trademark owned by BC Bertenex Oy. Under the SekoKuva brand, we develop consumer-level AI products and open-source AI models — including this family of lightweight vision models designed for mobile deployment and on-device learning.

Roadmap

Model Parameters Status Description
SekoKuva Mobile 423K 423K ✅ Released Feature extractor for transfer learning
SekoKuva Mobile 5M ~5M 🔨 In development Larger model with InvertedResidual blocks, multi-head classification

Reproduce From Scratch

The entire training pipeline is open source. To reproduce this model:

# 1. Clone the repository
git clone https://github.com/BCBertenex/SekoKuva Mobile.git
cd SekoKuva Mobile

# 2. Install dependencies
pip install torch torchvision onnx onnxruntime fiftyone numpy pillow tqdm

# 3. Download training data (bounding-box verified, CC BY 4.0)
python prepare_data_clean.py --download --preset diverse --max-per-class 1000

# 4. Train (all enhancements enabled by default)
python train.py --data_dir ./data/openimages_clean --epochs 200 --batch_size 128 --num_workers 4

# 5. Export to ONNX
python export_tflite.py --checkpoint checkpoints/best.pt --mode features
python export_tflite.py --checkpoint checkpoints/best.pt --mode classifier

Citation

@misc{sekokuva2026mobile423k,
  title     = {SekoKuva Mobile 423K: A Lightweight Open-Source Feature Extractor for On-Device Transfer Learning},
  author    = {{BC Bertenex Oy}},
  year      = {2026},
  url       = {https://huggingface.co/BCBertenex/sekokuva-mobile-423k},
  note      = {Apache 2.0 License. Trained on OpenImages V7 (CC BY 4.0).}
}

License

This model is released under the Apache 2.0 License.

The training data (OpenImages V7) is licensed under Creative Commons Attribution 4.0 (CC BY 4.0).

You are free to use this model for any purpose — commercial, academic, or personal — with attribution to BC Bertenex Oy.

Downloads last month
35
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results