SekoKuva Mobile 423K
A lightweight, 100% open-source image feature extractor designed for on-device transfer learning
🇫🇮 Designed and trained in Finland by BC Bertenex Oy
📱 Built for mobile deployment — runs on any Android/iOS device
🎓 Ideal for education, prototyping, and production mobile apps
⚖️ 100% clean license chain — architecture, data, and weights all openly licensed
| Developer | BC Bertenex Oy (Finland) |
| Trademark | SekoKuva® |
| Model Type | Convolutional Neural Network (Image Classification / Feature Extraction) |
| Parameters | 423,020 (0.42M) |
| Model Size | 1.6 MB (float32) · 0.4 MB (int8 quantized) |
| Feature Dimension | 512 |
| Input Resolution | 224 × 224 × 3 (RGB) |
| Training Data | OpenImages V7, bounding-box verified subset (CC BY 4.0) |
| Training Classes | 65 diverse categories |
| License | Apache 2.0 |
Highlights
- Ultra-Lightweight: Only 423K parameters — 8× smaller than MobileNetV2, runs on the cheapest smartphones
- 512-Dim Feature Extractor: Outputs a compact, powerful feature vector that enables on-device transfer learning with as few as 5–10 images per class
- 100% Open License Chain: Own architecture → own training code → CC BY 4.0 training data → own weights. No legal grey areas. No inherited license restrictions. Every component is transparent and traceable.
- Designed in Finland: Built by BC Bertenex Oy, a Finnish company, to European standards of transparency and data responsibility
- Education-Friendly Design: Clear, well-documented architecture suitable for teaching how neural networks work — from first-year students to professionals exploring on-device AI
- Production Ready: Exported to ONNX (and TFLite), suitable for embedding in Android/iOS/Flutter apps via standard inference runtimes
Why This Model Exists
Most pre-trained mobile models (MobileNetV2, EfficientNet-Lite) carry weights derived from ImageNet training under ambiguous or restrictive license terms. If you build a commercial product on those weights, your legal standing is unclear.
SekoKuva Mobile 423K solves this. Every component has a clear, permissive license:
| Component | Source | License |
|---|---|---|
| Architecture | Original design by BC Bertenex Oy | Apache 2.0 |
| Training code | Written from scratch | Apache 2.0 |
| Training data | OpenImages V7 (bbox-verified only) | CC BY 4.0 |
| Model weights | Trained from scratch (random init) | Apache 2.0 |
You can use this model in commercial products, modify it, redistribute it, and build upon it — with full legal clarity.
Model Architecture
SekoKuva Mobile 423K uses depthwise separable convolutions — the same core building block as MobileNetV1 — arranged in 5 progressive stages that transform a raw photo into a compact feature vector.
Input: 224×224×3 (RGB photo)
│
├── Stage 1: Conv2d 3→32, stride 2 → 112×112×32
├── Stage 2: DepthwiseSeparable 32→64, s2 → 56×56×64
├── Stage 3: DepthwiseSeparable 64→128→128 → 28×28×128
├── Stage 4: DepthwiseSeparable 128→256→256 → 14×14×256
├── Stage 5: DepthwiseSeparable 256→512, s2 → 7×7×512
│
├── Global Average Pooling → 512
│
└── Output: 512-dimensional feature vector
Architecture Details
| Property | Value |
|---|---|
| Building block | Depthwise Separable Convolution |
| Activation | ReLU6 (quantization-friendly) |
| Normalization | Batch Normalization |
| Pooling | Global Average Pooling |
| Feature dimension | 512 |
| Classifier head | Linear (512 → num_classes), removable |
| Dropout | 0.2 (before classifier only) |
| Total blocks | 7 depthwise separable + 1 standard conv |
| FLOPs (224×224) | ~85M |
Parameter Distribution
| Stage | Output Shape | Parameters |
|---|---|---|
| First Conv (3→32) | 112×112×32 | 896 |
| Stage 2 (32→64) | 56×56×64 | 2,400 |
| Stage 3 (64→128→128) | 28×28×128 | 26,432 |
| Stage 4 (128→256→256) | 14×14×256 | 101,376 |
| Stage 5 (256→512) | 7×7×512 | 267,264 |
| Backbone total | 512-dim | 398,368 |
| Classifier (65 classes) | 65 | 33,345 |
Training Details
Data
Trained on a curated subset of OpenImages V7 with the following key properties:
- 65 diverse categories spanning fruits, vegetables, animals, people, vehicles, household objects, plants, and nature
- Only bounding-box verified images — every training image has a human-drawn bounding box confirming the object's presence (no machine-generated labels, no ambiguity)
- Cropped to object region with 20% context padding — ensures every image actually shows the labeled object
- 52,898 training images + 9,478 validation images
- Minimum bounding box size: 50×50 pixels (filters annotation errors)
Categories
Click to expand full class list (65 classes)
airplane · apple_fruit · backpack · ball · banana · bicycle · bird · boat · book · bottle · bowl · boy · bread · broccoli · bus · butterfly · cabbage · cake · car · carrot · cat · cattle · chair · chicken · coconut · cookie · dog · elephant · fish · flower · frog · girl · goat · grape · hat · horse · houseplant · knife · laptop · lemon · man · mango · mobile_phone · motorcycle · mushroom · orange_fruit · palm_tree · peach · pear · pen · person · pineapple · plate · potato · rose · sheep · strawberry · table · tomato · tortoise · tree · truck · umbrella · watermelon · woman
Training Configuration
| Setting | Value |
|---|---|
| Optimizer | SGD (momentum=0.9, weight decay=1e-4) |
| Learning rate | 0.05 → cosine annealing → 1e-6 |
| Batch size | 128 |
| Epochs | 200 |
| Label smoothing | 0.1 |
| Mixed precision | FP16 (AMP) |
| Augmentation | RandomResizedCrop, HorizontalFlip, Rotation(15°), ColorJitter, CutMix + MixUp |
| Class balancing | WeightedRandomSampler (inversely proportional to class size) |
| Progressive resolution | 112px (ep 0–60) → 160px (ep 60–120) → 224px (ep 120–200) |
| SWA | Stochastic Weight Averaging in final 25% of training |
| Hardware | NVIDIA GeForce RTX 4050 Laptop GPU (6 GB VRAM) |
| Training time | ~7 hours total (200 epochs) |
Training Techniques
All techniques are implemented in the open-source training script (train.py) and enabled by default:
- Automatic Mixed Precision (AMP): ~2× speedup on GPUs with Tensor Cores
- CutMix + MixUp: Advanced augmentation that creates mixed training samples, improving regularization (+2–5% accuracy)
- Class-Balanced Sampling: WeightedRandomSampler ensures underrepresented classes (e.g., Cabbage: 368 images) get equal training time as larger classes (e.g., Car: 862 images)
- Progressive Resolution: Training starts at 112×112 and gradually increases to 224×224, allowing early epochs to run faster while later epochs refine fine details
- Stochastic Weight Averaging (SWA): Averages model weights across the final 25% of training epochs, finding a flatter minimum that generalizes better (+1–2% accuracy)
- Gradient Accumulation: Configurable effective batch size without additional VRAM
Performance
Classification Accuracy
| Metric | Value |
|---|---|
| Top-1 Accuracy (65 classes) | 67.9% |
| Top-5 Accuracy (65 classes) | ~88% |
| Random baseline (65 classes) | 1.5% |
Context: Model Size vs. Accuracy
| Model | Params | Pre-trained on ImageNet? | Expected Accuracy (65 classes) |
|---|---|---|---|
| Random guess | — | — | 1.5% |
| SekoKuva Mobile 423K | 423K | No (trained from scratch) | 67.9% |
| MobileNetV2 | 3.4M | No | ~75% |
| MobileNetV2 | 3.4M | Yes | ~85% |
SekoKuva Mobile 423K achieves competitive accuracy at 8× fewer parameters than MobileNetV2, with the critical advantage of a fully clean license chain. The model is not designed to compete on raw accuracy — it is designed to provide the best possible feature quality at the smallest possible size for on-device transfer learning.
Inference Speed
| Platform | Latency (224×224, single image) |
|---|---|
| NVIDIA RTX 4050 (FP16) | < 1 ms |
| NVIDIA RTX 4050 (FP32) | ~2 ms |
| Snapdragon 8 Gen 2 (TFLite, int8) | ~5 ms |
| Mid-range Android (TFLite, int8) | ~15 ms |
| Low-end Android (TFLite, int8) | ~30 ms |
Quick Start
Installation
pip install torch torchvision onnxruntime
Classification (PyTorch)
import torch
from PIL import Image
from torchvision import transforms
# Load model
from sekokuva_mobile.model import SekoKuvaMobile
checkpoint = torch.load("checkpoints/best.pt", map_location="cpu")
model = SekoKuvaMobile(num_classes=65)
model.load_state_dict(checkpoint["model_state_dict"])
model.eval()
# Preprocess image (same as validation transforms)
transform = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
])
img = Image.open("photo.jpg").convert("RGB")
input_tensor = transform(img).unsqueeze(0) # [1, 3, 224, 224]
# Classify
with torch.no_grad():
logits = model(input_tensor)
probs = torch.softmax(logits, dim=1)
top5 = torch.topk(probs, 5)
# Load class names
import json
with open("checkpoints/class_names.json") as f:
class_names = json.load(f)
for prob, idx in zip(top5.values[0], top5.indices[0]):
print(f" {class_names[idx]:<20s} {prob:.1%}")
Feature Extraction (PyTorch)
# Extract 512-dim features for transfer learning
with torch.no_grad():
features = model.forward_features(input_tensor) # [1, 512]
print(f"Feature vector: {features.shape}") # torch.Size([1, 512])
ONNX Inference
import numpy as np
import onnxruntime as ort
# Load ONNX model
session = ort.InferenceSession("exported/sekokuva_mobile_classifier.onnx")
# Run inference (input_array: numpy float32 [1, 3, 224, 224])
result = session.run(None, {"input_image": input_array})
logits = result[0] # [1, 65] for classifier, [1, 512] for features
Feature Extraction (ONNX)
# Use the features ONNX model for transfer learning pipelines
session = ort.InferenceSession("exported/sekokuva_mobile_features.onnx")
result = session.run(None, {"input_image": input_array})
features = result[0] # [1, 512]
Transfer Learning — The Key Feature
This is the primary use case. SekoKuva Mobile 423K is designed as a frozen feature extractor that enables on-device transfer learning with minimal data.
How It Works
- The model converts any photo into a 512-dimensional feature vector — a compact numerical "fingerprint" that describes the visual content
- A user trains a tiny linear layer on top (512 × num_classes parameters) using just 5–10 images per class
- Training happens on-device in under a second — no GPU, no server, no cloud
Example: Custom 3-Class Classifier
import torch
import torch.nn as nn
# 1. Freeze the feature extractor
feature_model = SekoKuvaMobile(num_classes=0) # Feature-only mode
feature_model.load_state_dict(checkpoint["model_state_dict"], strict=False)
feature_model.eval()
# 2. Collect features from user's photos (e.g., 10 photos × 3 classes)
features = [] # list of [512] tensors
labels = [] # list of class indices (0, 1, 2)
for img_path, label in user_training_data:
img = transform(Image.open(img_path).convert("RGB")).unsqueeze(0)
with torch.no_grad():
feat = feature_model.forward_features(img) # [1, 512]
features.append(feat.squeeze())
labels.append(label)
X = torch.stack(features) # [30, 512]
y = torch.tensor(labels) # [30]
# 3. Train a tiny classifier (512 × 3 = 1,536 parameters)
classifier = nn.Linear(512, 3)
optimizer = torch.optim.Adam(classifier.parameters(), lr=0.01)
criterion = nn.CrossEntropyLoss()
for epoch in range(100): # Takes < 1 second total
logits = classifier(X)
loss = criterion(logits, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
# 4. Classify a new image
new_img = transform(Image.open("new_photo.jpg").convert("RGB")).unsqueeze(0)
with torch.no_grad():
feat = feature_model.forward_features(new_img)
prediction = classifier(feat)
class_idx = prediction.argmax().item()
print(f"Predicted class: {class_idx}")
Transfer Learning Performance
The quality of transfer learning depends on the feature vector quality, not the top-1 classification accuracy. With SekoKuva Mobile 423K features:
| Task | Images per class | Expected accuracy |
|---|---|---|
| Binary classification (e.g., healthy vs. sick leaf) | 10 | 85–95% |
| 3-class classification | 10 | 80–90% |
| 5-class classification | 15 | 75–85% |
| 10-class classification | 20 | 70–80% |
These estimates assume the classes are visually distinct. Performance may vary for very similar classes.
Available Model Files
| File | Format | Size | Description |
|---|---|---|---|
checkpoints/best.pt |
PyTorch | ~2 MB | Full model checkpoint (classifier + backbone) |
checkpoints/swa.pt |
PyTorch | ~2 MB | SWA-averaged model (best generalization) |
exported/sekokuva_mobile_classifier.onnx |
ONNX | ~1.7 MB | Full 65-class classifier |
exported/sekokuva_mobile_features.onnx |
ONNX | ~1.6 MB | Feature extractor only (512-dim output) |
checkpoints/class_names.json |
JSON | 1 KB | Ordered list of 65 class names |
Which File Should I Use?
- Building a mobile app with on-device learning? →
sekokuva_mobile_features.onnx - Quick image classification demo? →
sekokuva_mobile_classifier.onnx - Fine-tuning on your own dataset? →
checkpoints/best.pt(PyTorch) - Research or architecture exploration? →
checkpoints/best.pt+ source code
Fine-Tuning Guide
You can fine-tune the entire model on your own dataset. This is different from transfer learning — fine-tuning updates all weights, while transfer learning only trains a new head on frozen features.
When to Fine-Tune vs. Transfer Learn
| Approach | Best for | Data needed | Compute needed |
|---|---|---|---|
| Transfer learning (frozen features) | Quick, few-shot tasks on phone | 5–20 per class | CPU, < 1 second |
| Full fine-tuning | Specialized domains (medical, industrial) | 100+ per class | GPU, minutes–hours |
Fine-Tuning Example
# Replace the classifier head and train on your data
python train.py \
--data_dir /path/to/your/dataset \
--epochs 50 \
--batch_size 128 \
--lr 0.01 \
--resume checkpoints/best.pt
The training script automatically:
- Detects the number of classes from your dataset folder structure
- Replaces the classifier head if class count differs
- Applies all training enhancements (AMP, CutMix, SWA, etc.)
Dataset Format
your_dataset/
├── train/
│ ├── class_a/
│ │ ├── img001.jpg
│ │ └── img002.jpg
│ └── class_b/
│ ├── img001.jpg
│ └── img002.jpg
└── val/
├── class_a/
│ └── img001.jpg
└── class_b/
└── img001.jpg
Intended Use
Primary Use Cases
- Mobile image classification: Deploy as a lightweight classifier in Android/iOS/Flutter apps that runs fast even on low-end devices
- On-device transfer learning: Use as a frozen feature extractor so end users can build custom classifiers with just a few photos — no server, no cloud
- Educational tool: Teach students how neural networks, feature extraction, and transfer learning work through hands-on experimentation
- Prototyping: Rapidly test image classification ideas before scaling to larger models
- Clean-license foundation: Build commercial products with full legal clarity — no inherited license ambiguity
Out of Scope
- Text recognition / OCR: The model processes whole-image features, not localized text
- Object detection: The model classifies entire images, not bounding boxes within images
- High-accuracy production classifier: For applications requiring >90% accuracy, consider larger models or the upcoming SekoKuva Mobile 5M
- Video processing: Designed for single-frame classification
Limitations
- 65-class vocabulary: The classifier head recognizes 65 categories. The feature extractor generalizes beyond these, but performance on very dissimilar domains (e.g., medical imaging, satellite imagery) may be limited.
- Small model capacity: With 423K parameters, the model cannot learn as many fine-grained distinctions as larger models. It trades accuracy for size and speed.
- No pre-training on ImageNet: The model was trained from scratch on ~53K images. Models pre-trained on ImageNet's 1.2M images will have richer feature representations.
- Resolution: Fixed 224×224 input. Very small objects or fine details may not be captured.
Ethical Considerations
- Training data: All training data comes from OpenImages V7, which is publicly available under CC BY 4.0. Images were selected using human-verified bounding box annotations to minimize label noise.
- Bias: The training categories reflect a curated subset of OpenImages chosen for broad everyday-object diversity (fruits, vegetables, animals, vehicles, household items). The model may perform unevenly across underrepresented visual domains.
- Privacy: No personal data was used beyond what is publicly available in OpenImages V7. The model does not store, transmit, or identify personal information.
- Environmental impact: Total training compute was approximately 7 GPU-hours on a laptop GPU — orders of magnitude less than large-scale model training.
About
BC Bertenex Oy
BC Bertenex Oy is a Finnish startup based in Eurajoki, Finland. We build AI-driven solutions for small businesses and design AI models for different purposes. We develop our own AI-based products and create educational content about AI.
The SekoKuva Project
SekoKuva (from the Finnish seko kuva — "messed-up image", inspired by the noisy initial state in diffusion models) is a media generation and AI model project started in July 2025. SekoKuva is a Finnish trademark owned by BC Bertenex Oy. Under the SekoKuva brand, we develop consumer-level AI products and open-source AI models — including this family of lightweight vision models designed for mobile deployment and on-device learning.
Roadmap
| Model | Parameters | Status | Description |
|---|---|---|---|
| SekoKuva Mobile 423K | 423K | ✅ Released | Feature extractor for transfer learning |
| SekoKuva Mobile 5M | ~5M | 🔨 In development | Larger model with InvertedResidual blocks, multi-head classification |
Reproduce From Scratch
The entire training pipeline is open source. To reproduce this model:
# 1. Clone the repository
git clone https://github.com/BCBertenex/SekoKuva Mobile.git
cd SekoKuva Mobile
# 2. Install dependencies
pip install torch torchvision onnx onnxruntime fiftyone numpy pillow tqdm
# 3. Download training data (bounding-box verified, CC BY 4.0)
python prepare_data_clean.py --download --preset diverse --max-per-class 1000
# 4. Train (all enhancements enabled by default)
python train.py --data_dir ./data/openimages_clean --epochs 200 --batch_size 128 --num_workers 4
# 5. Export to ONNX
python export_tflite.py --checkpoint checkpoints/best.pt --mode features
python export_tflite.py --checkpoint checkpoints/best.pt --mode classifier
Citation
@misc{sekokuva2026mobile423k,
title = {SekoKuva Mobile 423K: A Lightweight Open-Source Feature Extractor for On-Device Transfer Learning},
author = {{BC Bertenex Oy}},
year = {2026},
url = {https://huggingface.co/BCBertenex/sekokuva-mobile-423k},
note = {Apache 2.0 License. Trained on OpenImages V7 (CC BY 4.0).}
}
License
This model is released under the Apache 2.0 License.
The training data (OpenImages V7) is licensed under Creative Commons Attribution 4.0 (CC BY 4.0).
You are free to use this model for any purpose — commercial, academic, or personal — with attribution to BC Bertenex Oy.
- Downloads last month
- 35
Evaluation results
- Top-1 Accuracy on OpenImages V7 (65 classes, bbox-verified)validation set self-reported67.900