SekoKuva Mobile 423K

A lightweight, 100% open-source image feature extractor designed for on-device transfer learning

🇫🇮 Designed and trained in Finland by BC Bertenex Oy
📱 Built for mobile deployment — runs on any Android/iOS device
🎓 Ideal for education, prototyping, and production mobile apps
⚖️ 100% clean license chain — architecture, data, and weights all openly licensed


Developer	BC Bertenex Oy (Finland)
Trademark	SekoKuva®
Model Type	Convolutional Neural Network (Image Classification / Feature Extraction)
Parameters	423,020 (0.42M)
Model Size	1.6 MB (float32) · 0.4 MB (int8 quantized)
Feature Dimension	512
Input Resolution	224 × 224 × 3 (RGB)
Training Data	OpenImages V7, bounding-box verified subset (CC BY 4.0)
Training Classes	65 diverse categories
License	Apache 2.0

Highlights

Ultra-Lightweight: Only 423K parameters — 8× smaller than MobileNetV2, runs on the cheapest smartphones
512-Dim Feature Extractor: Outputs a compact, powerful feature vector that enables on-device transfer learning with as few as 5–10 images per class
100% Open License Chain: Own architecture → own training code → CC BY 4.0 training data → own weights. No legal grey areas. No inherited license restrictions. Every component is transparent and traceable.
Designed in Finland: Built by BC Bertenex Oy, a Finnish company, to European standards of transparency and data responsibility
Education-Friendly Design: Clear, well-documented architecture suitable for teaching how neural networks work — from first-year students to professionals exploring on-device AI
Production Ready: Exported to ONNX (and TFLite), suitable for embedding in Android/iOS/Flutter apps via standard inference runtimes

Why This Model Exists

Most pre-trained mobile models (MobileNetV2, EfficientNet-Lite) carry weights derived from ImageNet training under ambiguous or restrictive license terms. If you build a commercial product on those weights, your legal standing is unclear.

SekoKuva Mobile 423K solves this. Every component has a clear, permissive license:

Component	Source	License
Architecture	Original design by BC Bertenex Oy	Apache 2.0
Training code	Written from scratch	Apache 2.0
Training data	OpenImages V7 (bbox-verified only)	CC BY 4.0
Model weights	Trained from scratch (random init)	Apache 2.0

You can use this model in commercial products, modify it, redistribute it, and build upon it — with full legal clarity.

Model Architecture

SekoKuva Mobile 423K uses depthwise separable convolutions — the same core building block as MobileNetV1 — arranged in 5 progressive stages that transform a raw photo into a compact feature vector.

Input: 224×224×3 (RGB photo)
    │
    ├── Stage 1: Conv2d 3→32, stride 2          → 112×112×32
    ├── Stage 2: DepthwiseSeparable 32→64, s2    → 56×56×64
    ├── Stage 3: DepthwiseSeparable 64→128→128   → 28×28×128
    ├── Stage 4: DepthwiseSeparable 128→256→256  → 14×14×256
    ├── Stage 5: DepthwiseSeparable 256→512, s2  → 7×7×512
    │
    ├── Global Average Pooling                   → 512
    │
    └── Output: 512-dimensional feature vector

Architecture Details

Property	Value
Building block	Depthwise Separable Convolution
Activation	ReLU6 (quantization-friendly)
Normalization	Batch Normalization
Pooling	Global Average Pooling
Feature dimension	512
Classifier head	Linear (512 → num_classes), removable
Dropout	0.2 (before classifier only)
Total blocks	7 depthwise separable + 1 standard conv
FLOPs (224×224)	~85M

Parameter Distribution

Stage	Output Shape	Parameters
First Conv (3→32)	112×112×32	896
Stage 2 (32→64)	56×56×64	2,400
Stage 3 (64→128→128)	28×28×128	26,432
Stage 4 (128→256→256)	14×14×256	101,376
Stage 5 (256→512)	7×7×512	267,264
Backbone total	512-dim	398,368
Classifier (65 classes)	65	33,345

Training Details

Data

Trained on a curated subset of OpenImages V7 with the following key properties:

65 diverse categories spanning fruits, vegetables, animals, people, vehicles, household objects, plants, and nature
Only bounding-box verified images — every training image has a human-drawn bounding box confirming the object's presence (no machine-generated labels, no ambiguity)
Cropped to object region with 20% context padding — ensures every image actually shows the labeled object
52,898 training images + 9,478 validation images
Minimum bounding box size: 50×50 pixels (filters annotation errors)

Training Configuration

Setting	Value
Optimizer	SGD (momentum=0.9, weight decay=1e-4)
Learning rate	0.05 → cosine annealing → 1e-6
Batch size	128
Epochs	200
Label smoothing	0.1
Mixed precision	FP16 (AMP)
Augmentation	RandomResizedCrop, HorizontalFlip, Rotation(15°), ColorJitter, CutMix + MixUp
Class balancing	WeightedRandomSampler (inversely proportional to class size)
Progressive resolution	112px (ep 0–60) → 160px (ep 60–120) → 224px (ep 120–200)
SWA	Stochastic Weight Averaging in final 25% of training
Hardware	NVIDIA GeForce RTX 4050 Laptop GPU (6 GB VRAM)
Training time	~7 hours total (200 epochs)

Training Techniques

All techniques are implemented in the open-source training script (train.py) and enabled by default:

Automatic Mixed Precision (AMP): ~2× speedup on GPUs with Tensor Cores
CutMix + MixUp: Advanced augmentation that creates mixed training samples, improving regularization (+2–5% accuracy)
Class-Balanced Sampling: WeightedRandomSampler ensures underrepresented classes (e.g., Cabbage: 368 images) get equal training time as larger classes (e.g., Car: 862 images)
Progressive Resolution: Training starts at 112×112 and gradually increases to 224×224, allowing early epochs to run faster while later epochs refine fine details
Stochastic Weight Averaging (SWA): Averages model weights across the final 25% of training epochs, finding a flatter minimum that generalizes better (+1–2% accuracy)
Gradient Accumulation: Configurable effective batch size without additional VRAM

Performance

Classification Accuracy

Metric	Value
Top-1 Accuracy (65 classes)	67.9%
Top-5 Accuracy (65 classes)	~88%
Random baseline (65 classes)	1.5%

Context: Model Size vs. Accuracy

Model	Params	Pre-trained on ImageNet?	Expected Accuracy (65 classes)
Random guess	—	—	1.5%
SekoKuva Mobile 423K	423K	No (trained from scratch)	67.9%
MobileNetV2	3.4M	No	~75%
MobileNetV2	3.4M	Yes	~85%

SekoKuva Mobile 423K achieves competitive accuracy at 8× fewer parameters than MobileNetV2, with the critical advantage of a fully clean license chain. The model is not designed to compete on raw accuracy — it is designed to provide the best possible feature quality at the smallest possible size for on-device transfer learning.

Inference Speed

Platform	Latency (224×224, single image)
NVIDIA RTX 4050 (FP16)	< 1 ms
NVIDIA RTX 4050 (FP32)	~2 ms
Snapdragon 8 Gen 2 (TFLite, int8)	~5 ms
Mid-range Android (TFLite, int8)	~15 ms
Low-end Android (TFLite, int8)	~30 ms

Quick Start

Installation

pip install torch torchvision onnxruntime

Classification (PyTorch)

import torch
from PIL import Image
from torchvision import transforms

# Load model
from sekokuva_mobile.model import SekoKuvaMobile

checkpoint = torch.load("checkpoints/best.pt", map_location="cpu")
model = SekoKuvaMobile(num_classes=65)
model.load_state_dict(checkpoint["model_state_dict"])
model.eval()

# Preprocess image (same as validation transforms)
transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225]),
])

img = Image.open("photo.jpg").convert("RGB")
input_tensor = transform(img).unsqueeze(0)  # [1, 3, 224, 224]

# Classify
with torch.no_grad():
    logits = model(input_tensor)
    probs = torch.softmax(logits, dim=1)
    top5 = torch.topk(probs, 5)

# Load class names
import json
with open("checkpoints/class_names.json") as f:
    class_names = json.load(f)

for prob, idx in zip(top5.values[0], top5.indices[0]):
    print(f"  {class_names[idx]:<20s} {prob:.1%}")

Feature Extraction (PyTorch)

# Extract 512-dim features for transfer learning
with torch.no_grad():
    features = model.forward_features(input_tensor)  # [1, 512]

print(f"Feature vector: {features.shape}")  # torch.Size([1, 512])

ONNX Inference

import numpy as np
import onnxruntime as ort

# Load ONNX model
session = ort.InferenceSession("exported/sekokuva_mobile_classifier.onnx")

# Run inference (input_array: numpy float32 [1, 3, 224, 224])
result = session.run(None, {"input_image": input_array})
logits = result[0]  # [1, 65] for classifier, [1, 512] for features

Feature Extraction (ONNX)

# Use the features ONNX model for transfer learning pipelines
session = ort.InferenceSession("exported/sekokuva_mobile_features.onnx")
result = session.run(None, {"input_image": input_array})
features = result[0]  # [1, 512]

Transfer Learning — The Key Feature

This is the primary use case. SekoKuva Mobile 423K is designed as a frozen feature extractor that enables on-device transfer learning with minimal data.

How It Works

The model converts any photo into a 512-dimensional feature vector — a compact numerical "fingerprint" that describes the visual content
A user trains a tiny linear layer on top (512 × num_classes parameters) using just 5–10 images per class
Training happens on-device in under a second — no GPU, no server, no cloud

Example: Custom 3-Class Classifier

import torch
import torch.nn as nn

# 1. Freeze the feature extractor
feature_model = SekoKuvaMobile(num_classes=0)  # Feature-only mode
feature_model.load_state_dict(checkpoint["model_state_dict"], strict=False)
feature_model.eval()

# 2. Collect features from user's photos (e.g., 10 photos × 3 classes)
features = []  # list of [512] tensors
labels = []    # list of class indices (0, 1, 2)

for img_path, label in user_training_data:
    img = transform(Image.open(img_path).convert("RGB")).unsqueeze(0)
    with torch.no_grad():
        feat = feature_model.forward_features(img)  # [1, 512]
    features.append(feat.squeeze())
    labels.append(label)

X = torch.stack(features)       # [30, 512]
y = torch.tensor(labels)        # [30]

# 3. Train a tiny classifier (512 × 3 = 1,536 parameters)
classifier = nn.Linear(512, 3)
optimizer = torch.optim.Adam(classifier.parameters(), lr=0.01)
criterion = nn.CrossEntropyLoss()

for epoch in range(100):  # Takes < 1 second total
    logits = classifier(X)
    loss = criterion(logits, y)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

# 4. Classify a new image
new_img = transform(Image.open("new_photo.jpg").convert("RGB")).unsqueeze(0)
with torch.no_grad():
    feat = feature_model.forward_features(new_img)
    prediction = classifier(feat)
    class_idx = prediction.argmax().item()
    print(f"Predicted class: {class_idx}")

Transfer Learning Performance

The quality of transfer learning depends on the feature vector quality, not the top-1 classification accuracy. With SekoKuva Mobile 423K features:

Task	Images per class	Expected accuracy
Binary classification (e.g., healthy vs. sick leaf)	10	85–95%
3-class classification	10	80–90%
5-class classification	15	75–85%
10-class classification	20	70–80%

These estimates assume the classes are visually distinct. Performance may vary for very similar classes.

Available Model Files

File	Format	Size	Description
`checkpoints/best.pt`	PyTorch	~2 MB	Full model checkpoint (classifier + backbone)
`checkpoints/swa.pt`	PyTorch	~2 MB	SWA-averaged model (best generalization)
`exported/sekokuva_mobile_classifier.onnx`	ONNX	~1.7 MB	Full 65-class classifier
`exported/sekokuva_mobile_features.onnx`	ONNX	~1.6 MB	Feature extractor only (512-dim output)
`checkpoints/class_names.json`	JSON	1 KB	Ordered list of 65 class names

Which File Should I Use?

Building a mobile app with on-device learning? → sekokuva_mobile_features.onnx
Quick image classification demo? → sekokuva_mobile_classifier.onnx
Fine-tuning on your own dataset? → checkpoints/best.pt (PyTorch)
Research or architecture exploration? → checkpoints/best.pt + source code

Fine-Tuning Guide

You can fine-tune the entire model on your own dataset. This is different from transfer learning — fine-tuning updates all weights, while transfer learning only trains a new head on frozen features.

When to Fine-Tune vs. Transfer Learn

Approach	Best for	Data needed	Compute needed
Transfer learning (frozen features)	Quick, few-shot tasks on phone	5–20 per class	CPU, < 1 second
Full fine-tuning	Specialized domains (medical, industrial)	100+ per class	GPU, minutes–hours

Fine-Tuning Example

# Replace the classifier head and train on your data
python train.py \
    --data_dir /path/to/your/dataset \
    --epochs 50 \
    --batch_size 128 \
    --lr 0.01 \
    --resume checkpoints/best.pt

The training script automatically:

Detects the number of classes from your dataset folder structure
Replaces the classifier head if class count differs
Applies all training enhancements (AMP, CutMix, SWA, etc.)

Dataset Format

your_dataset/
├── train/
│   ├── class_a/
│   │   ├── img001.jpg
│   │   └── img002.jpg
│   └── class_b/
│       ├── img001.jpg
│       └── img002.jpg
└── val/
    ├── class_a/
    │   └── img001.jpg
    └── class_b/
        └── img001.jpg

Intended Use

Primary Use Cases

Mobile image classification: Deploy as a lightweight classifier in Android/iOS/Flutter apps that runs fast even on low-end devices
On-device transfer learning: Use as a frozen feature extractor so end users can build custom classifiers with just a few photos — no server, no cloud
Educational tool: Teach students how neural networks, feature extraction, and transfer learning work through hands-on experimentation
Prototyping: Rapidly test image classification ideas before scaling to larger models
Clean-license foundation: Build commercial products with full legal clarity — no inherited license ambiguity

Out of Scope

Text recognition / OCR: The model processes whole-image features, not localized text
Object detection: The model classifies entire images, not bounding boxes within images
High-accuracy production classifier: For applications requiring >90% accuracy, consider larger models or the upcoming SekoKuva Mobile 5M
Video processing: Designed for single-frame classification

Limitations

65-class vocabulary: The classifier head recognizes 65 categories. The feature extractor generalizes beyond these, but performance on very dissimilar domains (e.g., medical imaging, satellite imagery) may be limited.
Small model capacity: With 423K parameters, the model cannot learn as many fine-grained distinctions as larger models. It trades accuracy for size and speed.
No pre-training on ImageNet: The model was trained from scratch on ~53K images. Models pre-trained on ImageNet's 1.2M images will have richer feature representations.
Resolution: Fixed 224×224 input. Very small objects or fine details may not be captured.

Ethical Considerations

Training data: All training data comes from OpenImages V7, which is publicly available under CC BY 4.0. Images were selected using human-verified bounding box annotations to minimize label noise.
Bias: The training categories reflect a curated subset of OpenImages chosen for broad everyday-object diversity (fruits, vegetables, animals, vehicles, household items). The model may perform unevenly across underrepresented visual domains.
Privacy: No personal data was used beyond what is publicly available in OpenImages V7. The model does not store, transmit, or identify personal information.
Environmental impact: Total training compute was approximately 7 GPU-hours on a laptop GPU — orders of magnitude less than large-scale model training.

About

BC Bertenex Oy

BC Bertenex Oy is a Finnish startup based in Eurajoki, Finland. We build AI-driven solutions for small businesses and design AI models for different purposes. We develop our own AI-based products and create educational content about AI.

The SekoKuva Project

SekoKuva (from the Finnish seko kuva — "messed-up image", inspired by the noisy initial state in diffusion models) is a media generation and AI model project started in July 2025. SekoKuva is a Finnish trademark owned by BC Bertenex Oy. Under the SekoKuva brand, we develop consumer-level AI products and open-source AI models — including this family of lightweight vision models designed for mobile deployment and on-device learning.

Roadmap

Model	Parameters	Status	Description
SekoKuva Mobile 423K	423K	✅ Released	Feature extractor for transfer learning
SekoKuva Mobile 5M	~5M	🔨 In development	Larger model with InvertedResidual blocks, multi-head classification

Reproduce From Scratch

The entire training pipeline is open source. To reproduce this model:

# 1. Clone the repository
git clone https://github.com/BCBertenex/SekoKuva Mobile.git
cd SekoKuva Mobile

# 2. Install dependencies
pip install torch torchvision onnx onnxruntime fiftyone numpy pillow tqdm

# 3. Download training data (bounding-box verified, CC BY 4.0)
python prepare_data_clean.py --download --preset diverse --max-per-class 1000

# 4. Train (all enhancements enabled by default)
python train.py --data_dir ./data/openimages_clean --epochs 200 --batch_size 128 --num_workers 4

# 5. Export to ONNX
python export_tflite.py --checkpoint checkpoints/best.pt --mode features
python export_tflite.py --checkpoint checkpoints/best.pt --mode classifier

Citation

@misc{sekokuva2026mobile423k,
  title     = {SekoKuva Mobile 423K: A Lightweight Open-Source Feature Extractor for On-Device Transfer Learning},
  author    = {{BC Bertenex Oy}},
  year      = {2026},
  url       = {https://huggingface.co/BCBertenex/sekokuva-mobile-423k},
  note      = {Apache 2.0 License. Trained on OpenImages V7 (CC BY 4.0).}
}

License

This model is released under the Apache 2.0 License.

The training data (OpenImages V7) is licensed under Creative Commons Attribution 4.0 (CC BY 4.0).

You are free to use this model for any purpose — commercial, academic, or personal — with attribution to BC Bertenex Oy.

Downloads last month: 35

Evaluation results

Top-1 Accuracy on OpenImages V7 (65 classes, bbox-verified)
validation set self-reported

67.900