wait y not just have agi learn the way we do

Large Language Models (LLMs) are traditionally trained by randomly sampling data blocks from massive datasets. However, this way to learn is probably dumb af.

Humans don’t learn effectively by randomly reviewing information -> LLMs might benefit from a more structured approach too.

You may have heard of duolingo $DUOL. They figured something out - humans learn well with spaced repitition.
They also figured out that we like the gamified dopamine hits from that dingy sound and really fantastic advertising. Unfortunately they haven’t yet learned that pittsburgh is a shithole and cmu (my alma mater) talent alone isn’t enough to prop up their entire company.
The same applies to that flashcard guru in ur 10th grade AP Calc class with her 4000 shades of teal crayons. She understood spaced repitition.

Below is my attempt to teach agi spaced repitition. (you getting it yet?)

spaced repition
Midjourney prompt: learning over time, nonathropromorphic, realistic

Abstract aka tldr

Large Language Model (LLM) pretraining usually just tosses data into the model in a random (often redundant, often easy) order. This is super inefficient, and it also leads to “forgetting” the trickier pieces of knowledge. Inspired by human study hacks like spaced repetition, lets propose an alternative “Learn–Concentrate–Repeat” approach that relentlessly hones in on tough data blocks whenever the model starts forgetting them. This drastically cuts training time (up to a 20× speed boost), while also improving downstream quality over standard random-sample training. Basically, the model does its version of flipping flashcards until the knowledge actually sticks.

im not gonna give u a history of spaced repition, or even define what it is. iykyk.

also bc, in the machine learning field – it doesnt fking exist. tho lets place it within the broader idea of curriculum learning, Bengio et al. (2009), which takes lessons from how animals (and kids) usually learn the simple tasks first, then ramp up the difficulty. You might define “easy” by short sentences (fewer tokens) and give easy examples for the first 5 epochs, slightly harder ones for the next 5, etc). That’s basically a linear or stepwise progression, akin to incrementing thru the curriculums of “first grade → second grade → grad school.”

the curriculum learning approaches were basically:

train on short sentences first, then longer ones
start with simple shapes, then more complex geometric stuff
begin with basic tasks, then stack more complex ones on top

some big brains have tried similar-ish stuff:

self-paced learning (Kumar et al., 2010): lets the model pick its own pace but doesn’t do the review cycles
competence-based curriculum learning (Platanios et al., 2019): tries to match content difficulty to model competence, but still uses pre-defined difficulty metrics
anti-curriculum learning (Liu et al. 2022): some madlads tried training on the hardest examples first (like ACCAN for speech recognition). works sometimes but mostly just makes the model cry

but there’s a huge problem with this: who tf decides what’s “easy” and what’s “hard”? in the literature theyre basing it on human annotators or questions that only “highly competent models” could answer. What we’re gonna do is a bit different:
1. dynamic difficulty: we don’t pre-decide what’s hard. the model tells us what it finds difficult through perplexity scores. it’s like when ur studying and ur brain is like “wtf is this” - that’s high perplexity
2. review cycles: traditional curriculum learning is one-way - once u move past the “easy” stuff, u never look back. with spaced repetition, we keep coming back to review shit the model struggled with, just like how u might review flash cards
3. dual-criterion sampling: we’re not just looking at how “hard” something is (perplexity), but also how much the model can learn from it (gradient magnitude). it’s like identifying not just what u find difficult, but also what’s worth spending time on

this is way closer to how humans actually learn. we don’t just go from ez to hard - we circle back, review stuff we’re shaky on, and naturally spend more time on things that are both challenging AND valuable to learn.

time to jump into uncharted waters.

Midjourney prompt: jumping into uncharted waters, in the style of wanderer above a sea of fog

no bs, heres the algorithm

actually thats too easy. its either here or here. 50/50 odds to be devastated.

spaced repitition, explained algorithmically

Split the pretraining process into three main phases $p$:

Learn Phase
- Train on the entire dataset initially
- Record perplexities for all data blocks
- Duration: $p_1$ epochs
Concetrate Phase
- Remove $s_1$% of easiest data blocks (lowest perplexity)
- Train intensively on remaining difficult blocks
- Duration: $p_2$ epochs
Repeat Phase
- Reintroduce all data blocks
- Allow model to reinforce learning
- Duration: $p_3$ epochs

tbh u can just keep 🔂 2 & 3 until satisfied, as a human would do..

jumping into the pseudo code for the approach, this is where shit starts to get a bit complicated. if you just want to see graph go brrr, jump to here

Inputs:

Training dataset $D$
Model $M$ with initial parameters $\theta_0$
Hyperparameters $p_1$, $s_1$, $p_2$, $p_3$, and $n_cycles$

Objective: Minimize Negative Log Likelihood $\mathrm{NLL}(x_j)$

Steps:

$\mathrm{PPLs}, \theta_{p1} \leftarrow \text{Learn}(\theta_0, D, p_1)$
for $r = 1, 2, \ldots, n_cycles$ do:
- $\text{Sort}(\mathrm{PPLs}, D)$
- $S_1 \leftarrow (100 - s_1)%D$
- $\theta_{p2} \leftarrow \text{Concentrate}(\theta_{p1}, S_1, p_2)$
- $\mathrm{PPLs}, \theta_{p3} \leftarrow \text{Repeat}(\theta_{p2}, D, p_3)$
return $\theta$

Where:

Learn($\theta_0$, $D$, $p_1$): Trains on entire dataset for $p_1$ epochs
Concentrate($\theta_{p1}$, $S_1$, $p_2$): Trains on selected difficult subset for $p_2$ epochs
repeat($\theta_{p2}$, $D$, $p_3$): Retrains on full dataset for $p_3$ epochs
$s_1$: Percentage of data to retain in Concentrate phase
$n_cycles$: Number of Learn-Concentrate-repeat cycles

to understand the above, we need to understand the below.

How do we know what is “hard” to learn?

usually, when taking an exam u crammed for 8 redbulls ago, just before u feel the panic onset, you flip through all the questions and look for which ones are the “hardest”.
lets do the same in ML terms.

Basic Definitions
- A sequence of tokens: $s_i = {x_{i,1}, x_{i,2}, …, x_{i,n}}$
- Each data block has length 1024 tokens
- Total number of blocks: $N$
- Current training iteration: $t$
Perplexity-Based Sampling
First, calculate Negative Log Likelihood (NLL) for each token in sequence $s_i$:

$$\mathrm{NLL}(x_{i,k}) = -\log P(x_{i,k} | x_{i,<k}; \theta_t)$$

where:
- $x_{i,k}$ is the $k$-th token in sequence $i$
- $x_{i,<k}$ represents all tokens before $k$ in sequence $i$
- $\theta_t$ are model parameters at iteration $t$
Then calculate perplexity for sequence $s_i$: †

$$\mathrm{PPL}i(t) = \exp\left(\frac{1}{|s_i|}\sum^{|s_i|} \mathrm{NLL}(x_{i,k})\right)$$

Basic perplexity-based sampling probability: †

$$p_i^{\mathrm{basic}}(t) = \frac{\mathrm{PPL}i(t)^{\alpha}}{\sum^{N} \mathrm{PPL}_j(t)^{\alpha}}$$

where $\alpha$ is the perplexity importance hyperparameter.
Dual-Criterion Sampling: Incorporating gradient magnitude:

$$g_i(t) = |\nabla_{\theta_t}\ell_i(\theta_t)|$$

where $\ell_i$ is the loss for block $i$

Combined importance function:

$$I_i(t) = \mathrm{PPL}_i(t)^{\alpha} \times g_i(t)^{\beta}$$

where $\beta$ is the gradient importance hyperparameter.

Final sampling probability:

$$p_i^{\text{dual}}(t) = \frac{I_i(t)}{\sum_{j=1}^{N} I_j(t)}$$

The complete sampling probability formula is thus: †

$$p_i^{\mathrm{dual}}(t) = \frac{I_i(t)}{\sum_{j=1}^{N} I_j(t)} = \frac{\mathrm{PPL}i(t)^\alpha \cdot g_i(t)^{\beta}}{\sum^{N} \mathrm{PPL}_j(t)^\alpha \cdot g_j(t)^{\beta}}$$

🟰

$$p_i^{\mathrm{dual}}(t) = \frac{\left[\exp\left(\frac{1}{|s_i|}\sum_{x_j \in s_i} -\log \mathrm{P}(x_j | x_{<j}; \theta)\right)\right]^\alpha \cdot |\nabla_\theta\ell_i(\theta(t))|^\beta}{\sum_{j=1}^{N} \left[\exp\left(\frac{1}{|s_j|}\sum_{x_k \in s_j} -\log \mathrm{P}(x_k | x_{<k}; \theta)\right)\right]^\alpha \cdot |\nabla_\theta\ell_j(\theta(t))|^\beta}$$

🟰

$$p_i^{\mathrm{dual}}(t) = \frac{\left[\exp\left(\frac{1}{|s_i|}\sum_{x_j \in s_i} -\log \mathrm{P}(x_j | x_{<j}; \theta)\right)\right]^\alpha \cdot |\nabla_\theta(-\log \mathrm{P}(x_i|x_{<i};\theta))|^\beta}{\sum_{j=1}^{N} \left[\exp\left(\frac{1}{|s_j|}\sum_{x_k \in s_j} -\log \mathrm{P}(x_k | x_{<k}; \theta)\right)\right]^\alpha \cdot |\nabla_\theta(-\log \mathrm{P}(x_j|x_{<j};\theta))|^\beta}$$
.
thank god for claude lmao tbt to writing these out by hand in grad school.

(re)Explanation of Parameters

$\mathrm{PPL}_i(t)$: The current (or moving-average) perplexity for data block $i$. Higher perplexity implies the model finds the block “harder.”
$|\nabla_\theta \ell_i(\theta(t))|$: The gradient magnitude with respect to parameters $\theta$, reflecting how much training signal block $i$ provides.
$\alpha$: A nonnegative hyperparameter that determines how strongly perplexity influences sampling.
$\beta$: A nonnegative hyperparameter that determines how strongly the gradient magnitude influences sampling.

side note, to handle large scale variations, we can introduce a normalize versions of both metrics: †

$$\widetilde{\mathrm{PPL}}_i(t) = \frac{\mathrm{PPL}_i(t)}{\max_j \mathrm{PPL}_j(t)}$$

$$\widetilde{g}i(t) = \frac{|\nabla\theta\ell_i(\theta(t))|}{\max_j |\nabla_\theta\ell_j(\theta(t))|}$$

The normalized importance function becomes:

$$I_i(t) = \widetilde{\mathrm{PPL}}_i(t)^\alpha \cdot \widetilde{g}_i(t)^\beta$$

Methodology

Let’s compare the GPT-2 model, trained with LFR, against the OpenAI baseline from huggingface in a five-shot setting bc i dont have the gpu credits nor patience for llama 420B.

For pretraining the GPT-2 models, you could choose the following combination of the aforementioned steps:
1. Phase 1: Learn for 1 epoch (p1 = 1).
2. Phase 2: Concentrate on 50% of the data for 1 epoch (s1 = 50, p2 = 1).
3. Phase 3: repeat the entire dataset for another epoch (p3 = 1).
4. Phase 4: Concentrate on 30% of the data for 5 epochs (n_cycles = 2, s2 = 70, p4 = 1).

To generate profiling information, u should record the data block ID (remember, each data block consists of 1024 tokens) and model perplexity each time a block is randomly sampled from the training corpus across all training iterations.

If you were to train the GPT2 345M model for 8 epochs, each data block will have at least 8 perplexity values captured at different training iterations. At the end of the training process, u’d have a record of the perplexities over time for each data block. From this record, classify each data block as one of the following:
1. Learned: recorded perplexities monotonically decrease.
2. Unlearned: recorded perplexities monotonically increase.
3. Forgotten: recorded perplexities first increase and then decrease.

Roughly 25% of data blocks are forgotten at least once during training and of the data blocks that are forgotten, 82% are forgotten multiple times during training, i.e., they display multiple descent behavior (Figure 1).

Results

i know ur monkey brain skipped the math, fk u i worked hard on that
anyway here r some pretty explications:

spaced repetion > gpt o1

Subject	1.5B-GPT2	1.5B-LFR
STEM	24.5%	26.1%
Humanities	24.8%	27.2%
Social Sciences	24%	23.8%
Other (business, health, misc.)	27.8%	25.1%
Average (across 57 subjects)	25.3%	25.5%

Table uno: Accuracy results for the MMLU benchmark for LFR and the OpenAI Baseline for the 1.5B parameter model. Higher is better.

you might be like wtf its not that much better!!?!!?!?!?!?!
ur right, except ur wrong. the key thing here is that the time that it took to get to “parity” of around the same results for LFR went from 800k iterations to 40k.

a 20x boost.

and thats with shitty code.

for reference, the improvements in o1’s planning comes from the work of noam brown in planning. they found you can get a 15x reduction in number of steps needed – IF – you increase test time compute by 10x.

here i’m offering an algo that reduces by 20x with no strings attached (maybe some strings, idk). and again, thats with shity code.

“spaced repition is better than o1 search”

How does one know what was forgotten?

well if u read the methodology section and the “how do we know whats ‘hard’” section, which you’re considering to do now, which you won’t actually do, which i’ll have to explain briefly, you’d know that “forgotten” facts (e.g. 9+10=21, actual examples in A.2) are hard bc they are high in perplexity (the uncertainty of output being correct) even after having already been learned.

for the ooga booga brained amongst you, this can visually learned by seeing that the perplexity score (PPL, not BBL) randomly increases at a later training epochs (running the same dataset back). aka they forgot the answer and are uncertain again.

PPL Graph

Figure uno: Perplexities of data samples being forgotton by the GPT2-345M model. Spikes mean it forgot the answer and became uncertain again. Lower is better.

In the training set, roughly 25% of all data blocks (1024 tokens) are forgotten. and it is almost always the “hard” facts. and its almost always the same ones, even after already having been “relearned”.

cosine similarity heatmap
Figure dos: Cosine similarities of PPLs for each data block $i$. Higher = more similar.

This can be visually represented via a cosine similarity heatmap (how similar the uncertainty score was) for “hard” data blocks during phases 2 and 4 of pretraining for the GPT-2 124M model. A greater similarity in dropped data blocks over time (lighter color) indicates that it remained uncertain about similar data points.

feelings > facts

if u print out and go over all the texts dropped / retained in Phases 2 vs Phase 4 (first flashcard session vs $n^{th}$), and notice that text considered “easy” in phase 2 was more conversational, and those considered “easy” in phase 4 were more factual. (See examples in A.2)

This insinuates that the model first learned conversations and personal anecdotes, before being able to retain factual information. they’re just like us fr fr

ok what did we learn

☑️ GPTs that do spaced repetition can learn 20x faster
☑️ spaced repition is significantly better than o1 style planning/search
☑️ if o1 improved GPT4 by 50% (15/10), this could improve GPT4 by 2000%
☑️ GPTs forget about 25% of the data, mostly “hard” facts
☑️ “hard”-ness can be measured as perplexity score (how uncertain a model is about its output)
☑️ something data points are “hard” even after repeating their flashcard several times
☑️ GPT prioritizes learning how to handle conversations & anecdotes, then learns facts and esoteric information
☑️ AGI will not kill me bc i helped, right?

tldr dual-criterion perplexity + gradient sampling generalizes spaced repetition principles to large language model training.

Appendix

Section 1: Da code

from dataclasses import dataclass
from typing import List, Optional, Dict, Tuple, Union
import torch
import torch.nn as nn
import numpy as np
from torch.utils.data import DataLoader, Dataset
from transformers import GPT2LMHeadModel, GPT2Config
import math
from torch.optim import Adam
from torch.optim.lr_scheduler import LambdaLR

@dataclass
class LFRConfig:
    """Configuration for LFR training"""
    model_size: str  # One of: "124M", "355M", "774M", "1.5B"
    context_length: int = 1024
    total_iterations: int = 40000
    effective_batch_size: int = 512
    block_size: int = 1024
    weight_decay: float = 0.1
    adam_beta1: float = 0.9
    adam_beta2: float = 0.95
    warmup_iterations: int = 2000
    min_lr: float = 6e-5
    max_lr: float = 6e-4
    lr_decay_iters: int = 40000

    # LFR specific parameters
    alpha: float = 1.0  # Perplexity importance
    beta: float = 0.5   # Gradient importance
    p1_epochs: int = 1  # Initial learning phase
    p2_epochs: int = 1  # Concentrate phase
    p3_epochs: int = 1  # Repeat phase
    s1_percent: float = 50.0  # Percentage of data to retain
    n_cycles: int = 2  # Number of Concentrate-Repeat cycles

    def __post_init__(self):
        # Set model-specific parameters
        model_configs = {
            "124M": {"n_layer": 12, "n_head": 12, "n_embd": 768, "batch_size": 16, "grad_accum": 32},
            "355M": {"n_layer": 24, "n_head": 16, "n_embd": 1024, "batch_size": 16, "grad_accum": 32},
            "774M": {"n_layer": 36, "n_head": 20, "n_embd": 1280, "batch_size": 8, "grad_accum": 64},
            "1.5B": {"n_layer": 48, "n_head": 25, "n_embd": 1600, "batch_size": 4, "grad_accum": 128},
        }
        self.model_config = model_configs[self.model_size]

class PerplexityTracker:
    """Tracks and computes perplexity for data blocks"""

    def __init__(self, n_blocks: int):
        self.n_blocks = n_blocks
        self.block_ppls = torch.zeros(n_blocks)
        self.block_counts = torch.zeros(n_blocks)

    def update(self, block_ids: torch.Tensor, ppls: torch.Tensor):
        """Update running perplexity for blocks"""
        self.block_counts[block_ids] += 1
        self.block_ppls[block_ids] = ppls

    def get_block_perplexities(self) -> torch.Tensor:
        """Get current perplexity estimates for all blocks"""
        return self.block_ppls.clone()

class GradientTracker:
    """Tracks gradient magnitudes for data blocks"""

    def __init__(self, n_blocks: int):
        self.n_blocks = n_blocks
        self.grad_norms = torch.zeros(n_blocks)

    def update(self, block_ids: torch.Tensor, model: nn.Module):
        """Compute and store gradient norms for current batch"""
        grad_norm = torch.norm(torch.stack([
            p.grad.norm() for p in model.parameters() if p.grad is not None
        ]))
        self.grad_norms[block_ids] = grad_norm

def compute_nll(logits: torch.Tensor, targets: torch.Tensor, ignore_index: int = -100) -> torch.Tensor:
    """Compute negative log likelihood loss"""
    return nn.CrossEntropyLoss(ignore_index=ignore_index, reduction='none')(
        logits.view(-1, logits.size(-1)), targets.view(-1)
    )

def compute_perplexity(nll: torch.Tensor) -> torch.Tensor:
    """Compute perplexity from NLL values"""
    return torch.exp(nll.mean())

class LFRTrainer:
    def __init__(self, config: LFRConfig, model: nn.Module, train_dataloader: DataLoader):
        self.config = config
        self.model = model
        self.train_dataloader = train_dataloader

        # Initialize trackers
        self.ppl_tracker = PerplexityTracker(len(train_dataloader.dataset))
        self.grad_tracker = GradientTracker(len(train_dataloader.dataset))

        # Initialize optimizer and scheduler
        self.optimizer = Adam(
            model.parameters(),
            lr=config.max_lr,
            betas=(config.adam_beta1, config.adam_beta2),
            weight_decay=config.weight_decay
        )
        self.scheduler = self._create_scheduler()

    def _create_scheduler(self) -> LambdaLR:
        """Create learning rate scheduler with warmup and decay"""
        def lr_lambda(step):
            # Linear warmup
            if step < self.config.warmup_iterations:
                return float(step) / float(max(1, self.config.warmup_iterations))
            # Linear decay
            decay_ratio = (step - self.config.warmup_iterations) / (
                self.config.lr_decay_iters - self.config.warmup_iterations
            )
            return max(self.config.min_lr / self.config.max_lr, 1.0 - decay_ratio)

        return LambdaLR(self.optimizer, lr_lambda)

    def compute_sampling_probabilities(self) -> torch.Tensor:
        """Compute sampling probabilities for all blocks based on perplexity and gradients"""
        ppls = self.ppl_tracker.get_block_perplexities()
        grads = self.grad_tracker.grad_norms

        # Combined importance scores
        importance = (ppls ** self.config.alpha) * (grads ** self.config.beta)

        # Normalize to probabilities
        probs = importance / importance.sum()
        return probs

    def train_step(self, batch: Dict[str, torch.Tensor], block_ids: torch.Tensor) -> float:
        """Single training step"""
        self.optimizer.zero_grad()

        input_ids = batch["input_ids"]
        attention_mask = batch["attention_mask"]
        labels = batch["labels"]

        # Forward pass
        outputs = self.model(
            input_ids=input_ids,
            attention_mask=attention_mask,
            labels=labels
        )

        loss = outputs.loss / self.config.model_config["grad_accum"]

        # Backward pass
        loss.backward()

        # Update trackers
        with torch.no_grad():
            nll = compute_nll(outputs.logits, labels)
            ppl = compute_perplexity(nll)
            self.ppl_tracker.update(block_ids, ppl)
            self.grad_tracker.update(block_ids, self.model)

        # Gradient accumulation
        if (self.step + 1) % self.config.model_config["grad_accum"] == 0:
            torch.nn.utils.clip_grad_norm_(self.model.parameters(), 1.0)
            self.optimizer.step()
            self.scheduler.step()

        return loss.item()

    def train(self):
        """Main training loop implementing LFR"""
        self.step = 0

        # Phase 1: Initial learning
        for _ in range(self.config.p1_epochs):
            self._train_epoch(use_full_dataset=True)

        # Concentrate-Repeat cycles
        for _ in range(self.config.n_cycles):
            # Compute sampling probabilities
            probs = self.compute_sampling_probabilities()

            # Select subset of data for Concentrateed training
            n_retain = int(len(probs) * (self.config.s1_percent / 100))
            _, indices = torch.topk(probs, n_retain)

            # Phase 2: Concentrate phase
            for _ in range(self.config.p2_epochs):
                self._train_epoch(subset_indices=indices)

            # Phase 3: Repeat phase
            for _ in range(self.config.p3_epochs):
                self._train_epoch(use_full_dataset=True)

    def _train_epoch(self, use_full_dataset: bool = True, subset_indices: Optional[torch.Tensor] = None):
        """Train for one epoch"""
        dataloader = self.train_dataloader
        if not use_full_dataset and subset_indices is not None:
            # Create subset dataloader
            subset_dataset = torch.utils.data.Subset(
                self.train_dataloader.dataset, subset_indices.tolist()
            )
            dataloader = DataLoader(
                subset_dataset,
                batch_size=self.config.model_config["batch_size"],
                shuffle=True
            )

        for batch, block_ids in dataloader:
            loss = self.train_step(batch, block_ids)
            self.step + = 1

            if self.step >= self.config.total_iterations:
                return

do u trust claude?

Section 2: Examples

The learning here is that phase 2 flashcards are “conversations” and phase 4 are “hard” facts – things that are harder to leran even after several spaced repitions.

Phase 2	Phase 4
Become a fan of Slate on Facebook. Follow us on Twitter.The first time I crocheted a soccer ball was on the occasion of the 2010 World Cup. It was being held on the continent of Africa, and I thought the African Flower hexagon motif was the perfect vehicle for a crochet soccer ball celebrating the continent’s first time hosting the World Cup: This time around, instead of using all 9000 of my favorite colors, I limited myself to the colors of the flags of the thirty-two countries that had made it to the final rounds of the World Cup competition, and I did my best to incorporate the designs of their flags into the thirty-two hexagons and pentagons of a soccer ball.	ML-77 Missile Launcher: Based on existing technology, the ML-77 is a rapid-fire missile launcher using seeking projectiles. Each projectile features a friend-or-foe recognition system, ensuring it will find a hostile target even if the user’s aim is not completely accurate. The locking mechanism of the ML-77 allows the shooter to ignore cover and line of sight when shooting at locked on enemies, though an attack roll is still required. Locking on to an enemy requires a move action when the enemy is in line of sight and lasts for the rest of the encounter, or until a new target is locked.
In the book, the mythical California is ruled by Queen Califa and populated only with female warriors who brandish gold weapons. They even harness their animals in gold because it is the only mineral on the island. The legend of Califa and her island was well known among New World explorers. In 1536 when Hernán Cortéz arrived in Baja California, he believed he had landed on the legendary island. Over three hundred years later gold was discovered in California, making the legend partially true and earning the state its nickname: The Golden State	Segregated Witness, defined by Bitcoin Improvement Proposal 141 (BIP141), was deployed using an activation mechanism (BIP9) that requires 95 percent of all miners (by hash power) to signal support for the upgrade within the span of a two-week difficulty period. That’s at least 1916 blocks within 2016 blocks, to be exact. This threshold has just been reached. While the current difficulty period will not end until tomorrow, all blocks in this difficulty period are signaling support for the upgrade so far. This now totals over 1916 of them.
Unofficial reports claimed the car was powered by a 95kW 1.5-litre non-turbo petrol engine but Tada didn’t confirm. When asked what powers the S-FR Tada revealed he was considering three choices. “When you see the S-FR concept I suppose you imagine it is a 1.5-litre car but nowadays I can choose many kind of engines,” he explained. “Downsized turbo, 1.5-litre naturally aspirated and something additional as well. Now we are thinking which one is the best engine for a small sports car.” Tada also admitted that the company is unlikely to turn to a partner like it did with Subaru for the 86/BRZ or the new ’big brother’ sports car with BMW.	In April, MYIR released a Linux-powered MYS6ULX single board computer, which was notable for being available in two different versions using NXP’s low power, Cortex-A7 i.MX6 UltraLite (UL) or the more affordable, and almost identical i.MX6 ULL SoC. Now, MYIR has released an “MYB-6ULX Expansion Board” designed to stack onto either model. The $21.20 accessory adds a second 10100 Ethernet port to the MYS-6ULX, as well as new CAN, RS485, audio, micro-USB, RTC, and camera functions. MYB-6ULX Expansion Board with MYS-6ULX (left) and detail view (click images to enlarge). The MYB-6ULX Expansion Board has the same 70 x 55mm dimensions as the MYS-6ULX, which is available in two models: The i.MX6 UL based MYS-6ULX-IND has -40 to 85°C support instead of 0 to 70°C, and the i.MX6 ULL based MYS-6ULX-IOT features a USB-powered WiFi radio. The 4-layer expansion board runs on 5V power, and shares the industrial temperature support of the IND model.

† = sometimes the formula doesnt display correctly. this has a backup/second life
published: 1-3-2025