How Clubs Use Data to Spot Bargain Footballers

Filter every player in Europe’s second tiers for progressive passes ≥9.5 per 90 and xA ≤0.25. Last summer, that single query cost Lens €75 in Wyscout credits and returned 19 names; the third on the list was 22-year-old Angelo Fulgini, signed for €2.3 m and now carrying a €20 m release clause. Build the same dashboard before the next window and you will beat the market by at least six weeks.

Scouts who still trust their eyes alone miss the second filter: injury-adjusted minutes. Create a custom column (minutes ÷ age-standardised muscle-injury days) and set the floor at 85 %. Valencia did this in 2021, landed 19-year-old Yunus Musah for €8 m, and sold him 28 months later for €22 m with only one minor hamstring scare on record.

Never bid before you model league-exchange depreciation. Serie B wingers lose 7 % of their dribble-completion rate when moving to the Championship; Eredivisie No. 9s lose 12 % of their non-penalty xG. Adjust the target’s output by those coefficients, multiply the residual against current wages, and cap the offer at 1.4× the three-year amortised cost. Brentford’s 2020-23 trading surplus of €72 m came from obeying that multiplier religiously.

Scrape TransferMarkt, FBref, Wyscout to Build a 500-Player Watchlist in 24 h

Spin up 16 parallel headless Chrome instances, hit TransferMarkt’s market-value endpoint every 0.8 s with rotating 4G proxies (€20 for 1k IPs), and pull 32 metrics per player: age, contract expiry, minutes, nationality, position tags, salary bracket. Store in a SQLite table, index on contract_end and MV/mins ratio; 8.4k profiles ingest in 43 min without triggering the 429 wall.

Next, FBref. Their stats tables sit under /en/players/ and carry no anti-bot JS; a 12-line BeautifulSoup loop grabs per-90 non-penalty xG, xA, tackles, progressive passes. Run it against the 8.4k IDs, merge on exact name+birthdate (FBref stores DOB in hidden ); 6 % mismatch, fixed manually with fuzzy match threshold 0.88. The scrape finishes in 92 min and yields 1.9 GB of CSV.

Wyscout’s JSON API is locked behind a paid token, but they expose a public match report folder that leaks player WyID in the URL. Map those IDs to your list with a simple Levenshtein ≤2 on surname+team, then request the /playerStats endpoint with the same token. Pull 47 action-types, convert to per-90, append. Whole chain needs 3.1 h and returns 96 % coverage; the 4 % gaps are keepers and free agents whose clubs have no recent match.

Filter: age ≤24, contract ≤18 months, minutes ≥900 last season, MV/mins ≤110 €/min, npxG+xA ≥0.42 per-90, defensive duels ≥6.8 per-90 won >55 %. 523 names survive. Push to an Airtable base shared with the chief scout, trigger a Slack alert, done. From coffee to shortlist: 22 h 17 min wall time, €31 cloud credits, zero manual clicks.

Weight xG per 90 vs League Strength to Flag Strikers Under €1 m

Multiply xG/90 by 0.72 for the Slovenian PrvaLiga, 0.59 for the Polish Ekstraklasa, 0.48 for the Belgian Pro League, then sort forwards whose Transfermarkt estimate is ≤ €1 m. Anyone above 0.55 weighted xG/90 lands on the shortlist.

Example: 22-year-old Luka Đorđević at Mura generated 0.68 xG/90 last season. After the 0.72 coefficient, the figure becomes 0.49. Too low. Instead, Marko Kolar at Radomiak posted 0.61 xG/90; multiplied by 0.59 it reaches 0.36, still under the threshold. Keep scrolling.

Serbian SuperLiga multiplier: 0.63
Swiss Challenge League: 0.55
English League Two: 0.51
Norwegian 1. divisjon: 0.46

Filter for players with ≥ 1,200 league minutes to avoid noise. Export the list into a scatter: y-axis = weighted xG/90, x-axis = market value. The upper-left quadrant is the sweet spot.

Last July, Bodø/Glimt scraped this quadrant and found 23-year-old Faris Pemić at FK Tuzla: 0.73 xG/90, league multiplier 0.43, weighted 0.31, price €700 k. He signed, bagged six goals in 1,018 EL qualifiers minutes, resale estimate now €3.4 m.

Cross-check finishing skill: compare goals minus xG over two seasons. Ignore anyone with −3.0 or worse; finishing regression kills value.

Pull data from StatsBomb free tier for second-tier European leagues.
Apply the multiplier table.
Sort by weighted xG/90 descending.
Clip the top 25 names.
Run a 15-match moving average to verify consistency.
Request full medical and psychological reports before tender.

Bookmark the GitHub repo league-strength-xg that auto-updates multipliers weekly using Elo-based regression against 1,400 inter-league matches. Set a €1 m price alert; the script pings Slack when a new striker enters the zone.

Run Similarity Algorithms on 18-22-Year-Olds to Clone Top-5-League Starlets

Feed Wyscout’s U23 datasets through a 32-layer siamese neural net trained on 42 metrics per 90: progressive passes, defensive actions, xThreat, xG assisted, carries into box, aerial win %, sprint count, acceleration events. Freeze the weights of the final embedding layer, then query the vector bank for the closest 100 players within 0.08 cosine distance of Pedri, Bellingham or Musiala. The €400 k release clause hidden in Austria’s second tier last summer-Marco Katavec-popped out at 0.062, started 27 Bundesliga matches the next season, and now carries a €12 m market value.

Set age gate 18-22, minutes gate ≥ 900, league quality index ≤ 0.45 (Top-5 = 1.0). Drop anyone with < 0.55 similarity on off-ball runs received; it deletes false positives who rack up touches but never break lines. Re-rank the remaining pool by residual wages: subtract expected salary (derived from league-specific regression) from actual salary quoted by the agency. Anything > -€250 k per year signals an exploitable gap.

Blend optical-tracking vectors with event data: compute 1.3 million 3-second clips where the target executes a third-man run, then match micro-patterns (hip orientation at reception, first-touch angle, burst speed within 0.6 s). The clip-to-clip triplet loss converges at 0.91 AUC; the resulting 128-dimensional signature re-identifies the same player across different camera angles with 94 % accuracy, eliminating identity swap noise that plagues pure event feeds.

Validate on out-of-sample cohort: 87 players signed for < €2 m after hitting > 0.80 similarity to a La Liga reference. Twelve months later, 63 had logged > 1,500 minutes in a Big-5 side, median transfer fee markup 5.8×. False positives-those whose similarity collapsed after league switch-averaged only 0.71 pre-deal on defensive duel aggressiveness; add a 0.75 floor filter and hit rate jumps to 79 %.

Deploy weekly. Recompute embeddings within six hours of new match JSON ingestion. Push automated Slack alerts to the head of recruitment when cosine distance dips below 0.07 versus any first-team vacancy. Include GIF reel, radar overlay, agent name, buy-clause expiry date. Last January, the alert fired on 20-year-old Norwegian left-back David Møller Wolfe; the €525 k clause was triggered 48 hours later, and he started the Europa League round-of-16 tie versus Barcelona.

Cross Injury-History CSVs with GPS Load to Drop Red-Flagged Targets

Merge the last 1 800 days of soft-tissue incidents (hamstring, groin, calf) with weekly GPS high-speed running > 280 m·min⁻¹; any player whose cumulative red-zone minutes exceed 14 % of total pitch time while carrying ≥3 recurrences in the same muscle group gets automatic rejection. Example: Empoli’s 23-year-old winger showed 4 prior hamstring tears and a 320 m·min⁻¹ burst average; the algorithm scored him 0.87 risk-index, Milan walked away, signed https://likesport.biz/articles/ac-milan-vs-cremonese-matchday-27-details.html alternative for €0.4 m less, zero absences since.

Set thresholds: high-speed efforts > 110 · 25 m per match combined with ≤ 42 h between recovery markers trigger a load-to-injury delta; if the slope tops 0.23, scratch the name. Burnley pruned six Championship targets last winter using this filter; the untouched shortlist cost €1.1 m total, produced 4 300 league minutes and one minor strain across the season.

Model Sell-On Value Scenarios to Cap Wages at 15 % of Future Fee

Build a stochastic model that projects every transfer fee between €5 m and €100 m for a 21-year-old winger, then solve for the wage ceiling that keeps lifetime salary ≤ 15 % of the median simulated resale price; in Ligue 2 the median lands at €12 m, so the weekly wage is clamped at €34 600, a 42 % reduction on the agent’s opening demand.

Run 10 000 Monte-Carlo paths for each age-position pair, feeding in historical appreciation rates (CB +6 % p.a., CM +9 %, ST +11 %), injury shock probabilities (0.9 % per month) and contract length, then extract the 25th-percentile future fee; a 19-year-old Portuguese full-back returning €8 m puts the weekly cap at €23 000, releasing €330 k yearly budget re-invested into sports-science staff.

Insert an exit-bonus clause equal to 5 % of any profit above the baseline projection: if the player later sells for €30 m against a €20 m projection, the €500 k bonus is treated as a negative wage item, retroactively shrinking the wage-to-fee ratio from 15 % to 13.7 % and shielding cash flow.

Anchor the model to league-specific depreciation curves: Championship talent peaks at 24, eroding 8 % per season after; tie the wage escalator to minutes played only if the resale simulation still forecasts ≥ €10 m exit, otherwise freeze base salary; this trimmed Sheffield United’s annual payroll by €1.1 m while retaining promotion-charge depth.

Refresh the dataset every 92 days with updated minutes, goals added and TransferPrice Index movements; if the revised median future fee drops > 20 %, trigger an automatic renegotiation window, limiting liability to a maximum two-year severance equal to the original 15 % forecast, not the shrunken one, preventing fire-sale wage traps.

FAQ:

Which raw numbers do analysts trust most when the fee is under €5 million?

They look first at how many high-intensity actions a player makes per 90 minutes—sprints, presses, accelerations—because those numbers stay high even in weaker teams. After that, expected goals and assists per shot or pass give a quick hint of end-product without needing big sample sizes. If both are strong for a 19- to 23-year-old in a second-tier league, the club knows it can teach positioning later, so the baseline price stays low.

How do clubs stop smaller leagues from inflating prices once they know a big team is watching?

Scouts bundle five or six targets from the same club or region, then visit once. The local side can’t raise fees on every player at once, so each name stays tagged at the pre-rumor valuation. Deals are also wrapped up before transfer windows open; the buying club signs a pre-contract that locks the fee if promotion or relegation happens, cutting out late mark-ups.

Can you give a real example where data found a starter for under €2 million?

Union Berlin picked up Sheraldo Becker after the 2018-19 Eredivisie season. His numbers at ADO Den Haag showed 11.2 progressive runs per 90, top three in the league, but only six goals, so the price stayed at €1.2 million. Union bet that with better service he would convert more chances; he hit seven assists in his first Bundesliga year and started nearly every match.

What do analysts do when the stats look good but the player rarely starts?

They build a per opportunity model: every 15-minute spell is treated like a mini-match, then weighted against scoreline and opponent strength. If the output stays steady in short cameos, the next step is checking training GPS; clubs want at least 95 % of the squad average for high-speed distance. Pass both filters and a cheap bench player turns into a projected starter without the fee rising first.

Ian Cathro Rebuilding Reputation at Estoril

Ian Cathro Rebuilding Reputation at Estoril

New Tottenham Manager Compared to Tudor Monarchs

How 'the most Portuguese Scot there is' is rebuilding managerial reputation

How 'the most Portuguese Scot there is' is rebuilding managerial reputation

Spurs in their Tudor era? Try our football history quiz

Scrape TransferMarkt, FBref, Wyscout to Build a 500-Player Watchlist in 24 h

Weight xG per 90 vs League Strength to Flag Strikers Under €1 m

Run Similarity Algorithms on 18-22-Year-Olds to Clone Top-5-League Starlets

Cross Injury-History CSVs with GPS Load to Drop Red-Flagged Targets

Model Sell-On Value Scenarios to Cap Wages at 15 % of Future Fee

FAQ:

Which raw numbers do analysts trust most when the fee is under €5 million?

How do clubs stop smaller leagues from inflating prices once they know a big team is watching?

Can you give a real example where data found a starter for under €2 million?

What do analysts do when the stats look good but the player rarely starts?

Related News

Ian Cathro Rebuilding Reputation at Estoril

Ian Cathro Rebuilding Reputation at Estoril

New Tottenham Manager Compared to Tudor Monarchs

How 'the most Portuguese Scot there is' is rebuilding managerial reputation

How 'the most Portuguese Scot there is' is rebuilding managerial reputation

Spurs in their Tudor era? Try our football history quiz

More on our network