{"id":3093429,"date":"2026-02-17T08:44:14","date_gmt":"2026-02-17T16:44:14","guid":{"rendered":"https:\/\/techcrunch.com\/?p=3093429"},"modified":"2026-02-17T15:20:23","modified_gmt":"2026-02-17T23:20:23","slug":"running-ai-models-is-turning-into-a-memory-game","status":"publish","type":"post","link":"https:\/\/techcrunch.com\/2026\/02\/17\/running-ai-models-is-turning-into-a-memory-game\/","title":{"rendered":"Running AI models is turning into a memory game"},"content":{"rendered":"\n<p id=\"speakable-summary\" class=\"wp-block-paragraph\">When we talk about the cost of AI infrastructure, the focus is usually on Nvidia and GPUs \u2014 but memory is an increasingly important part of the picture. As hyperscalers prepare to build out billions of dollars&#8217; worth of new data centers, the price for DRAM chips has jumped <a href=\"https:\/\/datatrack.trendforce.com\/Chart\/content\/4694\/mainstream-dram-spot-price\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">roughly 7x in the last year<\/a>.<\/p>\n\n<p class=\"wp-block-paragraph\">At the same time, there&#8217;s a growing discipline in orchestrating all that memory to make sure the right data gets to the right agent at the right time. The companies that master it will be able to make the same queries with fewer tokens, which can be the difference between folding and staying in business.<\/p>\n\n<p class=\"wp-block-paragraph\"><a rel=\"nofollow\" href=\"https:\/\/www.fabricatedknowledge.com\/p\/another-conversation-with-val-bercovici\">Semiconductor analyst <\/a><a href=\"https:\/\/www.fabricatedknowledge.com\/p\/another-conversation-with-val-bercovici\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Doug O&#8217;Laughlin<\/a> has an interesting look at the importance of memory chips on his Substack, where he talks with Val Bercovici, chief AI officer at Weka. They&#8217;re both semiconductor guys, so the focus is more on the chips than the broader architecture; the implications for AI software are pretty significant too.<\/p>\n\n<p class=\"wp-block-paragraph\">I was particularly struck by this passage, in which Bercovici looks at the growing complexity of <a href=\"https:\/\/platform.claude.com\/docs\/en\/build-with-claude\/prompt-caching\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Anthropic&#8217;s prompt-caching documentation<\/a>:<\/p>\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\">The tell is if we go to Anthropic\u2019s prompt caching pricing page. It started off as a very simple page six or seven months ago, especially as Claude Code was launching \u2014 just \u201cuse caching, it\u2019s cheaper.\u201d Now it\u2019s an encyclopedia of advice on exactly how many cache writes to pre-buy. You\u2019ve got 5-minute tiers, which are very common across the industry, or 1-hour tiers \u2014 and nothing above. That\u2019s a really important tell. Then of course you\u2019ve got all sorts of arbitrage opportunities around the pricing for cache reads based on how many cache writes you\u2019ve pre-purchased.<\/p>\n<\/blockquote>\n\n<p class=\"wp-block-paragraph\">The question here is how long Claude holds your prompt in cached memory: You can pay for a 5-minute window, or pay more for an hour-long window. It&#8217;s much cheaper to draw on data that&#8217;s still in the cache, so if you manage it right, you can save an awful lot. There is a catch though: Every new bit of data you add to the query may bump something else out of the cache window.<\/p>\n\n<p class=\"wp-block-paragraph\">This is complex stuff, but the upshot is simple enough: Managing memory in AI models is going to be a huge part of AI going forward. Companies that do it well are going to rise to the top.<\/p>\n\n<p class=\"wp-block-paragraph\">And there is plenty of progress to be made in this new field. Back in October, I covered <a href=\"https:\/\/techcrunch.com\/2025\/10\/23\/tensormesh-raises-4-5m-to-squeeze-more-inference-out-of-ai-server-loads\/\">a startup called Tensormesh<\/a> that was working on one layer in the stack known as cache optimization.<\/p>\n<div class=\"wp-block-techcrunch-inline-cta\">\n\t<div class=\"inline-cta__wrapper\">\n\t\t<div class=\"inline-cta__logo\">\n\t\t\t<svg aria-hidden=\"true\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"41\" height=\"20\" fill=\"none\" viewBox=\"0 0 41 20\"><path fill=\"#fff\" d=\"M0 0v6.452h7.097V20h7.097V6.452h6.451V0zM27.742 13.548V6.452h-7.097V20h20v-6.452zM40.645 0H27.742v6.452h12.903z\"\/><\/svg>\t\t<\/div>\n\t\t<div class=\"inline-cta__flag\">Techcrunch event<\/div>\n\t\t<div class=\"inline-cta__content\">\n\t\t\t<div class=\"inline-cta__header-container\">\n\t\t\t\t<div class=\"inline-cta__header-container-desktop\">\n\t\t\t\t\t\t\t\t\t\t\t<h3 class=\"inline-cta__header has-h-5-font-size\">Save up to $300 or 30% to TechCrunch Founder Summit<\/h3>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<h4 class=\"inline-cta__subheader\">1,000+<strong> founders<\/strong> and investors come together at <strong>TechCrunch Founder Summit 2026<\/strong> for a full day focused on growth, execution, and real-world scaling. Learn from founders and investors who have shaped the industry. Connect with peers navigating similar growth stages. Walk away with tactics you can apply immediately.<br><br>Offer ends March 13.<\/h4>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<div class=\"inline-cta__header-container-mobile\">\n\t\t\t\t\t\t\t\t\t\t\t<h3 class=\"inline-cta__header has-h-5-font-size\">Save up to $300 or 30% to TechCrunch Founder Summit<\/h3>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<h4 class=\"inline-cta__subheader\">1,000+<strong> founders<\/strong> and investors come together at <strong>TechCrunch Founder Summit 2026<\/strong> for a full day focused on growth, execution, and real-world scaling. Learn from founders and investors who have shaped the industry. Connect with peers navigating similar growth stages. Walk away with tactics you can apply immediately<br><br>Offer ends March 13.<\/h4>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t\t<div class=\"inline-cta__event-info\">\n\t\t\t\t\t\t\t\t\t<span class=\"inline-cta__location\">Boston, MA<\/span>\n\t\t\t\t\t\t\t\t\t\t\t\t\t<span class=\"inline-cta__separator\">|<\/span>\n\t\t\t\t\t\t\t\t\t\t\t\t\t<span class=\"inline-cta__date\">June 9, 2026<\/span>\n\t\t\t\t\t\t\t<\/div>\n\t\t\t<div class=\"inline-cta__register-button\">\n\t\t\t\t\n\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button\"><a data-ctaText=\"REGISTER NOW\" data-destinationLink=\"https:\/\/techcrunch.com\/events\/techcrunch-founder-summit-2026\/?utm_source=tc&amp;utm_medium=ad&amp;utm_campaign=tcfoundersummit2026&amp;utm_content=seb&amp;promo=tc_inline_seb&amp;display=\" data-event=\"button\" class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/techcrunch.com\/events\/techcrunch-founder-summit-2026\/?utm_source=tc&amp;utm_medium=ad&amp;utm_campaign=tcfoundersummit2026&amp;utm_content=seb&amp;promo=tc_inline_seb&amp;display=\" target=\"_blank\" rel=\"noreferrer noopener\">REGISTER NOW<\/a><\/div>\n<\/div>\n\n\t\t\t<\/div>\n\t\t<\/div>\n\t<\/div>\n<\/div>\n\n<p class=\"wp-block-paragraph\">Opportunities exist in other parts of the stack. For instance, lower down the stack, there&#8217;s the question of how data centers are using the different types of memory they have. (The interview includes a nice discussion of when DRAM chips are used instead of HBM, although it&#8217;s pretty deep in the hardware weeds.) Higher up the stack, end users are figuring out how to structure their model swarms to take advantage of the shared cache.<\/p>\n\n<p class=\"wp-block-paragraph\">As companies get better at memory orchestration, they&#8217;ll use fewer tokens and inference will get cheaper. Meanwhile, <a href=\"https:\/\/ramp.com\/velocity\/ai-is-getting-cheaper\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">models are getting more efficient at processing each token<\/a>, pushing the cost down still further. As server costs drop, a lot of applications that don&#8217;t seem viable now will start to edge into profitability.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>When we talk about the cost of AI infrastructure, the focus is usually on Nvidia and GPUs &#8212; but memory is an increasingly important part of the picture.<\/p>\n","protected":false},"author":133574702,"featured_media":389168,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"tc-featured-article":false,"tc-header-option":"","tc-breaking-news":false,"tc-article-brief":false,"carmot_uuid":"","apple_news_api_created_at":"2026-02-17T16:44:19Z","apple_news_api_id":"ede4f65e-af62-4370-b11c-1ae43064d93e","apple_news_api_modified_at":"2026-02-17T23:19:05Z","apple_news_api_revision":"AAAAAAAAAAAAAAAAAAAAAQ==","apple_news_api_share_url":"https:\/\/apple.news\/A7eT2Xq9iQ3CxHBrkMGTZPg","apple_news_coverimage":0,"apple_news_coverimage_caption":"","apple_news_is_hidden":false,"apple_news_is_paid":false,"apple_news_is_preview":false,"apple_news_is_sponsored":false,"apple_news_maturity_rating":"","apple_news_metadata":"\"\"","apple_news_pullquote":"","apple_news_pullquote_position":"","apple_news_slug":"","apple_news_sections":[],"apple_news_suppress_video_url":false,"apple_news_use_image_component":false,"tc_subtitle":"","tc_featured_image_disabled":false,"tc_exclude_from_rss_feed":false,"tc_exclude_from_content_rivers":false,"footnotes":""},"categories":[577047203],"tags":[576886827,577122538,16404,22376,577370512],"tc_region":[],"tc_event":[],"tc_storyline_tax":[],"coauthors":[577368069],"class_list":["post-3093429","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artificial-intelligence","tag-anthropic","tag-claude","tag-dram","tag-exclusive","tag-inference-costs"],"apple_news_notices":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.1 (Yoast SEO v25.1) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Running AI models is turning into a memory game | TechCrunch<\/title>\n<meta name=\"description\" content=\"When we talk about the cost of AI infrastructure, the focus is usually on Nvidia and GPUs -- but memory is an increasingly important part of the picture.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/techcrunch.com\/2026\/02\/17\/running-ai-models-is-turning-into-a-memory-game\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Running AI models is turning into a memory game | TechCrunch\" \/>\n<meta property=\"og:description\" content=\"When we talk about the cost of AI infrastructure, the focus is usually on Nvidia and GPUs -- but memory is an increasingly important part of the picture.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/techcrunch.com\/2026\/02\/17\/running-ai-models-is-turning-into-a-memory-game\/\" \/>\n<meta property=\"og:site_name\" content=\"TechCrunch\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/techcrunch\" \/>\n<meta property=\"article:published_time\" content=\"2026-02-17T16:44:14+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-02-17T23:20:23+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/techcrunch.com\/wp-content\/uploads\/2009\/01\/samsung_50nm_dram.jpg?w=425\" \/>\n\t<meta property=\"og:image:width\" content=\"425\" \/>\n\t<meta property=\"og:image:height\" content=\"266\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Russell Brandom\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@TechCrunch\" \/>\n<meta name=\"twitter:site\" content=\"@TechCrunch\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Russell Brandom\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"NewsArticle\",\"@id\":\"https:\/\/techcrunch.com\/2026\/02\/17\/running-ai-models-is-turning-into-a-memory-game\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/techcrunch.com\/2026\/02\/17\/running-ai-models-is-turning-into-a-memory-game\/\"},\"author\":{\"name\":\"Russell Brandom\",\"@id\":\"https:\/\/techcrunch.com\/#\/schema\/person\/a29f23cab1a9fe3c68d4ccb024335871\"},\"headline\":\"Running AI models is turning into a memory game\",\"datePublished\":\"2026-02-17T16:44:14+00:00\",\"dateModified\":\"2026-02-17T23:20:23+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/techcrunch.com\/2026\/02\/17\/running-ai-models-is-turning-into-a-memory-game\/\"},\"wordCount\":577,\"publisher\":{\"@id\":\"https:\/\/techcrunch.com\/#organization\"},\"image\":{\"@id\":\"https:\/\/techcrunch.com\/2026\/02\/17\/running-ai-models-is-turning-into-a-memory-game\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/techcrunch.com\/wp-content\/uploads\/2009\/01\/samsung_50nm_dram.jpg\",\"keywords\":[\"Anthropic\",\"Claude\",\"dram\",\"Exclusive\",\"inference costs\"],\"articleSection\":[\"AI\"],\"inLanguage\":\"en-US\",\"copyrightYear\":\"2026\",\"copyrightHolder\":{\"@id\":\"https:\/\/techcrunch.com\/#organization\"}},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/techcrunch.com\/2026\/02\/17\/running-ai-models-is-turning-into-a-memory-game\/\",\"url\":\"https:\/\/techcrunch.com\/2026\/02\/17\/running-ai-models-is-turning-into-a-memory-game\/\",\"name\":\"Running AI models is turning into a memory game | TechCrunch\",\"isPartOf\":{\"@id\":\"https:\/\/techcrunch.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/techcrunch.com\/2026\/02\/17\/running-ai-models-is-turning-into-a-memory-game\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/techcrunch.com\/2026\/02\/17\/running-ai-models-is-turning-into-a-memory-game\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/techcrunch.com\/wp-content\/uploads\/2009\/01\/samsung_50nm_dram.jpg\",\"datePublished\":\"2026-02-17T16:44:14+00:00\",\"dateModified\":\"2026-02-17T23:20:23+00:00\",\"description\":\"When we talk about the cost of AI infrastructure, the focus is usually on Nvidia and GPUs -- but memory is an increasingly important part of the picture.\",\"breadcrumb\":{\"@id\":\"https:\/\/techcrunch.com\/2026\/02\/17\/running-ai-models-is-turning-into-a-memory-game\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/techcrunch.com\/2026\/02\/17\/running-ai-models-is-turning-into-a-memory-game\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/techcrunch.com\/2026\/02\/17\/running-ai-models-is-turning-into-a-memory-game\/#primaryimage\",\"url\":\"https:\/\/techcrunch.com\/wp-content\/uploads\/2009\/01\/samsung_50nm_dram.jpg\",\"contentUrl\":\"https:\/\/techcrunch.com\/wp-content\/uploads\/2009\/01\/samsung_50nm_dram.jpg\",\"width\":\"425\",\"height\":\"266\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/techcrunch.com\/2026\/02\/17\/running-ai-models-is-turning-into-a-memory-game\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/techcrunch.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Running AI models is turning into a memory game\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/techcrunch.com\/#website\",\"url\":\"https:\/\/techcrunch.com\/\",\"name\":\"TechCrunch\",\"description\":\"Startup and Technology News\",\"publisher\":{\"@id\":\"https:\/\/techcrunch.com\/#organization\"},\"alternateName\":\"TC\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/techcrunch.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/techcrunch.com\/#organization\",\"name\":\"TechCrunch\",\"alternateName\":\"TC\",\"url\":\"https:\/\/techcrunch.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/techcrunch.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/techcrunch.com\/wp-content\/uploads\/2018\/04\/tc-logo-2018-square-reverse2x.png?resize=1200,1200\",\"contentUrl\":\"https:\/\/techcrunch.com\/wp-content\/uploads\/2018\/04\/tc-logo-2018-square-reverse2x.png?resize=1200,1200\",\"width\":1200,\"height\":1200,\"caption\":\"TechCrunch\"},\"image\":{\"@id\":\"https:\/\/techcrunch.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/techcrunch\",\"https:\/\/x.com\/TechCrunch\",\"https:\/\/mstdn.social\/@TechCrunch\",\"https:\/\/bsky.app\/profile\/techcrunch.com\",\"https:\/\/www.threads.net\/@techcrunch\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/techcrunch.com\/#\/schema\/person\/a29f23cab1a9fe3c68d4ccb024335871\",\"name\":\"Russell Brandom\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/techcrunch.com\/#\/schema\/person\/image\/4ac6e41d7218ac0ef4b5333e2235098c\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/7f6ff1b797f0719f588fa9da04f01a705dfeb405dc1d880659b201c25ff6725c?s=96&d=identicon&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/7f6ff1b797f0719f588fa9da04f01a705dfeb405dc1d880659b201c25ff6725c?s=96&d=identicon&r=g\",\"caption\":\"Russell Brandom\"},\"url\":\"https:\/\/techcrunch.com\/author\/rbrandom\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Running AI models is turning into a memory game | TechCrunch","description":"When we talk about the cost of AI infrastructure, the focus is usually on Nvidia and GPUs -- but memory is an increasingly important part of the picture.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/techcrunch.com\/2026\/02\/17\/running-ai-models-is-turning-into-a-memory-game\/","og_locale":"en_US","og_type":"article","og_title":"Running AI models is turning into a memory game | TechCrunch","og_description":"When we talk about the cost of AI infrastructure, the focus is usually on Nvidia and GPUs -- but memory is an increasingly important part of the picture.","og_url":"https:\/\/techcrunch.com\/2026\/02\/17\/running-ai-models-is-turning-into-a-memory-game\/","og_site_name":"TechCrunch","article_publisher":"https:\/\/www.facebook.com\/techcrunch","article_published_time":"2026-02-17T16:44:14+00:00","article_modified_time":"2026-02-17T23:20:23+00:00","og_image":[{"url":"https:\/\/techcrunch.com\/wp-content\/uploads\/2009\/01\/samsung_50nm_dram.jpg?w=425","width":425,"height":266,"type":"image\/jpeg"}],"author":"Russell Brandom","twitter_card":"summary_large_image","twitter_creator":"@TechCrunch","twitter_site":"@TechCrunch","twitter_misc":{"Written by":"Russell Brandom","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"NewsArticle","@id":"https:\/\/techcrunch.com\/2026\/02\/17\/running-ai-models-is-turning-into-a-memory-game\/#article","isPartOf":{"@id":"https:\/\/techcrunch.com\/2026\/02\/17\/running-ai-models-is-turning-into-a-memory-game\/"},"author":{"name":"Russell Brandom","@id":"https:\/\/techcrunch.com\/#\/schema\/person\/a29f23cab1a9fe3c68d4ccb024335871"},"headline":"Running AI models is turning into a memory game","datePublished":"2026-02-17T16:44:14+00:00","dateModified":"2026-02-17T23:20:23+00:00","mainEntityOfPage":{"@id":"https:\/\/techcrunch.com\/2026\/02\/17\/running-ai-models-is-turning-into-a-memory-game\/"},"wordCount":577,"publisher":{"@id":"https:\/\/techcrunch.com\/#organization"},"image":{"@id":"https:\/\/techcrunch.com\/2026\/02\/17\/running-ai-models-is-turning-into-a-memory-game\/#primaryimage"},"thumbnailUrl":"https:\/\/techcrunch.com\/wp-content\/uploads\/2009\/01\/samsung_50nm_dram.jpg","keywords":["Anthropic","Claude","dram","Exclusive","inference costs"],"articleSection":["AI"],"inLanguage":"en-US","copyrightYear":"2026","copyrightHolder":{"@id":"https:\/\/techcrunch.com\/#organization"}},{"@type":"WebPage","@id":"https:\/\/techcrunch.com\/2026\/02\/17\/running-ai-models-is-turning-into-a-memory-game\/","url":"https:\/\/techcrunch.com\/2026\/02\/17\/running-ai-models-is-turning-into-a-memory-game\/","name":"Running AI models is turning into a memory game | TechCrunch","isPartOf":{"@id":"https:\/\/techcrunch.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/techcrunch.com\/2026\/02\/17\/running-ai-models-is-turning-into-a-memory-game\/#primaryimage"},"image":{"@id":"https:\/\/techcrunch.com\/2026\/02\/17\/running-ai-models-is-turning-into-a-memory-game\/#primaryimage"},"thumbnailUrl":"https:\/\/techcrunch.com\/wp-content\/uploads\/2009\/01\/samsung_50nm_dram.jpg","datePublished":"2026-02-17T16:44:14+00:00","dateModified":"2026-02-17T23:20:23+00:00","description":"When we talk about the cost of AI infrastructure, the focus is usually on Nvidia and GPUs -- but memory is an increasingly important part of the picture.","breadcrumb":{"@id":"https:\/\/techcrunch.com\/2026\/02\/17\/running-ai-models-is-turning-into-a-memory-game\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/techcrunch.com\/2026\/02\/17\/running-ai-models-is-turning-into-a-memory-game\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/techcrunch.com\/2026\/02\/17\/running-ai-models-is-turning-into-a-memory-game\/#primaryimage","url":"https:\/\/techcrunch.com\/wp-content\/uploads\/2009\/01\/samsung_50nm_dram.jpg","contentUrl":"https:\/\/techcrunch.com\/wp-content\/uploads\/2009\/01\/samsung_50nm_dram.jpg","width":"425","height":"266"},{"@type":"BreadcrumbList","@id":"https:\/\/techcrunch.com\/2026\/02\/17\/running-ai-models-is-turning-into-a-memory-game\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/techcrunch.com\/"},{"@type":"ListItem","position":2,"name":"Running AI models is turning into a memory game"}]},{"@type":"WebSite","@id":"https:\/\/techcrunch.com\/#website","url":"https:\/\/techcrunch.com\/","name":"TechCrunch","description":"Startup and Technology News","publisher":{"@id":"https:\/\/techcrunch.com\/#organization"},"alternateName":"TC","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/techcrunch.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/techcrunch.com\/#organization","name":"TechCrunch","alternateName":"TC","url":"https:\/\/techcrunch.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/techcrunch.com\/#\/schema\/logo\/image\/","url":"https:\/\/techcrunch.com\/wp-content\/uploads\/2018\/04\/tc-logo-2018-square-reverse2x.png?resize=1200,1200","contentUrl":"https:\/\/techcrunch.com\/wp-content\/uploads\/2018\/04\/tc-logo-2018-square-reverse2x.png?resize=1200,1200","width":1200,"height":1200,"caption":"TechCrunch"},"image":{"@id":"https:\/\/techcrunch.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/techcrunch","https:\/\/x.com\/TechCrunch","https:\/\/mstdn.social\/@TechCrunch","https:\/\/bsky.app\/profile\/techcrunch.com","https:\/\/www.threads.net\/@techcrunch"]},{"@type":"Person","@id":"https:\/\/techcrunch.com\/#\/schema\/person\/a29f23cab1a9fe3c68d4ccb024335871","name":"Russell Brandom","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/techcrunch.com\/#\/schema\/person\/image\/4ac6e41d7218ac0ef4b5333e2235098c","url":"https:\/\/secure.gravatar.com\/avatar\/7f6ff1b797f0719f588fa9da04f01a705dfeb405dc1d880659b201c25ff6725c?s=96&d=identicon&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/7f6ff1b797f0719f588fa9da04f01a705dfeb405dc1d880659b201c25ff6725c?s=96&d=identicon&r=g","caption":"Russell Brandom"},"url":"https:\/\/techcrunch.com\/author\/rbrandom\/"}]}},"parsely":{"version":"1.1.0","canonical_url":"https:\/\/techcrunch.com\/2026\/02\/17\/running-ai-models-is-turning-into-a-memory-game\/","smart_links":{"inbound":0,"outbound":0},"traffic_boost_suggestions_count":0,"meta":{"@context":"https:\/\/schema.org","@type":"NewsArticle","headline":"Running AI models is turning into a memory game","url":"https:\/\/techcrunch.com\/2026\/02\/17\/running-ai-models-is-turning-into-a-memory-game\/","mainEntityOfPage":{"@type":"WebPage","@id":"https:\/\/techcrunch.com\/2026\/02\/17\/running-ai-models-is-turning-into-a-memory-game\/"},"thumbnailUrl":"https:\/\/techcrunch.com\/wp-content\/uploads\/2009\/01\/samsung_50nm_dram.jpg?w=150","image":{"@type":"ImageObject","url":"https:\/\/techcrunch.com\/wp-content\/uploads\/2009\/01\/samsung_50nm_dram.jpg"},"articleSection":"AI","author":[{"@type":"Person","name":"Russell Brandom"}],"creator":["Russell Brandom"],"publisher":{"@type":"Organization","name":"TechCrunch","logo":"https:\/\/techcrunch.com\/wp-content\/uploads\/2015\/02\/cropped-cropped-favicon-gradient.png"},"keywords":["exclusive","dram","anthropic","claude","inference costs"],"dateCreated":"2026-02-17T16:44:14Z","datePublished":"2026-02-17T16:44:14Z","dateModified":"2026-02-17T23:20:23Z"},"rendered":"<meta name=\"parsely-title\" content=\"Running AI models is turning into a memory game\" \/>\n<meta name=\"parsely-link\" content=\"https:\/\/techcrunch.com\/2026\/02\/17\/running-ai-models-is-turning-into-a-memory-game\/\" \/>\n<meta name=\"parsely-type\" content=\"post\" \/>\n<meta name=\"parsely-image-url\" content=\"https:\/\/techcrunch.com\/wp-content\/uploads\/2009\/01\/samsung_50nm_dram.jpg?w=150\" \/>\n<meta name=\"parsely-pub-date\" content=\"2026-02-17T16:44:14Z\" \/>\n<meta name=\"parsely-section\" content=\"AI\" \/>\n<meta name=\"parsely-tags\" content=\"exclusive,dram,anthropic,claude,inference costs\" \/>\n<meta name=\"parsely-author\" content=\"Russell Brandom\" \/>","tracker_url":"https:\/\/cdn.parsely.com\/keys\/techcrunch.com\/p.js"},"jetpack_featured_media_url":"https:\/\/techcrunch.com\/wp-content\/uploads\/2009\/01\/samsung_50nm_dram.jpg","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/techcrunch.com\/wp-json\/wp\/v2\/posts\/3093429","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/techcrunch.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techcrunch.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techcrunch.com\/wp-json\/wp\/v2\/users\/133574702"}],"replies":[{"embeddable":true,"href":"https:\/\/techcrunch.com\/wp-json\/wp\/v2\/comments?post=3093429"}],"version-history":[{"count":7,"href":"https:\/\/techcrunch.com\/wp-json\/wp\/v2\/posts\/3093429\/revisions"}],"predecessor-version":[{"id":3093798,"href":"https:\/\/techcrunch.com\/wp-json\/wp\/v2\/posts\/3093429\/revisions\/3093798"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/techcrunch.com\/wp-json\/wp\/v2\/media\/389168"}],"wp:attachment":[{"href":"https:\/\/techcrunch.com\/wp-json\/wp\/v2\/media?parent=3093429"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techcrunch.com\/wp-json\/wp\/v2\/categories?post=3093429"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techcrunch.com\/wp-json\/wp\/v2\/tags?post=3093429"},{"taxonomy":"tc_region","embeddable":true,"href":"https:\/\/techcrunch.com\/wp-json\/wp\/v2\/tc_region?post=3093429"},{"taxonomy":"tc_event","embeddable":true,"href":"https:\/\/techcrunch.com\/wp-json\/wp\/v2\/tc_event?post=3093429"},{"taxonomy":"tc_storyline_tax","embeddable":true,"href":"https:\/\/techcrunch.com\/wp-json\/wp\/v2\/tc_storyline_tax?post=3093429"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/techcrunch.com\/wp-json\/wp\/v2\/coauthors?post=3093429"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}