{ "type": [ "h-entry" ], "properties": { "name": [ "Offline listings" ], "published": [ "2019-09-19T20:00:03Z" ], "category": [ "offline", "cache" ], "bookmark-of": [ { "type": [ "h-cite" ], "properties": { "url": [ "https://remysharp.com/2019/09/05/offline-listings" ], "name": [ "Offline listings" ], "uid": [ "https://remysharp.com/2019/09/05/offline-listings" ], "published": [ "2019-09-05 00:00:00" ], "content": [ { "value": "
The other week I finally pushed full offline access to my blog. I'd taken a lot of inspiration from Jeremy Keith's service worker from his blog.
\nOne defining feature I wanted to support was that if you were offline and visited that page isn't cached, I wanted to list recent blog posts you had visited.
\n\nMY WORKSHOPMaster Next.js
\nEverything you need to master universal React with Next.js in a single intense master class. Includes full pass to ffconf, the web developer conf.
\n£449+VAT - only from this link
\nIf you're a regular visitor of this blog then my service worker (only deployed in the last few weeks) will collect those posts you visit in a dedicated cache. If you then try to visit a URL that hasn't been cached, say a post or page like popular posts (and so on) you'll be presented with a page saying that the page isn't available offline but you can re-visit an existing post:
\n\nIn the service worker this is handled by the following lines:
\nself.addEventListener('fetch', event => {\n /* this logic is trimmed down for brevity */\n\n // only for HTML based requests\n if (request.headers.get('Accept').includes('text/html')) {\n event.respondWith(\n fetch(request) // try the network first method\n .then(response => {\n // if we have a 200 success, cache the result\n // in a cache called \"v1/pages\"\n if (response.status === 200) {\n caches\n .open('v1/pages')\n .then(cache => cache.put(request, response.clone()));\n }\n return response;\n }).catch(() => {\n // the catch fires if we're offline, so first we try the\n // cache for a match, and if `response` is empty (or null)\n // return the `/offline` page instead.\n return caches\n .match(request)\n .then(response => response || caches.match('/offline'));\n\n }) // end fetch\n ); // end event.respondWith\n return;\n }\n})\n
\nHowever, the interesting part is how we retrieve the recently visited posts.
\nWhen I'd chatted to Jeremy about his offline/recently visited page I realised since the cache API is only for requests and responses, the metadata required for a history page (such as post title) would have to be stored elsewhere. Jeremy (IIRC) stores his metadata in localStorage
.
When I took my first stab at an implementation I used IndexedDB (along with Jake Archibald's idb keyval script). Then each page you visit needs to include the metadata about the post, which was adding a little more complexity to the problem.
\nUntil, I realised I didn't need to store anything. HTML is the API.
\nInstead of capturing metadata, my posts, themselves in the markup, includes all the metadata about the post. So here's the logic without any additional store:
\nv1/pages
cacherequest.url
await cache.match(request).then(res => res.text())
(.*)
text
tagIn you're concerned that using a regex is brittle, the HTML could be put inside a DOM parser and queried out again. You can see that idea in action here (open the browser console) using code such as:
\nconst p = new DOMParser();\nconst dom = p.parseFromString(html, 'text/html');\nconsole.log(dom.querySelector('time').getAttribute('datetime'));\n
\nFor my offline listings code, the actual code looks like this:
\nasync function listPages() {\n // since my cache names are versioned, look for the one that\n // includes \"/posts\"\n const cacheNames = await caches.keys();\n\n // results is recently visited blog posts\n const results = [];\n\n for (const name of cacheNames) {\n if (name.includes('/posts')) {\n const cache = await caches.open(name);\n\n // get a list of all the entries (keys are requests)\n for (const request of await cache.keys()) {\n const url = request.url;\n\n // this regex gets both the publish date of the post,\n // but also ensures the URL is a blog post\n const match = url.match(/\\/(\\d{4})\\/(\\d{2})\\/(\\d{2})\\//);\n\n if (match) {\n const response = await cache.match(request);\n\n // capture the plain text HTML\n const body = await response.text();\n\n // regex for the title of the post\n const title = body.match(/(.*)<\\/title>/ )[1];\n results.push({\n url,\n response,\n title,\n // published date is from the URL\n published: new Date(match.slice(1).join('-')),\n // last visited is the `date` prop in the response header\n visited: new Date(response.headers.get('date'))\n });\n }\n }\n }\n }\n\n // now display the results\n if (results.length) {\n // sort the results, map each result to an tag and put \n // in the `ul#offline-posts` element\n document.querySelector('ul#offline-posts').innerHTML = results\n .sort((a, b) => a.published.toJSON() < b.published.toJSON() ? 1 : -1)\n .map(res => {\n // results in:\n // [ Title ] [pubDate] (visited X days ago) \n let html = `${res.url}\">${\n res.title\n } ${formatDate(\n res.published\n )} ${res.visited.toString()}\">(visited ${daysAgo(\n res.visited\n )})`;\n return html;\n })\n .join('\\n');\n }\n}\n
\nThe /offline
page is going to do a bit of JavaScript, scraping text out of cached pages to show you recently browsed results. At first I felt like this may be a lot of work for the browser to be doing, but since it only happens in exceptional circumstances and in reality it takes a handful of milliseconds, the improved user experience is worth this (relatively) small hit.
Oh, and as it happens, this page is now in your recently visited list :)
", "html": "The other week I finally pushed full offline access to my blog. I'd taken a lot of inspiration from Jeremy Keith's service worker from his blog.
\nOne defining feature I wanted to support was that if you were offline and visited that page isn't cached, I wanted to list recent blog posts you had visited.
\n\nMY WORKSHOPMaster Next.js
\nEverything you need to master universal React with Next.js in a single intense master class. Includes full pass to ffconf, the web developer conf.
\n£449+VAT - only from this link
\nIf you're a regular visitor of this blog then my service worker (only deployed in the last few weeks) will collect those posts you visit in a dedicated cache. If you then try to visit a URL that hasn't been cached, say a post or page like popular posts (and so on) you'll be presented with a page saying that the page isn't available offline but you can re-visit an existing post:
\n\nIn the service worker this is handled by the following lines:
\nself.addEventListener('fetch', event => {\n /* this logic is trimmed down for brevity */\n\n // only for HTML based requests\n if (request.headers.get('Accept').includes('text/html')) {\n event.respondWith(\n fetch(request) // try the network first method\n .then(response => {\n // if we have a 200 success, cache the result\n // in a cache called \"v1/pages\"\n if (response.status === 200) {\n caches\n .open('v1/pages')\n .then(cache => cache.put(request, response.clone()));\n }\n return response;\n }).catch(() => {\n // the catch fires if we're offline, so first we try the\n // cache for a match, and if `response` is empty (or null)\n // return the `/offline` page instead.\n return caches\n .match(request)\n .then(response => response || caches.match('/offline'));\n\n }) // end fetch\n ); // end event.respondWith\n return;\n }\n})\n
\nHowever, the interesting part is how we retrieve the recently visited posts.
\nWhen I'd chatted to Jeremy about his offline/recently visited page I realised since the cache API is only for requests and responses, the metadata required for a history page (such as post title) would have to be stored elsewhere. Jeremy (IIRC) stores his metadata in localStorage
.
When I took my first stab at an implementation I used IndexedDB (along with Jake Archibald's idb keyval script). Then each page you visit needs to include the metadata about the post, which was adding a little more complexity to the problem.
\nUntil, I realised I didn't need to store anything. HTML is the API.
\nInstead of capturing metadata, my posts, themselves in the markup, includes all the metadata about the post. So here's the logic without any additional store:
\nv1/pages
cacherequest.url
await cache.match(request).then(res => res.text())
<title>(.*)</title>
text<time>
tagIn you're concerned that using a regex is brittle, the HTML could be put inside a DOM parser and queried out again. You can see that idea in action here (open the browser console) using code such as:
\nconst p = new DOMParser();\nconst dom = p.parseFromString(html, 'text/html');\nconsole.log(dom.querySelector('time').getAttribute('datetime'));\n
\nFor my offline listings code, the actual code looks like this:
\nasync function listPages() {\n // since my cache names are versioned, look for the one that\n // includes \"/posts\"\n const cacheNames = await caches.keys();\n\n // results is recently visited blog posts\n const results = [];\n\n for (const name of cacheNames) {\n if (name.includes('/posts')) {\n const cache = await caches.open(name);\n\n // get a list of all the entries (keys are requests)\n for (const request of await cache.keys()) {\n const url = request.url;\n\n // this regex gets both the publish date of the post,\n // but also ensures the URL is a blog post\n const match = url.match(/\\/(\\d{4})\\/(\\d{2})\\/(\\d{2})\\//);\n\n if (match) {\n const response = await cache.match(request);\n\n // capture the plain text HTML\n const body = await response.text();\n\n // regex for the title of the post\n const title = body.match(/<title>(.*)<\\/title>/)[1];\n results.push({\n url,\n response,\n title,\n // published date is from the URL\n published: new Date(match.slice(1).join('-')),\n // last visited is the `date` prop in the response header\n visited: new Date(response.headers.get('date'))\n });\n }\n }\n }\n }\n\n // now display the results\n if (results.length) {\n // sort the results, map each result to an <li> tag and put\n // in the `ul#offline-posts` element\n document.querySelector('ul#offline-posts').innerHTML = results\n .sort((a, b) => a.published.toJSON() < b.published.toJSON() ? 1 : -1)\n .map(res => {\n // results in:\n // <li><a href=\"…\">[ Title ] <small>[pubDate] (visited X days ago)</small></a></li>\n let html = `<li><a href=\"${res.url}\">${\n res.title\n }</a> <small class=\"date\">${formatDate(\n res.published\n )} <span title=\"${res.visited.toString()}\">(visited ${daysAgo(\n res.visited\n )})</span></small></li>`;\n return html;\n })\n .join('\\n');\n }\n}\n
\nThe /offline
page is going to do a bit of JavaScript, scraping text out of cached pages to show you recently browsed results. At first I felt like this may be a lot of work for the browser to be doing, but since it only happens in exceptional circumstances and in reality it takes a handful of milliseconds, the improved user experience is worth this (relatively) small hit.
Oh, and as it happens, this page is now in your recently visited list :)
" } ] } } ], "content": [ "Caching pages for offline use with Service Workers" ] } }