Generating SVG sprites (the easy way…)

Our current approach

If you’re building user interfaces then there’s a very good chance you’re going to need some icons at some point. I’ve been building web things for 10+ years now (has introspective moment as she realises this…) and the go-to method I’ve used to manage icons has changed several times, each one aging alongside me (heads to mirror to check for wrinkles…).

In the early days, I used image sprites – carefully laying out each icon and it’s different size options/hover states within a single PNG file. I’d then manually craft the CSS rules to ensure each icon class used the correct dimensions and background position values to match the sprite. Tedious.

From there I discovered icon fonts. Up until a few years ago, we still used Fontello to generate and maintain our icon kits for each project. Slightly less tedious but still a somewhat manual process to update (as well as still being a slightly problematic approach).

Finally, I stumbled upon a blog post by Chris Coyier about SVG sprites a few years back. I trialled a couple of online tools for generating sprites from individual SVG files before settling on Fontastic. We started setting up a new kit on Fontastic for each new project but the process of adding new icons and re-exporting the SVG sprite after each update still felt a little too laborious.

Issues with using an online SVG Sprite tool

Using Fontastic allowed us to ditch icon fonts but the workflow still wasn’t quite right for us. Even though updating the kit and re-exporting the SVG sprite was usually a 10 minute job, it was a 10 minute job that no one wanted to do (especially as it needed doing reasonably frequently).

Another issue we faced was how to manage the logins for Fontastic. Do we create a separate login per client or just have one account with separate icon kits for each project? Either way, it just didn’t seem optimal relying on a third-party tool to manage our clients’ assets – especially in the event that a client chose to move the project in-house or to another provider. Handing over an SVG file that would need to be extracted out and reintegrated next time it needed updating would be a bit of a jerk move.

SVG Sprite Generation Options

So what are the options for generating SVG sprites? Obviously there are services like Fontastic which are basically online tools to manage your individual icon files and export as your preferred format. We used Fontastic but other services such as IcoMoon and Fontello offer similar functionality.

The more obvious (and probably more popular) option is to use a package like grunt-svg-sprite or gulp-svg-sprite as part of your Grunt or Gulp build process (Webpack probably has similar options in this vein also). This would probably be our chosen option if we worked with a consistent stack across all projects and could bake an SVG sprite task into our default setup (more on this below).

Finally, you could look to leverage a JavaScript package like svg-sprite to generate an updated version of your SVG sprite on the fly via the command line (using a script in your package.json file).

The right approach for Media Suite…

Before I describe the approach we’ve settled on, it’s probably important to outline the requirements of our solution as they may very well differ from yours (and thus not be the best solution in your case).

For a start, we still need to support Internet Explorer 11 for most of our projects. We work with a lot of government departments and organisations which still require wide-reaching browser support. This doesn’t actually affect which method we choose for generation but it does mean we need to consider how we include the SVG icons in our HTML (more on this below).

The other key factor is that our projects don’t always align with a consistent stack. The nature of the work we do means that we either don’t get to dictate the stack or we have to make a per-project decision regarding which stack is best suited to solving a particular problem (given constraints such as time/money, user base, government requirements, integrations with other software etc).

This means we sometimes work on projects using content management systems like WordPress and SilverStripe, we sometimes build bespoke software using SPAs like React, Ember and Angular or sometimes we’re putting together a custom prototype as a proof of concept.

The point I’m making is that we don’t have a one-size-fits-all approach that we can roll out with each new project. The way we systemise our workflow needs to be a bit more granular so that we can put together a recipe that improves efficiency, quality and maintainability for each individual use-case.

Our Solution (or the TLDR;)

As mentioned above, the svg-sprite package includes a CLI version which is pretty straightforward to get up and running (see the CLI usage documentation for more details).

1. Install svg-sprite package

The first step is to install the svg-sprite package in your project (either via NPM or Yarn) like so:

Note: if you don’t have a package.json file yet, run npm init (or yarn init) before installing svg-sprite.

2. Add a task to your package.json scripts

The goal is to keep all your individual SVG icon files stored in one directory and be able to automatically compile these into a single SVG sprite file on the fly via the command line.

We can do this with the CLI options provided by the svg-sprite package but to make things even easier, we can add an ‘update-sprite’ task to our package.json scripts. This way we don’t need to remember the config options each time we want to run it. Hooray!

svg-sprite is super configurable so you can tweak it to output things in a format that suits you. For our projects, we just want to take all the individual SVG files currently sitting in our icons directory and spit them out as an SVG sprite using the <symbol> element.

For simplicity’s sake, let’s assume the following basic file structure for these examples:

We want to combine the heart, cat and arrow SVG files into a single SVG sprite file called icons.svg and output that in the icons directory (alongside the individual-icons directory).

To do this, we add the following task to our package.json file scripts:

You can review the CLI options documentation for more details but here’s a quick breakdown of what each option does:

  • -s: run svg-sprite in symbol mode (several other modes are available but not what we’re after)
  • –symbol-dest: location to output the generated SVG sprite file
  • –symbol-sprite: filename for generated SVG sprite file
  • icons/individual-icons/*.svg: this is the location of the individual sprite files

3. Output icons in index.html

To use our new SVG sprite, we should be able to add the following to our index.html file:

The above should output the individual icon specified after the hash symbol (see Chris Coyier’s article regarding SVG <use> with External Reference for more information about including inline SVGs this way).

Only hurdle left is… IE11 🙂

4. Polyfill for Internet Explorer support

To utilise <use> with an external reference in Internet Explorer you’re going to need Jonathan Neal’s SVG for Everybody script.

First, install the package like so:

Then include and initialise it in your HTML like so:

That’s it – your icons should be working in all major browsers now!

Taking it further

That’s the most basic implementation using svg-sprite but you can definitely extend the functionality a bit further to suit your needs. As mentioned above, svg-sprite supports other modes of sprite output if inline embedding using an external reference doesn’t tickle your fancy.

We also have the ability to output an example HTML file to preview all the icons included in the generated sprite. This can be really useful for checking for any malformed SVG that isn’t rendering correctly etc. To generate this, you can add the following script to your package.json:


If you want to view it in the browser easily, you could add an additional script to serve it up at localhost:8000 (note: if you’re not on OSX or Linux then you may need to install Python before using SimpleHTTPServer)


Those are just a couple of ideas but you could also push it a bit further to automatically generate an array of icon names that you can utilise within your JavaScript app or Storybook (something on my todo list but not yet attempted).

You can find all code examples shown above in my Easy Peasy SVGeezy repo.

NLP – Natural Language Processing

More than 80% of data is recorded in an unstructured format. Unstructured data will continue to increase with the prominence of the internet of things. Data is being generated from audio, social media communications (like posts, tweets, messages on WhatsApp etc) and in various other forms. The majority of this data exists in text form, which is highly unstructured in nature.

In order to produce significant and actionable insights from text data, it is important to get acquainted with the techniques and principles of Natural Language Processing (NLP).

What is NLP?

Massive amounts of data are stored in text as natural language, however some of it is not directly machine understandable. The ability to process and analyse this data has huge scope in our current business and day to day lives. However, processing this unstructured kind of data is a complex task. 

Natural Language Processing (NLP) techniques provide the basis for mobilising this massive amount of data and making it useful for further processing.

A few examples of NLP that people use every day are:

  • Spell check
  • Autocomplete
  • Voice text messaging
  • Spam filters
  • Related keywords on search engines
  • Siri, Alexa, or Google Assistant

Natural language processing tasks

  1. Tokenisation: This task splits the text roughly into words depending on the language being used. These tokens help in understanding the context or developing the model for the NLP. The tokenisation helps in interpreting the meaning of the text by analysing the sequence of the words.
  2. Sentence boundary detection: This task involves splitting text into sub-sentences by a predefined boundary. This boundary could be a new line or a regular expression that will match something to be treated as a sentence boundary (eg: ., !, ?, <p>,<html> etc ) or you may specify the whole document to be defined as the sentence.
  3. Shallow parsing: This next task involves classifying words within a sentence (nouns, verbs, adjectives, etc.) and links them to higher order units that have discrete grammatical meanings (noun groups or phrases, verb groups, etc.).
  4. Stemming and Lemmatisation: For grammatical reasons, documents are going to use different forms of a word, such as write, writes and writing. Additionally, there are families of derivationally related words with similar meanings, such as democracy, democratic, and democratisation. In many situations, it would be useful if a search for one of these words would return documents that contain another word in the set. For example, when searching for ‘democracy’, results containing the word ‘democratic’ are also returned. The goal of both stemming and lemmatisation is to reduce inflectional forms and sometimes derivationally related forms of a word to a common base form.
  5. Named-entity Recognition: To automatically identify named entities within raw text and classify them into predetermined categories, like people, organisations, email addresses, locations, values, etc.
  6. Summarisation: The art of creating short, coherent and accurate content based on vast knowledge sources. This could be articles, documents, blogs, social media or anything over the web. For example, in the news feed, summarisation of sentences for news articles.

Python for Natural Language Processing (NLP)

There are many things about python that make it a really good programming language choice for an NLP project. The simple syntax and transparent semantics of this language make it an excellent choice for projects that include Natural Language Processing tasks. Moreover, developers can enjoy excellent support for integration with other languages and tools that come in handy for techniques like machine learning.

But there’s something else about this versatile language that makes it such a great technology for helping machines process natural languages. It provides developers with an extensive collection of NLP tools and libraries that enable developers to handle a great number of NLP-related tasks and it has a shallow learning curve, its syntax and semantics are transparent, and it has good string-handling functionality.

When it comes to natural language processing, Python is a top technology. Developing software that can handle natural languages in the context of artificial intelligence can be challenging. But thanks to this extensive toolkit and Python NLP libraries, developers get all the support they need while building amazing tools.

The 6 libraries of this amazing programming language make it a top choice for any project that relies on machine understanding of unstructured data.

1. Natural Language Toolkit (NLTK)

Link: https://www.nltk.org/

Info: Its modularised structure makes it excellent for learning and exploring NLP concepts.

2. TextBlob

Link: https://textblob.readthedocs.io/en/dev/

Info: TextBlob is built on top of NLTK, and it’s more easily accessible. This is one of the preferred libraries for fast-prototyping or building applications that don’t require highly optimised performance.

3. CoreNLP

Link: https://stanfordnlp.github.io/CoreNLP/

Info: CoreNLP is a Java library with Python wrappers. It’s in many existing production systems due to its speed.

4. Gensim

Link: https://github.com/RaRe-Technologies/gensim

Info: Gensim is most commonly used for topic modelling and similarity detection. It’s not a general-purpose NLP library, but for the tasks it does handle, it does them well.

5. spaCy

Link: https://spacy.io/

Info: SpaCy is a new NLP library that’s designed to be fast, streamlined, and production-ready. It’s not as widely adopted, but if you’re building a new application, you should give it a try.

6. scikit–learn

Link: https://scikit-learn.org/

Info: Simple and efficient tools for predictive data analysis. Built on NumPy, SciPy, and matplotlib.

 

Why Python?

Python is a simple yet powerful programming language with excellent functionality for processing linguistic data. Python is heavily used in industry, scientific research, and education around the world. Python is often praised for the way it facilitates productivity, quality, and maintainability of software. A collection of Python success stories is posted at Python Success Stories

NLTK defines an infrastructure that can be used to build NLP programs in Python. It provides basic classes for representing data relevant to natural language processing; standard interfaces for performing tasks such as part-of-speech tagging, syntactic parsing, and text classification; and standard implementations for each task that can be combined to solve complex problems.

Word count program using NLTK python

Install following packages using pip:

First, Import necessary modules:

Second, we will grab a webpage and analyse the text. 

Urllib module will help us to crown the webpage:

Next, we will use BeautifulSoup library to pull data out of HTML and XML files and clean the text of HTML tags.

Till this point, we have a clean text from the HTML page, now convert the text to tokens.

Further pieces of code will remove all stop words and count the word frequency.

In the final part, we will display the top 50 common words with count and plot the data on a map.

Sample Output:

Pros and cons of NLTK

Pros:

  • Most well-known and full NLP library with many 3rd extensions
  • Supports the largest number of languages compared to other libraries

Cons:

  • Difficult to learn
  • Slow
  • Only splits text by sentences, without analysing the semantic structure
  • No natural network models