How much code does it take to format a telephone number?

It’s been a formatting day today.

Not a “font faces, and whether to make something bold, red, and far too large for the paragraph” sort of format, but more a “how do I change a phone number into an international format that TAPI or an SMS app can understand” type of format.

Is it even possible?

Look at how many formats exist in the world. Different lengths of numbers. Varying area codes. Some area codes in brackets. Some numbers separated with dashes. Some users including the country code, some not. Not to mention those good old users who insist on tacking “(home)” after the number, or sticking a backslash between the country and area codes for the sheer hell of it.

Well, as to “is it possible”, yes, this type of challenge is workable. It’s just about the puzzling out and how much time that takes. As a home project with a year or so to develop, it would have been great. As a request for a client who wanted it yesterday, let’s say the clock was ticking. So onto Plan B.

Which, if you’re me, starts with Google.

Here’s what I found.

Google’s handy Java, C++, and JavaScript phone number handling library for parsing, formatting, storing, and validating international phone numbers:

https://code.google.com/p/libphonenumber/

An even more handy implementation of the above converted into C# by Beat Kiener:

http://blog.thekieners.com/2011/06/06/using-googles-libphonenumber-in-microsoft-net-with-c/

But the magic part – the part that really helped me get 98% of the numbers I threw at it into the format I wanted was Beat Kiener’s own converter in VB.NET.

The main method I made use of was “ConvertToUnformattedPhoneNumber” which uses various regular expressions to remove dodgy characters (such as hyphens, back slashes, and the good old “(home”)).

public string ConvertToUnformattedPhoneNumber(string phoneNumber)
{
    string regexSubPattern = String.Empty;
    Regex regex;
    Match m;
    bool national = false;

    try
    {
	if (!(phoneNumber == null))
	{
	    // Remove all spaces
	    phoneNumber = phoneNumber.Replace(" ", "");
	}

	for (int i = 0; i < 10; i++)
	{
	    if (i.ToString() != this.NetworkCarrierCode)
	    {
		if (regexSubPattern.Length > 0)
		{
		    regexSubPattern += "|";
		}
		regexSubPattern += i.ToString();
	    }
	}

	// Check if number starts with the network carrier code to identify if
	// it needs the country code adding
	regex = new Regex("^[" + this.NetworkCarrierCode
	    + "][" + regexSubPattern + "].*");
	if (regex.IsMatch(phoneNumber))
	{
	    phoneNumber = "+" + this.CountryCode + phoneNumber.Substring(1);
	}

	// Check if the number starts with the International code and change
	// to a "+"
	regex = new Regex("^[" + this.InternationalCarrierCode + "].*");
	if (regex.IsMatch(phoneNumber))
	{
	    phoneNumber = "+" + phoneNumber
		.Substring(this.InternationalCarrierCode.Length);
	}

	// Remove existing brackets
	regex = new Regex(@"(\[|\()(\d)*(\]|\))");
	m = regex.Match(phoneNumber);
	while (m.Success)
	{
	    phoneNumber = phoneNumber.Remove(m.Index, m.Length);
	    m = regex.Match(phoneNumber);
	}

	// Now remove the remaining characters (not digits)
	national = phoneNumber.StartsWith("+");
	regex = new Regex(@"\D");
	if (regex.IsMatch(phoneNumber))
	{
	    m = regex.Match(phoneNumber);
	    while (m.Success)
	    {
		phoneNumber = phoneNumber.Remove(m.Index, m.Length);
		m = regex.Match(phoneNumber);
	    }
	}
	if (national)
	    phoneNumber = "+" + phoneNumber;
    }
    catch { }

    return phoneNumber;
}

While far from perfect (the method removes anything in brackets in the middle of the phone number, so a number formatted as +353 (086) 1234567 would lose the entire “(086)”, which makes for an invalid number), it seems like a fantastic start to me, and I wanted to share Beat Kiener’s brilliant solution in case it helps anyone else on the rocky path of telephone number formatting.