Change Your Formatted XML’s Encoding

Apples To: .NET (C#, VB.NET)

In my previous post, Prettify Your XML in .NET I showed a method for taking some XML and making it pretty (indentation, new lines, etc.). Using the method also produced the XML Declaration node for us. Unfortunately, because strings are UTF-16 encoded in .NET, the XML Declaration node generated by this method is always listed as “utf-16” which may not always be what we want.

Here’s the results of the previous post’s prettified XML:

<?xml version="1.0" encoding="utf-16"?>
<TMNT>
    <Turtles>
        <Turtle Name="Leonardo" Color="Blue" Weapon="Katana" />
        <Turtle Name="Raphael" Color="Red" Weapon="Sai" />
        <Turtle Name="Michelangelo" Color="Orange" Weapon="Nunchaku" />
        <Turtle Name="Donatello" Color="Purple" Weapon="Bo" />
    </Turtles>
</TMNT>

As mentioned you can see that encoding=”utf-16″. But what it you want something else (Most likely UTF8)? Well, there are several ways you can do it with Streams, XMLWriter and XMLWriterSettings objects and other junk, but you can also use a neat little method I found on Project 20 which involves subclassing the StringWriter class. (This idea originally comes from Jon Skeet).

So, just add a new class to your project and call it StringWriterWithEncoding or something similar and override the Encoding property. Here is the entire class:

Public Class StringWriterWithEncoding
    Inherits IO.StringWriter

    Private _encoding As System.Text.Encoding

    Public Sub New(encoding As System.Text.Encoding)
        MyBase.New()
        _encoding = encoding
    End Sub

    Public Sub New(encoding As System.Text.Encoding, formatProvider As IFormatProvider)
        MyBase.New(formatProvider)
        _encoding = encoding
    End Sub

    Public Sub New(encoding As System.Text.Encoding, sb As System.Text.StringBuilder)
        MyBase.New(sb)
        _encoding = encoding
    End Sub

    Public Sub New(encoding As System.Text.Encoding, sb As System.Text.StringBuilder, formatProvider As IFormatProvider)
        MyBase.New(sb, formatProvider)
        _encoding = encoding
    End Sub

    Public Overrides ReadOnly Property Encoding As System.Text.Encoding
        Get
            Return _encoding
        End Get
    End Property

End Class

So all we’ve really done is provided constructors that allow us to specify the encoding the StringWriter object should use. Then we’ve overriden the Encoding property to always return the value specified in the constructor. The result is the StringWriter uses our encoding. So then we can take the PrettyXML code and swap the StringWriter object creation to a StringWriterWithEncoding like so:

    Private Function PrettyXML(XMLString As String) As String
        Dim sw As New StringWriterWithEncoding(System.Text.Encoding.UTF8)
        Dim xw As New XmlTextWriter(sw)
        xw.Formatting = Formatting.Indented
        xw.Indentation = 4
        Dim doc As New XmlDocument
        doc.LoadXml(XMLString)
        doc.Save(xw)
        Return sw.ToString()
    End Function

Then when we run our XML through it we get the results we wanted:

<?xml version="1.0" encoding="utf-8"?>
<TMNT>
    <Turtles>
        <Turtle Name="Leonardo" Color="Blue" Weapon="Katana" />
        <Turtle Name="Raphael" Color="Red" Weapon="Sai" />
        <Turtle Name="Michelangelo" Color="Orange" Weapon="Nunchaku" />
        <Turtle Name="Donatello" Color="Purple" Weapon="Bo" />
    </Turtles>
</TMNT>

3 thoughts on “Change Your Formatted XML’s Encoding

  1. Thank you for this! I had been searching for a simple lesson on XML and SharePoint. Your posting has filled gaps in my research. I still have to work with the code, but your information is dead on with regard to creating XML and has shaved a lot of time off my project.

Leave a comment