Wednesday, June 6, 2012

Refactoring C to C++ Part 2 - Strings, Strings, and More Strings

In the previous entry in this series, a general info dump on a converted class was taken. This time a more general rule will be examined: string usage in C++.

One large improvement in C++ coding over C is in the area of strings. With C, a string is just a random memory pointer to what should be a NULL terminated sequence of proper characters. In practice there ends up being many ways that problems with C strings can creep in.

  • the final zero-byte null terminator might be missed during creation.
  • some common library functions will ensure null termination, while others do not.
  • to determine the length of a string, the entire buffer needs to be walked
  • resizing and appending to strings can be complex multistage operations with many potential failure points.
  • resizing a string most often invalidates the existing pointer.
  • tracking different character encodings can be difficult.

With C++ in general strings are represented by the standard class std::string. However that still does not address the issue of encodings. What the meaning of an individual byte or set of bytes is can depend on many factors. Modern programs have to deal with multiple encodings... even if their developers do not always realize it.

With GTK+ programs there are three main encoding values to keep aware of: locale encoding, filesystem encoding and internal encoding. The internal encoding is used for UI widgets and most internal GTK+ calls. The encoding itself is UTF-8. The locale encoding can vary at runtime, and although it is commonly also UTF-8, it can be any other. The filesystem encoding is different, and used for paths. This can vary greatly for systems that have been upgraded over time.

I'll cover encodings a bit more at a different time, but in the context of GTK+ and C++ the potential encoding allows us to select between the two main classes for strings:

std::string
The standard class for strings in C++. Should be used when the data might be in an encoding other than UTF-8. This is such for GTK+ and Glib APIs that operate with either locale or filesystem encodings.
Glib::ustring
A class from Gtkmm that represents strings of UTF-8 data. Aside from other things it manages details of multi-byte UTF-8 single characters, etc.

Thankfully we end up with some fairly simple rules for C++ programs:

  • Use a single common encoding for as much of a program as possible. For GTK+ this is UTF-8.
  • Avoid using legacy C strings such as "char *" or "gchar *"
  • Use Glib::ustring for all UTF-8 encoded strings.
  • Use std::string for strings that might be in different encodings.
  • Be very careful about string conversions, and use explicit encodings.
  • Do not mix strings and byte data.
  • Use std::vector<uint8_t> for random byte buffers.
  • For parameters passed into functions, use "Glib::ustring const &" or "std::string const &".
  • For return values, prefer functions that return "Glib::ustring" or "std::string" (note that these do not use 'const' nor references).
  • For functions that return multiple strings, take in parameters of either "Glib::string &" or "std::string &"

Finally we end up with a very important question: does any of this make sense? Hopefully some guidance can be quickly drawn from this information. However, if any point needs more clarification, or was missed, please speak up and let me know what to address.

Read more!

Friday, May 18, 2012

Refactoring C to C++ Part 1

It turns out that a recent Inkscape source change is a good example for showing some of the process of conversion from C to C++ of a GTK+ type. In doing some recent usability changes, I'd done a bit of a cleanup on 'C++ifying' the Inkscape SPCtrlLine type. Trying to keep our source revision history clear and useful, this one cleanup pass went in as a separate change (revision 11321). This also makes it easy to look at for guidance.

A good starting point is to look at the changes to the main header file itself: sp-ctrlline.h.

First is a simple change to a standard GTK+ macro definition. Yes, in general macros are evil, but the few macros listed at the start of the header are following GTK+ conventions.

21    #define SP_TYPE_CTRLLINE (sp_ctrlline_get_type ())
   23 #define SP_TYPE_CTRLLINE (SPCtrlLine::getType())
  • The "SP" prefixing is legacy naming that we will ignore for now.
  • In general this seems like a minor change, with only subtle formatting differences, but there is more to it than that.
  • Instead of invoking a single function with a long name, it now invokes a static method on a class.
  • The method being called is now merely "getType()" (and thus is template-friendly).

One important point to keep in mind is that in C++, a struct is just a class that defaults to public:. So once we're in C++-land, just think of "struct" as a rough synonym for "class".

Then the main change in the header involves moving a set of simple C functions to instead be class methods:

33  GType sp_ctrlline_get_type (void);
34 
35  void sp_ctrlline_set_rgba32 (SPCtrlLine *cl, guint32 rgba);
36  void sp_ctrlline_set_coords (SPCtrlLine *cl, gdouble x0, gdouble y0, gdouble x1, gdouble y1);
37  void sp_ctrlline_set_coords (SPCtrlLine *cl, const Geom::Point start, const Geom::Point end);
  • Since sp_ctrlline_get_type() does not have a pointer to an instance, this will be a static method
  • Since the others start with SPCtrlLine *cl instance pointers, these will become normal methods.
  • The prefix "sp_ctrlline_" dissappears as a natural part of moving into a class.
  • The explicit instance pointers (SPCtrlLine *cl) dissappear and are replaced by the implicit "this" pointer of C++ member functions (aka "methods").
  • To avoid making unnecessary copies of the start and end parameters on sp_ctrlline_set_coords, we change it to pass constant references instead.
  • Since C++ references are easiest to understand when read left-to-right, we move the 'const' to be just before the & of the reference.
28    static GType getType();
30    void setRgba32(guint32 rgba);
32    void setCoords(gdouble x0, gdouble y0, gdouble x1, gdouble y1);
34    void setCoords(Geom::Point const &start, Geom::Point const &end);

Moving on now to the sp-strlline.cpp file, there are a few things to note. One is switching from static methods to using an unnamed (or anonymous) namespace. That could have allowed us to drop the "sp_ctrlline_" prefix, but that step was skipped for the moment. We do, however, want to fix casts as we go, such as

49        (GClassInitFunc) sp_ctrlline_class_init, 
   51     reinterpret_cast<GClassInitFunc>(sp_ctrlline_class_init),

Inside of the class_init function around lines 63-72/66-72 there is a simplification due to inheritance. There is no need to create object_class and item_class pointers from the passed in SOCtrlLineClass *klass pointer. The members of the parent types are visible, so we can just use klass directly, such as for

klass->destroy = sp_ctrlline_destroy;

Another handy aspect to turning stand-alone C functions in to C++ methods is that we get compile-type checks and safety and can drop run-time checks, such as at the beginning of the new SPCtrlLine::setRgba32() method:

154    g_return_if_fail (cl != NULL);
155    g_return_if_fail (SP_IS_CTRLLINE (cl));

The checks at lines 171-172 are similarly dropped.

Once we get to the body of the method, there are a few interesting points to be seen:

157        if (rgba != cl->rgba) {
158            SPCanvasItem *item;
159            cl->rgba = rgba;
160            item = SP_CANVAS_ITEM (cl);
161            item->canvas->requestRedraw((int)item->x1, (int)item->y1, (int)item->x2, (int)item->y2);
    155    if (rgba != this->rgba) {
    156        this->rgba = rgba;
    157        canvas->requestRedraw(x1, y1, x2, y2);
  • At new line 155 since a parameter has the same name as a member, we use "this->" to be able to access the member.
  • There is no need for the casting macro SP_CANVAS_ITEM from line 160, since a subclass has all the superclass accessible.
  • Since canvas, x1, y1, x2 and y2 are all members and we are now a member function, use of cl-> and item-> can be dropped.
  • Since canvas is a member and we are in a member function, we can use it directly in new line 157.
  • C-style casts, and casting in general, are enemies. By dropping the casts to (int), we let the code get simpler, gain the ability to leverage from overloading, and get errors more visible.

Moving on down into gradient-drag.cpp, there is a very important shift in though/approach for pointers. Looking at line 1579/1578 we see a difference in type:

1579         SPCanvasItem *line = sp_canvas_item_new(sp_desktop_controls(this->desktop),
1580                                                                  SP_TYPE_CTRLLINE, NULL);
     1578    SPCtrlLine *line = SP_CTRLLINE(sp_canvas_item_new(sp_desktop_controls(this->desktop), SP_TYPE_CTRLLINE, NULL));

Instead of holding a pointer to the more generic parent class SPCanvasItem, we hold and use a more specific pointer to the sublcass SPCtrlLine.

With GTK+ in C, holding the more generic type is common, and results in, among other things, excessive use of the type check and type casting macros (such as SP_CTRLLINE()). Aside from any performance slowdown they introduce, they hide things, block overriding, and sacrifice compile-time safety for run-time checks. It is far better to have incorrect code that will result in the compiler rejecting it upfront rather than code that will fail at runtime (but only when a user trips over the specific code path in question).

Similar fixes can be seen in the changes to line-geometry.cpp and elsewhere. In pen-context.h, seltrans.h, text-context.h, and node.h the type of the pertinent members have also been changed from the parent class SPCanvasItem to the more specific subclass SPCtrlLine.

In closing, reviewing the entire change with thoughts as to why different things were done can be quite useful. At some point soon I'll be following up with some more examples, along with some summaries of key points to follow and keep in mind. Additionally, this change did not really touch on any conversion from plain GTK+ over to Gtkmm (the C++ wrapper library for GKT+). Subsequent entries will also touch on those.

Read more!

Tuesday, January 31, 2012

Back on Track

After being bogged down with 'real life', I've finally managed to get things moving bak on track... so time to get back to the blogging. A lot has gone on, and is getting ready to happen. Conferences conferences conferences and more conferences, hardware, Inkscape hacking, and more...

We have a lot planned, and maybe something for most anyone. Inkscape has picked up a few more active contributors, and I've gotten progress on a few 'interesting' tweaks. Some seem just for fun, but others have good practical application. We're also trying to get together some more organized meetings, online and in person, so that will be good. Also look for more on the front to help promote Inkscape.

Much went on at this past linux.conf.au, with great people helping out and some really outstanding presentations going on. Bruce Perens had some very important things to say, and it looks to be very helpful. And I even had my talk on logo design for developers make it up online. (There are more going up over time, and the mirrors should be getting ogg versions too.)

Posts will show up highlighting things from linux.conf.au and SCALE10x shortly. There will even be a few photos here and there. Most importantly, though, is that things should get more and more active here, and posts should be quite regular now.

Read more!