Wednesday, June 6, 2012

Refactoring C to C++ Part 2 - Strings, Strings, and More Strings

In the previous entry in this series, a general info dump on a converted class was taken. This time a more general rule will be examined: string usage in C++.

One large improvement in C++ coding over C is in the area of strings. With C, a string is just a random memory pointer to what should be a NULL terminated sequence of proper characters. In practice there ends up being many ways that problems with C strings can creep in.

  • the final zero-byte null terminator might be missed during creation.
  • some common library functions will ensure null termination, while others do not.
  • to determine the length of a string, the entire buffer needs to be walked
  • resizing and appending to strings can be complex multistage operations with many potential failure points.
  • resizing a string most often invalidates the existing pointer.
  • tracking different character encodings can be difficult.

With C++ in general strings are represented by the standard class std::string. However that still does not address the issue of encodings. What the meaning of an individual byte or set of bytes is can depend on many factors. Modern programs have to deal with multiple encodings... even if their developers do not always realize it.

With GTK+ programs there are three main encoding values to keep aware of: locale encoding, filesystem encoding and internal encoding. The internal encoding is used for UI widgets and most internal GTK+ calls. The encoding itself is UTF-8. The locale encoding can vary at runtime, and although it is commonly also UTF-8, it can be any other. The filesystem encoding is different, and used for paths. This can vary greatly for systems that have been upgraded over time.

I'll cover encodings a bit more at a different time, but in the context of GTK+ and C++ the potential encoding allows us to select between the two main classes for strings:

std::string
The standard class for strings in C++. Should be used when the data might be in an encoding other than UTF-8. This is such for GTK+ and Glib APIs that operate with either locale or filesystem encodings.
Glib::ustring
A class from Gtkmm that represents strings of UTF-8 data. Aside from other things it manages details of multi-byte UTF-8 single characters, etc.

Thankfully we end up with some fairly simple rules for C++ programs:

  • Use a single common encoding for as much of a program as possible. For GTK+ this is UTF-8.
  • Avoid using legacy C strings such as "char *" or "gchar *"
  • Use Glib::ustring for all UTF-8 encoded strings.
  • Use std::string for strings that might be in different encodings.
  • Be very careful about string conversions, and use explicit encodings.
  • Do not mix strings and byte data.
  • Use std::vector<uint8_t> for random byte buffers.
  • For parameters passed into functions, use "Glib::ustring const &" or "std::string const &".
  • For return values, prefer functions that return "Glib::ustring" or "std::string" (note that these do not use 'const' nor references).
  • For functions that return multiple strings, take in parameters of either "Glib::string &" or "std::string &"

Finally we end up with a very important question: does any of this make sense? Hopefully some guidance can be quickly drawn from this information. However, if any point needs more clarification, or was missed, please speak up and let me know what to address.

Read more!

Friday, May 18, 2012

Refactoring C to C++ Part 1

It turns out that a recent Inkscape source change is a good example for showing some of the process of conversion from C to C++ of a GTK+ type. In doing some recent usability changes, I'd done a bit of a cleanup on 'C++ifying' the Inkscape SPCtrlLine type. Trying to keep our source revision history clear and useful, this one cleanup pass went in as a separate change (revision 11321). This also makes it easy to look at for guidance.

A good starting point is to look at the changes to the main header file itself: sp-ctrlline.h.

First is a simple change to a standard GTK+ macro definition. Yes, in general macros are evil, but the few macros listed at the start of the header are following GTK+ conventions.

21    #define SP_TYPE_CTRLLINE (sp_ctrlline_get_type ())
   23 #define SP_TYPE_CTRLLINE (SPCtrlLine::getType())
  • The "SP" prefixing is legacy naming that we will ignore for now.
  • In general this seems like a minor change, with only subtle formatting differences, but there is more to it than that.
  • Instead of invoking a single function with a long name, it now invokes a static method on a class.
  • The method being called is now merely "getType()" (and thus is template-friendly).

One important point to keep in mind is that in C++, a struct is just a class that defaults to public:. So once we're in C++-land, just think of "struct" as a rough synonym for "class".

Then the main change in the header involves moving a set of simple C functions to instead be class methods:

33  GType sp_ctrlline_get_type (void);
34 
35  void sp_ctrlline_set_rgba32 (SPCtrlLine *cl, guint32 rgba);
36  void sp_ctrlline_set_coords (SPCtrlLine *cl, gdouble x0, gdouble y0, gdouble x1, gdouble y1);
37  void sp_ctrlline_set_coords (SPCtrlLine *cl, const Geom::Point start, const Geom::Point end);
  • Since sp_ctrlline_get_type() does not have a pointer to an instance, this will be a static method
  • Since the others start with SPCtrlLine *cl instance pointers, these will become normal methods.
  • The prefix "sp_ctrlline_" dissappears as a natural part of moving into a class.
  • The explicit instance pointers (SPCtrlLine *cl) dissappear and are replaced by the implicit "this" pointer of C++ member functions (aka "methods").
  • To avoid making unnecessary copies of the start and end parameters on sp_ctrlline_set_coords, we change it to pass constant references instead.
  • Since C++ references are easiest to understand when read left-to-right, we move the 'const' to be just before the & of the reference.
28    static GType getType();
30    void setRgba32(guint32 rgba);
32    void setCoords(gdouble x0, gdouble y0, gdouble x1, gdouble y1);
34    void setCoords(Geom::Point const &start, Geom::Point const &end);

Moving on now to the sp-strlline.cpp file, there are a few things to note. One is switching from static methods to using an unnamed (or anonymous) namespace. That could have allowed us to drop the "sp_ctrlline_" prefix, but that step was skipped for the moment. We do, however, want to fix casts as we go, such as

49        (GClassInitFunc) sp_ctrlline_class_init, 
   51     reinterpret_cast<GClassInitFunc>(sp_ctrlline_class_init),

Inside of the class_init function around lines 63-72/66-72 there is a simplification due to inheritance. There is no need to create object_class and item_class pointers from the passed in SOCtrlLineClass *klass pointer. The members of the parent types are visible, so we can just use klass directly, such as for

klass->destroy = sp_ctrlline_destroy;

Another handy aspect to turning stand-alone C functions in to C++ methods is that we get compile-type checks and safety and can drop run-time checks, such as at the beginning of the new SPCtrlLine::setRgba32() method:

154    g_return_if_fail (cl != NULL);
155    g_return_if_fail (SP_IS_CTRLLINE (cl));

The checks at lines 171-172 are similarly dropped.

Once we get to the body of the method, there are a few interesting points to be seen:

157        if (rgba != cl->rgba) {
158            SPCanvasItem *item;
159            cl->rgba = rgba;
160            item = SP_CANVAS_ITEM (cl);
161            item->canvas->requestRedraw((int)item->x1, (int)item->y1, (int)item->x2, (int)item->y2);
    155    if (rgba != this->rgba) {
    156        this->rgba = rgba;
    157        canvas->requestRedraw(x1, y1, x2, y2);
  • At new line 155 since a parameter has the same name as a member, we use "this->" to be able to access the member.
  • There is no need for the casting macro SP_CANVAS_ITEM from line 160, since a subclass has all the superclass accessible.
  • Since canvas, x1, y1, x2 and y2 are all members and we are now a member function, use of cl-> and item-> can be dropped.
  • Since canvas is a member and we are in a member function, we can use it directly in new line 157.
  • C-style casts, and casting in general, are enemies. By dropping the casts to (int), we let the code get simpler, gain the ability to leverage from overloading, and get errors more visible.

Moving on down into gradient-drag.cpp, there is a very important shift in though/approach for pointers. Looking at line 1579/1578 we see a difference in type:

1579         SPCanvasItem *line = sp_canvas_item_new(sp_desktop_controls(this->desktop),
1580                                                                  SP_TYPE_CTRLLINE, NULL);
     1578    SPCtrlLine *line = SP_CTRLLINE(sp_canvas_item_new(sp_desktop_controls(this->desktop), SP_TYPE_CTRLLINE, NULL));

Instead of holding a pointer to the more generic parent class SPCanvasItem, we hold and use a more specific pointer to the sublcass SPCtrlLine.

With GTK+ in C, holding the more generic type is common, and results in, among other things, excessive use of the type check and type casting macros (such as SP_CTRLLINE()). Aside from any performance slowdown they introduce, they hide things, block overriding, and sacrifice compile-time safety for run-time checks. It is far better to have incorrect code that will result in the compiler rejecting it upfront rather than code that will fail at runtime (but only when a user trips over the specific code path in question).

Similar fixes can be seen in the changes to line-geometry.cpp and elsewhere. In pen-context.h, seltrans.h, text-context.h, and node.h the type of the pertinent members have also been changed from the parent class SPCanvasItem to the more specific subclass SPCtrlLine.

In closing, reviewing the entire change with thoughts as to why different things were done can be quite useful. At some point soon I'll be following up with some more examples, along with some summaries of key points to follow and keep in mind. Additionally, this change did not really touch on any conversion from plain GTK+ over to Gtkmm (the C++ wrapper library for GKT+). Subsequent entries will also touch on those.

Read more!

Tuesday, January 31, 2012

Back on Track

After being bogged down with 'real life', I've finally managed to get things moving bak on track... so time to get back to the blogging. A lot has gone on, and is getting ready to happen. Conferences conferences conferences and more conferences, hardware, Inkscape hacking, and more...

We have a lot planned, and maybe something for most anyone. Inkscape has picked up a few more active contributors, and I've gotten progress on a few 'interesting' tweaks. Some seem just for fun, but others have good practical application. We're also trying to get together some more organized meetings, online and in person, so that will be good. Also look for more on the front to help promote Inkscape.

Much went on at this past linux.conf.au, with great people helping out and some really outstanding presentations going on. Bruce Perens had some very important things to say, and it looks to be very helpful. And I even had my talk on logo design for developers make it up online. (There are more going up over time, and the mirrors should be getting ogg versions too.)

Posts will show up highlighting things from linux.conf.au and SCALE10x shortly. There will even be a few photos here and there. Most importantly, though, is that things should get more and more active here, and posts should be quite regular now.

Read more!

Sunday, March 27, 2011

Inkscape Text Outlines

An Inkscape user mentioned trying to follow this Illustrator-based tutorial on outlined lettering, but said he was having some problems. To get the user going, I pointed out Troy Sobotka's tutorial on text styling. It gets a bit more complicated, but does present several nice features that got the user going even better.

However... I thought that the basic techniques used by Illustrator should work fine with Inkscape, so I pulled things up to take a pass at following it. It turns out I was right, and it wasn't hard at all. In fact, it turns out to be doable with fewer steps. So here is a basic summary of the changes in approach needed to achieve the copy-n-past over text effects explained there.

Start by reading the tutorial on "Lettering: Multiple Outline SFX". It seems pretty straightforward, right? Since adding a thick stroke often doesn't work the way one might want it to, I was quite familiar with the approach of pasting a copy over a thickened version.

1. Create the letters you want and get thing tuned as positioned as desired. This might be just plain text, text with some manual kerning applied, or outlines one draws or modifies.


2. Group them, apply a stroke, and set the desired thickness. Depending on the size of your text, you'll need different thicknesses. (If you look closely, especially at the hole in the 'A', you can see how this version is thicker than the original)


3. Duplicate the text and set the fill white and turn off the stroke. This is the part where it varies a bit from Illustrator. In Inkscape, all you need to do is "Duplicate" the grouped paths/object using Ctrl-D (That will combine the separate copy and then paste-in-front that were called for in the Illustrator tutorial).


Moving on, one can also follow the other techniques there. Using the simpler 'duplicate' saves a tiny bit of time, but the general principals are solid and the skills transfer easily from Illustrator to Inkscape once the specific keystrokes are not hunted for.

Read more!

Wednesday, October 6, 2010

Inkscape Does Support CMYK

While some are mislead by the fact that Inkscape does not (and probably should not) support raw or "generic" CMYK, it does in fact support working with true CMYK for print support. The key factor is that Inkscape only supports real CMYK work, and not "pretend CMYK." In and of itself "CMYK color" does not mean anything specific. It turns out that "RGB color" is also meaningless as far as specifying an actual color goes, but the variances are usually not as strong. To get accurate RGB color, one needs to specify *which* RGB to use. SMTP television values, Adobe RGB, Wide Gamut RGB, sRGB, etc. For experience on the Internet, people usually don't realize that there is an implied colorspace of "sRGB" used for tools, browsers, etc.

In a similar manner, Inkscape needs to be given a *specific* CMYK colorspace to work in. In fact, Inkscape (and SVG itself) can support a document with several different colorspaces at once, including mixing multiple different CMYK colorspaces alongside RGB colorspaces. This could be useful for cases such as when a graphic designer is creating artwork for some brochure that will be printed with cutaways and different paper types. Or it could apply to a case where different printers are used for different parts of a job. (Figure 1 shows what appears to be the same colors)

However an fairly common use case is where one might create a document to be printed mainly in CMYK, but with one or more spot colors, such as Pantone, Toyo, HKS, etc. colors. In this case, some elements of the artwork can be marked with a specific target CMYK profile, while the elements to be done in a spot color can be specified with a named color profile supporting the type of spot color (for SVG 1.2) or perhaps even with simple sRGB equivalents. Then when things go to a service bureau to be run the CMYK elements can go to a four-color print and the spot colors can go to custom plates per ink. (Figure 2 shows that the colors are actually different)

So how does only get to "real CMYK" in Inkscape? It's actually fairly simple. First at least one CMYK profile needs to be added to the document being worked on. That can be accessed in the GUI through the "Color Management" tab on the document properties dialog. Once at least one profile has been added, the color pickers in the Fill & Stroke dialog can be used to pick colors in that colorspace via specifying the ICC profile. In the past one needed to use the CMS color picker, but with Inkscape 0.48 the other color pickers such as the "CMYK" one will attempt to preserve values. (Figure 3 shows that the exact same CMYK numbers were used)

The main problem left is that even though true CMYK values are stored in the saved SVG value, using those values is now a bottleneck. Printing directly from Inkscape will flatten things to sRGB, and even PDF export will not yet preserve the CMYK values However, other software can read and use those values. Recent versions of Scribus will read in and preserve ICC colors including CMYK, and will happily save high-quality print-ready PDF output. (Figure 4 reveals the difference in RGB and visible colors that result from using two different CMYK profiles)

Read more!

Monday, September 20, 2010

"CMYK" is Meaningless

For anyone working with print there is one key principal to keep in mind: "CMYK numbers" are meaningless. Far too often artists are lured into "working in CMYK" without actually understanding what it means, and fall into the trap of believing their work is more precise when the exact opposite is true. Saying "50% Cyan" does not actually specify any color, and will most likely result in many different colors being produced. The subtle yet critical difference between "untagged CMYK" vs. "unspecified CMYK" is the bottom line that will mean the difference between success and failure.

CMYK is a type of colorspace, but in and of itself does not actually specify anything. It just tells how one can define a specific colorspace. To actually make sense a specific device (either real or theoretical) needs to be targeted. Once the specifics are involved, however, working in a specific CMYK is a very helpful thing. It is good to try to not get caught up in distraction of focusing on the "how" of things and instead look to the "why" of doing things.

In this case, the distracting "how" is people "working in CMYK" while the "why" they should be focusing on is reliably creating and controlling color. Instead of an artist getting caught up in saying "I need to get exactly a 50% saturation of cyan ink onto the printing plate at this point", they need to shift to say "I need to get this point of the final print output to be exactly this shade of color." That is not to say that an artist should never be concerned about CMYK numbers and plate separations, but rather they need to keep in mind that the ink details are merely a means in reaching the ends of the final print. (There is one main exception to the focus on raw CMYK, but I'll cover that later)

So although 'raw CMYK' numbers might be meaningless and produce randomly shifting results in completely unpredictable ways, CMYK values in a specific context normally give very precise and controlled output. Be wary of anyone asking for or providing "generic CMYK". However, specifics such as "SWOP v2 CMYK" will allow for fairly exact control and results. If someone says "CMYK", the response should be "which CMYK?"

What does this all mean in practice? If one is producing artwork that will go to print, getting specifics nailed down is critical. When working with a good print house, they will provide either specific profiles for the output that will be run, or will say which industry standard profiles they will work from. The artist provides content targeting the specific CMYK profile and then the printed results comes back with the exact appearance that was intended. The print house gets this generally specified input and then does a specific conversion from what the artist supplied to the measured control needed for the specific press and the actual paper with that days inks and the current temperature, humidity, etc.

When such a good workflow is used one can get reprints run at any time later on and the output will match what was printed the past week, month or even year. Thus there is no need for the time nor the expense of multiple tweak-and-reprint runs to end up with acceptable results.

On the other hand, if a small shop is used that employs no color management in their own workflow, the burden falls squarely on the artist/customer to come up with ways to get consistent and desired output. Sometimes the print shop will provide some target profile, but make no guarantees as what the resulting run will look like.

This is the point where most will just think they're working directly with raw CMYK and just need to tweak things over and over until they are close enough. That's actually not what they should be doing. The critical need here is to understand that they are not working in "generic CMYK", but that they are working to a very specific CMYK. They really are working in "CMYK for fliers printed at the corner press". Sometimes that color will be consistent over time, but more often even that will vary from week to week or from day to day. If the artist was smart, he would have created his own profile for the local print shop and will work in an industry standard CMYK and convert to the specific local shop CMYK for delivery to them. He also should have done some simple calibrated output (perhaps in the margins of his run) so that if (or more likely when) the local print shop gives different colors for the same numbers he can call them on it and get things corrected.

For a shop with a good relationship, the artist can use his own management and tracking to get the print shop to correct their output. Some small shops might not even be employing much in the realm of color control but can happily match output to a reference proof they get handed along with the files for the job to be run that day.

Finally in the case were the local print shop will give varying results and no help for color matching we reach the situation I mentioned where actually sending raw CMYK values is desired. However the reason for sending out a file with raw CMYK that is intended to go exactly to the end plates is in creating a test target output for measurement so that the artist can create his own target CMYK profile. The artist can send over a raw file to get a test output print, and then measure that test print. He then converts the real job files to using that locally created profile and sends the adjusted art files over to be printed. The net result is better output with lower costs due to avoiding reprints, etc. (so "raw CMYK values" should really only be used for printing test targets)

This last case helps illustrate the difference between "untagged CMYK" and "unspecified CMYK". The artists sends over art files that are not tagged with any embedded ICC profiles. However these are not raw nor "unspecified" CMYK values at all. Rather they are CMYK numbers in the specific profile that the artist himself has created to describe the characteristics of the local print shop. Although the files are not literally tagged with the profile, they have been created with an explicit ICC CMYK profile specified in the artists workflow.

And the bottom line? Be specific and you will save both time and money. And end up with happier clients.

Read more!

Tuesday, March 16, 2010

Backlogs and Real Life

The last three months have been a bit crazy, with far too much "real life" hitting us upside the head. Things have finally settled in a bit so that I'll be able to get my head above water and surface again. Aside from diving head first at the new day job and surviving the holidays, much had happened in the tech world.

I still haven't had time to finish my writeup of SVG Open (partly since I accepted the new day job while I was attending it up in Mountain View). Then there was the Google Summer of Code Mentors' summit. Great things happened there. Then I had to prep for our visit to New Zealand as co-organizer for a Libre Graphics Day miniconf and as a speaker at the main linux.conf.au. Then we had SCALE8x come 'round where I presented yet another talk and then also run the Inkscape booth on the show floor. Toss in getting a new tech (adaptive UI) going, starting a new project with other CREATE guys, and doing battle across the board to help get proper CMYK support out for end users everywhere.

Whew!

On top of all that was work for Inkscape and trying to get new features solid for the next release, 0.48. Thankfully I was able to squeeze the time in to finish up the basic support and UI for per-document color/swatch palettes. This allows for basic colors to be stored as a set in a given document, but also for gradients to be included in that. One big thing that inclusion accomplishes is breaking down the artificial barriers software engineers have imposed on artists for far too long. Assets had been artificially separated by their *implementation*, without regard for how artists actually are used to working. This also enabled many workflow enhancements including making art recoloring easier, indicating which swatches are in use on the selected object, etc.

Work on the new input devices dialog also came through. Aside from more end users getting their hands on tablets and such, we had a push in that the ugly outdated GTK+ dialog is being removed. And just in the nick of time we had Krzysztof step up and investigate some of the win32 tablet bugs and get some insight on the problem with Aiptek and others showing up with broken names. I was able to help refine the fixups there wile getting them set to be reimplemented in the new dialog.

And then there is the basic work on adaptive UI. This is a very promising area, and is just beginning to show the tip of the iceberg. I'm implementing internals based in part on Michael Terry's work with INGIMP he has presented at LGM. Though 0.48 will only expose a tiny bit of what can go on, the support in Inkscape will give it some very useful functionality in even the near term. We're looking at only giving 0.48 a few set layout modes, but with some handy logic behind the scenes to assist users getting what they need without having to think as much.

Unfortunately, though, we were unable to find time to work in support for Wii remotes, joysticks, and the SpaceNavigator someone at LCA lent me. We are on track to get more in, and 0.49 might even see some of that. Some of this (like using guitar game controllers) might sound a bit silly. However there are some very interesting ways these can be worked in and give Inkscape some nice functionality for average users. And, of course, more hardware toys always makes the geeks happier.

Read more!