Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extending SPI by exposing zephyr SPI capabilities. (probably applies to many subsystems) #14

Open
KurtE opened this issue Dec 15, 2024 · 7 comments

Comments

@KurtE
Copy link

KurtE commented Dec 15, 2024

Overview:
Suppose in a library (or sketch), I wish to use some of the more powerful capabilities of the Zephyr system. For example in SPI,
I may wish to output a whole display frame of pixel data: for example an ILI9341 with 320x240 pixels, or 153600 bytes.
Currently with the SPI library, I have the choices of calling simple SPI.transfer(), where for each pixel I have to do something like
SPI.transfer(color >> 8); SPI.transfer(color & 0xff);

Which is rather slow. That is there is a lot of wasted time on the SPI buss.
Image
Or assuming SPI.transfer16(color) works. #13
We can cut the time down more or less half.

The only fast alternative is to use: SPI.transfer(buf, cnt)
However, we have to use a secondary buffer, and reverse the bytes into that secondary buffer, but it does speed things up.
And the transfer overwrites the buffer, with the return data, which is unneeded in this case.

What I would like is the ability to do: SPI.transfer16(send_buf, recv_buf, cnt);
where the two buffers are optional. Can do send only or receive only or send and receive...
And in some case I might want to do this asynchronously, maybe using DMA.

It looks like Zephyr has a lot of these capabilities built into their SPI device. That is the spi_transceive function that our SPI library is calling allows most of the above.

Question, how much of the above can/should be included in the main Arduino SPI library? And if it is not implemented as part of the SPI library, how much should we facilitate, making it easier for developers to implement faster and more efficient SPI functionality?
There are several possibilities:

a) Add some additional methods to the SPI library, for this. Several of other SPI libraries have additional methods, including those for the Teensy and likewise ESP32.

b) Add methods to SPI, that exposes some of the internal data, like:

  // temporary 
  const struct device *getSPIDevice() {return spi_dev;}
  const struct spi_config *getSPIConfig() {return &config;}

Which allow one to gain access to the underlying zephyr SPI object and the config. Note: one could use their own config object without too much problem

c) change the SPI object, to use protected: instead of private: than one could create a new sub-class
like: class extended_SPI : public SPI { ...

d) Hack it, which I will likely do to experiment:

const struct device *spi_dev;
struct spi_config *config;
...
    uint32_t *p = (uint32_t *)&SPI;
    spi_dev = (const struct device *)p[1];
    config = (struct spi_config *)&p[2];

and then maybe in my display code, I might have something like updateScreen:
might change from something like:

      uint16_t *pfbtft_end =
          &_pfbtft[(ILI9341_TFTWIDTH * ILI9341_TFTHEIGHT) - 1]; // setup
      uint16_t *pftbft = _pfbtft;

      // Quick and dirty
      uint16_t c = 0;
      while (pftbft <= pfbtft_end) {
        uint16_t pixel_color = *pftbft++;
        s_row_buff[c++] = (pixel_color >> 8) | ((pixel_color & 0xff) << 8);  // swap the bytes...
        if (c == 320) {
          _pspi->transfer(s_row_buff, c * 2);
          c = 0;
        }
      }
      if (c != 0) _pspi->transfer(s_row_buff, c * 2);

maybe to something like:

  const struct spi_buf tx_buf = {.buf = _pfbtft, .len = TFT_WIDTH*TFT_HEIGHT*2};
  const struct spi_buf_set tx_buf_set = {
      .buffers = &tx_buf,
      .count = 1,
  };
  spi_operation_t operation_save = config->operation;
  config->operation = (operation_save & ~(SPI_WORD_SET(0x3f))) | SPI_WORD_SET(16) | BIT(4);
  ret = spi_transceive(spi_dev, config, &tx_buf_set, nullptr);
  config->operation = operation_save;

Which does not need secondary buffer, nor time to copy it into it and reverse the bytes

Thoughts?

@mjs513
Copy link

mjs513 commented Dec 15, 2024

@KurtE
Personally think this is a great idea will definitely bring the Giga up on par for SPI with Teensy/ESP32 spi transfers. Know we did a lot with using transfer buffers especially with display drivers.

@KurtE
Copy link
Author

KurtE commented Dec 16, 2024

I have examples working for updating the display a few different ways using my updated ILI9441 library, which is up
on my github project: https://github.com/KurtE/Arduino_GIGA-stuff

I have the display being updated, a few different ways:

  1. uses the library updateScreen function which calls the pspi->transfer(temp_buf, 640);
    That is I copy 320 pixels in at a time, doing byte swapping....

  2. Use SPI.transfer(color >> 8); SPI.transfer(color & 0xff);

  3. use SPI.transfer16(color);

  4. call down to zephyr to do it. (spi_transceive(spi_dev, &config16, &tx_buf_set, nullptr);)

Timings for drawing 5 full screens (different colors):

updateScreen: 822
SPI.transfer: 4353
SPI.transfer16: 2357
Zephyr: 603

So the Zephyr one is not that much faster than the SPI buffered transfer. i.e. the spi_transceive being called once versus 240 times per screen. However it also saves a lot of memory.

That is using:

uint16_t arduino::ZephyrSPI::transfer16(uint16_t data) {
  int ret;
  uint16_t rx;
  const struct spi_buf tx_buf = {.buf = &data, .len = sizeof(data)};
  const struct spi_buf_set tx_buf_set = {
      .buffers = &tx_buf,
      .count = 1,
  };
  const struct spi_buf rx_buf = {.buf = &rx, .len = sizeof(rx)};
  const struct spi_buf_set rx_buf_set = {
      .buffers = &rx_buf,
      .count = 1,
  };

  ret = spi_transceive(spi_dev, &config16, &tx_buf_set, &rx_buf_set);
  if (ret < 0) {
    return 0;
  }

  return rx;
}

I use up an extra 640 bytes of memory to copy into, and then the above call use 640 bytes of stack space, to receive the pixels
which the code then copies back over the original data. Note: depending on different memory configuration, my
code would crash with my buffering of 640 bytes, as I assume it ran out of space and corrupted or walked over something or out of bounds.
 ```

//=============================================================================
// Simple image (BMP optional JPEG and PNG) display program, which if the
// sketch is built with one of the USB Types which include MTP support
//=============================================================================
#include <SPI.h>

//-----------------------------------------------------------------------------
// ILI9341 displays
//-----------------------------------------------------------------------------
#include "ILI9341_GIGA_zephyr.h"
#include <elapsedMillis.h>
#define CS_SD 6
#define TFT_DC 9
#define TFT_RST 8
#define TFT_CS 10
ILI9341_GIGA_n tft(TFT_CS, TFT_DC, TFT_RST);
#define TFT_SPI SPI1
#define TFT_SPEED 20000000u

//-----------------------------------------------------------------------------
// Some common things.
//-----------------------------------------------------------------------------

#define BLUE 0x001F
#define BLACK 0x0000
#define WHITE 0xFFFF
#define GREEN 0x07E0
#define RED 0xf800

//****************************************************************************
// Setup
//****************************************************************************
const struct device *spi_dev = nullptr;
struct spi_config config16;
struct spi_buf tx_buf = { .buf = nullptr, .len = 320 * 240 * 2 };
const struct spi_buf_set tx_buf_set = { .buffers = &tx_buf, .count = 1 };
uint16_t *pframeBuffer;

extern void UpdateScreen();

void setup(void) {
// Keep the SD card inactive while working the display.
delay(20);

Serial.begin(115200);
while (!Serial && millis() < 3000)
    ;
// give chance to debug some display startups...

//-----------------------------------------------------------------------------
// initialize display
//-----------------------------------------------------------------------------

pinMode(TFT_CS, OUTPUT);
pinMode(CS_SD, OUTPUT);
digitalWrite(TFT_CS, HIGH);

Serial.println("*** start up ILI9341 ***");
tft.setSPI(TFT_SPI);  // temporary...
tft.begin(TFT_SPEED);
tft.setRotation(1);

tft.fillScreen(RED);
delay(500);
tft.fillScreen(GREEN);
delay(500);
tft.fillScreen(BLUE);
delay(500);

tft.useFrameBuffer(true);

// setup to use zephyr spi
uint32_t *p = (uint32_t *)&TFT_SPI;
spi_dev = (const struct device *)p[1];
memset((void *)&config16, 0, sizeof(config16));
config16.frequency = TFT_SPEED;
config16.operation = SPI_WORD_SET(16) | SPI_TRANSFER_MSB;
tx_buf.buf = pframeBuffer = tft.getFrameBuffer();

}

//****************************************************************************
// loop
//****************************************************************************
uint8_t update_mode = 0;

void loop() {
switch (update_mode) {
case 0: Serial.print("updateScreen: "); break;
case 1: Serial.print("SPI.transfer: "); break;
case 2: Serial.print("SPI.transfer16: "); break;
case 3: Serial.print("Zephyr: "); break;
}
elapsedMillis em;
tft.fillScreen(BLUE);
UpdateScreen();
tft.fillScreen(BLACK);
UpdateScreen();
tft.fillScreen(WHITE);
UpdateScreen();
tft.fillScreen(GREEN);
UpdateScreen();
tft.fillScreen(RED);
UpdateScreen();
Serial.println((uint32_t)em);
update_mode++;
if (update_mode > 3) {
update_mode = 0;
Serial.println();
}
delay(250);
}

void UpdateScreen() {
if (update_mode == 0) {
tft.updateScreen();
} else {
tft.beginSPITransaction(TFT_SPEED);
tft.setAddr(0, 0, tft.width() - 1, tft.height() - 1);
tft.writecommand_cont(ILI9341_RAMWR);
tft.setDataMode();
uint32_t frame_size = tft.width() * tft.height();
if (update_mode == 1) {
for (uint32_t i = 0; i < frame_size; i++) {
uint16_t color = pframeBuffer[i];
TFT_SPI.transfer(color >> 8);
TFT_SPI.transfer(color & 0xff);
}
} else if (update_mode == 2) {
for (uint32_t i = 0; i < frame_size; i++) {
TFT_SPI.transfer16(pframeBuffer[i]);
}
} else {
spi_transceive(spi_dev, &config16, &tx_buf_set, nullptr);
}
tft.endSPITransaction();
}
}

@KurtE
Copy link
Author

KurtE commented Dec 17, 2024

Not sure this is going anywhere,
But:

I hacked up the tft_picture_view_sd_zephyr sketch to overwrite the SDFat default SPI driver implementation,
like the UserSPIDriver.ino SDFat example sketch.

To use this with this sketch you need to:
#if SPI_DRIVER_SELECT == 3 // Must be set in SdFat/SdFatConfig.h
I overwrite the default write buffer and read buffer like code. to use the spi_transceive code of zephyr.
Note: My first attempt to this was to pass in nullptr for TX buffer, but SDFat was not happy. So currently I use
a stack buffer, currently 128 bytes, which I memset to 0xff, and transfer 128 bytes at a time, which is working.

Without using the zephyr code, timings were like:

Loading image 'Sharon Barn.bmp'
Image size: 1716x1312 depth:24Scale: 6 Image Offsets (17, 11)
!!File:Sharon Barn.bmp Time:73754 writeRect calls:0
US:494
Loop looking for image file

2400D080 Loading JPG image 'Small-Drake.jpg' 54453
Image size: 400x320Scale: 1/1 Image Offsets (-40, -40)
!!File:Small-Drake.jpg Time:537 writeRect calls:0
US:495
Loop looking for image file

Loading image 'Annie1.bmp'
Image size: 320x240 depth:24Scale: 1 Image Offsets (0, 0)
!!File:Annie1.bmp Time:3269 writeRect calls:0
US:495

With the code turned on.


Loading image 'Sharon Barn.bmp'
Image size: 1716x1312 depth:24Scale: 6 Image Offsets (17, 11)
!!File:Sharon Barn.bmp Time:11346 writeRect calls:0
US:121
Loop looking for image file

2400D080 Loading JPG image 'Small-Drake.jpg' 54453
Image size: 400x320Scale: 1/1 Image Offsets (-40, -40)
!!File:Small-Drake.jpg Time:103 writeRect calls:0
US:121
Loop looking for image file

Loading image 'Annie1.bmp'
Image size: 320x240 depth:24Scale: 1 Image Offsets (0, 0)
!!File:Annie1.bmp Time:485 writeRect calls:0
US:121

That is for the photo: !!File:Sharon Barn.bmp
**The time to read the file went from: 73.754 seconds down to 11.346 seconds or 6.5 times faster **
Time to display the loaded image went from: 0.495 down to 0.121 seconds.

Annie1.bmp from 3.269 seconds down to 0.385 or 8.49 times faster...

So I think something along this line is worth it!

@KurtE
Copy link
Author

KurtE commented Dec 18, 2024

Follow on to the timings I mentioned:
If you look at my forum post:
https://forum.arduino.cc/t/performance-of-sd-and-usb-drives-on-portenta-h7-and-giga/1198411

You can see some comparisons of the times mentioned in the previous message versus:
Teensy Micromod(SDIO), GIGA MBED, Portenta

Teensy MicroMod: !!File:Sharon Barn.bmp Time:1817 writeRect calls:0

GIGA:
SDStick: !!File:/usb/Sharon Barn.bmp Time:28777 writeRect calls:0
SDFat: !!File:Sharon Barn.bmp Time:29339 writeRect calls:0

Portenta: (Note I did not write down the full numbers but:
SDFat (SPI): same timing as GIGA or about 30 seconds
SDIO: about 19 seconds.

@facchinm
Copy link
Member

@KurtE I like option

c) change the SPI object, to use protected: instead of private: than one could create a new sub-class

the most, since it won't touch the existing APIs but will allow a much greater degree of freedom for hacking without touching the code.
Would you mind submitting a PR?

KurtE added a commit to KurtE/ArduinoCore-zephyr that referenced this issue Dec 18, 2024
…iables

resolves arduino#13

And: provides for c) option in arduino#14

In particular, added another config option (config16) that is initialized at beginTransaction with same
settings as config except word size set to 16 instead of 8.

Also: updated begin() to call beginTransaction with default SPISettings, and the endTransaction, such that sketches that don't call beginTransaction output at the default settings.

Changed the header file private: to protected
@KurtE
Copy link
Author

KurtE commented Dec 18, 2024

As I mentioned in PR#18

I changed private: to protected:

I created a quick and dirty sub-class

class wrapped_SPI : public arduino::ZephyrSPI {
  public:
    const struct device *SPIDevice() {return spi_dev;}
    struct spi_config *getConfig()  {return &config;}
    struct spi_config *getConfig16() {return &config16;}
};

Had sketch with some code to see if I could use this. The only way I saw was to
cast a pointer to an SPI object to a pointer of the class above.

I then verified I got the same information from this as I was getting by my hack version:

uint32_t *p = (uint32_t *)&TFT_SPI;
    Serial.print("PSPI: 0x");
    Serial.println((uint32_t)p, HEX);
    spi_dev = (const struct device *)p[1];
    memset((void *)&config16, 0, sizeof(config16));
    config16.frequency = TFT_SPEED;
    config16.operation = SPI_WORD_SET(16) | SPI_TRANSFER_MSB;
    Serial.print("Get zspi and config by hack: 0x");
    Serial.print((uint32_t)spi_dev, HEX);
    Serial.print(" 0x");
    Serial.println((uint32_t)&p[2], HEX);

    Serial.print("Try SubClass: 0x");
    wrapped_SPI *pwspi = (wrapped_SPI*)&TFT_SPI;
    Serial.print((uint32_t)pwspi->SPIDevice(), HEX);
    Serial.print(" 0x");
    Serial.print((uint32_t)pwspi->getConfig(), HEX);
    Serial.print(" 0x");
    Serial.println((uint32_t)pwspi->getConfig16(), HEX);

and the output was correct:

PSPI: 0x24013D74
Get zspi and config by hack: 0x805A200 0x24013D7C
Try SubClass: 0x805A200 0x24013D7C 0x24013D90

@KurtE
Copy link
Author

KurtE commented Dec 24, 2024

@facchinm - This could be own issue, or part of this one:

Asynchronous SPI transfers:
There are several libraries, which often do some form of Asynchronous SPI transfers. Wondering what is the best way to a accomplish this on Arduino/Zephyr?

Reading the Zephyr documents, I see they have some functionality to do this, including:

static inline int spi_transceive_cb(const struct device *dev,
				    const struct spi_config *config,
				    const struct spi_buf_set *tx_bufs,
				    const struct spi_buf_set *rx_bufs,
				    spi_callback_t callback,
				    void *userdata)
{
	const struct spi_driver_api *api =
		(const struct spi_driver_api *)dev->api;

	return api->transceive_async(dev, config, tx_bufs, rx_bufs, callback, userdata);
}

However, this API is under the: #if defined(CONFIG_SPI_ASYNC)
Which I added to my arguino_giga_r1_m7.conf file
Which then example sketch compiled but returned error code of not supported. Found I need to also add the define:

CONFIG_SPI_ASYNC=y
CONFIG_SPI_STM32_INTERRUPT=y

Which my example sketch now runs.



//=============================================================================
// Simple image (BMP optional JPEG and PNG) display program, which if the
// sketch is built with one of the USB Types which include MTP support
//=============================================================================
#include <SPI.h>

//-----------------------------------------------------------------------------
// ILI9341 displays
//-----------------------------------------------------------------------------
#include "ILI9341_GIGA_zephyr.h"
#include <elapsedMillis.h>
#define CS_SD 6
#define TFT_DC 9
#define TFT_RST 8
#define TFT_CS 10
ILI9341_GIGA_n tft(TFT_CS, TFT_DC, TFT_RST);
#define TFT_SPI SPI1
#define TFT_SPEED 20000000u



//-----------------------------------------------------------------------------
// Some common things.
//-----------------------------------------------------------------------------

#define BLUE 0x001F
#define BLACK 0x0000
#define WHITE 0xFFFF
#define GREEN 0x07E0
#define RED 0xf800

class wrapped_SPI : public arduino::ZephyrSPI {
  public:
    const struct device *SPIDevice() {return spi_dev;}
    struct spi_config *getConfig()  {return &config;}
    struct spi_config *getConfig16() {return &config16;}
};


//****************************************************************************
// Setup
//****************************************************************************
const struct device *spi_dev = nullptr;
struct spi_config config16;
struct spi_buf tx_buf = { .buf = nullptr, .len = 320 * 240 * 2 };
const struct spi_buf_set tx_buf_set = { .buffers = &tx_buf, .count = 1 };
uint16_t *pframeBuffer;

extern void UpdateScreen();

void setup(void) {
    // Keep the SD card inactive while working the display.
    delay(20);

    Serial.begin(115200);
    while (!Serial && millis() < 3000)
        ;
    // give chance to debug some display startups...

    //-----------------------------------------------------------------------------
    // initialize display
    //-----------------------------------------------------------------------------

    pinMode(TFT_CS, OUTPUT);
    pinMode(CS_SD, OUTPUT);
    digitalWrite(TFT_CS, HIGH);

    Serial.println("*** start up ILI9341 ***");
    tft.setSPI(TFT_SPI);  // temporary...
    tft.begin(TFT_SPEED);
    tft.setRotation(1);

    tft.fillScreen(RED);
    delay(500);
    tft.fillScreen(GREEN);
    delay(500);
    tft.fillScreen(BLUE);
    delay(500);

    tft.useFrameBuffer(true);

    // setup to use zephyr spi
    uint32_t *p = (uint32_t *)&TFT_SPI;
    Serial.print("PSPI: 0x");
    Serial.println((uint32_t)p, HEX);
    spi_dev = (const struct device *)p[1];
    memset((void *)&config16, 0, sizeof(config16));
    config16.frequency = TFT_SPEED;
    config16.operation = SPI_WORD_SET(16) | SPI_TRANSFER_MSB;
    Serial.print("Get zspi and config by hack: 0x");
    Serial.print((uint32_t)spi_dev, HEX);
    Serial.print(" 0x");
    Serial.println((uint32_t)&p[2], HEX);

    Serial.print("Try SubClass: 0x");
    wrapped_SPI *pwspi = (wrapped_SPI*)&TFT_SPI;
    Serial.print((uint32_t)pwspi->SPIDevice(), HEX);
    Serial.print(" 0x");
    Serial.print((uint32_t)pwspi->getConfig(), HEX);
    Serial.print(" 0x");
    Serial.println((uint32_t)pwspi->getConfig16(), HEX);


    tx_buf.buf = pframeBuffer = tft.getFrameBuffer();
}

//****************************************************************************
// loop
//****************************************************************************
uint8_t update_mode = 0;

volatile bool spi_async_active = false;

void spi_cb(const struct device *dev, int result, void *data) {
  spi_async_active = false;
}

void loop() {
    switch (update_mode) {
        case 0: Serial.print("updateScreen: "); break;
        case 1: Serial.print("SPI.transfer: "); break;
        case 2: Serial.print("SPI.transfer16: "); break;
        case 3: Serial.print("Zephyr: "); break;
        case 4: Serial.print("Zephyr Async: "); break;
    }
    elapsedMillis em;
    tft.fillScreen(BLUE);
    UpdateScreen();
    tft.fillScreen(BLACK);
    UpdateScreen();
    tft.fillScreen(WHITE);
    UpdateScreen();
    tft.fillScreen(GREEN);
    UpdateScreen();
    tft.fillScreen(RED);
    UpdateScreen();
    Serial.println((uint32_t)em);
    update_mode++;
    if (update_mode > 4) {
        update_mode = 0;
        Serial.println();
    }
    delay(250);
    if (Serial.available()) {
      while (Serial.read() != -1);
      Serial.println("Paused");
      while (Serial.read() == -1);
      while (Serial.read() != -1);

    }
}



void UpdateScreen() {
    if (update_mode == 0) {
        tft.updateScreen();
    } else {
        tft.beginSPITransaction(TFT_SPEED);
        tft.setAddr(0, 0, tft.width() - 1, tft.height() - 1);
        tft.writecommand_cont(ILI9341_RAMWR);
        tft.setDataMode();
        uint32_t frame_size = tft.width() * tft.height();
        if (update_mode == 1) {
            for (uint32_t i = 0; i < frame_size; i++) {
                uint16_t color = pframeBuffer[i];
                TFT_SPI.transfer(color >> 8);
                TFT_SPI.transfer(color & 0xff);
            }
        } else if (update_mode == 2) {
            for (uint32_t i = 0; i < frame_size; i++) {
                TFT_SPI.transfer16(pframeBuffer[i]);
            }
        } else if (update_mode == 3) {
            spi_transceive(spi_dev, &config16, &tx_buf_set, nullptr);
        } else {
            spi_async_active = true;
            int iRet;
            if ((iRet = spi_transceive_cb(spi_dev, &config16, &tx_buf_set, nullptr, &spi_cb, &tft)) < 0) {
              Serial.print("spi_transceiveCB failed:");
              Serial.println(iRet);
            } else {
              while (spi_async_active) {}
            }
        }
        tft.endSPITransaction();
    }
}

Note I have not gone through the code to see if this is done by DMA or interrupt or polling or ???

Suppose I instead wish to do this outside of this, can I setup my own DMA to do this? I have a version of my
ILI9341_GIGA_n library, where in the constructor, I can pass it a DMA_Stream_typedef object, which I default to: DMA1_Stream1,
where I manually setup the hardware registers and do a DMA operation?
Can/Should I do something like that Zephyr?

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants